Proceedings Volume 9416

Medical Imaging 2015: Image Perception, Observer Performance, and Technology Assessment

cover
Proceedings Volume 9416

Medical Imaging 2015: Image Perception, Observer Performance, and Technology Assessment

Purchase the printed version of this volume at proceedings.com or access the digital version at SPIE Digital Library.

Volume Details

Date Published: 27 April 2015
Contents: 10 Sessions, 52 Papers, 0 Presentations
Conference: SPIE Medical Imaging 2015
Volume Number: 9416

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 9416
  • Keynote and Breast I
  • Breast II
  • Observer Performance Evaluation
  • CT
  • Model Observers I
  • Visual Search
  • Model Observers II
  • Technology Assessment
  • Poster Session
Front Matter: Volume 9416
icon_mobile_dropdown
Front Matter: Volume 9416
This PDF file contains the front matter associated with SPIE Proceedings Volume 9416, including the Title Page, Copyright information, Table of Contents, Introduction (if any), and Conference Committee listing.
Keynote and Breast I
icon_mobile_dropdown
Incorporating breast tomosynthesis into radiology residency: Does trainee experience in breast imaging translate into improved performance with this new modality?
Lars J. Grimm, Jing Zhang, Karen S. Johnson, et al.
Digital breast tomosynthesis (DBT) is a powerful new imaging modality that has the potential to transform breast cancer screening practices. The advantages over mammography include improved sensitivity and specificity as well as the detection of additional invasive cancers. While this modality holds many advantages, the best means of incorporating DBT into radiology training programs is currently not well understood. The initial performance of a trainee in DBT might depend on the amount of previous radiology training, in particular breast imaging experience. In our study, we tested the DBT interpretive skills of radiology trainees with different levels of breast imaging training, but with no prior DBT experience. We recruited 16 radiology trainees to review 60 DBT studies. A fellowship-trained expert breast radiologist reviewed the studies and provided the gold standard interpretations. Receiver operating characteristic analysis was used to evaluate the performance of the trainees. Our results show that there is no notable difference in trainee performance in DBT, regardless of the number of years of general radiology experience. These results provide guidance to breast imaging educators as they prepare new curricula to teach DBT. These curricula should provide a base training which will likely be suitable for all trainee levels.
Detection of calcification clusters in digital breast tomosynthesis slices at different dose levels utilizing a SRSAR reconstruction and JAFROC
P. Timberg, M. Dustler, H. Petersson, et al.
Purpose: To investigate detection performance for calcification clusters in reconstructed digital breast tomosynthesis (DBT) slices at different dose levels using a Super Resolution and Statistical Artifact Reduction (SRSAR) reconstruction method. Method: Simulated calcifications with irregular profile (0.2 mm diameter) where combined to form clusters that were added to projection images (1-3 per abnormal image) acquired on a DBT system (Mammomat Inspiration, Siemens). The projection images were dose reduced by software to form 35 abnormal cases and 25 normal cases as if acquired at 100%, 75% and 50% dose level (AGD of approximately 1.6 mGy for a 53 mm standard breast, measured according to EUREF v0.15). A standard FBP and a SRSAR reconstruction method (utilizing IRIS (iterative reconstruction filters), and outlier detection using Maximum-Intensity Projections and Average-Intensity Projections) were used to reconstruct single central slices to be used in a Free-response task (60 images per observer and dose level). Six observers participated and their task was to detect the clusters and assign confidence rating in randomly presented images from the whole image set (balanced by dose level). Each trial was separated by one weeks to reduce possible memory bias. The outcome was analyzed for statistical differences using Jackknifed Alternative Free-response Receiver Operating Characteristics. Results: The results indicate that it is possible reduce the dose by 50% with SRSAR without jeopardizing cluster detection. Conclusions: The detection performance for clusters can be maintained at a lower dose level by using SRSAR reconstruction.
Breast II
icon_mobile_dropdown
Inter- and intra-observer variations in the delineation of lesions in mammograms
Thomas Buelow, Harald S. Heese, Ruediger Grewer, et al.
Many clinical and research tasks require the delineation of lesions in radiological images. There is a variety of methods available for deriving such delineations, ranging from free hand manual contouring and manual positioning of lowparameter graphical objects, to (semi-)automatic computerized segmentation methods. In this paper we investigate the impact of the chosen segmentation method on the inter-observer variability of the resulting contour. Three different methods are compared in this paper, namely (1) manual positioning of an ellipse, (2) an automatic segmentation method, coined live-segmentation, which depends on the current mouse pointer position as input information and is updated in real-time as the user hovers with the mouse over the image and (3) free form segmentation which is realized by allowing the user to pull the result of method (2) to image positions that the contour is required to pass. Each of the three methods was used by three experienced radiologists to delineate a set of 215 round breast lesion images in digital mammograms. Agreement between contours was assessed by computing the Dice coefficient. The median Dice coefficient for the ellipses placed by different readers was 0.85. The intra-reader Dice coefficient comparing ellipses and livesegmentations was 0.84, thus showing that the live-segmentation results agree with ellipse segmentations to the same extent as readers agree on the ellipse placement. Inter-observer agreement when using the live-segmentation was higher than for the ellipses (median Dice = 0.91 vs. 0.85) showing that the live-segmentation is a more reproducible alternative to the ellipse placement.
Computational assessment of mammography accreditation phantom images and correlation with human observer analysis
Bruno Barufaldi, Kristen C. Lau, Homero Schiabel, et al.
Routine performance of basic test procedures and dose measurements are essential for assuring high quality of mammograms. International guidelines recommend that breast care providers ascertain that mammography systems produce a constant high quality image, using as low a radiation dose as is reasonably achievable. The main purpose of this research is to develop a framework to monitor radiation dose and image quality in a mixed breast screening and diagnostic imaging environment using an automated tracking system. This study presents a module of this framework, consisting of a computerized system to measure the image quality of the American College of Radiology mammography accreditation phantom. The methods developed combine correlation approaches, matched filters, and data mining techniques. These methods have been used to analyze radiological images of the accreditation phantom. The classification of structures of interest is based upon reports produced by four trained readers. As previously reported, human observers demonstrate great variation in their analysis due to the subjectivity of human visual inspection. The software tool was trained with three sets of 60 phantom images in order to generate decision trees using the software WEKA (Waikato Environment for Knowledge Analysis). When tested with 240 images during the classification step, the tool correctly classified 88%, 99%, and 98%, of fibers, speck groups and masses, respectively. The variation between the computer classification and human reading was comparable to the variation between human readers. This computerized system not only automates the quality control procedure in mammography, but also decreases the subjectivity in the expert evaluation of the phantom images.
iDensity: an automatic Gabor filter-based algorithm for breast density assessment
Ziba Gamdonkar, Kevin Tay, Will Ryder, et al.
Abstract Although many semi-automated and automated algorithms for breast density assessment have been recently proposed, none of these have been widely accepted. In this study a novel automated algorithm, named iDensity, inspired by the human visual system is proposed for classifying mammograms into four breast density categories corresponding to the Breast Imaging Reporting and Data System (BI-RADS). For each BI-RADS category 80 cases were taken from the normal volumes of the Digital Database for Screening Mammography (DDSM). For each case only the left medio-lateral oblique was utilized. After image calibration using the provided tables of each scanner in the DDSM, the pectoral muscle and background were removed. Images were filtered by a median filter and down sampled. Images were then filtered by a filter bank consisting of Gabor filters in six orientations and 3 scales, as well as a Gaussian filter. Three gray level histogram-based features and three second order statistics features were extracted from each filtered image. Using the extracted features, mammograms were separated initially separated into two groups, low or high density, then in a second stage, the low density group was subdivided into BI-RADS I or II, and the high density group into BI-RADS III or IV. The algorithm achieved a sensitivity of 95% and specificity of 94% in the first stage, sensitivity of 89% and specificity of 95% when classifying BIRADS I and II cases, and a sensitivity of 88% and 91% specificity when classifying BI-RADS III and IV.
Assessment of mass detection performance in contrast enhanced digital mammography
Ann-Katherine Carton, Pablo Milioni de Carvalho, Zhijin Li, et al.
We address the detectability of contrast-agent enhancing masses for contrast-agent enhanced spectral mammography (CESM), a dual-energy technique providing functional projection images of breast tissue perfusion and vascularity using simulated CESM images. First, the realism of simulated CESM images from anthropomorphic breast software phantoms generated with a software X-ray imaging platform was validated. Breast texture was characterized by power-law coefficients calculated in data sets of real clinical and simulated images. We also performed a 2-alternative forced choice (2-AFC) psychophysical experiment whereby simulated and real images were presented side-by-side to an experienced radiologist to test if real images could be distinguished from the simulated images. It was found that texture in our simulated CESM images has a fairly realistic appearance. Next, the relative performance of human readers and previously developed mathematical observers was assessed for the detection of iodine-enhancing mass lesions containing different contrast agent concentrations. A four alternative-forced-choice (4 AFC) task was designed; the task for the model and human observer was to detect which one of the four simulated DE recombined images contained an iodineenhancing mass. Our results showed that the NPW and NPWE models largely outperform human performance. After introduction of an internal noise component, both observers approached human performance. The CHO observer performs slightly worse than the average human observer. There is still work to be done in improving model observers as predictors of human-observer performance. Larger trials could also improve our test statistics. We hope that in the future, this framework of software breast phantoms, virtual image acquisition and processing, and mathematical observers can be beneficial to optimize CESM imaging techniques.
The relationship between socio-economic status and cancer detection at screening
Sian Taylor-Phillips, Toyin Ogboye, Tom Hamborg, et al.
It is well known that socio-economic status is a strong predictor of screening attendance, with women of higher socioeconomic status more likely to attend breast cancer screening. We investigated whether socio-economic status was related to the detection of cancer at breast screening centres. In two separate projects we combined UK data from the population census, the screening information systems, and the cancer registry. Five years of data from all 81 screening centres in the UK was collected. Only women who had previously attended screening were included. The study was given ethical approval by the University of Warwick Biomedical Research Ethics committee reference SDR-232-07- 2012. Generalised linear models with a log-normal link function were fitted to investigate the relationship between predictors and the age corrected cancer detection rate at each centre. We found that screening centres serving areas with lower average socio-economic status had lower cancer detection rates, even after correcting for the age distribution of the population. This may be because there may be a correlation between higher socio-economic status and some risk factors for breast cancer such as nullparity (never bearing children). When applying adjustment for age, ethnicity and socioeconomic status of the population screened (rather than simply age) we found that SDR can change by up to 0.11.
The impact of mammographic imaging systems on density measurement
The purpose of this study is to investigate whether having a mammogram on differing manufacturer equipment will affect a woman’s breast density (BD) measurement. The data set comprised of 40 cases, each containing a combined image of the left craniocaudal (LCC) and left mediolateral oblique (LMLO). These images were obtained from 20 women age between 42–89 years. The images were acquired on two imaging systems (GE and Hologic) one year apart. Volumetric BD was assessed by using Volpara Density Grade (VDG) and average BD% (AvBD%). Twenty American Board of Radiology (ABR) examiners assessed the same images using the BIRADS BD scale 1-4. Statistical comparisons were performed on the means using Mann-Whitney, on correlation using Spearman’s rank coefficient of correlation and agreement using Cohen’s Kappa. The absolute median BIRADS difference between GE and Hologic was 0.225 (2.00 versus 2.00; p<0.043). The VDG measures for GE was not statistically different to Hologic (2.00 versus 2.00; p<0.877), likewise the median AvBD% for the GE and Hologic systems showed no difference (6.51 versus 6.79; p<0.935). BIRADS for GE and Hologic systems showed strong positive correlation (ρ=0.904; p<0.001), while the VDG (ρ=0.978; p<0.001) and AvBD% (ρ=0.973; p<0.001) showed very strong positive correlations. There was a substantial agreement between GE and Hologic systems for BIRADS density shown with Cohen’s Kappa (κ=0.692; p<0.001), however the systems demonstrated an almost perfect agreement for VDG (κ=0.933; p<0.001).
Observer Performance Evaluation
icon_mobile_dropdown
A phantom-based JAFROC observer study of two CT reconstruction methods: the search for optimisation of lesion detection and effective dose
Purpose: To investigate the dose saving potential of iterative reconstruction (IR) in a computed tomography (CT) examination of the thorax.

Materials and Methods: An anthropomorphic chest phantom containing various configurations of simulated lesions (5, 8, 10 and 12mm; +100, -630 and -800 Hounsfield Units, HU) was imaged on a modern CT system over a tube current range (20, 40, 60 and 80mA). Images were reconstructed with (IR) and filtered back projection (FBP). An ATOM 701D (CIRS, Norfolk, VA) dosimetry phantom was used to measure organ dose. Effective dose was calculated. Eleven observers (15.11±8.75 years of experience) completed a free response study, localizing lesions in 544 single CT image slices. A modified jackknife alternative free-response receiver operating characteristic (JAFROC) analysis was completed to look for a significant effect of two factors: reconstruction method and tube current. Alpha was set at 0.05 to control the Type I error in this study.

Results: For modified JAFROC analysis of reconstruction method there was no statistically significant difference in lesion detection performance between FBP and IR when figures-of-merit were averaged over tube current (F(1,10)=0.08, p = 0.789). For tube current analysis, significant differences were revealed between multiple pairs of tube current settings (F(3,10) = 16.96, p<0.001) when averaged over image reconstruction method.

Conclusion: The free-response study suggests that lesion detection can be optimized at 40mA in this phantom model, a measured effective dose of 0.97mSv. In high-contrast regions the diagnostic value of IR, compared to FBP, is less clear.
A multireader diagnostic performance study of low-contrast detectability on a third-generation dual-source CT scanner: filtered back projection versus advanced modeled iterative reconstruction
Justin Solomon, Achille Mileto, Juan Carlos Ramirez-Giraldo, et al.
The purpose of this work was to compare CT low-contrast detectability between two reconstruction algorithms, filtered back-projection (FBP) and advanced modeled iterative reconstruction (ADMIRE). A phantom was designed with a range of low-contrast circular inserts representing 5 contrast levels and 3 sizes. The phantom was imaged on a third-generation dual-source CT scanner (SOMATOM Definition Force, Siemens Healthcare) under various dose levels (0.74 – 5.8 mGy CTDIVol). Images were reconstructed using different settings of slice thickness (0.6 – 5 mm) and reconstruction algorithms (FBP and ADMIRE with strength of 3-5) and were assessed by eleven blinded and independent readers using a two alternative forced choice (2AFC) detection experiment. A second observer experiment was further performed in which observers scored the images based on the total number of visible object groups. Detection performance increased with increasing contrast, size, dose, with accuracy ranging from 50% (i.e., guessing) to 87% with an average inter-observer variability of ±7%. The use of ADMIRE-3 increased performance by 5.2% resulting in an estimated dose reduction potential of 56-60%. The results from the second experiment also showed increased number of visible object groups for increasing dose, slice thickness, and ADMIRE strength. The score difference between FBP and ADMIRE was 0.9, 1.3, and 2.1 for ADMIRE strengths of 3, 4, and 5, respectively, resulting in estimated dose reduction potentials between 4-80%. Overall, the data indicated potential to image at reduced doses while maintaining comparable image quality when using ADMIRE compared to FBP.
Demonstration of multi- and single-reader sample size program for diagnostic studies software
The recently released software Multi- and Single-Reader Sample Size Sample Size Program for Diagnostic Studies, written by Kevin Schartz and Stephen Hillis, performs sample size computations for diagnostic reader-performance studies. The program computes the sample size needed to detect a specified difference in a reader performance measure between two modalities, when using the analysis methods initially proposed by Dorfman, Berbaum, and Metz (DBM) and Obuchowski and Rockette (OR), and later unified and improved by Hillis and colleagues. A commonly used reader performance measure is the area under the receiver-operating-characteristic curve. The program can be used with typical common reader-performance measures which can be estimated parametrically or nonparametrically. The program has an easy-to-use step-by-step intuitive interface that walks the user through the entry of the needed information. Features of the software include the following: (1) choice of several study designs; (2) choice of inputs obtained from either OR or DBM analyses; (3) choice of three different inference situations: both readers and cases random, readers fixed and cases random, and readers random and cases fixed; (4) choice of two types of hypotheses: equivalence or noninferiority; (6) choice of two output formats: power for specified case and reader sample sizes, or a listing of case-reader combinations that provide a specified power; (7) choice of single or multi-reader analyses; and (8) functionality in Windows, Mac OS, and Linux.
Low contrast detectability in CT for human and model observers in multi-slice data sets
Alexandre Ba, Damien Racine, Julien G. Ott, et al.
Task-based medical image quality is often assessed by model observers for single slice images. The goal of the study was to determine if model observers can predict human detection performance of low contrast signals in CT for clinical multi-slice (ms) images. We collected 24 different data subsets from a low contrast phantom: 3 dose levels (40, 90, 150 mAs), 4 signals (6 and 8 mm diameter; 10 and 20 HU at 120kV) and 2 reconstruction algorithms (FBP and iterative (IR)). Images were assessed by human and model observers in 4-alternative forced choice (4AFC) experiments with ms data set in a signal-known-exactly (SKE) paradigm. Model observers with single (msCHOa) and multiple (msCHOb) templates were implemented in a train and test method analysis with Dense Difference of Gaussian (DDoG) and Gabor spatial channels. For human observers, we found that percent correct increased with the dose and was higher for iterative reconstructed images than FBP in all investigated conditions. All model observers implemented overestimated human performance in any condition except one case (6mm and 10HU) for msCHOa and msCHOb with Gabor channels. Internal noise could be implemented and a good agreement was found but necessitates independent fits according to the reconstruction method. Generally msCHOb shows higher detection performance than msCHOa with both types of channels. Gabor channels were less efficient than DDoG in this context. These results allow further developments in 3D analysis technique for low contrast CT.
CT
icon_mobile_dropdown
Influence of the grayscale on phantom-based image quality assessment in x-ray computed tomography
F. Noo, K. Hahn, H. C. Davidson, et al.
Radiation dose associated with CT scans has become an important concern in medical imaging. Fortunately, there are many pathways to reducing dose. A complicated aspect, however, is to ensure that image quality is not affected while reducing the dose. A preferred method to assess image quality is ROC analysis, possibly with a search process. For early assessment of new imaging solutions, utilization of human observers and real patient data is rarely practical. Instead, studies involving phantoms and model observers are often preferred. We present here an experimental result that sheds light on how the grayscale window affects human observer performance in a typical phantom-based study; and we also present an analysis that clarifies how the grayscale window affects the statistics of the image. These studies provide a better understanding of possible consequences associated with not including a grayscale window in studies with model observers, as is typical.
Combination of detection and estimation tasks using channelized scanning linear observer for CT imaging systems
Maintaining or even improving image quality while lowering patient dose is always the desire in clinical CT imaging. Iterative reconstruction (IR) algorithms have been designed to help reduce dose and/or provide better image quality. In this work, the channelized scanning linear observer (CSLO) is applied to study the combination of detection and estimation task performance using CT image data. The purpose of this work is to design a task-­‐based approach to quantitatively evaluate image-­‐quality for different reconstruction algorithms. Low-­‐contrast objects embedded in head-­‐size and body-­‐size phantoms are imaged multiple times and reconstructed by FBP and an IR algorithm for this study. Independent signal present and absent ROIs cropped from images are channelized by Difference of Gauss channels for CSLO training and testing. Estimation receiver operating characteristic (EROC) curves and the area under EROC curve (EAUC) are calculated by CSLO as the figure of merit. The One-­‐ Shot method is used to compute the variance of the EAUC values. Results suggest that the IR algorithm studied in this work could efficiently reduce the dose approximately 54% to achieve an image quality comparable to conventional FBP reconstruction for the combined detection and estimation tasks.
What observer models best reflect low-contrast detectability in CT?
The purpose of this work was to compare CT low-contrast detectability as measured via human perception experiments with observer model surrogates of image quality measured directly from the images. A phantom was designed with a range of low-contrast circular inserts representing 5 contrast levels and 3 sizes. The phantom was imaged repeatedly (20 times) on a third-generation dual-source CT scanner (SOMATOM Definition Force, Siemens Healthcare). Images were reconstructed at 0.6 mm slice thickness using filtered back projection (FBP) and advanced modeled iterative reconstruction (ADMIRE) and were assessed by eleven blinded and independent readers using a two alternative forced choice (2AFC) detection experiment. The human scores were taken as the accuracy, averaged across observers. The predicted performance was computed directly from the images for several traditional image quality metrics and model observers including contrast to noise ratio (CNR), area weighted CNR (CNRa), non-prewhitening matched filter (NPW), non-prewhitening matched filter with an eye filter (NPWE), channelized Hotelling observer (CHO), and channelized Hotelling observer with internal noise (CHOi). The correlation between model observer predictions and human performance was assessed using linear regression analysis. The coefficient of determination (R2) was used as goodness-of-fit metric to determine how well each model observer predicts human performance. R2 was 0.11, 0.71, 0.73, 0.77, 0.60, and 0.72 for CNR, CNRa, NPW, NPWE, CHO, and CHOi, respectively. The findings demonstrate NPW, NPWE, and CHOi all to have strong correlation with human performance and could be used to optimize scan and reconstruction settings.
CT image quality evaluation for detection of signals with unknown location, size, contrast and shape using unsupervised methods
Aria X. Pezeshk, Lucretiu Popescu, Berkman Sahiner
The advent of new image reconstruction and image processing techniques for CT images has increased the need for robust objective image quality assessment methods. One of the most common quality assessment methods is the measurement of signal detectability for a known signal at a known location using supervised classification techniques. However, this method requires a large number of simulations or physical measurements, and its underlying assumptions may be considered clinically unrealistic. In this study we focus on objective assessment of image quality in terms of detection of a signal with unknown location, size, shape, and contrast. We explore several unsupervised saliency detection methods which assume no knowledge about the signal, along with a template matching technique which uses information about the signal's size and shape in the object domain, for simulated phantoms that have been reconstructed using filtered back projection (FBP) and iterative reconstruction algorithms (IRA). The performance of each of the image reconstruction algorithms is then measured using the area under the localization receiver operating characteristic curve (LROC) and exponential transformation of the free response operating characteristic curve (EFROC). Our results indicate that unsupervised saliency detection methods can be effectively used to determine image quality in terms of signal detectability for unknown signals given only a small number of sample images.
Impact of number of repeated scans on model observer performance for a low-contrast detection task in CT
In previous investigations on CT image quality, channelized Hotelling observer (CHO) models have been shown to well represent human observer performance in several phantom-based detection/discrimination tasks. In these studies, a large number of independent images was necessary to estimate the expectation images and covariance matrices for each test condition. The purpose of this study is to investigate how the number of repeated scans affects the precision and accuracy of the CHO’s performance in a signal-known-exactly detection task. A phantom containing 21 low-contrast objects (3 contrast levels and 7 sizes) was scanned with a 128-slice CT scanner at three dose levels. For each dose level, 100 independent images were acquired for each test condition. All images were reconstructed using filtered-backprojection (FBP) and a commercial iterative reconstruction algorithm. For each combination of dose level and reconstruction method, the low-contrast detectability, quantified with the area under receiver operating characteristic curve (Az), was calculated using a previously validated CHO model. To determine the dependency of CHO performance on the number of repeated scans, the Az value was calculated for different number of channel filters, for each object size and contrast, and for different dose/reconstruction settings using all 100 repeated scans. The Az values were also calculated using randomly selected subsets of the scans (from 10 to 90 scans with an increment of 10 scans). Using the Az from the 100 scans as the reference, the accuracy of Az values calculated from a fewer number of scans was determined and the minimal number of scans was subsequently derived. For the studied signal-known-exactly detection task, results demonstrated that, the minimal number of scans depends on dose level, object size and contrast level, and channel filters.
Using the Wiener estimator to determine optimal imaging parameters in a synthetic-collimator SPECT system used for small animal imaging
Alexander Lin, Lindsay C. Johnson, Sepideh Shokouhi, et al.
In synthetic-collimator SPECT imaging, two detectors are placed at different distances behind a multi-pinhole aperture. This configuration allows for image detection at different magnifications and photon energies, resulting in higher overall sensitivity while maintaining high resolution. Image multiplexing the undesired overlapping between images due to photon origin uncertainty may occur in both detector planes and is often present in the second detector plane due to greater magnification. However, artifact-free image reconstruction is possible by combining data from both the front detector (little to no multiplexing) and the back detector (noticeable multiplexing). When the two detectors are used in tandem, spatial resolution is increased, allowing for a higher sensitivity-to-detector-area ratio. Due to variability in detector distances and pinhole spacings found in synthetic-collimator SPECT systems, a large parameter space must be examined to determine optimal imaging configurations. We chose to assess image quality based on the task of estimating activity in various regions of a mouse brain. Phantom objects were simulated using mouse brain data from the Magnetic Resonance Microimaging Neurological Atlas (MRM NeAt) and projected at different angles through models of a synthetic-collimator SPECT system, which was developed by collaborators at Vanderbilt University. Uptake in the different brain regions was modeled as being normally distributed about predetermined means and variances. We computed the performance of the Wiener estimator for the task of estimating activity in different regions of the mouse brain. Our results demonstrate the utility of the method for optimizing synthetic-collimator system design.
Model Observers I
icon_mobile_dropdown
Optimization of energy window and evaluation of scatter compensation methods in MPS using the ideal observer with model mismatch
Michael Ghaly, Jonathan M. Links, Eric Frey
In this work, we used the ideal observer (IO) and IO with model mismatch (IO-MM) applied in the projection domain and an anthropomorphic Channelized Hotelling Observer (CHO) applied to reconstructed images to optimize the acquisition energy window width and evaluate various scatter compensation methods in the context of a myocardial perfusion SPECT defect detection task. The IO has perfect knowledge of the image formation process and thus reflects performance with perfect compensation for image-degrading factors. Thus, using the IO to optimize imaging systems could lead to suboptimal parameters compared to those optimized for humans interpreting SPECT images reconstructed with imperfect or no compensation. The IO-MM allows incorporating imperfect system models into the IO optimization process. We found that with near-perfect scatter compensation, the optimal energy window for the IO and CHO were similar; in its absence the IO-MM gave a better prediction of the optimal energy window for the CHO using different scatter compensation methods. These data suggest that the IO-MM may be useful for projection-domain optimization when model mismatch is significant, and that the IO is useful when followed by reconstruction with good models of the image formation process.
Approximate maximum likelihood estimation of scanning observer templates
In localization tasks, an observer is asked to give the location of some target or feature of interest in an image. Scanning linear observer models incorporate the search implicit in this task through convolution of an observer template with the image being evaluated. Such models are becoming increasingly popular as predictors of human performance for validating medical imaging methodology. In addition to convolution, scanning models may utilize internal noise components to model inconsistencies in human observer responses. In this work, we build a probabilistic mathematical model of this process and show how it can, in principle, be used to obtain estimates of the observer template using maximum likelihood methods. The main difficulty of this approach is that a closed form probability distribution for a maximal location response is not generally available in the presence of internal noise. However, for a given image we can generate an empirical distribution of maximal locations using Monte-Carlo sampling. We show that this probability is well approximated by applying an exponential function to the scanning template output. We also evaluate log-likelihood functions on the basis of this approximate distribution. Using 1,000 trials of simulated data as a validation test set, we find that a plot of the approximate log-likelihood function along a single parameter related to the template profile achieves its maximum value near the true value used in the simulation. This finding holds regardless of whether the trials are correctly localized or not. In a second validation study evaluating a parameter related to the relative magnitude of internal noise, only the incorrect localization images produces a maximum in the approximate log-likelihood function that is near the true value of the parameter.
The effect of signal variability on the histograms of anthropomorphic channel outputs: factors resulting in non-normally distributed data
Fatma E. A. Elshahaby, Michael Ghaly, Abhinav K. Jha, et al.
Model Observers are widely used in medical imaging for the optimization and evaluation of instrumentation, acquisition parameters and image reconstruction and processing methods. The channelized Hotelling observer (CHO) is a commonly used model observer in nuclear medicine and has seen increasing use in other modalities. An anthropmorphic CHO consists of a set of channels that model some aspects of the human visual system and the Hotelling Observer, which is the optimal linear discriminant. The optimality of the CHO is based on the assumption that the channel outputs for data with and without the signal present have a multivariate normal distribution with equal class covariance matrices. The channel outputs result from the dot product of channel templates with input images and are thus the sum of a large number of random variables. The central limit theorem is thus often used to justify the assumption that the channel outputs are normally distributed. In this work, we aim to examine this assumption for realistically simulated nuclear medicine images when various types of signal variability are present.
Visual Search
icon_mobile_dropdown
Prevalence learning and decision making in a visual search task: an equivalent ideal observer approach
Xin He, Frank Samuelson, Rongping Zeng, et al.
Research studies have observed an influence of target prevalence on observer performance for visual search tasks. The goal of this work is to develop models for prevalence effects on visual search. In a recent study by Wolfe et. al, a large scale observer study was conducted to understand the effects of varying target prevalence on visual search. Particularly, a total of 12 observers were recruited to perform 1000 trials of simulated baggage search as target prevalence varied sinusoidally from high to low and back to high. We attempted to model observers’ behavior in prevalence learning and decision making. We modeled the observer as an equivalent ideal observer (EIO) with a prior belief of the signal prevalence. The use of EIO allows the application of ideal observer mathematics to characterize real observers’ performance reading real-life images. For every given new image, the observer updates the belief on prevalence and adjusts his/her decision threshold according to utility theory. The model results agree well with the experimental results from the Wolfe study. The proposed models allow theoretical insights into observer behavior in learning prevalence and adjusting their decision threshold.
Ideal and visual-search observers: accounting for anatomical noise in search tasks with planar nuclear imaging
Model observers have frequently been used for hardware optimization of imaging systems. For model observers to reliably mimic human performance it is important to account for the sources of variations in the images. Detection-localization tasks are complicated by anatomical noise present in the images. Several scanning observers have been proposed for such tasks. The most popular of these, the channelized Hotelling observer (CHO) incorporates anatomical variations through covariance matrices. We propose the visual-search (VS) observer as an alternative to the CHO to account for anatomical noise. The VS observer is a two-step process which first identifies suspicious tumor candidates and then performs a detailed analysis on them. The identification of suspicious candidates (search) implicitly accounts for anatomical noise. In this study we present a comparison of these two observers with human observers. The application considered is collimator optimization for planar nuclear imaging. Both observers show similar trends in performance with the VS observer slightly closer to human performance.
Priming cases disturb visual search patterns in screening mammography
Sarah J. Lewis, Warren M. Reed, Alvin N. K. Tan, et al.
Rationale and Objectives: To investigate the effect of inserting obvious cancers into a screening set of mammograms on the visual search of radiologists. Previous research presents conflicting evidence as to the impact of priming in scenarios where prevalence is naturally low, such as in screening mammography.

Materials and Methods: An observer performance and eye position analysis study was performed. Four expert breast radiologists were asked to interpret two sets of 40 screening mammograms. The Control Set contained 36 normal and 4 malignant cases (located at case # 9, 14, 25 and 37). The Primed Set contained the same 34 normal and 4 malignant cases (in the same location) plus 2 “primer” malignant cases replacing 2 normal cases (located at positions #20 and 34). Primer cases were defined as lower difficulty cases containing salient malignant features inserted before cases of greater difficulty.

Results: Wilcoxon Signed Rank Test indicated no significant differences in sensitivity or specificity between the two sets (P > 0.05). The fixation count in the malignant cases (#25, 37) in the Primed Set after viewing the primer cases (#20, 34) decreased significantly (Z = -2.330, P = 0.020). False-Negatives errors were mostly due to sampling in the Primed Set (75%) in contrast to in the Control Set (25%).

Conclusion: The overall performance of radiologists is not affected by the inclusion of obvious cancer cases. However, changes in visual search behavior, as measured by eye-position recording, suggests visual disturbance by the inclusion of priming cases in screening mammography.
Fractal analysis of radiologists' visual scanning pattern in screening mammography
Folami T. Alamudun, Hong-Jun Yoon, Kathy Hudson, et al.
Several researchers have investigated radiologists’ visual scanning patterns with respect to features such as total time examining a case, time to initially hit true lesions, number of hits, etc. The purpose of this study was to examine the complexity of the radiologists’ visual scanning pattern when viewing 4-view mammographic cases, as they typically do in clinical practice. Gaze data were collected from 10 readers (3 breast imaging experts and 7 radiology residents) while reviewing 100 screening mammograms (24 normal, 26 benign, 50 malignant). The radiologists’ scanpaths across the 4 mammographic views were mapped to a single 2-D image plane. Then, fractal analysis was applied on the composite 4- view scanpaths. For each case, the complexity of each radiologist’s scanpath was measured using fractal dimension estimated with the box counting method. The association between the fractal dimension of the radiologists’ visual scanpath, case pathology, case density, and radiologist experience was evaluated using fixed effects ANOVA. ANOVA showed that the complexity of the radiologists’ visual search pattern in screening mammography is dependent on case specific attributes (breast parenchyma density and case pathology) as well as on reader attributes, namely experience level. Visual scanning patterns are significantly different for benign and malignant cases than for normal cases. There is also substantial inter-observer variability which cannot be explained only by experience level.
Temporal stability of visual search-driven biometrics
Hong-Jun Yoon, Tandy R. Carmichael, Georgia Tourassi
Previously, we have shown the potential of using an individual’s visual search pattern as a possible biometric. That study focused on viewing images displaying dot-patterns with different spatial relationships to determine which pattern can be more effective in establishing the identity of an individual. In this follow-up study we investigated the temporal stability of this biometric. We performed an experiment with 16 individuals asked to search for a predetermined feature of a random-dot pattern as we tracked their eye movements. Each participant completed four testing sessions consisting of two dot patterns repeated twice. One dot pattern displayed concentric circles shifted to the left or right side of the screen overlaid with visual noise, and participants were asked which side the circles were centered on. The second dot-pattern displayed a number of circles (between 0 and 4) scattered on the screen overlaid with visual noise, and participants were asked how many circles they could identify. Each session contained 5 untracked tutorial questions and 50 tracked test questions (200 total tracked questions per participant). To create each participant’s "fingerprint", we constructed a Hidden Markov Model (HMM) from the gaze data representing the underlying visual search and cognitive process. The accuracy of the derived HMM models was evaluated using cross-validation for various time-dependent train-test conditions. Subject identification accuracy ranged from 17.6% to 41.8% for all conditions, which is significantly higher than random guessing (1/16 = 6.25%). The results suggest that visual search pattern is a promising, temporally stable personalized fingerprint of perceptual organization.
Towards using eye-tracking data to develop visual-search observers for x-ray breast imaging
Visual-search (VS) model observers have the potential to provide reliable predictions of human-observer performance in detection-localization tasks. The purpose of this work was to examine some characteristics of human gaze on breast images with the goal of informing the design of our VS observers. Using a helmet-mounted eye- tracking system, we recording the movement of gaze from human observers as they searched for masses in sets of 2D digital breast tomosynthesis (DBT) images. The masses in this study were of a single profile. The DBT images were extracted from image volumes reconstructed with filtered back-projection and penalized maximum- likelihood methods. Fixation times associated with observer points of interest (POIs) were computed from the observer data. The fixation times were then compared to sets of morphological feature values extracted from the images. These features, extracted as cross-correlations involving the mass profile and the test image, included the matched filter (MF), gradient MF, and Laplacian MF. For this initial investigation, we computed correlation coefficients between the fixation times and the feature values.
Model Observers II
icon_mobile_dropdown
Improving lesion detectability in PET imaging with a penalized likelihood reconstruction algorithm
Kristen A. Wangerin, Sangtae Ahn, Steven G. Ross, et al.
Ordered Subset Expectation Maximization (OSEM) is currently the most widely used image reconstruction algorithm for clinical PET. However, OSEM does not necessarily provide optimal image quality, and a number of alternative algorithms have been explored. We have recently shown that a penalized likelihood image reconstruction algorithm using the relative difference penalty, block sequential regularized expectation maximization (BSREM), achieves more accurate lesion quantitation than OSEM, and importantly, maintains acceptable visual image quality in clinical wholebody PET. The goal of this work was to evaluate lesion detectability with BSREM versus OSEM. We performed a twoalternative forced choice study using 81 patient datasets with lesions of varying contrast inserted into the liver and lung. At matched imaging noise, BSREM and OSEM showed equivalent detectability in the lungs, and BSREM outperformed OSEM in the liver. These results suggest that BSREM provides not only improved quantitation and clinically acceptable visual image quality as previously shown but also improved lesion detectability compared to OSEM. We then modeled this detectability study, applying both nonprewhitening (NPW) and channelized Hotelling (CHO) model observers to the reconstructed images. The CHO model observer showed good agreement with the human observers, suggesting that we can apply this model to future studies with varying simulation and reconstruction parameters.
SVM-based visual-search model observers for PET tumor detection
Howard C. Gifford, Anando Sen, Robert Azencott
Many search-capable model observers follow task paradigms that specify clinically unrealistic prior knowledge about the anatomical backgrounds in study images. Visual-search (VS) observers, which implement distinct, feature-based candidate search and analysis stages, may provide a means of avoiding such paradigms. However, VS observers that conduct single-feature analysis have not been reliable in the absence of any background information. We investigated whether a VS observer based on multifeature analysis can overcome this background dependence. The testbed was a localization ROC (LROC) study with simulated whole-body PET images. Four target-dependent morphological features were defined in terms of 2D cross-correlations involving a known tumor profile and the test image. The feature values at the candidate locations in a set of training images were fed to a support-vector machine (SVM) to compute a linear discriminant that classified locations as tumor-present or tumor-absent. The LROC performance of this SVM-based VS observer was compared against the performances of human observers and a pair of existing model observers.
The use of kernel local Fisher discriminant analysis for the channelization of the Hotelling model observer
It is resource-intensive to conduct human studies for task-based assessment of medical image quality and system optimization. Thus, numerical model observers have been developed as a surrogate for human observers. The Hotelling observer (HO) is the optimal linear observer for signal-detection tasks, but the high dimensionality of imaging data results in a heavy computational burden. Channelization is often used to approximate the HO through a dimensionality reduction step, but how to produce channelized images without losing significant image information remains a key challenge. Kernel local Fisher discriminant analysis (KLFDA) uses kernel techniques to perform supervised dimensionality reduction, which finds an embedding transformation that maximizes betweenclass separability and preserves within-class local structure in the low-dimensional manifold. It is powerful for classification tasks, especially when the distribution of a class is multimodal. Such multimodality could be observed in many practical clinical tasks. For example, primary and metastatic lesions may both appear in medical imaging studies, but the distributions of their typical characteristics (e.g., size) may be very different. In this study, we propose to use KLFDA as a novel channelization method. The dimension of the embedded manifold (i.e., the result of KLFDA) is a counterpart to the number of channels in the state-of-art linear channelization. We present a simulation study to demonstrate the potential usefulness of KLFDA for building the channelized HOs (CHOs) and generating reliable decision statistics for clinical tasks. We show that the performance of the CHO with KLFDA channels is comparable to that of the benchmark CHOs.
Evaluation of six channelized Hotelling observers in combination with a contrast sensitivity function to predict human observer performance
Standard methods to quantify image quality (IQ) may not be adequate for clinical images since they depend on uniform backgrounds and linearity. Statistical model observers are not restricted to these limitations and might be suitable for IQ evaluation of clinical images. One of these statistical model observers is the channelized Hotelling observer (CHO), where the images are filtered by a set of channels. The aim of this study was to evaluate six different channel sets, with an additional filter to simulate the human contrast sensitivity function (CSF), in their ability to predict human observer performance. For this evaluation a two alternative forced choice experiment was performed with two types of background structures (white noise (WN) and clustered lumpy background (CLB)), 5 disk-shaped objects with different diameters and 3 different signal energies. The results show that the correlation between human and model observers have a diameter dependency for some channel sets in combination with CLBs. The addition of the CSF reduces this diameter dependency and in some cases improves the correlation coefficient between human- and model observer. For the CLB the Partial Least Squares channel set shows the highest correlation with the human observer (r2=0.71) and for WN backgrounds it was the Gabor-channel set with CSF (r2=0.72). This study showed that for some channels there is a high correlation between human and model observer, which suggests that the CHO has potential as a tool for IQ analysis of digital mammography systems.
On anthropomorphic decision making in a model observer
Ali R. N. Avanaki, Kathryn S. Espig, Tom R. L. Kimpe, et al.
By analyzing human readers’ performance in detecting small round lesions in simulated digital breast tomosynthesis background in a location known exactly scenario, we have developed a model observer that is a better predictor of human performance with different levels of background complexity (i.e., anatomical and quantum noise). Our analysis indicates that human observers perform a lesion detection task by combining a number of sub-decisions, each an indicator of the presence of a lesion in the image stack. This is in contrast to a channelized Hotelling observer, where the detection task is conducted holistically by thresholding a single decision variable, made from an optimally weighted linear combination of channels. However, it seems that the sub-par performance of human readers compared to the CHO cannot be fully explained by their reliance on sub-decisions, or perhaps we do not consider a sufficient number of subdecisions. To bridge the gap between the performances of human readers and the model observer based upon subdecisions, we use an additive noise model, the power of which is modulated with the level of background complexity. The proposed model observer better predicts the fast drop in human detection performance with background complexity.
Technology Assessment
icon_mobile_dropdown
Comparison of two stand-alone CADe systems at multiple operating points
Berkman Sahiner, Weijie Chen, Aria Pezeshk, et al.
Computer-aided detection (CADe) systems are typically designed to work at a given operating point: The device displays a mark if and only if the level of suspiciousness of a region of interest is above a fixed threshold. To compare the standalone performances of two systems, one approach is to select the parameters of the systems to yield a target false-positive rate that defines the operating point, and to compare the sensitivities at that operating point. Increasingly, CADe developers offer multiple operating points, which necessitates the comparison of two CADe systems involving multiple comparisons. To control the Type I error, multiple-comparison correction is needed for keeping the family-wise error rate (FWER) less than a given alpha-level. The sensitivities of a single modality at different operating points are correlated. In addition, the sensitivities of the two modalities at the same or different operating points are also likely to be correlated. It has been shown in the literature that when test statistics are correlated, well-known methods for controlling the FWER are conservative. In this study, we compared the FWER and power of three methods, namely the Bonferroni, step-up, and adjusted step-up methods in comparing the sensitivities of two CADe systems at multiple operating points, where the adjusted step-up method uses the estimated correlations. Our results indicate that the adjusted step-up method has a substantial advantage over other the two methods both in terms of the FWER and power.
Feasibility of using a biowatch to monitor GSR as a measure of radiologists' stress and fatigue
Elizabeth A. Krupinski, Lea MacKinnon, Bruce I. Reiner
We have been investigating the impact of fatigue on diagnostic performance of radiologists interpreting medical images. In previous studies we found evidence that eye strain could be objectively measured and that it correlates highly with degradations in diagnostic accuracy as radiologists work long hours. Eye strain however can be difficult to measure in a non-invasive and continuous manner over the work day so we have been investigating other ways to measure physiological stress and fatigue. In this study we evaluated the feasibility of using a commercially available biowatch to measure galvanic skin response (GSR), a well known indicator of stress. 10 radiology residents wore the biowatch for about 8 hours during their normal work day and data were automatically collected at 10 Hz. They completed the Swedish Occupational Fatigue Inventory (SOFI) at the start and finish of the day. GSR values (microsiemens) ranged from 0.14 to 38.27 with an average of 0.50 (0.28 median). Overall GSR tended to be fairly constant as the day progressed, but there were definite spikes indicating higher levels of stress. SOFI scores indicated greater levels of fatigue and stress at the end of the work day. Although further work is needed, GSR measurements obtained via an easy to wear watch may provide a means to monitor stress/fatigue and alert radiologists when to take a break from interpreting images to avoid making errors.
Augmenting real-time video with virtual models for enhanced visualization for simulation, teaching, training and guidance
Michael Potter, Alexander Bensch, Alexander Dawson-Elli, et al.
In minimally invasive surgical interventions direct visualization of the target area is often not available. Instead, clinicians rely on images from various sources, along with surgical navigation systems for guidance. These spatial localization and tracking systems function much like the Global Positioning Systems (GPS) that we are all well familiar with. In this work we demonstrate how the video feed from a typical camera, which could mimic a laparoscopic or endoscopic camera used during an interventional procedure, can be used to identify the pose of the camera with respect to the viewed scene and augment the video feed with computer-generated information, such as rendering of internal anatomy not visible beyond the imaged surface, resulting in a simple augmented reality environment. This paper describes the software and hardware environment and methodology for augmenting the real world with virtual models extracted from medical images to provide enhanced visualization beyond the surface view achieved using traditional imaging. Following intrinsic and extrinsic camera calibration, the technique was implemented and demonstrated using a LEGO structure phantom, as well as a 3D-printed patient-specific left atrial phantom. We assessed the quality of the overlay according to fiducial localization, fiducial registration, and target registration errors, as well as the overlay offset error. Using the software extensions we developed in conjunction with common webcams it is possible to achieve tracking accuracy comparable to that seen with significantly more expensive hardware, leading to target registration errors on the order of 2 mm.
Objective evaluation of methods to track motion from clinical cardiac-gated tagged MRI without the use of a gold standard
Cardiac-gated MRI is widely used for the task of measuring parameters related to heart motion. More specifically, gated tagged MRI is the preferred modality to estimate local deformation (strain) and rotational motion (twist) of myocardial tissue. Many methods have been proposed to estimate cardiac motion from gated MRI sequences. However, when dealing with clinical data, evaluation of these methods is problematic due to the absence of gold-standards for cardiac motion. To overcome that, a linear regression scheme known as regression-without-truth (RWT) was proposed in the past. RWT uses priors to model the distribution of true values, thus enabling us to assess image-analysis algorithms without knowledge of the ground-truth. Furthermore, it allows one to rank methods by means of an objective figure-of-merit γ (i.e. precision). In this work we apply RWT to compare the performance of several gated MRI motion-tracking methods (e.g. non-rigid registration, feature based, harmonic phase) at the task of estimating myocardial strain and left-ventricle (LV) twist, from a population of 18 clinical human cardiac-gated tagged MRI studies.
Developing a clinical utility framework to evaluate prediction models in radiogenomics
Yirong Wu, Jie Liu, Alejandro Munoz del Rio, et al.
Combining imaging and genetic information to predict disease presence and behavior is being codified into an emerging discipline called “radiogenomics.” Optimal evaluation methodologies for radiogenomics techniques have not been established. We aim to develop a clinical decision framework based on utility analysis to assess prediction models for breast cancer. Our data comes from a retrospective case-control study, collecting Gail model risk factors, genetic variants (single nucleotide polymorphisms-SNPs), and mammographic features in Breast Imaging Reporting and Data System (BI-RADS) lexicon. We first constructed three logistic regression models built on different sets of predictive features: (1) Gail, (2) Gail+SNP, and (3) Gail+SNP+BI-RADS. Then, we generated ROC curves for three models. After we assigned utility values for each category of findings (true negative, false positive, false negative and true positive), we pursued optimal operating points on ROC curves to achieve maximum expected utility (MEU) of breast cancer diagnosis. We used McNemar’s test to compare the predictive performance of the three models. We found that SNPs and BI-RADS features augmented the baseline Gail model in terms of the area under ROC curve (AUC) and MEU. SNPs improved sensitivity of the Gail model (0.276 vs. 0.147) and reduced specificity (0.855 vs. 0.912). When additional mammographic features were added, sensitivity increased to 0.457 and specificity to 0.872. SNPs and mammographic features played a significant role in breast cancer risk estimation (p-value < 0.001). Our decision framework comprising utility analysis and McNemar’s test provides a novel framework to evaluate prediction models in the realm of radiogenomics.
Poster Session
icon_mobile_dropdown
Evaluation of a simulation procedure designed to recognize shape and contour of suspicious masses in mammography
Maria A. Z. Sousa, Paula N. Siqueira, Homero Schiabel
A large number of breast phantoms have been developed for conducting quality tests, characterization of imaging systems and computer aided diagnosis schemes, dosimetry and image perception. The realism of these phantoms is important for ensuring the accuracy of results and a greater range of applications. In this work, a developed phantom is considered proposing the use of PVC films for simulation of nodules inserted in the breast parenchyma designed for classification between malignant and benign signals according to the BI-RADS® standard. The investigation includes analysis of radiographic density, mass shape and its corresponding contour outlined by experienced radiologists. The material was cut based on lesions margins found in 44 clinical cases, which were divided between circumscribed and spiculated structures. Tests were performed to check the ability of the specialists in distinguishing the contour compared to actual cases while the shapes accuracy was determined quantitatively by evaluation metrics. Results showed the applicability of the chosen material creating image radiological patterns very similar to the actual ones.
Implementation and value of using a split-plot reader design in a study of digital breast tomosynthesis in a breast cancer assessment clinic
The rapid evolution in medical imaging has led to an increased number of recurrent trials, primarily to ensure that the efficacy of new imaging techniques is known. The cost associated with time and resources in conducting such trials is usually high. The recruitment of participants, in a medium to large reader study, is often very challenging as the demanding number of cases discourages involvement with the trial. We aim to evaluate the efficacy of Digital Breast Tomosynthesis (DBT) in a recall assessment clinic in Australia in a prospective multi-reader-multi-case (MRMC) trial. Conducting such a study with the more commonly used fully crossed MRMC study design would require more cases and more cases read per reader, which was not viable in our setting. With an aim to perform a cost effective yet statistically efficient clinical trial, we evaluated alternative study designs, particularly the alternative split-plot MRMC study design and compared and contrasted it with more commonly used fully crossed MRMC study design. Our results suggest that ‘split-plot’, an alternative MRMC study design, could be very beneficial for medium to large clinical trials and the cost associated with conducting such trials can be greatly reduced without adversely effecting the variance of the study. We have also noted an inverse dependency between number of required readers and cases to achieve a target variance. This suggests that split-plot could also be very beneficial for studies that focus on cases that are hard to procure or readers that are hard to recruit. We believe that our results may be relevant to other researchers seeking to design a medium to large clinical trials.
Evaluating RVUs as a measure of workload for use in assessing fatigue
Elizabeth A. Krupinski, Lea MacKinnon, Karl Hasselbach, et al.
Physician work is not well defined and does not take into account all of the activities and tasks involved in interpreting cases. We observed 3 MSK radiologists reading 100 cases. We recorded types of cases, whether residents/fellows were present, total time per case, time spent teaching, and time for interruptions. There were residents/fellows present for 65% of the cases. On average, when residents/fellows were present it took significantly longer to read a case. Overall, prior studies were accessed for 25% of the cases, with radiographs and CT accessing them more than MRI and US. Time per case was significantly longer when prior studies were included. In terms of interruptions, 9.24% of the time was taken up by calls to/from other clinicians, talking to technologists, discussing case protocols, and technical problems. All interruptions occurred during a case review. We downloaded RVU data for the 3 radiologists and correlated them with the actual times per case. The overall correlation was 0.215. For a given RVU, the actual amount of time spent on the case varies. Radiologists spend more time per case than assigned RVUs account for. This underestimation contributes to expectations of increased workloads, leading potentially to more and more cases being read in shorter amounts of time leading to increased fatigue and stress that could lead to increases in error rates. In order to better address fatigue and stress in the radiology department we need to better understand the pressures radiologists face and possibly reevaluate the RVU system.
Characterization of breast density in women from Lima, Peru
Fanny L. Casado, Susan Manrique, Jorge Guerrero, et al.
Data from GLOBOCAN show that around 4,000 Peruvian women are diagnosed with breast cancer every year. From these new cases, the clinical presentation of 36% corresponded to advanced stages (III and IV). Therefore, there is an urgent need to strengthen current screening and early detection strategies. The American College of Radiology (ACR) breast density classification is a risk assessment and quality assurance tool in mammography to standardize and facilitate report to non-radiologists. In our sample of Peruvian women, we found that 45.3% of women have a breast density classified as ACR II, 32.3% as ACR I, 19.7% as ACR III and only 2.7% as ACR IV. Also, premenopausal women are more likely to have breast density types ACR III and IV than postmenopausal women. These results show certain similarity to other populations showing that most breast densities are classified as ACR I and II, but shows a unique distribution when taking into account all four ACR types. Our results are consistent with epidemiological evidence suggesting that the Peruvian population may have a different stratification of risk based on its particular genetic and/or ethnic background. The present work will aid to develop novel strategies for screening and early detection of breast malignancies.
Sparsity-driven ideal observer for computed medical imaging systems
The Bayesian ideal observer (IO) has been widely advocated to guide hardware optimization. However, except for special cases, computation of the IO test statistic is computationally burdensome and requires an appropriate stochastic object model that may be difficult to determine in practice. Modern reconstruction methods, referred to as sparse reconstruction methods, exploit the fact that objects of interest typically possess sparse representations and have proven to be highly effective at reconstructing images from under-sampled measurement data. Moreover, in computed imaging approaches that employ compressive sensing concepts, imaging hardware and image reconstruction are innately coupled technologies. In this work, we propose a sparsity-driven IO (SD-IO) to guide the optimization of data acquisition parameters for modern computed imaging systems. The SD-IO employs a variational Bayesian inference method to estimate the posterior distribution and calculates an approximate likelihood ratio analytically as its test statistic. Since it assumes knowledge of low-level statistical properties of the object that are related to sparsity, the SD-IO exploits the same statistical information regarding the object that is utilized by highly effective sparse image reconstruction methods. Preliminary simulation results are presented to demonstrate the feasibility of the SD-IO calculation.
Evaluation of angiogram visualization methods for fast and reliable aneurysm diagnosis
Žiga Lesar, Ciril Bohak, Matija Marolt
In this paper we present the results of an evaluation of different visualization methods for angiogram volumetric data-ray casting, marching cubes, and multi-level partition of unity implicits. There are several options available with ray-casting: isosurface extraction, maximum intensity projection and alpha compositing, each producing fundamentally different results. Different visualization methods are suitable for different needs, so this choice is crucial in diagnosis and decision making processes. We also evaluate visual effects such as ambient occlusion, screen space ambient occlusion, and depth of field. Some visualization methods include transparency, so we address the question of relevancy of this additional visual information. We employ transfer functions to map data values to color and transparency, allowing us to view or hide particular tissues. All the methods presented in this paper were developed using OpenCL, striving for real-time rendering and quality interaction. An evaluation has been conducted to assess the suitability of the visualization methods. Results show superiority of isosurface extraction with ambient occlusion effects. Visual effects may positively or negatively affect perception of depth, motion, and relative positions in space.
Investigation of methods for calibration of classifier scores to probability of disease
Classifier scores in many diagnostic devices, such as computer-aided diagnosis systems, are usually on an arbitrary scale, the meaning of which is unclear. Calibration of classifier scores to a meaningful scale such as the probability of disease is potentially useful when such scores are used by a physician or another algorithm. In this work, we investigated the properties of two methods for calibrating classifier scores to probability of disease. The first is a semiparametric method in which the likelihood ratio for each score is estimated based on a semiparametric proper receiver operating characteristic model, and then an estimate of the probability of disease is obtained using the Bayes theorem assuming a known prevalence of disease. The second method is nonparametric in which isotonic regression via the pool-adjacent-violators algorithm is used. We employed the mean square error (MSE) and the Brier score to evaluate the two methods. We evaluate the methods under two paradigms: (a) the dataset used to construct the score-to-probability mapping function is used to calculate the performance metric (MSE or Brier score) (resubstitution); (b) an independent test dataset is used to calculate the performance metric (independent). Under our simulation conditions, the semiparametric method is found to be superior to the nonparametric method at small to medium sample sizes and the two methods appear to converge at large sample sizes. Our simulation results also indicate that the resubstitution bias may depend on the performance metric and, for the semiparametric method, the resubstitution bias is small when a reasonable number of cases (> 100 cases per class) are available.
Image-domain sampling properties of the Hotelling Observer in CT using filtered back-projection
Adrian A. Sanchez, Emil Y. Sidky, Xiaochuan Pan
The Hotelling Observer (HO),1 along with its channelized variants,2 has been proposed for image quality evaluation in x-ray CT.3,4 In this work, we investigate HO performance for a detection task in parallel-beam FBP as a function of two image-domain sampling parameters, namely pixel size and field-of-view. These two parameters are of central importance in adapting HO methods to use in CT, since the large number of pixels in a single image makes direct computation of HO performance for a full image infeasible in most cases. Reduction of the number of image pixels and/or restriction of the image to a region-of-interest (ROI) has the potential to make direct computation of HO statistics feasible in CT, provided that the signal and noise properties lead to redundant information in some regions of the image. For small signals, we hypothesize that reduction of image pixel size and enlargement of the image field-of-view are approximately equivalent means of gaining additional information relevant to a detection task. The rationale for this hypothesis is that the backprojection operation in FBP introduces long range correlations so that, for small signals, the reconstructed signal outside of a small ROI is not linearly independent of the signal within the ROI. In this work, we perform a preliminary investigation of this hypothesis by sweeping these two sampling parameters and computing HO performance for a signal detection task.
The role of digital tomosynthesis in reducing the number of equivocal breast reportings
Maram Alakhras, Claudia Mello-Thoms, Mary Rickard, et al.
Purpose

To compare radiologists’ confidence in assessing breast cancer using combined digital mammography (DM) and digital breast tomosynthesis (DBT) compared with DM alone as a function of previous experience with DBT.

Materials and Methods

Institutional ethics approval was obtained. Twenty-three experienced breast radiologists reviewed 50 cases in two modes, DM alone and DM+DBT. Twenty-seven cases presented with breast cancer. Each radiologist was asked to detect breast lesions and give a confidence score of 1-5 (1- Normal, 2- Benign, 3- Equivocal, 4- Suspicious, 5- Malignant). Radiologists were divided into three sub-groups according to their prior experience with DBT (none, workshop experience, and clinical experience). Confidence scores using DM+DBT were compared with DM alone for all readers combined and for each DBT experience subgroup. Statistical analyses, using GraphPad Prism 5, were carried out using the Wilcoxon signed-rank test with statistical significance set at p< 0.05.

Results

Confidence scores were higher for true positive cancer cases using DM+DBT compared with DM alone for all readers (p < 0.0001). Confidence scores for normal cases were lower (indicating greater confidence in the non-cancer diagnosis) with DM+DBT compared with DM alone for all readers (p= 0.018) and readers with no prior DBT experience (p= 0.035).

Conclusion

Addition of DBT to DM increases the confidence level of radiologists in scoring cancer and normal/benign cases. This finding appears to apply across radiologists with varying levels of DBT experience, however further work involving greater numbers of radiologists is required.
Investigation on viewing direction dependent detectability in a reconstructed 3D volume for a cone beam CT system
In medical imaging systems, several factors (e.g., reconstruction algorithm, noise structures, target size, contrast, etc) affect the detection performance and need to be considered for object detection. In a cone beam CT system, FDK reconstruction produces different noise structures in axial and coronal slices, and thus we analyzed directional dependent detectability of objects using detection SNR of Channelized Hotelling observer. To calculate the detection SNR, difference-of-Gaussian channel model with 10 channels was implemented, and 20 sphere objects with different radius (i.e., 0.25 (mm) to 5 (mm) equally spaced by 0.25 (mm)), reconstructed by FDK algorithm, were used as object templates. Covariance matrix in axial and coronal direction was estimated from 3000 reconstructed noise volumes, and then the SNR ratio between axial and coronal direction was calculated. Corresponding 2D noise power spectrum was also calculated. The results show that as the object size increases, the SNR ratio decreases, especially lower than 1 when the object size is larger than 2.5 mm radius. The reason is because the axial (coronal) noise power is higher in high (low) frequency band, and therefore the detectability of a small (large) object is higher in coronal (axial) images. Our results indicate that it is more beneficial to use coronal slices in order to improve the detectability of a small object in a cone beam CT system.
The effect of NPS calculation method on power-law coefficient estimation accuracy in breast texture modeling
In breast X-ray imaging, breast texture has been characterized by a radial noise power spectrum (NPS) that has an inverse power-law shape with exponent β. The technique to estimate the radial power-law coefficient β is typically based on averaging 2-dimensional noise power spectra (NPS), calculated from partly overlapping image regions each weighted by a suitable window function. The linear regression applied over a selected frequency range to the logarithm of the 1- dimensional NPS as a function of the logarithm of the radial frequencies, gives β. For each step in this process, several alternative techniques have been proposed. This paper investigates the effect of image region of interest (ROI) size, image data windowing and alternative ways to determine radial frequency in terms of bias, variance and root mean square error (RMSE) in the estimated β. The effects of these three factors were analytically derived and evaluated using synthetic images with known β varying from 1 to 4 to cover the range of textures encountered in 2D and 3D breast X-ray imaging. Our results indicate that the RMSE in estimated β is smallest when the ROIs are multiplied with an appropriate window function and either no radial averaging or radial averaging with small frequency bins is applied. The ROI size yielding the smallest RMSE depends on several factors and needs to be validated with numerical simulations. In clinical practice however, there might be a need to compromise in the choice of the ROI size to balance between the RMSE magnitudes inherent to the applied β estimation technique and encompass the breast texture range so as to obtain an accurate shape of the NPS. When using 2.56 cm x 2.56 cm ROI sizes, applying a 2D Hann window and no radial frequency averaging, the RMSE in the estimated β ranges from 0.04 to 0.1 for true β values equal to 1 and 4. While many subtleties in real images were not modeled to simplify the mathematics in deriving our results, this work is illustrative in demonstrating the limits of commonly used algorithm steps to estimate accurate β values.
Comparing prediction models for radiographic exposures
W. Ching, J. Robinson, M. F. McEntee
During radiographic exposures the milliampere-seconds (mAs), kilovoltage peak (kVp) and source-to-image distance can be adjusted for variations in patient thicknesses. Several exposure adjustment systems have been developed to assist with this selection. This study compares the accuracy of four systems to predict the required mAs for pelvic radiographs taken on a direct digital radiography system (DDR).

Sixty radiographs were obtained by adjusting mAs to compensate for varying combinations of source-to-image distance (SID), kVp and patient thicknesses. The 25% rule, the DuPont™ Bit System and the DigiBit system were compared to determine which of these three most accurately predicted the mAs required for an increase in patient thickness. Similarly, the 15% rule, the DuPont™ Bit System and the DigiBit system were compared for an increase in kVp. The exposure index (EI) was used as an indication of exposure to the DDR. For each exposure combination the mAs was adjusted until an EI of 1500+/-2% was achieved.

The 25% rule was the most accurate at predicting the mAs required for an increase in patient thickness, with 53% of the mAs predictions correct. The DigiBit system was the most accurate at predicting mAs needed for changes in kVp, with 33% of predictions correct.

This study demonstrated that the 25% rule and DigiBit system were the most accurate predictors of mAs required for an increase in patient thickness and kVp respectively. The DigiBit system worked well in both scenarios as it is a single exposure adjustment system that considers a variety of exposure factors.
Objective evaluation of reconstruction methods for quantitative SPECT imaging in the absence of ground truth
Abhinav K. Jha, Na Song, Brian Caffo, et al.
Quantitative single-photon emission computed tomography (SPECT) imaging is emerging as an important tool in clinical studies and biomedical research. There is thus a need for optimization and evaluation of systems and algorithms that are being developed for quantitative SPECT imaging. An appropriate objective method to evaluate these systems is by comparing their performance in the end task that is required in quantitative SPECT imaging, such as estimating the mean activity concentration in a volume of interest (VOI) in a patient image. This objective evaluation can be performed if the true value of the estimated parameter is known, i.e. we have a gold standard. However, very rarely is this gold standard known in human studies. Thus, no-gold-standard techniques to optimize and evaluate systems and algorithms in the absence of gold standard are required. In this work, we developed a no-gold-standard technique to objectively evaluate reconstruction methods used in quantitative SPECT when the parameter to be estimated is the mean activity concentration in a VOI. We studied the performance of the technique with realistic simulated image data generated from an object database consisting of five phantom anatomies with all possible combinations of five sets of organ uptakes, where each anatomy consisted of eight different organ VOIs. Results indicate that the method pro- vided accurate ranking of the reconstruction methods. We also demonstrated the application of consistency checks to test the no-gold-standard output.
Experience in reading digital images may decrease observer accuracy in mammography
Mohammad A. Rawashdeh, Sarah J. Lewis, Warwick Lee, et al.
Rationale and Objectives: To identify parameters linked to higher levels of performance in screening mammography. In particular we explored whether experience in reading digital cases enhances radiologists’ performance.

Methods: A total of 60 cases were presented to the readers, of which 20 contained cancers and 40 showed no abnormality. Each case comprised of four images and 129 breast readers participated in the study. Each reader was asked to identify and locate any malignancies using a 1-5 confidence scale. All images were displayed using 5MP monitors, supported by radiology workstations with full image manipulation capabilities. A jack-knife free-response receiver operating characteristic, figure of merit (JAFROC, FOM) methodology was employed to assess reader performance. Details were obtained from each reader regarding their experience, qualifications and breast reading activities. Spearman and Mann Whitney U techniques were used for statistical analysis.

Results: Higher performance was positively related to numbers of years professionally qualified (r= 0.18; P<0.05), number of years reading breast images (r= 0.24; P<0.01), number of mammography images read per year (r= 0.28; P<0.001) and number of hours reading mammographic images per week (r= 0.19; P<0.04). Unexpectedly, higher performance was inversely linked to previous experience with digital images (r= - 0.17; p<0.05) and further analysis, demonstrated that this finding was due to changes in specificity.

Conclusion: This study suggests suggestion that readers with experience in digital images reporting may exhibit a reduced ability to correctly identify normal appearances requires further investigation. Higher performance is linked to number of cases read per year.