Proceedings Volume 6917

Medical Imaging 2008: Image Perception, Observer Performance, and Technology Assessment

cover
Proceedings Volume 6917

Medical Imaging 2008: Image Perception, Observer Performance, and Technology Assessment

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 3 April 2008
Contents: 8 Sessions, 52 Papers, 0 Presentations
Conference: Medical Imaging 2008
Volume Number: 6917

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 6917
  • Keynote and Eye Movement
  • Technology Assessment
  • ROC and Its Variants
  • Image Perception and Quality
  • Image Display
  • Model Observers
  • Poster Session
Front Matter: Volume 6917
icon_mobile_dropdown
Front Matter: Volume 6917
This PDF file contains the front matter associated with SPIE Proceedings Volume 6917, including the Title Page, Copyright information, Table of Contents, Introduction (if any), and the Conference Committee listing.
Keynote and Eye Movement
icon_mobile_dropdown
Performance changes in lung nodule detection following perceptual feedback of eye movements
T. Donovan, D. J. Manning, T. Crawford
In a previously reported study we demonstrated that expert performance can decline following perceptual feedback of eye movements in the relatively simple radiological task of wrist fracture detection. This study was carried out to determine if the same effect could be observed using a more complicated radiological task of identifying lung nodules on chest radiographs. Four groups (n=10 in each group) of observers with different levels of expertise were tested. The groups were naïve observers, level 1 radiography students, level 2 radiography students and experts. Feedback was presented to the observers in the form of their scan paths and fixations. Half the observers had feedback and half had no perceptual feedback. JAFROC analysis was used to measure observer performance. A repeated measures ANOVA was carried out. There was no significant effect between the pre and post "no feedback" condition. There was a significant difference between the pre and post "feedback" condition with a significant improvement following feedback (F(1,16)=6.6,p = 0.021). Overall the mean percentage improvement was small of 3.3%, with most of the improvement due to the level 1 group where the percentage increase in the figure of merit (FOM) was 8.4% and this was significant (p<0.05). Eye tracking metrics indicate that the expert and naïve observers were less affected by feedback or a second look whereas there were mixed results between the level 1 and level 2 students possibly reflecting the different search strategies used. Perceptual feedback may be beneficial for those early in their training.
How much agreement is there in the visual search strategy of experts reading mammograms?
Previously we have shown that the eyes of expert breast imagers are attracted to the location of a malignant mass in a mammogram in less than 2 seconds after image onset. Moreover, the longer they take to visually fixate the location of the mass, the less likely it is that they will report it. We conjectured that this behavior was due to the formation of the initial hypothesis about the image (i.e., 'normal' - no lesions to report, or 'abnormal' - possible lesions to report). This initial hypothesis is formed as a result of a difference template between the experts' expectations of the image and the actual image. Hence, when the image is displayed, the expert detects the areas that do not correspond to their 'a priori expectation', and these areas get assigned weights according to the magnitude of the perturbation. The radiologist then uses eye movements to guide the high resolution fovea to each of these locations, in order to resolve each perturbation. To accomplish this task successfully the radiologist uses not only the local features in the area but also lateral comparisons with selected background locations, and this comprises the radiologist's visual search strategy. Eye-position tracking studies seem to suggest that no two radiologists search the breast parenchyma alike, which makes one wonder whether successful search models can be developed. In this study we show that there is more to the experts' search strategy than meets the eye.
Technology Assessment
icon_mobile_dropdown
Comprehensive evaluation of an image segmentation technique for measuring tumor volume from CT images
Xiang Deng, Haibin Huang, Lei Zhu, et al.
Comprehensive quantitative evaluation of tumor segmentation technique on large scale clinical data sets is crucial for routine clinical use of CT based tumor volumetry for cancer diagnosis and treatment response evaluation. In this paper, we present a systematic validation study of a semi-automatic image segmentation technique for measuring tumor volume from CT images. The segmentation algorithm was tested using clinical data of 200 tumors in 107 patients with liver, lung, lymphoma and other types of cancer. The performance was evaluated using both accuracy and reproducibility. The accuracy was assessed using 7 commonly used metrics that can provide complementary information regarding the quality of the segmentation results. The reproducibility was measured by the variation of the volume measurements from 10 independent segmentations. The effect of disease type, lesion size and slice thickness of image data on the accuracy measures were also analyzed. Our results demonstrate that the tumor segmentation algorithm showed good correlation with ground truth for all four lesion types (r = 0.97, 0.99, 0.97, 0.98, p < 0.0001 for liver, lung, lymphoma and other respectively). The segmentation algorithm can produce relatively reproducible volume measurements on all lesion types (coefficient of variation in the range of 10-20%). Our results show that the algorithm is insensitive to lesion size (coefficient of determination close to 0) and slice thickness of image data(p > 0.90). The validation framework used in this study has the potential to facilitate the development of new tumor segmentation algorithms and assist large scale evaluation of segmentation techniques for other clinical applications.
Performance study of a globally elastic locally rigid matching algorithm for follow-up chest CT
Rafael Wiemker, Bartjan de Hoop, Sven Kabus, et al.
A real-time matching algorithm for follow-up chest CT scans can significantly reduce the workload on radiologists by automatically finding the corresponding location in the first or second scan, respectively. The objective of this study was to assess the accuracy of a fast and versatile single-point registration algorithm for thoracic CT scans. The matching algorithm is based on automatic lung segmentations in both CT scans, individually for left and right lung. Whenever the user clicks on an arbitrary structure in the lung, the coarse position of the corresponding point in the other scan is identified by comparing the volume percentiles of the lungs. Then the position is refined by optimizing the gray value cross-correlation of a local volume of interest. The algorithm is able to register any structure in or near the lungs, but is of clinical interest in particular with respect to lung nodules and airways. For validation, CT scan pairs were used in which the patients were scanned twice in one session, using low-dose non-contrast-enhanced chest CT scans (0.75 mm collimation). Between these scans, patients got off and on the table to simulate a follow-up scan. 291 nodules were evaluated. Average nodule diameter was 9.5 mm (range 2.9 - 74.1 mm). Automatic registration succeeded in 95.2% of all cases (277 / 291). In successful registered nodules, average registration consistency was 1.1 mm. The real-time matching proved to be an accurate and useful tool for radiologists evaluating follow-up chest CT scans to assess possible nodule growth.
PET/CT detectability and classification of simulated pulmonary lesions using an SUV correction scheme
Andrew N. Morrow, Kenneth L. Matthews II, Steven Bujenovic M.D.
Positron emission tomography (PET) and computed tomography (CT) together are a powerful diagnostic tool, but imperfect image quality allows false positive and false negative diagnoses to be made by any observer despite experience and training. This work investigates PET acquisition mode, reconstruction method and a standard uptake value (SUV) correction scheme on the classification of lesions as benign or malignant in PET/CT images, in an anthropomorphic phantom. The scheme accounts for partial volume effect (PVE) and PET resolution. The observer draws a region of interest (ROI) around the lesion using the CT dataset. A simulated homogenous PET lesion of the same shape as the drawn ROI is blurred with the point spread function (PSF) of the PET scanner to estimate the PVE, providing a scaling factor to produce a corrected SUV. Computer simulations showed that the accuracy of the corrected PET values depends on variations in the CT-drawn boundary and the position of the lesion with respect to the PET image matrix, especially for smaller lesions. Correction accuracy was affected slightly by mismatch of the simulation PSF and the actual scanner PSF. The receiver operating characteristic (ROC) study resulted in several observations. Using observer drawn ROIs, scaled tumor-background ratios (TBRs) more accurately represented actual TBRs than unscaled TBRs. For the PET images, 3D OSEM outperformed 2D OSEM, 3D OSEM outperformed 3D FBP, and 2D OSEM outperformed 2D FBP. The correction scheme significantly increased sensitivity and slightly increased accuracy for all acquisition and reconstruction modes at the cost of a small decrease in specificity.
Comparing signal-based and case-based methodologies for CAD assessment in a detection task
We are investigating the potential for differences in study conclusions when assessing the estimated impact of a computer-aided detection (CAD) system on readers' performance using a signal-based performance analysis derived from Free-response Receiver Operating Characteristics (FROC) versus a case-based performance analysis derived from Receiver Operating Characteristics (ROC) analysis. To consider this question, we utilized reader data from a CAD assessment study based on 100 mammographic background images to which fixed-size and fixed-intensity Gaussian signals were added, generating a low- and high-intensity set. The study thus allowed CAD assessment in two situations: when CAD sensitivity was 1) superior or 2) equivalent or lower than the average reader. Seven readers were asked to review each set using CAD in both second-reader and concurrent modes. Signal-based detection results were analyzed using the area under the FROC curve below 0.5 false positives per image. Case-based decision results were analyzed using the area under the parametric ROC curve. The results were consistent between the signal-based and case-based analyses for the low-intensity set, suggesting that CAD in both reading modes can increase reader signal-based detection and case-based decision accuracies. For the high-intensity set, the signal-based and case-based analysis suggested different conclusions regarding the utility of CAD, although neither analysis resulted in statistical significance.
Performance evaluation of image processing algorithms in digital mammography
Federica Zanca, Chantal Van Ongeval M.D., Jurgen Jacobs, et al.
The purpose of the study is to evaluate the performance of different image processing algorithms in terms of representation of microcalcification clusters in digital mammograms. Clusters were simulated in clinical raw ("for processing") images. The entire dataset of images consisted of 200 normal mammograms, selected out of our clinical routine cases and acquired with a Siemens Novation DR system. In 100 of the normal images a total of 142 clusters were simulated; the remaining 100 normal mammograms served as true negative input cases. Both abnormal and normal images were processed with 5 commercially available processing algorithms: Siemens OpView1 and Siemens OpView2, Agfa Musica1, Sectra Mamea AB Sigmoid and IMS Raffaello Mammo 1.2. Five observers were asked to locate and score the cluster(s) in each image, by means of dedicated software tool. Observer performance was assessed using the JAFROC Figure of Merit. FROC curves, fitted using the IDCA method, have also been calculated. JAFROC analysis revealed significant differences among the image processing algorithms in the detection of microcalcifications clusters (p=0.0000369). Calculated average Figures of Merit are: 0.758 for Siemens OpView2, 0.747 for IMS Processing 1.2, 0.736 for Agfa Musica1 processing, 0.706 for Sectra Mamea AB Sigmoid processing and 0.703 for Siemens OpView1. This study is a first step towards a quantitative assessment of image processing in terms of cluster detection in clinical mammograms. Although we showed a significant difference among the image processing algorithms, this method does not on its own allow for a global performance ranking of the investigated algorithms.
ROC and Its Variants
icon_mobile_dropdown
A novel partial area index of receiver operating characteristic (ROC) curve
Appropriate validation of the segmentation algorithms is important for clinical acceptance of those methods. Receiver operating characteristic (ROC) analysis provides the most comprehensive description of the accuracy performance of image segmentation. Total area under an ROC curve (AUC) is widely used as an index of ROC analysis of performance test. However, a large part of the ROC curve is in the clinically irrelevant range. The total area can be misleading in some clinical situation. In this paper, we proposed a partial area index of ROC curves, which measures the segmentation performance in a clinically relevant range decided by learning from subjective ratings. The boundary of the range is defined by a linear cost function of false positive fraction (FPF) and true positive fraction (TPF). The cost factors of FPF and TPF are learned by maximizing the Kendall's coefficient of concordance (KCC) between the partial areas and the subjective ratings. Experiment results show that our method gives a large cost factor on FPF and a small cost factor on TPF on a tumor data set. This is consistent with the fact that a large FPF is generally more difficult to be accepted in tumor segmentation. Our method is able to determine the optimal range for partial area index of ROC analysis, and this partial area index is more appropriate than AUC for evaluating segmentation performance.
Investigation of methods for analyzing location specific observer performance data
We examined the statistical powers of three methods for analyzing FROC mark-rating data, namely ROC, JAFROC and IDCA. Two classes of observers were simulated: a designer-level CAD algorithm and a human observer. A search-model based simulator was used with the average numbers of false positives per image ranging from 0.21 for the human observer to 10 for CAD. Model parameters were chosen to yield 80% and 85% areas under the predicted ROC curves for both classes of observers and inter-image and inter-modality correlations of 0.1, 0.5 and 0.9 were investigated. The area under the FROC curve up to abscissa α (ranging from 0.18 to 6.7) was used as the IDCA figure-of-merit; the other methods used their well-known figures of merit. For IDCA power increased with α so it should be chosen as large as possible consistent with the need for overlap of the two FROC curves in the x-direction. For CAD the IDCA method yielded the highest statistical power. Surprisingly, JAFROC yielded the highest statistical power for human observers, even greater than IDCA which, unlike JAFROC, uses all the marks. The largest difference occurred for conservative reporting styles and high data correlation: e.g., 0.3453 for JAFROC vs. 0.2672 for IDCA. One reason is that unlike IDCA, the JAFROC figure of merit is sensitive to unmarked normal images and unmarked lesions. In all cases the ROC method yielded the least statistical power and entailed a substantial statistical power penalty (e.g., 24% for ROC vs. 41% for JAFROC). For human observers JAFROC should be used and for designer-level CAD data IDCA should be used and use of the ROC method for localization studies is discouraged.
Comparing agreement measures
Agreement is estimated by comparing correlated/paired scores (e.g. the scores from two doctors reading the same set of images), such as the correlation coefficient and measures of concordance. Some variance estimation techniques for these measures are also available in the literature. In this work, we compared four agreement measures: the widely used Pearson's product moment correlation coefficient, Kendall's tau, and two measures that are generalizations of AUC, the area under the receiver operating characteristics (ROC) curve. The generalization allows for ordinal truth that is polytomous (multi-state) or even continuous instead of just binary, and thus AUC is a special case. We investigate how these measures behave in a multi-reader multi-case (MRMC) simulation experiment as we change the intrinsic correlation and number of rating levels. We also investigate a few variance estimation techniques for these measures that are available in the literature. These agreement measures will help investigators developing model observers to compare their models against a human on a case-by-case basis instead of with a summary figure of merit that requires and is limited by binary truth, like AUC. The model observer AUC can equal the human observer AUC, while making very different decisions on a case-by-case basis.
Three-class ROC analysis: a sequential decision model developed for the diagnostic task rest-stress myocardial perfusion SPECT imaging
Xin He, Eric C. Frey
Previously we have developed a decision model for three-class ROC analysis where classification is made three simultaneously, i.e., with a single decision. In this paper, an alternative sequential decision model was developed for the specific three-class diagnostic procedure of rest-stress myocardial perfusion SPECT (MPS) imaging. This sequential decision model was developed based on the fact that sometimes this diagnostic task is performed using a two-step process. First, the stress (99m Tc) image is read to determine whether a patient is normal or abnormal based on the presence of a defect in the stress image. If a defect is found, the rest (201Tl) image is then read to determine whether this is a reversible defect or a fixed defect based on the presence of defect on the rest image. In fact, in some MPS protocols where sequential stress/rest imaging is performed, the rest imaging is not performed if there is no defect in the stress image. Therefore, the three-class task is decomposed to a sequence of two two-class tasks. For this task we determined, by maximizing the expected utility of both steps of the decision process, that log likelihood ratios were the optimal decision variables and provide the optimal ROC surface under the assumption that incorrect decisions have equal utilities under the same hypothesis. The properties of the sequential decision model were then studied. We found that the sequential decision model shares most of the features of a 2-class ROC curve. While this model was developed in the context of rest-stress MPS, it may have applications to other two-step diagnostic tasks.
Optimality of a utility-based performance metric for ROC analysis
Darrin C. Edwards, Charles E. Metz
We previously introduced a utility-based ROC performance metric, the "surface-averaged expected cost" (SAEC), to address difficulties which arise in generalizing the well-known area under the ROC curve (AUC) to classification tasks with more than two classes. In a two-class classification task, the SAEC can be shown explicitly to be twice the area above the conventional ROC curve (1-AUC) divided by the arclength along the ROC curve. In the present work, we show that for a variety of two-class tasks under the binormal model, the SAEC obtained for the proper decision variable (the likelihood ratio of the latent decision variable) is less than that obtained for the conventional decision variable (i.e., using the latent decision variable directly). We also justify this result using a readily derived property of the arclength along the ROC curve under a given data model. Numerical studies as well as theoretical analysis suggest that the behavior of the SAEC is consistent with that of the AUC performance metric, in the sense that the optimal value of this quantity is achieved by the ideal observer.
Image Perception and Quality
icon_mobile_dropdown
Individualised training to address variability of radiologists' performance
Shanghua Sun, Paul Taylor, Louise Wilkinson, et al.
Computer-based tools are increasingly used for training and the continuing professional development of radiologists. We propose an adaptive training system to support individualised learning in mammography, based on a set of real cases, which are annotated with educational content by experienced breast radiologists. The system has knowledge of the strengths and weakness of each radiologist's performance: each radiologist is assessed to compute a profile showing how they perform on different sets of cases, classified by type of abnormality, breast density, and perceptual difficulty. We also assess variability in cognitive aspects of image perception, classifying errors made by radiologists as errors of search, recognition or decision. This is a novel element in our approach. The profile is used to select cases to present to the radiologist. The intelligent and flexible presentation of these cases distinguishes our system from existing training tools. The training cases are organised and indexed by an ontology we have developed for breast radiologist training, which is consistent with the radiologists' profile. Hence, the training system is able to select appropriate cases to compose an individualised training path, addressing the variability of the radiologists' performance. A substantial part of the system, the ontology has been evaluated on a large number of cases, and the training system is under implementation for further evaluation.
Image perception by expert readers as a function of patient skin entrance dose levels in digital radiography
T. Lehnert, H. Korkusuz, F. Khan, et al.
In this study, image quality was based on required clinical criteria, in order to investigate to what degree entrance dose could be lowered and what kind of added filtration can be used without impinging on radiologist confidence levels in diagnosing. Images were taken of extremities from a cadaver using stepwise decreasing dose levels and variation of added filtration (no filtration, aluminum, aluminum/copper) under digital projection radiography (Kodak DirectView DR7500). The starting point dose level for all body parts imaged was the current x-ray technique. Two experienced and two resident radiologists were presented the images in a blinded fashion and rated each with an image quality score from 1 to 9 indicated very satisfied and 1 as very unsatisfied indicating loss of diagnostic value. The readers were not aware of which dose level and added filtration corresponded to which image. Dose levels considered were 100%, 75%, 50% and 25% of the normal and customary x-ray techniques used for the particular body part and projection. Images were reviewed on a clinical diagnostic workstation with no time limits imposed. Readers were also able to change the image presentation by adjusting the window width and level. Without added filtration image quality mean score was rated with 6.3 (dose level 100%), 6.2 (dose level 75%), 5.3 (dose level 50%) and with 4.4 (dose level 25%). An added aluminum filtration induced an image quality mean score of 6.3 (dose level 100%), 6.0 (dose level 75%), 5.1 (dose level 50%) and of 4.2 (dose level 25%). Using aluminum/copper filtration image quality mean score was rated with 6.0 (dose level 100%), 6.1 (dose level 75%), 5.0 (dose level 50%) and with 3.8 (dose level 25%). Regardless of the added filtration a differentiation between dose levels 100% and 75% was possible in 38.9%, between dose levels 75% and 50% in 66.7%, and between dose levels 50% and 25% in 70.0% of the cases. It is possible, in the case of extremities, to lower entrance doses up to 75 % of the normal value, a reduction of 25% in dose, under simultaneous use of added aluminum or aluminum/copper filtration, without comprising the diagnostic value required.
Toward perceptually driven image retrieval in mammography: a pilot observer study to assess visual similarity of masses
Maciej A. Mazurowski, Brian P. Harrawood, Jacek M. Zurada, et al.
Development of a fully automated system retrieving visually similar images is a task that could be helpful as the basis of a computer-assisted diagnostic (CADx) tool in mammography. Our study aims at a better understanding of the concept of visual similarity as it pertains to mammographic masses. Such understanding is a necessary step for building effective perceptually-driven image retrieval systems. In our study we deconstruct the concept of visual mass similarity into three components: similarity of size, similarity of shape, and similarity of margin. We present the results of a pilot observer study to determine the importance of each component when human observers assess the overall similarity of two masses. Seven observers of various expertise participated in the study: 1 highly experienced mammographer, 1 expert in visual perception, 3 CAD researchers, and 2 novices. Each observer assessed the similarity between 100 pairs of mammographic regions of interest (ROIs) depicting benign and malignant masses. Visual similarity was assessed in four categories (shape, size, margin, overall) using a web-based interface and a 10-point rating scale. Preliminary analysis of the results suggests the following. First, there is a moderate agreement between observers in similarity assessment for all mentioned categories. Second, all components substantially affect the overall similarity rating, with mass margin having the highest significance and mass size having the lowest significance relatively to the other factors. These findings varied somewhat based on the observer's expertise. Third, some low-level morphological features extracted from the masses can be used to mimic the overall visual similarity ratings and its specific components.
Existence and perception of textural information predictive of atypical nevi: preliminary insights
Paul Wighton, Tim K. Lee, David McLean M.D., et al.
Texture is known to predict atypicality in pigmented skin lesions. This paper describes an experiment that was conducted to determine 1) if this textural information is present in the center of skin lesions, and 2) how color affects the perception of this information. Images of pigmented skin lesions from three categories were shown to subjects in such a way that only textural information could be perceived; other factors known to predict atypicality were removed or held constant. These images were shown in both color and grayscale. Each subject assigned a score of atypicality to each image. The experiment was conducted on 5 subjects of varying backgrounds, including one expert. Each subject's accuracy under each modality was measured by calculating the volume under a 3-way ROC surface. The modalities were compared using the Dorfman-Berbaum-Metz (DBM) method of ROC analysis, giving a p-value of 0.8611. Therefore the null hypothesis that there is no difference between the predictive power of the modalities cannot be rejected. Also, a two one-sided test of equivalence (TOST) was performed giving a p-value pair of < 0.01; strong evidence that the textural information is independent of color. Additionally, the subjects' accuracies were compared to a set of random readers using the DBM and TOST methods. This was done for accuracies under the color modality, the grayscale modality and both modalities simultaneously. The results (all p-values < 0.001) confirm the existence of textural information predictive of atypia in the center of pigmented skin lesions.
Mass detection on mammograms: signal variations and performance changes for human and model observers
C. Castella, K. Kinkel M.D., M. P. Eckstein, et al.
We studied the influence of signal variability on human and model observer performances for a detection task with mammographic backgrounds and computer generated clustered lumpy backgrounds (CLB). We used synthetic yet realistic masses and backgrounds that have been validated by radiologists during previous studies, ensuring conditions close to the clinical situation. Four trained non-physician observers participated in two-alternative forced-choice (2-AFC) experiments. They were asked to detect synthetic masses superimposed on real mammographic backgrounds or CLB. Separate experiments were conducted with sets of benign and malignant masses. Results under the signal-known-exactly (SKE) paradigm were compared with signal-known-statistically (SKS) experiments. In the latter case, the signal was chosen randomly for each of the 1,400 2-AFC trials (image pairs) among a set of 50 masses with similar dimensions, and the observers did not know which signal was present. Human observers' results were then compared with model observers (channelized Hotelling with Difference-of-Gaussian and Gabor channels) in the same experimental conditions. Results show that the performance of the human observers does not differ significantly when benign masses are superimposed on real images or on CLB with locally matched gray level mean and standard deviation. For both benign and malignant masses, the performance does not differ significantly between SKE and SKS experiments, when the signals' dimensions do not vary throughout the experiment. However, there is a performance drop when the SKS signals' dimensions vary from 5.5 to 9.5 mm in the same experiment. Noise level in the model observers can be adjusted to reproduce human observers' proportion of correct answers in the 2-AFC task within 5% accuracy for most conditions.
Weighted perceptual difference model (case-PDM) for MR image quality evaluation
The perceptual difference model (Case-PDM) is being used to quantify image quality of fast, parallel MR acquisitions and reconstruction algorithms by comparing to slower, full k-space, high quality reference images. To date, most perceptual difference models average a single scalar image quality metric over a large region of interest. In this paper, we create an alternative metric weighted to image processing features. Spatial filters were applied to the reference image to create edge and flat region images, then weighted and aggregated to create "structural" images which in turn spatially weighted the perceptual difference maps. We optimized the scale of the spatial filters and weighting scheme with an exhaustive search so as to improve the linear correlation coefficient between human ratings and weighted Case-PDM, across a large set of MR reconstruction test images of varying quality. Human ratings were obtained from a modified Double Stimulus Continuous Quality Scale experiment. For 5 different images (3 different brain, 1 cardiac, and 1 phantom images), r values [weighted PDM, average PDM] were improved ([0.96, 0.94], [0.93, 0.91], [0.97, 0.95], [0.97, 0.91], [0.96, 0.95]) in all cases. The method is robust across subjects and anatomy; that is, scores maintain a high correlation with human ratings even if the test dataset is different from the training dataset.
Image Display
icon_mobile_dropdown
Image splitting techniques for a dual layer high dynamic range LCD display
Liquid crystal displays (LCD) are replacing analog film in radiology and permit to reduce diagnosis times. Their typical dynamic range, however, can be too low for some applications, and their poor ability to reproduce low luminance areas represents a critical drawback. The black level of an LCD can be drastically improved by stacking two liquid crystal panels in series. In this way the global transmittance is the pointwise product of the transmittances of the two panels and the theoretical dynamic range is squared. Such a high dynamic range (HDR) display also permits the reproduction of a larger number of gray levels, increasing the bit depth of the device. The two panels, however, are placed at a small distance one from each other due to mechanical constraints, and this introduces a parallax error when the display is observed off-axis. A complex, spatially-adaptive algorithm is therefore necessary to generate the images used to drive the two panels. In this paper, we describe the characteristics of a prototype dual-layer HDR display and discuss the issues involved in the image splitting algorithms. We propose some solutions and analyze their performance, giving a measure of the capabilities and limitations of the device.
Detection of low contrast test patterns on an LCD with different luminance and illuminance settings
The DICOM part 14 grayscale standard display function provides one way of harmonizing image appearance under different monitor luminance settings. This function is based on ideal observer conditions where the eye is always adapted to the target luminance and thereby also at peak contrast sensitivity. Clinical workstations are however often exposed to variations in ambient light due to a sub-optimal reading room light environment. Also, clinical images are inhomogeneous and low-contrast patterns must be detected even at luminance levels that differ from the eye adaptation level. All deviations from ideal luminance conditions cause the observer to detect patterns with reduced eye sensitivity but the magnitude of this reduction is unclear. The purpose of this paper was to quantify the effect different luminance settings have on the contrast threshold. A method to display well-defined sinusoidal low-contrast test patterns on an LCD has previously been developed and was used in this study. The observers were exposed to light from three different areas: 1) A small sinusoidal test pattern. 2) The remaining of the display surface. 3) Ambient light from outside the display area covering most of the observer's field of view. By adjusting the luminance from each of these three areas, two major effects could be quantified. The first effect was similar to Barten's f-factor where the target luminance differs from the observer's adaptation level while the second effect concerned the influence of areas outside the display surface. When a luminance range of 1-350 cd/m2 was used, the contrast needed to detect a dark object in a gray surrounding was almost doubled compared to a dark object in a dark surrounding. Ambient light from outside the display area has a moderate effect on the contrast threshold, except for the combination of high ambient light and dark objects where the contrast threshold increased considerably.
Visual adaptation: softcopy image contribution to the observer's field of view
R. J. Toomey, K. Curran, C. D'Helft, et al.
Purpose Detection of low-contrast details is highly dependent on the adaptation state of the eye. It is important therefore that the average luminance of the observer's field of view (FOV) matches those of softcopy radiological images. This study establishes the percentage of FOV filled by workstations at various viewing distances. Methods Five observers stood at viewing distances of 20, 30 and 50cm from a homogenous white surface and were instructed to continuously focus on a fixed object at a height appropriate level. A dark indicator was held at this object and then moved steadily until the observer could no longer perceive it in his/her peripheral vision. This was performed at 0°, 90°, 180° and 270° clockwise from the median sagittal plane. Distances were recorded, radii calculated and observer and mean FOV areas established. These values were then compared with areas of typical high and low specification workstations. Results Individual and mean FOVs were 7660, 15463 and 30075cm2 at viewing distances of 20, 30 and 50cm respectively. High and low specification monitors with respective areas of 1576.25 and 921.25cm2 contributed between 5 to 21% and 3 to 12% respectively to the total FOV depending on observer distance. Limited inter-observer variances were noted. Conclusions Radiology workstations typically comprise between only 3 and 21% of the observer's FOV. This demonstrates the importance of measuring ambient light levels and surface reflection coefficients in order to maximise adaptation and observer's perception of low contrast detail and minimise eye strain.
Achieving consistent color and grayscale presentation on medial color displays
Color displays are increasingly used for medical imaging, replacing the traditional monochrome displays in radiology for multi-modality applications, 3D representation applications, etc. Color displays are also used increasingly because of wide spread application of Tele-Medicine, Tele-Dermatology and Digital Pathology. At this time, there is no concerted effort for calibration procedures for this diverse range of color displays in Telemedicine and in other areas of the medical field. Using a colorimeter to measure the display luminance and chrominance properties as well as some processing software we developed a first attempt to a color calibration protocol for the medical imaging field.
Assessment of temporal display using observers
Most assessments of display performance are limited to studying the display quality for static images. However, dynamic scenes constitute a large fraction of medical images and are becoming more widespread due to the increase in the number of images to be interpreted. The image quality of a dynamic scene is affected by the display's temporal characteristics and the human visual system's temporal response. We propose to use a computational observer to understand the effect of image browsing speed in medical displays. We use a 3D cluster lumpy background to study the effect of different browsing speeds using liquid crystal display (LCD) temporal response measurements reported in our previous work. The image set is then analyzed by the computational observer. This allows us to quantify the effect of slow temporal response of medical LCDs on the performance of the anthropomorphic observer. Slow temporal response of the display device affects the lesion contrast and the observer performance. Human visual system also adds to the complexity of image perception of dynamic scenes. A human observer study was used to validate the computational observer results.
Model Observers
icon_mobile_dropdown
Model observers to predict human performance in LROC studies of SPECT reconstruction using anatomical priors
We investigate the use of linear model observers to predict human performance in a localization ROC (LROC) study. The task is to locate gallium-avid tumors in simulated SPECT images of a digital phantom. Our study is intended to find the optimal strength of smoothing priors incorporating various degrees of anatomical knowledge. Although humans reading the images must perform a search task, our models ignore search by assuming the lesion location is known. We use area under the model ROC curve to predict human area under the LROC curve. We used three models, the non-prewhitening matched filter (NPWMF), the channelized nonprewhitening (CNPW), and the channelized Hotelling observer (CHO). All models have access to noise-free reconstructions, which are used to compute the signal template. The NPWMF model does a poor job of predicting human performance. The CNPW and CHO model do a somewhat better job, but still do not qualitatively capture the human results. None of the models accurately predicts the smoothing strength which maximizes human performance.
Optimizing breast-tomosynthesis acquisition parameters with scanning model observers
In breast tomosynthesis (BT), multiple x-ray projections obtained over a limited angular span are reconstructed to produce a three-dimensional (3D) volume. This 3D imagery can lead to reduced structural masking effects compared to conventional mammography. Accordingly, there has been considerable interest in optimizing acquisition and reconstruction parameters associated with BT. In this work, we evaluate the use of a scanning model observer and localization ROC (LROC) methodology for performing a task-based optimization of the angular span and number of projection angles for a simulated BT system. The observer was applied to extracted slices of 3D volumes reconstructed with filtered backprojection. Both "background-known-exactly" (BKE) and "quasi-BKE" (QBKE) tasks were conducted. The latter task attempts to account for limited observer training by preserving structural noise in the detection task. Reduced noise in the form of fewer projections was important with the BKE task, although wider angular spans were also advantageous. Higher sampling densities may improve performance for the more-realistic QBKE tasks.
Markov-chain Monte Carlo for the performance of a channelized-ideal observer in detection tasks with non-Gaussian lumpy backgrounds
The Bayesian ideal observer is optimal among all observers and sets an upper bound for observer performance in binary detection tasks. This observer provides a quantitative measure of diagnostic performance of an imaging system, summarized by the area under the receiver operating characteristic curve (AUC), and thus should be used for image quality assessment whenever possible. However, computation of ideal-observer performance is difficult because this observer requires the full description of the statistical properties of the signal-absent and signal-present data, which are often unknown in tasks involving complex backgrounds. Furthermore, the dimension of the integrals that need to be calculated for the observer is huge. To estimate ideal-observer performance in detection tasks with non-Gaussian lumpy backgrounds, Kupinski et al. developed a Markovchain Monte Carlo (MCMC) method, but this method has a disadvantage of long computation times. In an attempt to reduce the computation load and still approximate ideal-observer performance, Park et al. investigated a channelized-ideal observer (CIO) in similar tasks and found that the CIO with singular vectors of the imaging system approximated the performance of the ideal observer. But, in that work, an extension of the Kupinski MCMC was used for calculating the performance of the CIO and it did not reduce the computational burden. In the current work, we propose a new MCMC method, which we call a CIO-MCMC, to speed up the computation of the CIO. We use singular vectors of the imaging system as efficient channels for the ideal observer. Our results show that the CIO-MCMC has the potential to speed up the computation of ideal observer performance with a large number of channels.
Singular vectors of a linear imaging system as efficient channels for the ideal observer in detection tasks involving non-Gaussian distributed lumpy images
The Bayesian ideal observer sets an upper bound for diagnostic performance of an imaging system in binary detection tasks. Thus, this observer should be used for image quality assessment whenever possible. However, it is difficult to compute ideal-observer performance because the probability density functions of the data, required for the observer, are often unknown in tasks involving complex backgrounds. Furthermore, the dimension of the integrals that need to be calculated for the observer is huge. To attempt to reduce the dimensionality of the problem, and yet still approximate ideal-observer performance, a channelized-ideal observer (CIO) with Laguerre-Gauss channels was previously investigated for detecting a Gaussian signal at a known location in non-Gaussian lumpy images. While the CIO with Laguerre-Gauss channels had, in some cases, approximated ideal-observer performance, there was still a gap between the mean performance of the ideal observer and the CIO. Moreover, it is not clear how to choose efficient channels for the ideal observer. In the current work, we investigate the use of singular vectors of a linear imaging system as efficient channels for the ideal observer in the same tasks. Singular value decomposition of the imaging system is performed to obtain its singular vectors. Singular vectors most relevant to the signal and background images are chosen as candidate channels. Results indicate that the singular vectors are not only more efficient than Laguerre-Gauss channels, but are also highly efficient for the ideal observer. The results further demonstrate that singular vectors strongly associated with the signal-only image are the most efficient channels.
Comparison of variable and fixed focal length cone beam CT in diagnostic imaging
We use a task-based study to objectively evaluate the effect of variable versus fixed focal length in determining the position of a lesion in helical cone-beam computed tomography (HCBCT). This method will be used to assess whether variable focal length CBCT scans provide a measurable improvement in estimating lesion position relative to fixed focal length CBCT in diagnostic applications. In this simulation study a 1 cm diameter spherical lesion is placed at four different positions within a three-dimensional Shepp-Logan head phantom. The axial plane is taken to point along the z-axis, which is also the central axis of the helix. The lesion is placed at the center of the Shepp-Logan phantom, at positions displaced ±5 cm in x, and at a position displaced 5 cm in y. Four different scans of pitch length 10 cm are then performed using 128 views over 360° with a 100×300 pixel (20 cm×60 cm) detector. Two scans have a fixed focal length of 50 cm between the X-ray source and the center of rotation (COR), varying only in the starting angle of the source (0° and 90°). We call this the circular configuration. The other two scans have a variable focal length following the curvature of the head phantom and ranging from 37.5 cm to 50 cm. We call this the elliptical configuration. The detector rotates with the source but maintains a constant distance of 30 cm from the COR. A likelihood gridding technique is used to assess bias and variance in the position estimates determined from each scan configuration. We find that the biases are small relative to the variances, and have no apparent preferred direction. Of the 24 circular to elliptical comparisons made, we find that in 14 cases the elliptical scan has a smaller variance that is statistically significant(p ≤ 0.05). By contrast, we find no statistically significant cases in which the circular scan gives a smaller variance compared to the elliptical scan. We conclude that using a variable focal length adapted to the contours of the head phantom provides more precise results, but caution that this is a limited pilot study and many more factors will be accounted for in future work.
Accurate computation of the Hotelling template for SKE/BKE detection tasks
An accurate method for evaluating the Hotelling observer for large linear systems is generalized. The method involves solving an m-channel channelized Hotelling observer where the channels are refined in an iterative manner. Challenging numerical examples are shown in order to illustrate the method and give a sense of the convergence rates as a function of m.
Poster Session
icon_mobile_dropdown
VGC analysis: application of the ROC methodology to visual grading tasks
To determine clinical image quality in radiography, visual grading of the reproduction of important anatomical landmarks is often used. The rating data from the observers in a visual grading study with multiple scale steps is ordinal, meaning that non-parametric rank-invariant statistical methods are required. However, many visual grading methods incorrectly use parametric statistical methods. This work describes how the methodology developed in receiver operating characteristics (ROC) analysis for characterising the difference in the observer's response to the signal and no-signal distributions can be applied to visual grading data for characterising the difference in perceived image quality between two systems. The method is termed visual grading characteristics (VGC) analysis. In a VGC study, the task of the observer is to rate her confidence about the fulfilment of image quality criteria. Using ROC software, the given ratings for the two systems are then used to determine the VGC curve, which describes the relationship between the proportions of fulfilled image criteria for the two compared systems for all possible decision thresholds. As a single measure of the difference in image quality between the two compared systems, the area under the VGC curve can be used.
A method of ROC analysis by applying item response theory (IRT) to results of 1/0 judgments on the presence or absence of abnormal findings in CT image readings
Toru Matsumoto, Akira Furukawa, Kanae Nisizawa, et al.
The purpose of this study is to develop a method of ROC analysis to evaluate both the ability of individual readers to detect abnormal findings and the detectability of abnormal findings in individual cases by applying item response theory to the results of 1/0 judgments on presence of abnormal findings in CT image readings. The validity of the method was verified by the following data and methods. Twenty-four readers searched for abnormal findings in 25 cases for which there were chest CT images with defined abnormal findings. From the 1/0 judgment data for the 25 cases with CT images (column) read by the 24 readers (row), each reader's potential ability to detect the abnormal findings (θ), the rate of "1" judgment by each reader, i.e. confidence level for TP and FP, P(θ), and the individual image response characteristic curves with the image as the item were calculated, from which ROC curves that represent the ability of each reader to detect abnormal findings were created. In addition, from the 1/0 judgment data for the 25 cases with CT images (row) read by the 24 readers (column), the potential detectability of abnormal findings for each CT image (θ) and the rate of "1" judgment for the image by readers, i.e. confidence level for TP and FP, P(θ), were calculated, from which ROC curves that represent the detectability of the abnormal finding in each case were created.
Relations between physical properties of local and global image-based elements and the performance of human observers in lung nodule detection
Aim: The study aims to help our understanding of the relationship between physical characteristics of local and global image features and the location of visual attention by observers. Background: Neurological visual pathways are specified at least in part by particular spatial frequency ranges at different orientations. High spatial frequencies, which carry the information of local perturbations like edges, are assembled mainly by foveal vision, whereas peripheral vision provides more global information coded by low frequencies. Recent visual-search studies in mammography (C Mello-Thoms et al) have shown that observers allocate visual attention to regions of the image depending on; i) spatial frequency characteristics of regions that capture attention and ii) the level of experience of the observer. Both aspects are considered in this study. Methods: A spatial frequency analysis of postero-anterior (PA) chest images containing pulmonary nodules has been performed by wavelet packet transforms at different scales. This image analysis has provided regional physical information over the whole image field on locations both with nodules present and nodules absent. The relationship between such properties as spatial frequency, orientation, scales, contrast, and phase of localised perturbations has been compared with eye-tracked search strategies and decision performance of observers with different levels of expertise. Results: The work is in progress and the results of this initial stage of the project will be presented with a critical appraisal of the methods used.
Reconstruction filters and contrast detail curves in CT
In this study, we investigated the effect of CT reconstruction filters in abdominal CT images of a male anthropomorphic phantom. A GE Light Speed CT 4-slice scanner was used to scan the abdomen of an adult Rando phantom. Cross sectional images of the phantom were reconstructed using four reconstruction filters: (1) soft tissue with the lowest noise; (2) detail (relative noise 1.7); (3) bone (relative noise 4.5); and (4) edge (relative noise 7.7). A two Alternate Forced Choice (AFC) experimental paradigm was used to estimate the intensity needed to achieve 92% correct (i.e., I92%). Four observers measured detection performance for five lesions with size ranging from 2.5 to 12.5 mm for each of these four reconstruction filters. Contrast detail curves obtained in images of an anthropomorphic phantom were not straight lines, but best fitted to a second order polynomial. Results from four readers show similar trends with modest inter-observer differences with the measured coefficient of variation of the absolute performance levels of ~22%. All reconstruction filters had similar shaped contrast detail curves except for smallest details where the frequency response of filters differed most significantly. Increasing the noise level always reduced detection performance, and a doubling of image noise resulted in an average drop in detection performance of ~20%. The key findings of this study are that (a) the Rose model can provide reasonable predictions as to how changes in lesion size affect observer detection; (b) the shape of CT contrast detail curves is affected only very slightly with reconstruction filter; (c) changes in reconstruction filter noise can predict qualitative changes in observer detection performance, but are poor direct predictors of the quantitative changes of imaging performance.
Inter-reader variability in alternate forced choice studies
In this study, we investigated differences in detection performance for twelve observers who each generated a CT contrast detail curve. An anthropomorphic newborn phantom's abdomen was imaged using a GE Light Speed CT scanner (4-slice). Alternate Forced Choice (AFC) experiments were performed with lesions sizes ranging from 2.5 to 12.5 mm to determine the intensity needed to achieve 92% correct (I92%). Following training, twelve readers consisting of (2 technologists, 4 college students, 4 medical students, and 2 radiology residents) generated a single contrast detail curve. Eight readers produced approximately linear contrast detail curves while the remaining four readers required a second order polynomial fit because of reduced performance when detecting the largest (i.e., 12.5 mm) lesion. For the three smallest lesions, the coefficient of variation between the twelve readers was ~12%, which increases with increasing lesion size to ~23% for 12.5 mm lesion size. The ratio of the maximum I92% to minimum I92% values was ~1.6 for the smallest lesions, which increased to a factor of ~2.1 for the 12.5 mm lesion. Our results show that minimizing inter-reader variability in our AFC experiments could be achieved by eliminating the largest lesion that cause detection problems in one third of observers. The combined experimental data showed that the slope of the contrast detail curve was -0.42, lower than the value of -1.0 predicted by the Rose model, suggesting that the noise texture in CT associated with both quantum mottle and anatomic structure is an important factor affecting detection of these lesions.
Mammographic interpretation training: how useful is handheld technology?
In the UK a national self-assessment scheme (PERFORMS) for mammographers is undertaken as part of the National Health Search Breast Screening Programme. Where appropriate, further training is suggested to improve performance. Ideally, such training would be on-demand; that is whenever and wherever an individual decides to undertake it. To use a portable device for such a purpose would be attractive on many levels. However, it is not known whether handheld technology can be used effectively for viewing mammographic images. Previous studies indicate the potential for viewing medical images with fairly low spatial resolution (e.g. CT, MRI) on PDAs. In this study, we set out to investigate factors that might affect the feasibility of using PDAs as a training technology for examining large, high resolution mammographic images. Two studies are reported: 20 mammographers examined a series of mammograms presented on a PDA, specifying the location of any abnormality. Secondly, a group of technologists examined a series of mammograms presented at different sizes and resolutions to mimic presentation on a PDA and their eye movements were recorded. The results indicate the potential for using PDAs to show such large, high resolution images if suitable Human-computer Interaction (HCI) techniques are employed.
How are false negative cases perceived by mammographers? Which abnormalities are misinterpreted and which go undetected?
A radiographic 'false negative' or a case which has been 'missed' can be categorised in terms of errors of search (where gaze does not fall upon the abnormality); detection (a perceptual error where the abnormality may be physically 'seen' but remains undetected) and misinterpretation (a perceptual error whereby an abnormality, although detected, is not deemed worthy of further assessment). This study aims to investigate perceptual errors in mammographic film-reading and will focus on the later of the two error types, namely errors of misinterpretation and errors of non-detection. Previous research has shown, on a self-assessment scheme of recent and difficult breast-screening cases, that certain feature types are susceptible to errors of misinterpretation and others to errors of non-detection. This self assessment scheme, 'PERFORMS' (Personal Performance in Mammographic Screening), is undertaken by the majority (at present over 90%) of breast-screening mammographers in the UK Breast Screening Programme. The scheme is completed biannually and confidentially and participants receive immediate and detailed feedback on their performance. Feedback from the scheme includes information detailing their false negative decisions including case classifications (benign or malignant), feature type (masses, calcification, asymmetries, architectural distortions and others) and case perception error (percentage of misinterpretation and percentage of non-detection). Results from a recent round of PERFORMS (n=506), revealed that certain feature types had significantly higher percentages of error overall (including architectural distortion and asymmetries), and that these feature types also showed significant differences for error type. Implications for real-life screening practice were explored using real-life self-reported data on years of screening experience.
Measurement of visual strain in radiologists
We hypothesized that the current practice of radiology produces oculomotor fatigue that reduces diagnostic accuracy. The initial step in testing this hypothesis is to measure visual strain. We are approaching this by measuring visual accommodation of radiologists before and after diagnostic viewing work. We measure accommodation using the WAM-5500 Auto Refkeratometer from Grand Seiko, which collects refractive measurements and pupil diameter measurements. The radiologists focus on a simple target while accommodation is measured. The target distances are varied from near to far starting at 20 cm target distance from the eye to 183 cm. The data are compared for prior to and after long-term diagnostic viewing. Results indicate that we are successfully measuring visual accommodation. Accommodation at long distances does not seem to differ before and after diagnostic reading. Accommodation at near distances however does differ, with decreased ability to accommodate after many hours of diagnostic reading. Since near distances are crucial during diagnostic reading, this could have a substantial impact on diagnostic accuracy (the next phase of the project).
Learning from others: effects of viewing another person's eye movements while searching for chest nodules
Damien Litchfield, Linden J. Ball, Tim Donovan, et al.
We report a study that investigated whether experienced and inexperienced radiographers benefit from knowing where another person looked during pulmonary nodule detection. Twenty-four undergraduate radiographers (1 year of experience) and 24 postgraduate radiographers (5+ years of experience) searched 42 chest x-rays for nodules and rated how confident they were in their decisions. Eye movements were also recorded. Performance was compared across three within-participant conditions: (1) free search - where radiographers could identify nodules as normal; (2) image preview - where radiographers were first shown each chest x-ray for 20 seconds before they could then proceed to mark the location of any nodules; and (3) eye movement preview - which was identical to image preview except that the 20 second viewing period displayed an overlay of the real-time eye movements of another radiographer's scanpath for that image. For this preview condition half of each group were shown where a novice radiographer looked, and the other half were shown where an experienced radiologist looked. This was not made known to the participants until after the experiment. Performance was assessed using JAFROC analysis. Both groups of radiographers performed better in the eye movement preview condition compared with the image preview or free search conditions, with inexperienced radiographers improving the most. We discuss our findings in terms of the task-specific information interpreted from eye movement previews, task difficulty across images, and whether it matters if radiographers are previewing the eye movements of an expert or a novice.
Assembling a prototype resonance electrical impedance spectroscopy system for breast tissue signal detection: preliminary assessment
Jules Sumkin, Bin Zheng, Michelle Gruss, et al.
Using electrical impedance spectroscopy (EIS) technology to detect breast abnormalities in general and cancer in particular has been attracting research interests for decades. Large clinical tests suggest that current EIS systems can achieve high specificity (≥ 90%) at a relatively low sensitivity ranging from 15% to 35%. In this study, we explore a new resonance frequency based electrical impedance spectroscopy (REIS) technology to measure breast tissue EIS signals in vivo, which aims to be more sensitive to small tissue changes. Through collaboration between our imaging research group and a commercial company, a unique prototype REIS system has been assembled and preliminary signal acquisition has commenced. This REIS system has two detection probes mounted in the two ends of a Y-shape support device with probe separation of 60 mm. During REIS measurement, one probe touches the nipple and the other touches to an outer point of the breast. The electronic system continuously generates sweeps of multi-frequency electrical pulses ranging from 100 to 4100 kHz. The maximum electric voltage and the current applied to the probes are 1.5V and 30mA, respectively. Once a "record" command is entered, multi-frequency sweeps are recorded every 12 seconds until the program receives a "stop recording" command. In our imaging center, we have collected REIS measurements from 150 women under an IRB approved protocol. The database includes 58 biopsy cases, 78 screening negative cases, and other "recalled" cases (for additional imaging procedures). We measured eight signal features from the effective REIS sweep of each breast. We applied a multi-feature based artificial neural network (ANN) to classify between "biopsy" and normal "non-biopsy" breasts. The ANN performance is evaluated using a leave-one-out validation method and ROC analysis. We conducted two experiments. The first experiment attempted to classify 58 "biopsy" breasts and 58 "non-biopsy" breasts acquired on 58 women each having one breast recommended for biopsy. The second experiment attempted to classify 58 "biopsy" breasts and 58 negative breasts from the set of screening negative cases. The areas under ROC curves are 0.679 ± 0.033 and 0.606 ± 0.035 for the first and the second experiment, respectively. The preliminary results demonstrate (1) even with this rudimentary system with only one paired probes there is a measurable signal of changes in breast tissue demonstrating the feasibility of applying REIS technology for identifying at least some women with highly suspicious breast abnormalities and (2) the electromagnetic asymmetry between two breasts may be more sensitive in detecting changes in the abnormal breast. To further improve the REIS system performance, we are currently designing a new REIS system with multiple electrical probes and a more sophisticated analysis scheme.
Perceptual assessment of multiple stent deployment
Craig K. Abbey, Arian Teymoorian, Wade Schoonveld, et al.
Evaluating the placement of multiple coronary stents requires fine judgments of distance between two or more deployed stents in order to determine if there is continuous coverage without a gap or overlap between the two. These judgments are made difficult by limited system resolution, noise, relatively low contrast of the deployed stent, and stent motion during the cardiac cycle. In this work, we assess the effect of frame rate and number of frames used in a sequence on the detection accuracy of gaps between the stents. Both of these factors can be used to reduce patient dose. We use real X-ray coronary angiograms as backgrounds along with stents imaged separately with Lucite for similar beam attenuation. Stents and simulated guidewires are embedded in the angiograms by adding optical densities after scatter subtraction. Realistic motion is rendered by manually synchronizing the stent densities to vascular features in each image. We find no significant difference the different frame rates or sequence lengths, indicating potential savings in dose.
An automated system for the analysis of peri-prosthetic osteolysis progression
The purpose of this work is to evaluate the performance of a computer based analysis system aimed at the quantitative detection of changes in hip osteolytic lesions in subjects with hip implants. The computer system is based on the supervised segmentation of a baseline x-ray computed-tomography (CT) scan and an automated segmentation of a follow-up CT scan using an object based tracking algorithm. The segmentation process outlines the pelvic bone and lesions present in the pelvis. The size and CT density of the osteolytic lesions are computed in both baseline and follow-up segmentations and the change in both these quantities are evaluated. The system analysis consisted of the direct comparison of the quantitative results obtained from an expert manual segmentation to the quantitative results obtained using the automated system on 20 subjects. The system bias was evaluated by performing forwards and backwards analysis of the CT data. Furthermore, the stability of the proposed tracking system was compared to the variability of the manual tracking. The results show that the system enhances the human ability to detect changes in lesions size and density regardless of the inherent observer variability in the definition of the baseline manual segmentation.
Performance assessment of multi-frequency processing of ICU chest images for enhanced visualization of tubes and catheters
Xiaohui Wang, Mary E. Couwenhoven, David H. Foos, et al.
An image-processing method has been developed to improve the visibility of tube and catheter features in portable chest x-ray (CXR) images captured in the intensive care unit (ICU). The image-processing method is based on a multi-frequency approach, wherein the input image is decomposed into different spatial frequency bands, and those bands that contain the tube and catheter signals are individually enhanced by nonlinear boosting functions. Using a random sampling strategy, 50 cases were retrospectively selected for the study from a large database of portable CXR images that had been collected from multiple institutions over a two-year period. All images used in the study were captured using photo-stimulable, storage phosphor computed radiography (CR) systems. Each image was processed two ways. The images were processed with default image processing parameters such as those used in clinical settings (control). The 50 images were then separately processed using the new tube and catheter enhancement algorithm (test). Three board-certified radiologists participated in a reader study to assess differences in both detection-confidence performance and diagnostic efficiency between the control and test images. Images were evaluated on a diagnostic-quality, 3-megapixel monochrome monitor. Two scenarios were studied: the baseline scenario, representative of today's workflow (a single-control image presented with the window/level adjustments enabled) vs. the test scenario (a control/test image pair presented with a toggle enabled and the window/level settings disabled). The radiologists were asked to read the images in each scenario as they normally would for clinical diagnosis. Trend analysis indicates that the test scenario offers improved reading efficiency while providing as good or better detection capability compared to the baseline scenario.
Steady-state sweep visual evoked potential processing denoised by wavelet transform
Heinar A. Weiderpass, Jorge F. Yamamoto, Solange R. Salomão, et al.
Visually evoked potential (VEP) is a very small electrical signal originated in the visual cortex in response to periodic visual stimulation. Sweep-VEP is a modified VEP procedure used to measure grating visual acuity in non-verbal and preverbal patients. This biopotential is buried in a large amount of electroencephalographic (EEG) noise and movement related artifact. The signal-to-noise ratio (SNR) plays a dominant role in determining both systematic and statistic errors. The purpose of this study is to present a method based on wavelet transform technique for filtering and extracting steady-state sweep-VEP. Counter-phase sine-wave luminance gratings modulated at 6 Hz were used as stimuli to determine sweep-VEP grating acuity thresholds. The amplitude and phase of the second-harmonic (12 Hz) pattern reversal response were analyzed using the fast Fourier transform after the wavelet filtering. The wavelet transform method was used to decompose the VEP signal into wavelet coefficients by a discrete wavelet analysis to determine which coefficients yield significant activity at the corresponding frequency. In a subsequent step only significant coefficients were considered and the remaining was set to zero allowing a reconstruction of the VEP signal. This procedure resulted in filtering out other frequencies that were considered noise. Numerical simulations and analyses of human VEP data showed that this method has provided higher SNR when compared with the classical recursive least squares (RLS) method. An additional advantage was a more appropriate phase analysis showing more realistic second-harmonic amplitude value during phase brake.
A strategy to optimize CT pediatric dose with a visual discrimination model
Daniel Gutierrez, François Gudinchet, Leonor T. Alamo-Maestre, et al.
Technological developments of computed tomography (CT) have led to a drastic increase of its clinical utilization, creating concerns about patient exposure. To better control dose to patients, we propose a methodology to find an objective compromise between dose and image quality by means of a visual discrimination model. A GE LightSpeed-Ultra scanner was used to perform the acquisitions. A QRM 3D low contrast resolution phantom (QRM - Germany) was scanned using CTDIvol values in the range of 1.7 to 103 mGy. Raw data obtained with the highest CTDIvol were afterwards processed to simulate dose reductions by white noise addition. Noise realism of the simulations was verified by comparing normalized noise power spectra aspect and amplitudes (NNPS) and standard deviation measurements. Patient images were acquired using the Diagnostic Reference Levels (DRL) proposed in Switzerland. Noise reduction was then simulated, as for the QRM phantom, to obtain five different CTDIvol levels, down to 3.0 mGy. Image quality of phantom images was assessed with the Sarnoff JNDmetrix visual discrimination model and compared to an assessment made by means of the ROC methodology, taken as a reference. For patient images a similar approach was taken but using as reference the Visual Grading Analysis (VGA) method. A relationship between Sarnoff JNDmetrix and ROC results was established for low contrast detection in phantom images, demonstrating that the Sarnoff JNDmetrix can be used for qualification of images with highly correlated noise. Patient image qualification showed a threshold of conspicuity loss only for children over 35 kg.
Assessment of scanning model observers with hybrid SPECT images
H. C. Gifford, P. H. Pretorius, M. A. King
The purpose of this work was to test procedures for applying scanning model observers in order to predict human-observer lesion-detection performance with hybrid images. Hybrid images consist of clinical backgrounds with simulated abnormalities. The basis for this investigation was detection and localization of solitary pulmonary nodules (SPN) in SPECT lung images, and our overall goal has been to determine the extent to which detection of SPN could be improved by proper modeling of the acquisition physics during the iterative reconstruction process. Towards this end, we conducted human-observer localization ROC (LROC) studies to optimize the number of iterations and the postfiltering of four rescaled block-iterative (RBI) reconstruction strategies with various combinations of attenuation correction (AC), scatter correction (SC), and system-resolution correction (RC). This observer data was then used to evaluate a scanning channelized nonprewhitening model observer. A standard "background-known-exactly" (BKE) task formulation overstated the prior knowledge and training that human observers had about the hybrid images. Results from a quasi-BKE task that preserved some degree of structural noise in the detection task demonstrated better agreement with the humans.
SPECT image system optimization using ideal observer for detection and localization
We consider the problem of optimizing collimator characteristics for a simple emission tomographic imaging system. We use the performance of two different ideal observers to carry out the optimization. The first ideal observer applies to signal detection when signal location is unknown and background is variable, and the second ideal observer (one proposed previously by our group) to the more realistic task of signal detection and localization with signal location unknown and background variable. The two observers operate on sinogram data to deliver scalar figures of merit AROC and ALROC, respectively. We considered three different collimators that span a range of efficiency-resolution tradeoffs. Our central question is this: For optimizing the collimator in an emission tomographic system, does adding a localization requirement to a detection task yield an efficiency-resolution tradeoff that differs from that for the detection-only task? Our simulations with a simple SPECT imaging system show that as the localization requirement becomes more stringent, the optimal collimator shifts from a low-resolution, high efficiency version toward higher resolution, lower efficiency version. We had previously observed such behavior for a planar pinhole imaging system. In our simulations, we used a simplified model of tomographic imaging and a simple model for object background variability. This allowed us to avoid the severe computational complexity associated with ideal-observer performance calculations. Thus the more realistic task (i.e. localization included) resulted for this case in a different optimal collimator.
Noise reduction effect in super-high resolution LCDs using independent sub-pixel driving technology
Katsuhiro Ichikawa, Yoshikazu Nishi, Shigeo Hayashi, et al.
We have developed and reported a super-high resolution liquid crystal display (SHR-LCD) using a new resolution enhancement technology of the independent sub-pixel driving (ISD) that utilizes three sub-pixels in each pixel element. This technology realizes the three-times resolution enhancement of monochrome LCDs, and improves the depiction ability of detailed shape such as micro-calcifications of a mammography and bone structures. Furthermore, the ISD technology brings not only resolution enhancement but also noise reduction effect by the high-resolution data sampling in displaying the clinical images. In this study, we examined the efficacy of the newly developed LCDs from the noise power spectrum measurement (NPS), the perceptual comparison of the phantom images and the clinical images. A 15 mega-pixel (MP)SHR-LCD out of a 5MP LCD and a 6MP SHR-LCD out of a 2MP LCD were used for the measurement and the evaluation. In the NPS measurements, the noise of all the SHR-LCDs was improved obviously. The improvement degree of the NPS varied according to the sub-sampling ratio of the data sampling implemented during the image displaying, and the 6MP LCD showed higher improvement. In the perceptual evaluation of the quality-control phantom images and the low-contrast images of the micro-calcifications of the mammography, all the SHR-LCDs provided higher performance than the conventional LCDs. These results proved that the SHR-LCDs using the ISD technology had the excellent ability to display the high-resolution clinical images.
Mammography workstation design: effect on mammographer behaviour and the risk of musculoskeletal disorders
In the UK Breast Screening Programme there is a growing transition from film to digital mammography, and consequently a change in mammography workstation ergonomics. This paper investigates the effect of the change for radiologists including their comfort, likelihood of developing musculoskeletal disorders (MSD's), and work practices. Three workstations types were investigated: one with all film mammograms; one with digital mammograms alongside film mammograms from the previous screening round, and one with digital mammograms alongside digitised film mammograms from the previous screening round. Mammographers were video-taped whilst conducting work sessions at each of the workstations. Event based Rapid Upper Limb Assessment (RULA) postural analysis showed no overall increase in MSD risk level in the switch from the film to digital workstation. Average number of visual glances at the prior mammograms per case measured by analysis of recorded video footage showed an increase if the prior mammograms were digitised, rather than displayed on a multi-viewer (p<.05). This finding has potential implications for mammographer performance in the transition to digital mammography in the UK.
Influence of monitor characteristics on the signals detection present in the mammographic phantom image
The objective of this study was to compare the detection of microcalcifications and fibers on phantom images based on monitor readings versus analog image readings. 180 films were obtained with 3 different mammographic equipment under different exposure conditions and digitized using the Lumiscan75 scanner. It was used the ALVIM statistical phantom (4,5cm) and 2 acrylic plates of 1cm (6,5cm) over it. The software named QualIM was developed to manager the images and the database which storages the specialist's readings allowing them digital tools manipulation. The images were displayed on 4 monitors: CRT Philips 19" (2,9Mpixels/8bits), CRT Philips 22" (3,0Mpixels/8bits), LCD Clinton (3,0Mpixels/10bits) and LCD Barco (5,0Mpixels/14bits). The same images group was also displayed on the appropriated view box to mammograms (3200cd/m2) and the readings performance was taken as reference parameter. The software generates on real time Kappa values to microcalcifications and fibers detection, histogram, ROC curve and true/false positive parameters. Barco monitor readings produced superior results when compared with all the others suggesting that there weren't losses in the digitalizing process. Clinton monitor readings were similar the view box results and superior on both Philips monitors when compared the detection of objects on phantom images (6,5cm). The specialist performance results using Philips 22" were only comparable to view box and Clinton for images of 4,5cm. It was possible to verify that the monitors' spatial and contrast resolutions have influenced on the readings performance of specialists, suggesting these characteristics are relevant at lesions detection performance in mammographic exams.
Radiological image presentation requires consideration of human adaptation characteristics
N. M O'Connell, R. J. Toomey, M. McEntee, et al.
Visualisation of anatomical or pathological image data is highly dependent on the eye's ability to discriminate between image brightnesses and this is best achieved when these data are presented to the viewer at luminance levels to which the eye is adapted. Current ambient light recommendations are often linked to overall monitor luminance but this relies on specific regions of interest matching overall monitor brightness. The current work investigates the luminances of specific regions of interest within three image-types: postero-anterior (PA) chest; PA wrist; computerised tomography (CT) of the head. Luminance levels were measured within the hilar region and peripheral lung distal radius and supra-ventricular grey matter. For each image type average monitor luminances were calculated with a calibrated photometer at ambient light levels of 0, 100 and 400 lux. Thirty samples of each image-type were employed, resulting in a total of over 6,000 measurements. Results demonstrate that average monitor luminances varied from clinically-significant values by up to a factor of 4, 2 and 6 for chest, wrist and CT head images respectively. Values for the thoracic hilum and wrist were higher and for the peripheral lung and CT brain lower than overall monitor levels. The ambient light level had no impact on the results. The results demonstrate that clinically important radiological information for common radiological examinations is not being presented to the viewer in a way that facilitates optimised visual adaptation and subsequent interpretation. The importance of image-processing algorithms focussing on clinically-significant anatomical regions instead of radiographic projections is highlighted.
Searching in axial and 3D CT visualisations
Peter Phillips, David Manning, Trevor Crawford, et al.
Traditional diagnostic modalities have been, for the most part, static two-dimensional images displayed on film or computer screen. More recent diagnostic modalities are solely computer-based and consist of large data-sets of multiple images. Image perception and visual search using these new modalities are complicated by the need to interact with the computer in order to navigate through the data. This paper reports the late-breaking results from two small studies into visual search within two types of CT Colonography (CTC) visualisations. The twelve novice observers in the study were taking part in a week-long course in CTC and were tested at the beginning and end of the course. A number of expert observers were also recorded. The two visualisations used in the study were 2D axial view and 3D colon fly-through. In both cases, searching was performed by inspecting the colon wall, but by two distinct mechanisms. The first study recorded observer eye-gaze and image navigation in a CTC axial view. The search strategy was to follow the lumen of the colon and detect abnormalities in the colon wall. The observer used the physical computer interface to navigate through the set of axial images to perform this task. The 3D fly-through study recorded observer eye-gaze whilst watching a recording of a computed flight through the colon lumen. Unlike the axial view there was no computer control, so inspection of the colon surface was dictated by the speed of flight through the colon.