Proceedings Volume 6146

Medical Imaging 2006: Image Perception, Observer Performance, and Technology Assessment

cover
Proceedings Volume 6146

Medical Imaging 2006: Image Perception, Observer Performance, and Technology Assessment

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 6 March 2006
Contents: 7 Sessions, 44 Papers, 0 Presentations
Conference: Medical Imaging 2006
Volume Number: 6146

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Technology Assessment and Impact
  • Keynote and Image Display
  • ROC Methods
  • Observer Performance Evaluation
  • Model Observers
  • Image Perception
  • Poster Session
Technology Assessment and Impact
icon_mobile_dropdown
Can radiologists recognize that a computer has identified cancers that they have overlooked?
Robert M. Nishikawa, Alexandra Edwards, Robert A. Schmidt, et al.
For computer-aided detection (CADe) to be effective, the computer must be able to identify cancers that a radiologist misses clinically and the radiologist must be able to recognize that a cancer was missed when he or she reviews the computer output. There are several papers indicating CADe can detected clinically missed cancers. The purpose of this study is to examine whether radiologists can use the CADe output effectively to detect more cancers. Three-hundred mammographic cases, which included current and previous exams, were collected: 66 cases containing a missed cancer that was recognized in retrospect and 234 were normal cases. These were analyzed by a commercial CADe system. An observer study with eight MQSA-qualified radiologists was conducted using a sequential reading method. That is, the radiologist viewed the mammograms and scored the case. Then they reviewed the CADe output and rescored the case. The computer had a sensitivity of 55% with an average of 0.59 false detections per image. For all cancers (n=69), the radiologists had a sensitivity of 58% with no aid and 64% with aid (p=0.002). In cases where the computer detected the cancer in all views that the cancer was visible (n=17), the radiologists had a sensitivity of 74% unaided and increased to 85% aided (p=0.02). In cases where the computer missed the cancer in one view (n=21), the radiologists had a sensitivity of 65% unaided and 72% aided (p<0.001). The radiologists, on average, ignored 20% of all correct computer prompts.
Actual versus intended use of CAD systems in the clinical environment
Bin Zheng, Denise Chough, Perrin Ronald, et al.
Although computer-aided detection (CAD) systems were designed and approved for and are assumed to be used as a "second-reader", namely radiologists are expected to interpret mammograms and detect suspected abnormalities (i.e., micro-calcification clusters and masses) independently before viewing CAD results, it is not clear whether radiologists in a busy clinical environment follow the intended use. In this study, we observed ten experienced radiologists during the clinical reading of 635 mammography examinations and recorded their workflow pattern in terms of the use of CAD. The observations suggest that for detecting micro-calcification clusters only a few radiologists actually used a magnifying glass to carefully and systematically scan all images. Areas in which no CAD cues were identified for micro-calcifications were largely discarded. The majority of radiologists used CAD for identification micro-calcifications clusters almost as a "pre-screening" tool. In less than 15% of cases with CAD cues for microcalcifications cluster the radiologists actually scanned the complete set of images for possible additional clusters. The majority of more careful searches were performed by only three radiologists who voluntarily admitted they knew they were an exception in regards to their reading style and admitted they personally believed they were also "slower". CAD marks of possible masses were often discarded by the majority of the radiologists in particular when cues appeared only on one view. We found that there was a large difference between the use of CAD for detection of micro-calcifications clusters and masses. In addition, radiologists frequently use CAD in a manner that is substantially different than originally intended.
Mass detection in mammographic ROIs using Watson filters
Swatee Singh, Alan Baydush, Brian Harrawood, et al.
Human vision models have been shown to capture the response of the visual system; their incorporation into the classification stage of a Computer Aided Detection system could improve performance. This study seeks to improve the performance of an automated breast mass detection system by using the Watson filter model versus a Laguerre Gauss Channelized Hotelling Observer (LG-CHO). The LG-CHO and the Watson filter model were trained and tested on a 512x512 ROI database acquired from the Digital Database of Screening Mammography consisting of 800 total ROIs; 200 of which were malignant, 200 were benign and 400 were normal. Half of the ROIs were used to train the weights for ten LG-CHO templates that were later used during the testing stage. For the Watson filter model, the training cases were used to optimize the frequency filter parameter empirically to yield the best ROC Az performance. This set of filter parameters was then tested on the remaining cases. The training Az for the LG-CHO and the Watson filter was 0.896 +/- 0.016 and 0.924 +/- 0.014 respectively. The testing Az for the LG-CHO and Watson filter was 0.849 +/- 0.019 and 0.888 +/- 0.017. With a p-value of 0.029, the difference in testing performance was statistically significant, thus implying that the Watson filter model holds promise for better detection of masses.
A method for assessing the uncertainty in feature selection tasks
Feature selection is a common task in the development of computer-aided diagnosis techniques and other areas of research where there is a need to identify discriminative variables that can be used to separate two classes, e.g., individuals with a certain type of disease and those without. Feature-selection results based on a small sample are not reliable if they are not reproducible in replicated experiments. However, when unlimited large samples are not available, it is often difficult to assess sample size with respect to the reliability of the results of feature selection. We propose a method that could be used for assessing the uncertainty in the results of feature selection. We compute a joint likelihood function of observed ROC data of multiple features of interest conditional on a joint ROC model for the multiple features. From the likelihood function, we compute the posterior distribution (i.e., relative probability) of competing joint ROC models of the multiple features for giving rise to the observed ROC data. The posterior distribution can be used for statistical inference of the performance of the multiple features.
Comparative performance analysis for computer-aided lung nodule detection and segmentation on ultra-low-dose vs. standard dose CT
Rafael Wiemker, Patrik Rogalla, Roland Opfer, et al.
The performance of computer aided lung nodule detection (CAD) and computer aided nodule volumetry is compared between standard-dose (70-100 mAs) and ultra-low-dose CT images (5-10 mAs). A direct quantitative performance comparison was possible, since for each patient both an ultra-low-dose and a standard-dose CT scan were acquired within the same examination session. The data sets were recorded with a multi-slice CT scanner at the Charite university hospital Berlin with 1 mm slice thickness. Our computer aided nodule detection and segmentation algorithms were deployed on both ultra-low-dose and standard-dose CT data without any dose-specific fine-tuning or preprocessing. As a reference standard 292 nodules from 20 patients were visually identified, each nodule both in ultra-low-dose and standard-dose data sets. The CAD performance was analyzed by virtue of multiple FROC curves for different lower thresholds of the nodule diameter. For nodules with a volume-equivalent diameter equal or larger than 4 mm (149 nodules pairs), we observed a detection rate of 88% at a median false positive rate of 2 per patient in standard-dose images, and 86% detection rate in ultra-low-dose images, also at 2 FPs per patient. Including even smaller nodules equal or larger than 2 mm (272 nodules pairs), we observed a detection rate of 86% in standard-dose images, and 84% detection rate in ultra-low-dose images, both at a rate of 5 FPs per patient. Moreover, we observed a correlation of 94% between the volume-equivalent nodule diameter as automatically measured on ultra-low-dose versus on standard-dose images, indicating that ultra-low-dose CT is also feasible for growth-rate assessment in follow-up examinations. The comparable performance of lung nodule CAD in ultra-low-dose and standard-dose images is of particular interest with respect to lung cancer screening of asymptomatic patients.
Keynote and Image Display
icon_mobile_dropdown
Recent developments and contemporary issues for diagnostic technology assessment
Robert F. Wagner
This paper summarizes recent themes of interest to this conference as well as the conferences of the Medical Image Perception Society (MIPS).
Potential use of a large-screen display for interpreting radiographic images
Elizabeth Krupinski, Hans Roehrig, William Berger, et al.
Radiology has readily made the transition to the digital reading room. One commodity left behind when moving to digital displays however is display real estate. Even with multiple monitors radiologists cannot display numerous images as they did on a film alternator. We evaluated a large-screen rear-projection display (Philips Electronics) for potential use in radiology. Resolution was 1920 x 1080 with a 44-inch diagonal size and it was a color display. For comparison we used the IBM 9 Mpixel color display (22-inch diagonal) set to a comparable resolution and maximum luminance. Diagnostic accuracy with a series of bone images with subtle fractures and six observers was comparable (F = 0.3170, p = 0.5743) to traditional computer monitor. Viewing time, however, was significantly shorter (t = 6.723, p < 0.0001) with the large display for both normal and fracture images. On average, readers sat significantly closer (t = 5.578, p = 0.0026) to the small display than the large display. Four of the 6 radiologists preferred the smaller display, judging it to yield a sharper image. Half of the readers thought the black level was better with the large display and half with the small display. Most of the radiologists thought the large-screen display has potential for use in conferencing situations or those in which multiple viewers need to see images simultaneously.
Impact of defective pixels in AMLCDs on the perception of medical images
Tom Kimpe, Yuri Sneyders
With LCD displays, each pixel has its own individual transistor that controls the transmittance of that pixel. Occasionally, these individual transistors will short or alternatively malfunction, resulting in a defective pixel that always shows the same brightness. With ever increasing resolution of displays the number of defect pixels per display increases accordingly. State of the art processes are capable of producing displays with no more than one faulty transistor out of 3 million. A five Mega Pixel medical LCD panel contains 15 million individual sub pixels (3 sub pixels per pixel), each having an individual transistor. This means that a five Mega Pixel display on average will have 5 failing pixels. This paper investigates the visibility of defective pixels and analyzes the possible impact of defective pixels on the perception of medical images. JND simulations were done to study the effect of defective pixels on medical images. Our results indicate that defective LCD pixels can mask subtle features in medical images in an unexpectedly broad area around the defect and therefore may reduce the quality of diagnosis for specific high-demanding areas such as mammography. As a second contribution an innovative solution is proposed. A specialized image processing algorithm can make defective pixels completely invisible and moreover can also recover the information of the defect so that the radiologist perceives the medical image correctly. This correction algorithm has been validated with both JND simulations and psycho visual tests.
Assessment of the influence of display veiling glare on observer and model performance
We evaluated human observer and model (JNDmetrix) performance to assess whether the veiling glare of a digital display influences performance in softcopy interpretation of mammographic images. 160 mammographic images, half with a single mass, were processed to simulate four levels of veiling glare: none, comparable to a typical cathode ray tube (CRT) display, double a CRT and quadruple a CRT. Six radiologist observers were shown the images in a randomized presentation order on a liquid crystal display (LCD) that had relatively no veiling glare. The JNDmetrix human visual system model also analyzed the images. Receiver Operating Characteristic (ROC) techniques showed that performance declined with increasing veiling glare (F = 6.884, p = 0.0035). Quadruple veiling glare yielded significantly lower performance than the lower veiling glare levels. The JNDmetrix model did not predict a reduction in performance with changes in veiling glare, and correlation with the human observer data was modest (0.588). Display veiling glare may influence observer performance, but only at very high levels.
ROC Methods
icon_mobile_dropdown
Optimization of an ROC hypersurface constructed only from an observer's within-class sensitivities
Darrin C. Edwards, Charles E. Metz
We have shown in previous work that an ideal observer in a classification task with N classes achieves the optimal receiver operating characteristic (ROC) hypersurface in a Neyman-Pearson sense. That is, the hypersurface obtained by taking one of the ideal observer's misclassification probabilities as a function of the other N2-N-1 misclassification probabilities is never above the corresponding hypersurface obtained by any other observer. Due to the inherent complexity of evaluating observer performance in an N-class classification task with N>2, some researchers have suggested a generally incomplete but more tractable evaluation in terms of a hypersurface plotting only the N "sensitivities" (the probabilities of correctly classifying observations in the various classes). An N-class observer generally has up to N2-N-1 degrees of freedom, so a given sensitivity will still vary when the other N-1 are held fixed; a well-defined hypersurface can be constructed by considering only the maximum possible value of one sensitivity for each achievable value of the other N-1. We show that optimal performance in terms of this generally incomplete performance descriptor, in a Neyman-Pearson sense, is still achieved by the N-class ideal observer. That is, the hypersurface obtained by taking the maximal value of one of the ideal observer's correct classification probabilities as a function of the other N-1 is never below the corresponding hypersurface obtained by any other observer.
Optimal observer framework and categorization observer framework for three-class ROC analysis
X. He, E. C. Frey
ROC analysis has been an important tool for system evaluation and optimization in medical imaging. Despite its success in evaluating binary classification tasks, ROC analysis does not provide a direct way for evaluating performance on classification tasks that involve more than two diagnostic alternatives. We have previously developed a three-class ROC analysis method that provides a practical way to evaluate three-class task performance. Based on two-class ROC analysis and the proposed three-class ROC analysis method, this work proposes two frameworks, the optimal observer framework and the categorization observer framework, for three-class ROC analysis. The optimal observer framework seeks three-class decision rules and decision variables based on a formal decision strategy; it provides a ROC surface for system comparison on the basis of optimal performance with respect to this strategy. A categorization procedure is the generalization to 3-D of a 2-alternative forced choice procedure and is an important concept in the categorization observer framework. The categorization observer framework seeks three-class decision rules, decision variables and ROC surface such that task performance as measured by volume under the ROC surface (VUS) and the percent correct on the categorization procedure are equal. We then show that how our previously-proposed three-class ROC method fits into both frameworks.
Performance analysis of 3-class classifiers: properties of the 3D ROC surface and the normalized volume under the surface
Many computer-aided diagnostic problems involve three-class classification (e.g., abnormal (malignant), normal, and benign classes). A general treatment of the performance of a 3-class classifier results in a complex sixdimensional (6D) ROC space for which no analysis tools are available at present. A practical paradigm for performance analysis and a figure of merit (FOM) are yet to be developed. The area Az under the ROC curve plays an important role in the evaluation of two-class classifiers. An analogous FOM for three-class classification is highly desirable. We have been investigating conditions for reducing the 6D ROC space to 3D by constraining the utilities of some of the decisions in the classification task. These assumptions lead to a 3D ROC space in which the true-positive fraction (TPF) of one class can be expressed as a function of the two types of false-positive fractions (FPFs) from the other two classes. In this study, we investigated the properties of the 3D ROC surface for a maximum-likelihood classifier under the condition that the three class distributions in the feature space are multivariate normal with equal covariance matrices. Under the same conditions, we studied the properties of the normalized volume under the surface (NVUS) that was previously proposed as an FOM for a 3-class classifier. Under the equal covariance Gaussian assumption, it can be shown that the probability density functions of the decision variables follow a bivariate log-normal distribution. By considering these probability density functions, we can express the TPF in terms of the FPFs, and numerically integrate over the domain of the TPF to obtain the NVUS. We have compared the NVUS value obtained by directly integrating the probability density functions to that obtained from a Monte Carlo simulation study in which the 3D ROC surface was generated by empirical "optimal" classification of case samples in the multi-dimensional feature space following the assumed distributions. The NVUS values obtained from the direct integration method and the Monte Carlo simulation were found to be in good agreement. Our results indicate that the NVUS exhibits many of the intuitive properties that a proper FOM should satisfy and may be used as a performance index under the conditions that we imposed on the utilities.
Exploring FROC paradigm: initial experience with clinical applications
Design of receiver-operating characteristic (ROC)-based paradigm and data analysis methods that capture and adequately address the complexity of clinical decision-making will facilitate the development and evaluation of new image acquisition, reconstruction and processing techniques. We compare the JAFROC (Jackknife free-response ROC) to traditional ROC paradigm and analysis in two image evaluation studies. The first study is designed to address advantages of "free response" features of JAFROC paradigm in a breast lesion detection task. We developed tools allowing the acquisition of FROC-type rating data and use them on a set of simulated scintimammography images. Four observers participated in a small preliminary study. Rating data are then analyzed using traditional ROC and JAFROC techniques. The second study is aimed at comparing the diagnostic quality of myocardial perfusion SPECT (MPS) images obtained with different quantitative image reconstruction and compensation methods in an ongoing clinical trial. The observer assesses status of each of the three main vascular territories for each patient. This experimental set-up uses standardized locations of vascular territories on myocardial polar plot images, and a fixed number of three rating scores per patient. We compare results from the newly available JAFROC versus the traditional ROC analysis technique previously applied in similar studies, using a set of data from an on-going clinical trial. Comparison of two analysis methodologies reveals generally consistent behavior, corresponding to theoretical predictions.
LROC assessment of nonlinear filtering methods in Ga-67 SPECT imaging
Stijn De Clercq, Steven Staelens, Jan De Beenhouwer, et al.
In emission tomography, iterative reconstruction is usually followed by a linear smoothing filter to make such images more appropriate for visual inspection and diagnosis by a physician. This will result in a global blurring of the images, smoothing across edges and possibly discarding valuable image information for detection tasks. The purpose of this study is to investigate which possible advantages a non-linear, edge-preserving postfilter could have on lesion detection in Ga-67 SPECT imaging. Image quality can be defined based on the task that has to be performed on the image. This study used LROC observer studies based on a dataset created by CPU-intensive Gate Monte Carlo simulations of a voxelized digital phantom. The filters considered in this study were a linear Gaussian filter, a bilateral filter, the Perona-Malik anisotropic diffusion filter and the Catte filtering scheme. The 3D MCAT software phantom was used to simulate the distribution of Ga-67 citrate in the abdomen. Tumor-present cases had a 1-cm diameter tumor randomly placed near the edges of the anatomical boundaries of the kidneys, bone, liver and spleen. Our data set was generated out of a single noisy background simulation using the bootstrap method, to significantly reduce the simulation time and to allow for a larger observer data set. Lesions were simulated separately and added to the background afterwards. These were then reconstructed with an iterative approach, using a sufficiently large number of MLEM iterations to establish convergence. The output of a numerical observer was used in a simplex optimization method to estimate an optimal set of parameters for each postfilter. No significant improvement was found for using edge-preserving filtering techniques over standard linear Gaussian filtering.
Observer Performance Evaluation
icon_mobile_dropdown
Incorporating detection tasks into the assessment of CT image quality
E. M. Scalzetti, W. Huda, K. M. Ogden, et al.
The purpose of this study was to compare traditional and task dependent assessments of CT image quality. Chest CT examinations were obtained with a standard protocol for subjects participating in a lung cancer-screening project. Images were selected for patients whose weight ranged from 45 kg to 159 kg. Six ABR certified radiologists subjectively ranked these images using a traditional six-point ranking scheme that ranged from 1 (inadequate) to 6 (excellent). Three subtle diagnostic tasks were identified: (1) a lung section containing a sub-centimeter nodule of ground-glass opacity in an upper lung (2) a mediastinal section with a lymph node of soft tissue density in the mediastinum; (3) a liver section with a rounded low attenuation lesion in the liver periphery. Each observer was asked to estimate the probability of detecting each type of lesion in the appropriate CT section using a six-point scale ranging from 1 (< 10%) to 6 (> 90%). Traditional and task dependent measures of image quality were plotted as a function of patient weight. For the lung section, task dependent evaluations were very similar to those obtained using the traditional scoring scheme, but with larger inter-observer differences. Task dependent evaluations for the mediastinal section showed no obvious trend with subject weight, whereas there the traditional score decreased from ~4.9 for smaller subjects to ~3.3 for the larger subjects. Task dependent evaluations for the liver section showed a decreasing trend from ~4.1 for the smaller subjects to ~1.9 for the larger subjects, whereas the traditional evaluation had a markedly narrower range of scores. A task-dependent method of assessing CT image quality can be implemented with relative ease, and is likely to be more meaningful in the clinical setting.
A comparison of 2D and 3D evaluation methods for pulmonary embolism detection in CT images
Atilla P. Kiraly, Carol L. Novak, David P. Naidich, et al.
Pulmonary embolism (PE) is a life-threatening disease, requiring rapid diagnosis and treatment. Contrast enhanced computed tomographic (CT) images of the lungs allow physicians to confirm or rule out PE, but the large number of images per study and the complexity of lung anatomy may cause some emboli to be overlooked. We evaluated a novel three-dimensional (3D) visualization technique for detecting PE, and compared it with traditional 2D axial interpretation. Three readers independently marked 10 cases using the 3D method, and a separate interpretation was performed at a later date using only source axial images. An experienced thoracic radiologist adjudicated all marks, classifying clots according to location and confidence. There were a total of 8 positive examinations with 69 validated emboli. 44 (64%) of the clots were segmental while 12 (17%) proved subsegmental. Using the traditional 2D method for examination, readers detected a mean of 45 PE for 66% sensitivity. Using the 3D method, readers detected a mean of 35 PE (50% sensitivity). Combining both methods, readers detected a mean of 51 PE (74% sensitivity), significantly higher than either single method (p<0.001). Considered by arterial level, significant improvement was observed for detection of segmental and subsegmental clots (p<0.001) when comparing combined reading with either single method. The mean number of false positives per patient was 0.23 for both 2D and 3D readings and 0.4 for combined reading. 3D visualization of pulmonary arteries allowed readers to detect a significant number of additional emboli not detected during 2D axial interpretations and thus may lead to a more accurate diagnosis of PE.
Analyzing the effect of dose reduction on the detection of mammographic lesions using mathematical observer models
The purpose of this study was to determine the effect of dose reduction on the detectability of breast lesions in mammograms. Mammograms with dose levels corresponding to 50% and 25% of the original clinically-relevant exposure levels were simulated. Detection of masses and microcalicifications embedded in these mammograms was analyzed by four mathematical observer models, namely, the Hotelling Observer, Non-prewhitening Matched Filter with Eye Filter (NPWE), and Laguerre-Gauss and Gabor Channelized Hotelling Observers. Performance was measured in terms of ROC curves and Area under ROC Curves (AUC) under Signal Known Exactly but Variable Tasks (SKEV) paradigm. Gabor Channelized Hotelling Observer predicted deterioration in detectability of benign masses. The other algorithmic observers, however, did not indicate statistically significant differences in the detectability of masses and microcalcifications with reduction in dose. Detection of microcalcifications was affected more than the detection of masses. Overall, the results indicate that there is a potential for reduction of radiation dose level in mammographic screening procedures without severely compromising the detectability of lesions.
Parallel reconstructions of MRI: evaluation using detection and perceptual difference studies
Parallel imaging using multiple coils and sub-sampled k-space data is a promising fast MR image acquisition technique. We used detection studies and perceptual difference models on image data with ¼ sampling to evaluate three different reconstruction methods: a regularization method developed by Ying and Liang of UIUC, a simplified regularization method, and an iterative method. We also included images obtained from a full complement of k-space data as "gold standard" images. Detection studies were performed using a simulated dark tumor added on MR images of bovine liver. We found that human detection depended strongly on the reconstruction methods used, with the simplified regularization and UIUC methods achieving better performance than the iterative method. We also evaluated images using detection with a Channlized Hotelling Observer (CHO) model and with a Perceptual Difference Model (PDM). Both predicted the same trends as observed in the human detection studies. We are encouraged that PDM gives trends similar to that for detection studies. Its ease of use and applicability to a variety of MR imaging situations make it attractive for evaluating image quality in a variety of MR studies.
Using perceptual difference model to improve GRAPPA reconstruction in MRI
GRAPPA is a popular reconstruction technique in parallel imaging. In GRAPPA, a least-squares technique is used to solve the over-determined equations and get the "fitting" coefficients for the reconstruction. We developed the Robust GRAPPA method whereby robust estimation techniques are used to estimate the coefficients with discounting of k-space data outliers. One implementation, Slow Robust GRAPPA used iteratively re-weighted techniques, and it was compared to an ad hoc Fast Robust GRAPPA implementation. We evaluated these new algorithms using the Perceptual Difference Model (PDM). PDM has already been successfully applied to a variety of MR applications. We systematically investigated independent variables including algorithm, outer reduction factor, total reduction factor, outlier ratio, and noise across multiple image datasets, giving 9000 images. We conclude that Fast Robust GRAPPA method gives results very similar to Slow Robust GRAPPA and that both give significant improvements as compared to standard GRAPPA. PDM is very helpful in designing and optimizing the MR reconstruction algorithms.
Model Observers
icon_mobile_dropdown
The efficiency of reading around learned backgrounds
Miguel P. Eckstein, Binh T. Pham, Craig K. Abbey, et al.
Most metrics of medical image quality typically treat all variability components of the background as a Gaussian noise process. This includes task based model observers (non-prewhitening matched filter without and with an eye filter, NPW and NPWE; Hotelling and Channelized Hotelling) as well as Fourier metrics of medical image quality based on the noise power spectra. However, many investigators have observed that unlike many of the models/metrics, physicians often can discount signal-looking structures that are part of the normal anatomic background. This process has been referred to as reading around the background or noise. The purpose of this paper is to develop an experimental framework to systematically study the ability of human observers to read around learned backgrounds and compare their ability to that of an optimal ideal observer which has knowledge of the background. We measured human localization performance of one of twelve targets in the presence of a fixed background consisting of randomly placed Gaussians with random contrasts and sizes, and white noise. Performance was compared to a condition in which the test images contained only white noise but with higher contrast. Human performance was compared to standard model observers that treat the background as a Gaussian noise process (NPW, NPWE and Hotelling), a Fourier-based prewhitening matched filter, and an ideal observer. The Hotelling, NPW, NPWE models as well as the Fourier-based prewhitening matched filter predicted higher performance for the white noise test images than the background plus white noise. In contrast, ideal and human performance was higher for the background plus white noise condition. Furthermore, human performance exceeded that of the NPW, NPWE and Hotelling models and reached an efficiency of 19% relative to the ideal observer. Our results demonstrate that for some types of images human signal localization performance is consistent with use of knowledge about the high order moments of the backgrounds to discount signal-looking structures that belong to the background. In such scenarios model observers and metrics that either ignore the background or treat the background as a Gaussian process (Hotelling, Channelized Hotelling, Task-based SNR) under predict human performance.
Observer efficiency in boundary discrimination tasks related to assessment of breast lesions with ultrasound
Craig K. Abbey, Roger J. Zemp, Jie Liu, et al.
The statistical efficiency of human observers in diagnostic tasks is an important measure of how effectively task relevant information in the image is being utilized. Most efficiency studies have investigated efficiency in terms of contrast or size effects. In many cases, malignant lesions will have similar contrast to normal or benign objects, but can be distinguished by properties of their boundary. We investigate this issue in the framework of malignant/benign discrimination tasks for the breast with ultrasound. In order to identify effects in terms of specific features and to control for other effects such as aberration or specular reflections, we simulate the formation of beam-formed radio-frequency (RF) data. We consider three tasks related to lesion boundaries including boundary eccentricity, boundary sharpness, and detection of boundary spiculations. We also consider standard detection and contrast discrimination tasks. We find that human observers exhibit surprisingly low efficiency with respect to the Ideal observer acting on RF data in boundary discrimination tasks (0.08%-3.3%), and that efficiency of human observers is substantially increased by Wiener-filtering RF frame data. We also find a limitation in efficiency is the computation of an envelope image from the RF data recorded by the transducer. Approximations to the Ideal observer acting on the envelope images indicate that humans may be substantially more efficient (10%-75%) with respect to the envelope Ideal observers. Our work suggests that significant diagnostic information may be lost in standard envelope processing in the formation of ultrasonic images.
Performance of a channelized-ideal observer using Laguerre-Gauss channels for detecting a Gaussian signal at a known location in different lumpy backgrounds
The Bayesian ideal observer gives a measure for image quality since it uses all available statistical information for a given image data. A channelized-ideal observer (CIO), which reduces the dimensionality of integrals that need to be calculated for the ideal observer, has been introduced in the past. The goal of the CIO is to approximate the performance of the ideal observer in certain detection tasks. In this work, a CIO using Laguerre-Gauss (LG) channels is employed for detecting a rotationally symmetric Gaussian signal at a known location in the non-Gaussian distributed lumpy background. The mean number of lumps in the lumpy background is varied to see the impact of image statistics on the performance of this CIO and a channelized-Hotelling observer (CHO) using the same channels. The width parameter of LG channels is also varied to see its impact on observer performance. A Markov-chain Monte Carlo (MCMC) method is employed to determine the performance of the CIO using large numbers of LG channels. Simulation results show that the CIO is a better observer than the CHO for the task. The results also indicate that the performance of the CIO approaches that of the ideal observer as the mean number of lumps in the lumpy background decreases. This implies that LG channels may be efficient for the CIO to approximate the performance of the ideal observer in tasks using non-Gaussian distributed lumpy backgrounds.
Human efficiency for detecting Gaussian signals in non-Gaussian distributed lumpy backgrounds using different display characteristics and scaling methods
A psychophysical study to measure human efficiency relative to the ideal observer for detecting a Gaussian signal at a known location in a non-Gaussian distributed lumpy background was conducted previously by the author. The study found that human efficiency was less than 2.2% while the ideal observer achieved 0.95 in terms of the area under the receiver operating characteristic curve. In this work, a psychophysical study was conducted with a number of changes that substantially improve upon the previous study design. First, a DICOM-calibrated monitor was used in this study for human-observer performance measurements, whereas an uncalibrated LCD monitor was used in the prior study. Second, two scaling methods to display image values were employed to see how scaling affects human performance for the same task. Third, a random order of image pairs was chosen for each observer to reduce any correlations in human performance that is likely caused by the fixed ordering of the image pairs in the prior study. Human efficiency relative to the ideal observer was found to be less than 3.8$% at the same performance level of the ideal observer as above. Our variance analysis indicates that neither scaling nor display made a significant difference in human performance on the tasks in the two studies. That is, we cannot say that either of these factors caused low human efficiency in these studies. Therefore, we conclude that using highly non-Gaussian distributed lumpy backgrounds in the tasks may have been the major source of low human efficiency.
Image Perception
icon_mobile_dropdown
Variability in the interpretation of mammograms: Do similar decisions entail similar visual sampling strategies?
Eye position tracking has shown that, both in searching for lung nodules and in searching for breast cancer, most malignant lesions that are not reported do in fact attract the radiologists' visual attention, often for as long as cancers that are reported. Hence, detection is not the main problem for most radiologists; rather, image interpretation is the underlying reason why detected findings go unreported. According with models of image perception, the decision to report or to dismiss a perceived finding is made based upon not only on the conspicuity of the local elements but also on the selection of certain areas of the background parenchyma, which the radiologist uses to compare the finding against and thus determine its uniqueness. This sampling of the background corresponds to a visual search strategy. Nonetheless, several studies have shown that the final pattern observed in visual search is particular for each observer, and that even when the same observer searches the same image the pattern is likely to be different the second time around. This has led to the assumption that visual search is random. In this study we compare the visual sampling strategy of experienced mammographers as they search a case set of mammograms looking for benign and malignant masses. We determine whether similar decisions, that is, decisions to report the same findings, entail the sampling of similar areas of the background, regardless of in which order these areas were sampled.
Lesion detection using an a-contrario detector in simulated digital mammograms
Burgess showed that lesion detectability does have a non-trivial behavior with textured mammographic backgrounds: the threshold detectability occurs when the log contrast is linearly related to the log size with positive slope. Grosjean et al. proposed the a-contrario detector as an acceptable observer for detection on such backgrounds. In this study, we quantitatively simulated projected breast images containing lesions with a variety of sizes and thicknesses, for a 55 mm thick, 50/50 glandular breast and with different textured background types generated by the power-law filtered noise model proposed by Burgess. The acquisition parameters used in the simulation correspond to the optimal techniques provided by a digital mammography system for that specific breast. Images have been automatically scored by the a-contrario detector in order to find the minimum thickness of the lesion needed to reach the detection threshold. Taking into account the Fourier spectrum properties of the breast texture and using the a-contrario observer as a new metric for the detection task, we found the same detection slopes as described by Burgess. With our quantitative simulation, which includes a realistic image chain of a digital mammography system, and with the implementation of a novel detection process, we found that for the considered lesion sizes, lesions are easier to detect on textures with a high value of power-law exponent.
Lesion removal and lesion addition algorithms in lung volumetric data sets for perception studies
Mark T. Madsen, Kevin S. Berbaum, Andrew Ellingson M.D., et al.
Image perception studies of medical images provide important information about how radiologists interpret images and insights for reducing reading errors. In the past, perception studies have been difficult to perform using clinical imaging studies because of the problems associated with obtaining images demonstrating proven abnormalities and appropriate normal control images. We developed and evaluated interactive software that allows the seamless removal of abnormal areas from CT lung image sets. We have also developed interactive software for capturing lung lesions in a database where they can be added to lung CT studies. The efficacy of the software to remove abnormal areas of lung CT studies was evaluated psychophysically by having radiologists select the one altered image from a display of four. The software for adding lesions was evaluated by having radiologists classify displayed CT slices with lesions as real or artificial scaled to 3 levels of confidence. The results of these experiments demonstrated that the radiologist had difficulty in distinguishing the raw clinical images from those that had been altered. We conclude that this software can be used to create experimental normal control and "proven" lesion data sets for volumetric CT of the lung fields. We also note that this software can be easily adapted to work with other tissue besides lung and that it can be adapted to other digital imaging modalities.
Mammographic texture synthesis using genetic programming and clustered lumpy background
Cyril Castella, Karen Kinkel, François Descombes, et al.
In this work we investigated the digital synthesis of images which mimic real textures observed in mammograms. Such images could be produced in an unlimited number with tunable statistical properties in order to study human performance and model observer performance in perception experiments. We used the previously developed clustered lumpy background (CLB) technique and optimized its parameters with a genetic algorithm (GA). In order to maximize the realism of the textures, we combined the GA objective approach with psychophysical experiments involving the judgments of radiologists. Thirty-six statistical features were computed and averaged, over 1000 real mammograms regions of interest. The same features were measured for the synthetic textures, and the Mahalanobis distance was used to quantify the similarity of the features between the real and synthetic textures. The similarity, as measured by the Mahalanobis distance, was used as GA fitness function for evolving the free CLB parameters. In the psychophysical approach, experienced radiologists were asked to qualify the realism of synthetic images by considering typical structures that are expected to be found on real mammograms: glandular and fatty areas, and fiber crossings. Results show that CLB images found via optimization with GA are significantly closer to real mammograms than previously published images. Moreover, the psychophysical experiments confirm that all the above mentioned structures are reproduced well on the generated images. This means that we can generate an arbitrary large database of textures mimicking mammograms with traceable statistical properties.
Perceptually limited modality-adaptive medical image watermarking
Increasingly widespread communication of medical images across multiple user systems raises concerns for image security, for which digital image watermarking offers one solution. While digital image watermarking methods have been widely studied, much less attention has been paid to their application in medical imaging situations, due partially to speculations on loss in viewer performance due to degradation of image information. Such concerns are addressed if the amount of information lost due to watermarking can be kept at minimal levels and certainly well below visual perception thresholds. This paper describes a method for applying watermarks to medical images on a locally varying basis, so as to ensure that the visual impact of changes in pixel values is minimal. The method uses an adaptive approach based on 8x8 blocks of pixels, and takes into account the imaging modality and local image contents according to common perceptual information models, when determining the amount of watermark payload information to be encoded. It is assumed that "light" watermarking is desirable, and some typical examples of watermarks that might be used (e.g. patient identifiers and examination details) are provided to substantiate this position. Experimental results for typical CT and MR images are presented, and the performance of the method across a range of different choices of parameters is analysed. This would be useful in situations where images are manipulated on and transferred between many different independent image storage systems (including PACS), which would not allow the integrity of the image data to be assured.
Optimum ambient lighting conditions for the viewing of softcopy radiological images
Purpose The aim of the work is to determine the optimum ambient lighting conditions for viewing softcopy radiological images on LCD. Materials and Methods The study measured the diagnostic performance of observers viewing images on liquid crystal display (LCD) monitor under different ambient lighting conditions: 480, 100, 40, 25 and 7lux. An ROC analysis was performed as a measure of diagnostic performance. A set of 30 postero-anterior wrist images was used, 15 of which had fractures present the remainder were normal. These were evaluated by 79 American Board of Radiology certified experienced Radiologists. Results The observers performed better at 40 and 25lux compared with 480 and 100 lux. At 7lux, the observers' performance was generally similar to that at 480 and 100lux. Conclusion Using the previously recommended ambient lighting levels of 100lux resulted in no improvement over typical office lighting of 480lux. Lower ambient lighting levels ranging from 40-25lux improves diagnostic performance over higher levels. Lowering ambient lighting to 7lux (almost complete darkness apart from the light emanating from the monitor) reduces diagnostic performance to a level equal to that of typical office lighting. It is clearly important to control ambient lighting to ensure that diagnostic performance is maximized.
Poster Session
icon_mobile_dropdown
Does mammographic practice affect film reading style: breast screening vs. symptomatic radiologists?
In the UK there are two groups of radiologists who routinely read mammographic cases: Symptomatic and Screening Radiologists. We examined the performance of these two film-reading populations, Breast Screening Radiologists and Symptomatic Radiologists, to evaluate if there were group differences in their "style" of reading the same set of cases. Specifically we looked at each group's sensitivity and specificity measures. In addition we investigated if there were any individual group differences apparent in the cases which they found challenging and what (if any) were the characteristics of those cases. Data from 66 Breast Screening Radiologists and a matched group of 66 Symptomatic Radiologists were compared over a number of years (360 cases). Results are presented which demonstrate that whilst the two groups show overall similarities in performance there exist subtle underlying differences which we attribute to the differences in their everyday experience of the types of cases that they read. In conclusion, we argue that these differences are related to the volume of cases which UK Screening Radiologists read in order to maintain skill level.
A proposal of the diagnosis-dynamic characteristic (DDC) model describing the relation between search time and confidence levels for a dichotomous judgment, and its application to ROC curve generation
Toru Matsumoto, Nobuo Fukuda, Akira Furukawa, et al.
When physicians inspect an image, they make up a certain degree of confidence that the image are abnormal; p(t), or normal; n(t)[n(t)=1-p(t)]. After infinite time of the inspection, they reach the equilibrium levels of the confidence of p*=p(∞) and n*=n(∞). There are psychological conflicts between the decisions of normal and abnormal. We assume that the decision of "normal" is distracted by the decision of "abnormal" by a factor of k(1 + ap), and in an inverse direction by a factor of k(1 + bn), where k ( > 0) is a parameter that relates with image quality and skill of the physicians, and a and b are unknown constants. After the infinite time of inspection, the conflict reaches the equilibrium, which satisfies the equation, k(1 + ap*)n* = k(1 + bn*)p*. Here we define a parameter C, which is 2p*/[p*(1 - p*)]. After the infinite time of inspection, the conflict reaches the equilibrium, which satisfies t that changes in the confidence level with the time (dp/dt) is proportional to [k(1+ap)n - k(1+bn)p], i.e. k[-cp2 + (c - 2)p + 1]. Solving the differential equation, we derived the equation; t(p) and p(t) depending with the parameters; k, c, S. S (0-1) is the value arbitrary selected and related with probability of "abnormal" before the image inspection (S = p(0)). Image reading studies were executed for CT images. ROC curves were generated both by the traditional 4-step score-based method and by the confidence level; p estimated from the equation t(p) of the DDC model using observed judgment time. It was concluded that ROC curves could be generated by measuring time for dichotomous judgment without the subjective scores of diagnostic confidence and applying the DDC model.
Observer performance detecting signals in globally non-stationary oriented noise
Most of the studies on signal detection task for medical images have used backgrounds that are or assumed to be statistically stationary. However, medical images usually present statistically non-stationary properties. Fewer studies have addressed how humans detect signals in non-stationary backgrounds. In particular, it is unknown whether humans can adapt their strategy to different local statistical properties in non-stationary backgrounds. In this paper, we measured human performance detecting a signal embedded in statistically non-stationary noise and in statistically stationary noise. Test images were designed so that performance of model observers that assumed statistically stationary and made no use of differences in local statistics would be constant across both conditions. In contrast, performance of an ideal model observer that uses local statistics is about 140% higher with the non-stationary backgrounds than the stationary ones. Human performance was 30% higher in the non-stationary backgrounds. We conclude that humans can adapt their strategy to the local statistical properties of non-stationary backgrounds (although suboptimally compared to the ideal observer) and that model observers that derive their templates based on stationary assumptions might be inadequate to predict human performance in some non-stationary backgrounds.
An applicability research on JND model
In medical radiation department, physician makes his diagnosis by surveying the images represented to CRT or film. Due to the inherent characteristic of radiograph, the pathology features always exhibit in the form of small size image signal with low contrast and noisy staining. To investigate the detectivity of human vision on these will help to solve the problem in what condition just-noticeable differences can be detected, and then to choose proper x-ray tube voltage and current to make x-ray exposure. In this work, a software of improved 4-forced choice experiment is developed. This experiment is performed to test observer performance in more than 600 groups respectively. By ROC analyzing, an exact range of some image parameters is detected to satisfy the Rose model. Some results are obtained as follows: when observer can detect JND with 50% TPF, the minimum contrast is approximately 1%, while background intensity is 20% of the maximum intensity and the value of k in Rose model approximately varies from 2 to 3 as target area changing. The minimum contrast decreases when the background intensity is above 20%.
How does mass lesion detection vary with display window width and radiation exposure in computed radiography?
K. M. Ogden, W. Huda, V. Garg, et al.
In this study, we investigated lesion detection using a computed radiography (CR) imaging system in the absence of any anatomical background. Parameters investigated were the lesion size, display window width, and radiation exposure. Uniform CR exposures were obtained for an air kerma at the receptor between approximately 1 μGy and 10 μGy, which are typical values for radiographic imaging. We added simulated spherical lesions ranging from 1 to 5.5 mm. Observer detection performance was measured using a 4 Alternate Forced Choice (4-AFC) method that determines the lesion intensity (contrast) needed to achieve 92% accuracy (I92%). We generated contrast detail curves (I92% vs lesion size) for the various radiation doses, and display contrast settings that were varied by a factor of approximately eight. All contrast-detail curves showed a linear relationship between detection threshold intensity (I92%) and lesion size. Measured contrast detail slopes were -0.65 at 1 μGy, and -0.71 at 10 μGy. Increasing the radiation exposure by a factor of ten improved detection of 1 mm size lesions by a factor of 2.25 and 5.5 mm lesions by a factor of 1.8. Decreasing display contrast by a factor of four reduced lesion visibility by ~25% for exposures > 5 μGy, and by ~10% for exposures of 1 μGy. Reductions in lesion visibility with lower display contrast were similar for all lesion sizes. Detection performance generally improves when the radiation dose and/or display contrast are increased. At the lowest radiation dose (1 μGy), display contrast is less important than at doses above 5 μGy.
First validation of a new phantom for global quality control in digital mammography
H. Bosmans, K. Nijs, K. Young, et al.
As part of an EC funded project, the design for a new phantom has been proposed that consists of a smaller contrast-detail part than the CDMAM phantom and that contains items for other parts of an acceptance protocol for digital mammography. A first prototype of the "DIGIMAM" has been produced. Both the CDMAM phantom and the DIGIMAM phantom were then used on a series of systems and read out as a part of a multi centre study. The results with the new phantom were very similar to results obtained with the CDMAM phantom: readers scored different from each other and there was an overlap in the scores for the different systems. A system with a poor score in CDMAM had also the worst score for DIGIMAM. Reading time was significantly reduced however. There was promising agreement between automated reading of CDMAM and the scores of the DIGIMAM phantom. In order to reduce the subjectivity of the readings, computerized reading of the DIGIMAM should be developed. In a second version of the phantom, we propose to add more disks of the same size and contrast in each square to improve the statistical power of each reading.
Characterization of a new generation of computed radiography system based on line scanning and phosphor needles
Octavian Dragusin, Frank Rogge, Herman Pauwels, et al.
A new generation CR system that is based on phosphor needles and that uses a digitizer with line scan technology was compared to a clinically used CR system. Purely technical and more clinically related tests were run on both systems. This included the calculation of the DQE, signal-to-noise and contrast to noise ratios from Aluminum inserts, contrast detail analysis with the CDRAD phantom and the use of anthropomorphic phantoms (wrist, chest and skull) with scoring by a radiologist. X-ray exposures with various dose levels and 50kV, 70kV and 125kV were acquired. For detector doses above 0.8 μGy, all noise related measurements showed the superiority of the new technology. The MTF confirmed the improvement in sharpness: between 1 and 3 lp/mm increases ranged from 20 to 50%. Further work should be devoted to the determination of the required dose levels in the plate for the different radiological applications.
Potential for lower absorbed dose in digital mammography: a JAFROC experiment using clinical hybrid images with simulated dose reduction
Purpose: To determine how image quality linked to tumor detection is affected by reducing the absorbed dose to 50% and 30% of the clinical levels represented by an average glandular dose (AGD) level of 1.3 mGy for a standard breast according to European guidelines. Materials and methods: 90 normal, unprocessed images were acquired from the screening department using a full-field digital mammography (FFDM) unit Mammomat Novation (Siemens). Into 40 of these, one to three simulated tumors were inserted per image at various positions. These tumors represented irregular-shaped malignant masses. Dose reduction was simulated in all 90 images by adding simulated quantum noise to represent images acquired at 50% and 30% of the original dose, resulting in 270 images, which were subsequently processed for final display. Four radiologists participated in a free-response receiver operating characteristics (FROC) study in which they searched for and marked suspicious positions of the masses as well as rated their degree of suspicion of occurrence on a one to four scale. Using the jackknife FROC (JAFROC) method, a score between 0 and 1 (where 1 represents best performance), referred to as a figure-of-merit (FOM), was calculated for each dose level. Results: The FOM was 0.73, 0.70, and 0.68 for the 100%, 50% and 30% dose levels, respectively. Using Analysis of the Variance (ANOVA) to test for statistically significant differences between any two of the three FOMs revealed that they were not statistically distinguishable (p-value of 0.26). Conclusion: For the masses used in this experiment, there was no significant change in detection by increasing quantum noise, thus indicating a potential for dose reduction.
Development and implementation of a user friendly and automated environment for the creation of databases of digital mammograms with simulated microcalcifications
Federica Zanca, Jurgen Jacobs, Paula Pöyry, et al.
A database of raw composite mammograms containing simulated microcalcifications was generated. Databases can be used for technology assessment, quality assurance and comparison of different processing algorithms or different visualization modalities in digital mammography. Clinical mammograms were selected and fully documented for this scope. Microcalcifications were simulated in mammography images following a methodology developed and validated in an earlier work of our group. To create microcalcification templates, specimen containing lesions with different morphology types were acquired. From a basic set of (ideal) microcalcification templates, a set of specific templates for the systems under study was generated. The necessary input to do so is the system MTF and attenuation values of aluminum sheets with different thickness. In order to make the whole process less time consuming and applicable on a large scale, dedicated software tools for the creation of composite images have been developed. Automatic analysis of scores from observer performance study, in terms of microcalcification detectability on the composite images, is also implemented. We report on the functionalities foreseen in these new software tools. Simulated microcalcifications were successfully created and inserted in raw images of the Siemens Novation DR, the AGFA DM1000 and the AGFA CR MM2.0.
The effect of data set size on computer-aided diagnosis of breast cancer: comparing decision fusion to a linear discriminant
Jonathan L. Jesneck, Loren W. Nolte, Jay A. Baker M.D., et al.
Data sets with relatively few observations (cases) in medical research are common, especially if the data are expensive or difficult to collect. Such small sample sizes usually do not provide enough information for computer models to learn data patterns well enough for good prediction and generalization. As a model that may be able to maintain good classification performance in the presence of limited data, we used decision fusion. In this study, we investigated the effect of sample size on the generalization ability of both linear discriminant analysis (LDA) and decision fusion. Subsets of large data sets were selected by a bootstrap sampling method, which allowed us to estimate the mean and standard deviation of the classification performance as a function of data set size. We applied the models to two breast cancer data sets and compared the models using receiver operating characteristic (ROC) analysis. For the more challenging calcification data set, decision fusion reached its maximum classification performance of AUC = 0.80±0.04 at 50 samples and pAUC = 0.34±0.05 at 100 samples. The LDA reached a lower performance and required many more cases, with a maximum of AUC = 0.68±0.04 and pAUC = 0.12±0.05 at 450 samples. For the mass data set, the two classifiers had more similar performance, with AUC = 0.92±0.02 and pAUC = 0.48±0.02 at 50 samples for decision fusion and AUC = 0.92±0.03 and pAUC = 0.55±0.04 at 500 samples for the LDA.
Comparison of sensitivity and reading time for the use of computer aided detection (CAD) of pulmonary nodules at MDCT as concurrent or second reader
F. Beyer, L. Zierott, E. M. Fallenberg, et al.
Purpose: To compare sensitivity and reading time when using CAD as second reader resp. concurrent reader. Materials and Methods: Fifty chest MDCT scans due to clinical indication were analysed independently by four radiologists two times: First with CAD as concurrent reader (display of CAD results simultaneously to the primary reading by the radiologist); then after a median of 14 weeks with CAD as second reader (CAD results were shown after completion of a reading session without CAD). A prototype version of Siemens LungCAD (Siemens,Malvern,USA) was used. Sensitivities and reading times for detecting nodules ≥4mm of concurrent reading, reading without CAD and second reading were recorded. In a consensus conference false positive findings were eliminated. Student's T-Test was used to compare sensitivities and reading times. Results: 108 true positive nodules were found. Mean sensitivity was .68 for reading without CAD, .68 for concurrent reading and .75 for second reading. Differences of sensitivities were significant between concurrent and second reading (p<.001) resp. reading without CAD and second reading (p=.001). Mean reading time for concurrent reading was significant shorter (274s) compared to reading without CAD (294s;p=.04) and second reading (337s;p<.001). New work to be presented: To our knowledge this is the first study that compares sensitivities and reading times between use of CAD as concurrent resp. second reader. Conclusion: CAD can either be used to speed up reading of chest CT cases for pulmonary nodules without loss of sensitivity as concurrent reader -OR (and not AND) to increase sensitivity and reading time as second reader.
Potential effect of CAD systems on the detection of actionable nodules in chest CT scans during routine reporting
Dag Wormanns, Florian Beyer, Arnauld Butzbach, et al.
The purpose of the presented study was to determine the impact of two different CAD systems used as concur-rent reader for detection of actionable nodules (>4 mm) on the interpretation of chest CT scans during routine reporting. Fifty consecutive MDCT scans (1 mm or 1.25 mm slice thickness, 0.8 mm reconstruction increment) were se-lected from clinical routine. All cases were read by a resident and a staff radiologist, and a written report was available in the radiology information system (RIS). The RIS report mentioned at least one actionable pulmonary nodule in 18 cases (50%) and did not report any pulmonary nodule in the remaining 32 cases. Two different recent CAD systems were independently applied to the 50 CT scans as concurrent reader with two radiologists: Siemens LungCare NEV and MEDIAN CAD-Lung. Two radiologists independently reviewed the CAD results and determined if a CAD result was a true positive or a false positive finding. Patients were classified into two groups: in group A if at least one actionable nodule was detected and in group B if no actionable nodules were found. The effect of CAD on routine reporting was simulated as set union of the findings of routine reporting and CAD thus applying CAD as concurrent reader. According to the RIS report group A (patients with at least one actionable nodule) contained 18 cases (36% of all 50 cases), and group B contained 32 cases. Application of a CAD system as concurrent reader resulted in detec-tion of additional CT scans with actionable nodules and reclassification into group A in 16 resp. 18 cases (radi-ologist 1 resp. radiologist 2) with Siemens NEV and in 19 resp. 18 cases with MEDIAN CAD-Lung. In seven cases MEDIAN CAD-Lung and in four cases Siemens NEV reclassified a case into group A while the other CAD system missed the relevant finding. Sensitivity on a nodule (>4 mm) base was .45 for Siemens NEV and .55 for MEDIAN CAD-Lung; the difference was not yet significant (p=.077). In our study use of CAD as second reader in routine reporting doubled the percentage of patients with actionable nodules larger than 4 mm.
Explanation of the mechanism by which CAD assistance improves diagnostic performance when reading CT images
Toru Matsumoto, Shinichi Wada, Shinji Yamamoto, et al.
The purpose of our research is to make clear the mechanism that a reader (physician or radiological technologist) effectively identify abnormal findings in CT images of lung cancer screening by using with CAD system. A method guessing the 2X2 decision matrix between reader / CAD and reader / reader with CAD was investigated. We suppose the next scene to be it. At first, a reader judges whether abnormal findings per one patient per one CT image are present (1) or absent (0) without CAD results. The second, a reader judges whether abnormal findings are present (1) or absent (0) with CAD results. We expresses the correlation between diagnoses by a reader and CAD system for abnormal cases and for normal cases by following formula using phi correlation coefficient:φ=(cd-ab)/√(a+c)(b+d)(b+c)(a+d). a,b,c,d: 2X2 decision matrix parameters. If TPR1=(a+c)/n, TPR2=(b+c)/n and TPR3=(a+b+c)/n for abnormal cases, TPR3=TPR1+TPR2 - TPR1×TRR2 - φ√TPR1(1-TPR1)TPR2(1-TPR2). Therefore, a=n (TPR3 - TPR1), b=n (TPR3 - TPR2), c=n (TPR1 + TPR2 -TPR3), d=n (1.0 - TPR3). This theory was applied for the experimental data. The 41 students interpreted the same CT images [no training]. A second interpretation was performed after they had been instructed on how to interpret CT images [training], and third was assisted by a virtual CAD [training + CAD]. The mechanism that makes up for a good point of a reader and a CAD with CAD in interpreting CT images was theoretically and experimentally investigated. We concluded that a method guessing the decision matrix (2X2) between a reader and a CAD decided the "presence" or "absence" of abnormal findings explain the improvement mechanism of diagnostic performance with CAD system.
Automatic image quality assessment for uterine cervical imagery
Uterine cervical cancer is the second most common cancer among women worldwide. However, its death rate can be dramatically reduced by appropriate treatment, if early detection is available. We are developing a Computer-Aided-Diagnosis (CAD) system to facilitate colposcopic examinations for cervical cancer screening and diagnosis. Unfortunately, the effort to develop fully automated cervical cancer diagnostic algorithms is hindered by the paucity of high quality, standardized imaging data. The limited quality of cervical imagery can be attributed to several factors, including: incorrect instrumental settings or positioning, glint (specular reflection), blur due to poor focus, and physical contaminants. Glint eliminates the color information in affected pixels and can therefore introduce artifacts in feature extraction algorithms. Instrumental settings that result in an inadequate dynamic range or an overly constrained region of interest can reduce or eliminate pixel information and thus make image analysis algorithms unreliable. Poor focus causes image blur with a consequent loss of texture information. In addition, a variety of physical contaminants, such as blood, can obscure the desired scene and reduce or eliminate diagnostic information from affected areas. Thus, automated feedback should be provided to the colposcopist as a means to promote corrective actions. In this paper, we describe automated image quality assessment techniques, which include region of interest detection and assessment, contrast dynamic range assessment, blur detection, and contaminant detection. We have tested these algorithms using clinical colposcopic imagery, and plan to implement these algorithms in a CAD system designed to simplify high quality data acquisition. Moreover, these algorithms may also be suitable for image quality assessment in telemedicine applications.
A study on the performance evaluation of computer-aided diagnosis for detecting pulmonary nodules for the various CT reconstruction
The purpose of this study was to evaluate the performance of computer-aided diagnosis (CAD) system detecting pulmonary nodules for the various CT image qualities of the low dose CT cancer screening. Sixty three chest examinations with sixty-four pulmonary nodules consisting mainly ground-glass opacity (GGO) were used. All the CT images were acquired by using a multi-slice CT scanner Asteion with 4 detector rows system (Toshiba Medical Systems, Japan) with 0.75-second rotating time and 30mA. After the examination, CT image reconstructions were performed for every CT data set using seven reconstruction kernels and three sorts of slice thickness. Totally twenty-one data sets for a patient, namely 1323 data sets with about 60 thousands CT images which is 30.1GB data sets were investigated. Nodule detections were carried out using a computer-aided diagnosis system developed by Fujitsu Ltd, Japan. The mean nodule size was 0.69±0.28 (SD)[cm](range, 0.3-1.7cm). The CAD system identified 42 to 48 nodules out of the 64 nodules, in the slice thickness of 8mm for the seven reconstruction kernels, yielding a true-positive rate (TPR) of 65% to 75%. In the slice thickness of 5mm our CAD system indicates a TPR from 70% to 80%. In the slice thickness 10mm, TPR were resulted from 50% to 64%. Some kernel indicated relatively high TPR with high FP, other kernel showed high sensitivity with relatively low FP. CT image data sets with multi-reconstruction conditions is useful in assessing the robust characteristics of a CAD system detecting pulmonary nodule by multi-slice low dose CT screening.