Proceedings Volume 7627

Medical Imaging 2010: Image Perception, Observer Performance, and Technology Assessment

cover
Proceedings Volume 7627

Medical Imaging 2010: Image Perception, Observer Performance, and Technology Assessment

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 23 February 2010
Contents: 10 Sessions, 50 Papers, 0 Presentations
Conference: SPIE Medical Imaging 2010
Volume Number: 7627

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 7627
  • Keynote and Breast Lesions
  • Image Display and Presentation
  • Eyetracking and Vision
  • Technology Assessment and Impact
  • Human Performance
  • Model Observers
  • ROC and Decision Metrics
  • Characterization and Training
  • Poster Session
Front Matter: Volume 7627
icon_mobile_dropdown
Front Matter: Volume 7627
This PDF file contains the front matter associated with SPIE Proceedings Volume 7627, including the Title Page, Copyright information, Table of Contents, and the Conference Committee listing.
Keynote and Breast Lesions
icon_mobile_dropdown
Maintaining quality in the UK breast screening program
Breast screening in the UK has been implemented for over 20 years and annually nearly two million women are now screened with an estimated 1,400 lives saved. Nationally, some 700 individuals interpret screening mammograms in almost 110 screening centres. Currently, women aged 50 to 70 are invited for screening every three years and by 2012 this age range will increase to 47 - 73 years. There is a rapid ongoing transition from using film mammograms to full field digital mammography such that in 2010 every screening centre will be partly digital. An early, and long running, concern has been how to ensure the highest quality of imaging interpretation across the UK, an issue enhanced by the use of a three year screening interval. To partly address this question a self assessment scheme was developed in 1988 and subsequently implemented nationally in the UK as a virtually mandatory activity. The scheme is detailed from its beginnings, through its various developments to current incarnation and future plans. This encompasses both radiological (single view screening, two view screening, mammographic film and full field digital mammography) as well as design changes (cases reported by means of: form filling; PDA; tablet PC; iPhone, and the internet). The scheme provides a rich data source which is regularly studied to examine different aspects of radiological performance. Overall it aids screening radiologists by giving them regular access to a range of difficult exemplar cases together with feedback on their performance as compared to their peers.
Rating scales for observer performance studies
Robert M. Nishikawa, Yulei Jiang, Charles E. Metz
We compared the performance of radiologists reading a set of screening mammograms with and without CADe as measured by the BI-RADS assessment scale to that measured by a 9-point rating scale. Eight MQSA radiologists read 300 screening mammograms, of which 66 cases contained at least one cancer and 234 were normal based on two-year follow-up. Both without and then with CADe, the radiologists gave their BI-RADS assessment for each case and, for each suspicious lesion in the image, reported their confidence on a 9-point scale (1=no evidence for recall; 5=equivocal; 9=overwhelming evidence for recall) that the lesion needed to be worked up. The radiologists were instructed to read the cases as they would clinically. We used MRMC ROC analysis employing PROPROC curve fitting to analyze the data, once for the BI-RADS data and again for that collected on the 9-point scale. Given that the radiologists were reading screening mammograms and were instructed to read in their normal clinical manner, not all radiologists used the full BI-RADS scale. Two radiologists used only BI-RADS 0,1 and 2, three used the full scale, and three used the full scale but employed categories 3, 4 and 5 sparingly. This mimics what occurs clinically, according to the literature. The BI-RADS and the 9-point rating scales gave similar results in terms of AUC. However, the 95% CIs of the estimates of AUC were substantially smaller for the 9-point scale.
Evaluating the realism of synthetically generated mammographic lesions: an observer study
Michael Berks, David Barbosa da Silva, Caroline Boggis, et al.
A method has been developed for generating synthetic masses that exhibit the appearance of real breast cancers in mammograms. To be clinically useful, the synthetic masses must appear sufficiently realistic, even to expert mammogram readers. This paper presents the results of an observer study in which 10 expert mammogram readers at the Nightingale Centre, Manchester attempted to distinguish between real and synthetically generated masses. Each reader rated a set of 30 real and 30 synthetics masses on a scale ranging from "definitely real" to "definitely synthetic". ROC curves were fitted to their responses and the area-under-curve (AUC) used to quantify the ability of a reader to identify synthetic masses. The mean AUC was 0.70±0.09, showing the readers were able to identify synthetic masses at a rate statistically better than chance and suggesting that further improvements must be made to the mass synthesis method. Analysis of individual AUC scores showed reader performance was not affected by job type (radiologist versus breast physician/radiographer) or experience.
Image Display and Presentation
icon_mobile_dropdown
Use of a visual discrimination model to detect compression artifacts in virtual pathology images
A major issue in telepathology is the extremely large and growing size of digitized "virtual" slides, which can require several gigabytes of storage and cause significant delays in data transmission for remote image interpretation and interactive visualization by pathologists. Compression can reduce this massive amount of virtual slide data, but reversible (lossless) methods limit data reduction to less than 50%, while lossy compression can degrade image quality and diagnostic accuracy. "Visually lossless" compression offers the potential for using higher compression levels without noticeable artifacts, but requires a rate-control strategy that adapts to image content and loss visibility. We investigated the utility of a visual discrimination model (VDM) and other distortion metrics for predicting JPEG 2000 bit rates corresponding to visually lossless compression of virtual slides for breast biopsy specimens. Threshold bit rates were determined experimentally with human observers for a variety of tissue regions cropped from virtual slides. For test images compressed to their visually lossless thresholds, just-noticeable difference (JND) metrics computed by the VDM were nearly constant at the 95th percentile level or higher, and were significantly less variable than peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM) metrics. Our results suggest that VDM metrics could be used to guide the compression of virtual slides to achieve visually lossless compression while providing 5 to 12 times the data reduction of reversible methods.
Spatial noise suppression for LCD displays: noise contrast
LCD monitors used for radiologic diagnosis exhibit a random spatial gain-fluctuation pattern that can be characterized as multiplicative fixed-pattern noise. Our goal is to use a monitor-particular map of this noise to pre-process mammograms for display. This noise map is obtained by processing a captured image of the monitor screen. In previous work we have demonstrated a method that produces a measurable reduction in the noise level. In this paper we describe the results of our first observer studies and of computer analysis using visual a discrimination model. The result proved to be negative, i.e. the pre-processing provided an insufficient degree of improvement to be judged diagnostically effective. In light of these first results, we have been examining our assumptions and processing methods. We begin with a brief introduction then describe the construction of the noise map and its use in pre-processing. The two key advances follow: the observer-study and the computer-modeling results. Finally, we describe our efforts to understand the relation of our measurement, modeling, and observer results.
High-fidelity color video reproduction of open surgery by six-band camera
Masahiro Yamaguchi, Yuri Murakami, Hiroyuki Hashizume, et al.
The video capture of surgery is becoming widely used but the colors reproduced by conventional RGB-based video systems differ from the original. In this work, 6-band multispectral video was applied to the open surgery for highfidelity color reproduction, and medical doctors visually evaluated the reproduced image quality. As a result, 1) the 6- band video system was rated significantly higher in "color reproducibility," "fidelity," and "material appearance." 2) The perceived color differences between 6-band/RGB and 6-band/3-band were significant. 3) Color videos from 6-band data were transmitted via network, and approximately enough quality was obtained with 15Mbps bit rate. These results show the potential of multispectral technology for the improvement of surgical video quality.
DICOM GSPS affects on contrast detection threshold
David L. Leong, Tamara Miner Haygood, Gary J. Whitman, et al.
While previous research has been done to determine the contrast detection threshold in medical images, we have found it difficult to translate the results into settings that can be used for the optimization of image quality. Since many of these papers were done before the widespread use of DICOM GSPS calibrated monitors, how the GSPS affects the detection threshold and whether the median background intensity shift has been minimized by GSPS remain unknown. We set out to determine if the median background affected the detection of a low-contrast object in a clustered lumpy background, which simulated a mammography image. Our results show that shifts in the median background intensity did not affect the detection performance. The contrast detection threshold appears close to +3 gray levels above the background.
Comprehensive quantitative image quality evaluation of compressed sensing MRI reconstructions using a weighted perceptual difference model (Case-PDM): selective evaluation, disturbance calibration, and aggregative evaluation of noise, blur, aliasing, and oil-painting artifacts
Jun Miao, Feng Huang, David L. Wilson
The perceptual difference model (Case-PDM) is being used to quantify image quality of fast MR acquisitions and sparse reconstruction algorithms as compared to slower, full k-space, high quality reference images. To date, most perceptual difference models average image quality over a wide range of image degradations and assume that the observer has no bias towards any of them. Here, we create metrics weighted to different types of artifacts, calibrated to a human observer's preference, and then aggregate them to produce a comprehensive evaluation. The selective PDM is tuned using test images from an input reference image degraded by noise, blur, aliasing, or "oil-painting." To each artifact, responses of cortex channels in the PDM are normalized to be weights used for selective evaluation. A pair comparison experiment based on functional measurement theory was used to calibrate selective PDM score of each artifact to its measured disturbance. Test images of varying quality were from identical reference image degraded by one type of artifact. We found that human observers rated aliasing > blur > oil-painting > noise. In order to validate the new evaluation approach, PDM scores were compared to human ratings across a large set of compressed sensing MR reconstruction test images of varying quality. Human ratings (i.e. overall, noise, blur, aliasing, and oil-painting ratings) were obtained from a modified Double Stimulus Continuous Quality Scale experiment. For 3 brain images (transverse, sagittal, and coronal planes), averaged r values [comprehensive-PDM, noise-PDM, blur-PDM, aliasing-PDM, oilpainting- PDM] were [0.947±0.010, 0.827±0.028, 0.913±0.005, 0.941±0.016, 0.884±0.025]. We conclude the weighted Case-PDM is useful for selectively evaluating MR reconstruction artifacts and the proposed comprehensive PDM score can faithfully represent human evaluation, especially when demonstrating artifact bias, of compressed sensing reconstructed MR images.
A gaze-contingent high-dynamic range display for medical imaging applications
The grayscale resolution of current liquid crystal display technology limits its applications in medical imaging with wide dynamic range and dense grayscales are required. We propose an approach that dynamically processes the display image such that the luminance and contrast of the gazed area is optimized. A gazecontingent interactive display system based on an 8-bit LCD and an eye-tracker was implemented to emulate the proposed concept for a high-dynamic range display.
Eyetracking and Vision
icon_mobile_dropdown
The assessment of stroke multidimensional CT and MR imaging using eye movement analysis: does modality preference enhance observer performance?
Lindsey Cooper, Alastair Gale, Janak Saada, et al.
Although CT and MR imaging is now commonplace in the radiology department, few studies have examined complex interpretative tasks such as the reading of multidimensional brain CT or MRI scans from the observer performance perspective, especially with reference to Stroke. Modality performance studies have demonstrated a similar sensitivity of less than 50% for both conventional modalities, with neither modality proving superior to the other in Stroke observer performance tasks (Mohr, 1995; Lansberg, 2000; Wintermark, 2007). Visual search studies have not extensively explored stroke imaging and an in-depth, comparative eye-movement study between CT and MRI has not yet been conducted. A computer-based, eye-tracking study was designed to assess diagnostic accuracy and interpretation in stroke CT and MR imagery. Forty eight predetermined clinical cases, with five images per case, were presented to participants (novices, trainees and radiologists; n=28). The presence or absence of abnormalities was rated on a four-point Likert scale and their locations reported. Results highlight differences in visual search patterns amongst novice, trainee and expert observers; the most marked differences occurred between novice readers and experts. In terms of modality differences; novice and expert readers spent longer appraising CT images than MR, compared with trainees, who spent longer appraising MR than CT images. Image analysis trends did not appear to differ between modalities, but time spent within clinical images, accuracy and relative confidence performing the task did differ between CT and MR reader groups. To-date few studies have explored observer performance in neuroradiology and the present study examines multi-slice image appraisal by comparing matched pairs of CT and MRI Stroke cases.
Breast screening: visual search as an aid for digital mammographic interpretation training
Yan Chen, Ann Turnbull, Jonathan James, et al.
Digital mammography is gradually being introduced across all breast screening centres in the UK during 2010. This provides increased training opportunities using lower resolution, lower cost and more widely available devices, in addition to the clinical digital mammography workstations. This study examined how experienced breast screening personnel performed when they examined sets of difficult DICOM two-view screening cases in three conditions: on GE digital mammography workstations, on a standard LCD monitor (using a DICOM viewer) and an iPhone (running Osirix software). In each condition they either viewed the full images unaided or were permitted to use the post-processing manipulations of pan, zoom and window level/width adjustments. For each case they had to report the feature type, rate their confidence on the presence of abnormality, classify the case and specify case density. Their visual search behaviour was recorded throughout using a head mounted eye tracker. Additionally aspects of their real life screening performance and performance on a national self assessment scheme were examined. Data indicate that screening experience plays a major role in doing well on the self assessment scheme. Task performance was best on the clinical workstation. However, the data also suggest that a DICOM viewer that runs on a PC or laptop with a standard LCD display allows viewing digital images in full resolution support impressive cancer detection performance. The iPhone is not ideal for examining full images due to the amount of scrolling and zooming required. Overall, the results indicate that low cost devices could be used to provide additional tailored training as long as device resolution and HCI aspects are carefully considered.
Visual search characteristics in mammogram reading: SFM vs. FFDM
Previous studies comparing radiologists' behavior when reading mammograms that were 'digitized' from screen-film mammography (SFM) films and 'digital' mammograms obtained through the use of Full-Field Digital Mammography (FFDM) systems either showed a significant advantage to FFDM (mostly when a double-reading strategy was used) or no difference in performance between the two modalities (primarily when a single-reader strategy was used). The visual search characteristics used by radiologists in the reading of 'digitized' mammograms have been previously studied, but few perceptual studies have been carried out to characterize radiologists' behavior when reading 'digital' mammograms. In this paper we will compare the perceptual features related to the radiologists' reading of 'digital' mammograms with those related to the reading of 'digitized' mammograms. In addition, we will use spatial frequency analysis to characterize such a behavior and compare it to the one employed in the reading of 'digitized' mammograms.
Eye-position recording during brain MRI examination to identify and characterize steps of glioma diagnosis
MRI is an essential tool for brain glioma diagnosis thanks to its ability to produce images in any layout plan and to its numerous sequences adapted to both anatomic and functional imaging. In this paper, we investigate the use of an eyetracking system to explore relationships between visual scanning patterns and the glioma diagnostic process during brain MRI analysis. We divide the analyzed screen into Areas of Interest (AOIs), each AOI corresponding to one sequence. Analyzing temporal organization of fixation location intra AOI and inter AOI splits the diagnostic process into different steps. The analysis of saccadic amplitudes reveals clear delineation of three sequential steps. During the first step (characterized by large saccades), a radiologist performs a short review on all sequences and on the patient report. In the second step (characterized by short saccades), a radiologist sequentially and systematically scans all the slices of each sequence. The fixation duration in one AOI depends on the number of slices, on the lesion subtlety and on the lesion contrast in the sequence to be analyzed. In order to improve the detection, localization and characterization of the glioma, the radiologist compares sequences during the third step (characterized by large saccades). Eye-position recording enables one to identify each elementary task implemented during diagnostic process of glioma detection and characterization on brain MRI. Total dwell time associated with one MRI sequence (one AOI) and contrast in primary lesion area enable one to estimate the amount and subtleties of diagnosis criteria provided by the sequence. From this information, one could establish some rules to optimize brain MRI compression (depending on the sequence to be compressed).
Reading a radiologist's mind: monitoring rising and falling interest levels while scanning chest x-rays
Radiological images constitute a special class of images that are captured (or computed) specifically for the purpose of diagnosing patients. However, because these are not "natural" images, radiologists must be trained to interpret them through a process called "perceptual learning". However, because perceptual learning is implicit, experienced radiologists may sometimes find it difficult to explicitly (i.e. verbally) train less experienced colleagues. As a result, current methods of training can take years before a new radiologist is fully competent to independently interpret medical images. We hypothesize that eye tracking technology (coupled with multimedia technology) can be used to accelerate the process of perceptual training, through a Hebbian learning process. This would be accomplished by providing a radiologist-in-training with real-time feedback as he/she is fixating on important regions of an image. Of course this requires that the training system have information about what regions of an image are important - information that could presumably be solicited from experienced radiologists. However, our previous work has suggested that experienced radiologists are not always aware of those regions of an image that attract their attention, but are not clinically significant - information that is very important to a radiologist in training. This paper discusses a study in which local entropy computations were done on scan path data, and were found to provide a quantitative measure of the moment-by-moment interest level of radiologists as they scanned chest x-rays. The results also showed a striking contrast between the moment-by-moment deployment of attention between experienced radiologists and radiologists in training.
Technology Assessment and Impact
icon_mobile_dropdown
Effects of fixed-rate CT projection data compression on perceived and measured CT image quality
Albert Wegener, Naveen Chandra, Yi Ling, et al.
Compression of computed tomography (CT) projection data reduces CT scanner bandwidth and storage costs. Since fixed-rate compression guarantees predictable bandwidth, fixed-rate compression is preferable to lossless compression, but fixed-rate compression can introduce image artifacts. This research demonstrates clinically acceptable image quality at 3:1 compression as judged by a radiologist and as estimated by an image quality metric called local structural similarity (SSIM). We examine other common, quantitative image quality metrics from image processing, including peak signal-to-noise (PSNR), contrast-to-noise ratio (CNR), and difference image statistics to quantify the magnitude and location of image artifacts caused by fixed-rate compression of CT projection data. Masking effects caused by local contrast, air and bone pixels, and image reconstruction effects at the image's periphery and iso-center explain why artifacts introduced by compression are not noticed by radiologists. SSIM metrics in this study nearly always exceeds 0.98 (even at 4:1 compression ratios), which is considered visually indistinguishable. The excellent correlation of local SSIM and subjective image quality assessment confirms that fixed-rate 3:1 projection data compression on CT images does not affect clinical diagnosis and is rarely noticed. Local SSIM metrics can be used to significantly reduce the number of viewed images in medical image quality studies.
Image fade in computed radiography is exacerbated by increased kVp
Mark McEntee, Michelle Foley
Processing of Computed radiography image plates involves a time delay between the image acquisition and processing. This delay can vary from 1 to 30 minutes depending on the clinical situation. This study aims to examine the effects latent image fade upon the low contrast resolution of computed radiography images over time and establish whether kVp has an influence on image fade. Three observers assessed the images. Image quality figures were calculated as a measure of contrast detail resolution. Significant reductions in low contrast resolution were seen to occur as the delay between exposure and processing increased. The image quality decreased by 2.8% at 5mins; 6% at 10mins and 8.8% at 20mins. Deterioration then continued at a steady rate until the plateau of 20.7% was reached at 3 hours finally at 15hrs the deterioration was 40.49% compared to the reference image. The rate of this deterioration increased with kVp. At 20mins for 75kVp the low contrast resolution had decreased by 8.8%; at 80kVp it was 9.6%; 10.15% at 85kVp and at 95kVp 16.8%. A substantial decrease in low contrast resolution of CR images result from long delays between exposure time and processing, this is exacerbated by higher kVps. Radiographers will lose at least 8.8% in low contrast resolution of an image with a processing delay of 20 minutes.
Optimal processing of isotropic 3D black-blood MRI For accurate estimation of vessel wall thickness
Bernard Chiu, Niranjan Balu, Li Dong, et al.
Quantification of vessel wall thickness is important in longitudinal monitoring of atherosclerosis. Black-blood MRI has been useful in measuring vessel wall thickness. Studies using two-dimensional (2D) imaging protocols measured wall thickness by matching the arterial wall and lumen boundaries on an acquisition plane. If the acquisition plane is oblique to the artery, the wall thickness would be overestimated by a factor that is dependent on the obliqueness angle. This problem can be understood as a three-dimensional (3D) surface mismatch problem, and we evaluated the effect of this problem by comparing the thickness measurements obtained using a 2D contour matching method and a 3D surface matching method. In addition to the surface mismatch problem, two other parameters may affect the wall thickness estimation: reslicing angle and slice thickness. We measured the wall thickness using images resliced perpendicular to the centerline of the vessel and quantified the difference between the thickness measurements obtained from parallel and centerline-based resliced images. Images obtained from a 2D MRI protocol typically have a slice thickness of 2mm, while the 3D MRI technique applied in this study produced images with sub-millimeter isotropic voxel size. To investigate the effect of slice thickness, we simulated 2mm-thick images by averaging the 3D black-blood image. Our results show that the wall thickness measured from 2mm-thick images was overestimated, especially in the carotid artery, which is associated with a larger obliqueness angle. This result underscores the advantage of the 3D isotropic acquisition technique in wall thickness measurement, especially in more tortuous vessels.
Color calibration and color-managed medical displays: does the calibration method matter?
Hans Roehrig, Kelly Rehm, Louis D. Silverstein, et al.
Our laboratory has investigated the efficacy of a suite of color calibration and monitor profiling packages which employ a variety of color measurement sensors. Each of the methods computes gamma correction tables for the red, green and blue color channels of a monitor that attempt to: a) match a desired luminance range and tone reproduction curve; and b) maintain a target neutral point across the range of grey values. All of the methods examined here produce International Color Consortium (ICC) profiles that describe the color rendering capabilities of the monitor after calibration. Color profiles incorporate a transfer matrix that establishes the relationship between RGB driving levels and the International Commission on Illumination (CIE) XYZ (tristimulus) values of the resulting on-screen color; the matrix is developed by displaying color patches of known RGB values on the monitor and measuring the tristimulus values with a sensor. The number and chromatic distribution of color patches varies across methods and is usually not under user control. In this work we examine the effect of employing differing calibration and profiling methods on rendition of color images. A series of color patches encoded in sRGB color space were presented on the monitor using color-management software that utilized the ICC profile produced by each method. The patches were displayed on the calibrated monitor and measured with a Minolta CS200 colorimeter. Differences in intended and achieved luminance and chromaticity were computed using the CIE DE2000 color-difference metric, in which a value of ▵E = 1 is generally considered to be approximately one just noticeable difference (JND) in color. We observed between one and 17 JND's for individual colors, depending on calibration method and target.
Evaluating segmentation algorithms for diffusion-weighted MR images: a task-based approach
Abhinav K. Jha, Matthew A. Kupinski, Jeffrey J. Rodríguez, et al.
Apparent Diffusion Coefficient (ADC) of lesions obtained from Diffusion Weighted Magnetic Resonance Imaging is an emerging biomarker for evaluating anti-cancer therapy response. To compute the lesion's ADC, accurate lesion segmentation must be performed. To quantitatively compare these lesion segmentation algorithms, standard methods are used currently. However, the end task from these images is accurate ADC estimation, and these standard methods don't evaluate the segmentation algorithms on this task-based measure. Moreover, standard methods rely on the highly unlikely scenario of there being perfectly manually segmented lesions. In this paper, we present two methods for quantitatively comparing segmentation algorithms on the above task-based measure; the first method compares them given good manual segmentations from a radiologist, the second compares them even in absence of good manual segmentations.
Human Performance
icon_mobile_dropdown
Does reader visual fatigue impact interpretation accuracy?
To measure the impact of reader of reader visual fatigue by assessing symptoms, the ability to keep the eye focused on the display and diagnostic accuracy. Twenty radiology residents and 20 radiologists were given a diagnostic performance test containing 60 skeletal radiographic studies, half with fractures, before and after a day of clinical reading. Diagnostic accuracy was measured using area under the proper binormal curve (AUC). Error in visual accommodation was measured before and after each test session and subjects completed the Swedish Occupational Fatigue Inventory (SOFI) and the oculomotor strain subscale of the Simulator Sickness Questionnaire (SSQ) before each session. Average AUC was 0.89 for before work test and 0.85 for the after work test, (F(1,36) = 4.15, p = 0.049 < 0.05). There was significantly greater error in accommodation after the clinical workday (F(1,14829) = 7.81, p = 0.005 < 0.01), and after the reading test (F(1,14829) = 839.33, p < 0.0001). SOFI measures of lack of energy, physical discomfort and sleepiness were higher after a day of clinical reading (p < 0.05). The SSQ measure of oculomotor symptoms (i.e., difficulty focusing, blurred vision) was significantly higher after a day of clinical reading (F(1,75) = 20.38, p < 0.0001). Radiologists are visually fatigued by their clinical reading workday. This reduces their ability to focus on diagnostic images and to accurately interpret them.
The varying effects of ambient lighting on low contrast detection tasks
Mark F. McEntee, Barbara Martin
AIM: The aim of this study was to determine if there is a significant difference between the detection of low-contrast objects on 1MP review monitor and 3MP primary monitor. METHOD: The monitors compared were a 1MP NEC Multisync 1980SXi and a 3MP Barco Coronis MFGD 3420. The low-contrast detectability of these monitors was compared at a high ambient light setting (73 lx) equivalent to that of a ward or intensive care unit in the clinical setting and a low setting (20 lx) which reflected that used in reporting rooms in standard practice. The comparison was made using a CDRAD test tool and visualisation of nasogastric tubes and a central line. RESULTS: Image quality results for both the psychophysical and diagnostic performance test were substantially higher for the 3MP monitor than those obtained for the 1MP. Significant differences p≤0.000 existed between the IQF results for the 2 monitors. Image quality results were higher at the lower ambient light setting for both monitors. CONCLUSION: Contrast visualisation is significantly improved through the use of primary monitors. Review monitors are adequate for the visualisation of lines an NG tubes in low and high light settings.
Nuisance levels of noise effects radiologists' performance
Mark F. McEntee, Amina Coffey, John Ryan, et al.
This study aimed to measure the sound levels in Irish x-ray departments. The study then established whether these levels of noise have an impact on radiologists performance Noise levels were recorded 10 times within each of 14 environments in 4 hospitals, 11 of which were locations where radiologic images are judged. Thirty chest images were then presented to 26 senior radiologists, who were asked to detect up to three nodular lesions within 30 posteroanterior chest x-ray images in the absence and presence of noise at amplitude demonstrated in the clinical environment. The results demonstrated that noise amplitudes rarely exceeded that encountered with normal conversation with the maximum mean value for an image-viewing environment being 56.1 dB. This level of noise had no impact on the ability of radiologists to identify chest lesions with figure of merits of 0.68, 0.69, and 0.68 with noise and 0.65, 0.68, and 0.67 without noise for chest radiologists, non-chest radiologists, and all radiologists, respectively. the difference in their performance using the DBM MRMC method was significantly better with noise than in the absence of noise at the 90% confidence interval (p=0.077). Further studies are required to establish whether other aspects of diagnosis are impaired such as recall and attention and the effects of more unexpected noise on performance.
Impact of adaptation time on contrast sensitivity
Dörte Apelt, Hans Strasburger, Jan Klein, et al.
For softcopy-reading of mammograms, a room illuminance of 10 lx is recommended in standard procedures. Room illuminance affects both the maximal monitor contrast and the global luminance adaptation of the visual system. A radiologist observer has to adapt to low luminance levels, when entering the reading room. Since the observer's sensitivity to low-contrast patterns depends on adaptation state and processes, it would be expected that the contrast sensitivity is lower at the beginning of a reading session. We investigated the effect of an initial time of dark adaptation on the contrast sensitivity. A study with eight observers was conducted in the context of mammographic softcopy-reading. Using Gabor patterns with varying spatial frequency, orientation, and contrast level as stimuli in an orientation discrimination task, the intra-observer contrast sensitivity was determined for foveal vision. Before performing the discrimination task, the observers adapted for two minutes to an average illuminance of 450 lx. Thereafter, contrast thresholds were repeatedly measured at 10 lx room illuminance over a course of 15 minutes. The results show no significant variations in contrast sensitivity during the 15 minutes period. Thus, it can be concluded that taking an initial adaptation time does not affect the perception of lowcontrast objects in mammographic images presented in the typical softcopy-reading environment. Therefore, the reading performance would not be negatively influenced when the observer started immediately with reading of mammograms. The results can be used to optimize the workflow in the radiology reading room.
Spatial resolution and chest nodule detection: an interesting incidental finding
This study reports an incidental finding from a larger work. It examines the relationship between spatial resolution and nodule detection for chest radiographs. Twelve examining radiologists with the American Board of Radiology read thirty chest radiographs in two conditions - full (1500 × 1500 pixel) resolution, and 300 × 300 pixel resolution linearly interpolated to 1500 × 1500 pixels. All images were surrounded by a 10-pixel sharp grey border to aid in focussing the observer's eye when viewing the comparatively unsharp interpolated images. Fifteen of the images contained a single simulated pulmonary nodule. Observers were asked to rate their confidence that a nodule was present on each radiograph on a scale of 1 (least confidence, certain no lesion is present) to 6 (most confidence, certain a lesion was present). All other abnormalities were to be ignored. No windowing, levelling or magnification of the images was permitted and viewing distance was constrained to approximately 70cm. Images were displayed on a 3 megapixel greyscale monitor. Receiver operating characteristic (ROC) analysis was applied to the results of the readings using the Dorfman-Berbaum-Metz multiplereader, multiple-case method. No statistically significant differences were found with either readers and cases treated as random or with cases treated as fixed. Low spatial frequency information appears to be sufficient for the detection of chest lesion of the type used in this study.
Model Observers
icon_mobile_dropdown
Validation of the NPWE model observer to predict the CDMAM performance
We investigate signal-known-exactly (SKE) detection and free localization tasks in correlated noise having a power-law power spectrum. In all cases the target is an additive focal Gaussian "bump" signal. We compare the performance of human observers in psychophysical studies to that of the Bayesian ideal observer for both tasks in the form of the observer efficiency. We find efficiencies that range from 40% to 60% in the SKE detection task, consistent with previous works. Observer efficiency is considerably higher in the free-localization task, ranging from 60% to 80%. Direct estimation of the human-observer spatial-frequency weights shows clear evidence of a shift to higher frequencies with increasing power-law exponent in both tasks. Our results suggest that human observers are performing some sort of limited prewhitening of stimuli in both tasks.
The use of steerable channels for detecting asymmetrical signals with random orientations
Bart Goossens, Ljiljana Platiša, Ewout Vansteenkiste, et al.
In the optimization of medical imaging systems, there is a stringent need to shift from human observer studies to numerical observer studies, because of both cost and time limitations. Numerical models give an objective measure for the quality of displayed images for a given task and can be designed to predict the performance of medical specialists performing the same task. For the task of signal detection, the channelized Hotelling observer (CHO) has been successfully used, although several studies indicate an overefficiency of the CHO compared to human observers. One of the main causes of this overefficiency is attributed to the intrinsic uncertainty about the signal (such as its orientation) that a human observer is dealing with. Deeper knowledge of the discrepancies of the CHO and the human observer may provide extra insight in the processing of the human visual system and this knowledge can be utilized to better fine-tune medical imaging systems. In this paper, we investigate the optimal detection of asymmetrical signals with statistically known random orientation, based on joint detection and estimation theory. We derive the optimal channelized observer for this task and we show that the optimal detection in channel space requires the use of steerable channels, which are used in steerable pyramid transforms in image processing. Even though the use of CHOs for SKS tasks has not been studied so far, our findings indicate that CHO models can be further extended to incorporate intrinsic uncertainty about the signal to behave closer to humans. Experimental results are provided to illustrate these findings.
Personalized numerical observer
Jovan G. Brankov, P. Hendrik Pretorius
It is widely accepted that medical image quality should be assessed using task-based criteria, such as humanobserver (HO) performance in a lesion-detection (scoring) task. HO studies are time consuming and cost prohibitive to be used for image quality assessment during development of either reconstruction methods or imaging systems. Therefore, a numerical observer (NO), a HO surrogate, is highly desirable. In the past, we have proposed and successfully tested a NO based on a supervised-learning approach (namely a support vector machine) for cardiac gated SPECT image quality assessment. In the supervised-learning approach, the goal is to identify the relationship between measured image features and HO myocardium defect likelihood scores. Thus far we have treated multiple HO readers by simply averaging or pooling their respective scores. Due to observer variability, this may be suboptimal and less accurate. Therefore, in this work, we are setting our goal to predict individual observer scores independently in the hope to better capture some relevant lesion-detection mechanism of the human observers. This is even more important as there are many ways to get equivalent observer performance (measured by area under receiver operating curve), and simply predicting some joint (average or pooled) score alone is not likely to succeed.
Using channelized Hotelling observers to quantify temporal effects of medical liquid crystal displays on detection performance
Ljiljana Platiša, Bart Goossens, Ewout Vansteenkiste, et al.
Clinical practice is rapidly moving in the direction of volumetric imaging. Often, radiologists interpret these images in liquid crystal displays at browsing rates of 30 frames per second or higher. However, recent studies suggest that the slow response of the display can compromise image quality. In order to quantify the temporal effect of medical displays on detection performance, we investigate two designs of a multi-slice channelized Hotelling observer (msCHO) model in the task of detecting a single-slice signal in multi-slice simulated images. The design of msCHO models is inspired by simplifying assumptions about how humans observe while viewing in the stack-browsing mode. For comparison, we consider a standard CHO applied only on the slice where the signal is located, recently used in a similar study. We refer to it as a single-slice CHO (ssCHO). Overall, our results confirm previous findings that the slow response of displays degrades the detection performance of the observers. More specifically, the observed performance range of msCHO designs is higher compared to the ssCHO suggesting that the extent and rate of degradation, though significant, may be less drastic than previously estimated by the ssCHO. Especially, the difference between msCHO and ssCHO is more significant for higher browsing speeds than for slow image sequences or static images. This, together with their design criteria driven by the assumptions about humans, makes the msCHO models promising candidates for further studies aimed at building anthropomorphic observer models for the stack-mode image presentation.
Rapid performance evaluation for ideal FROC and AFROC observers
FROC and AFROC analyses are useful in medical imaging to characterize detection performance for the case of multiple lesions. We had previously developed1 ideal FROC and AFROC observers. Their performance is ideal in that they maximize the area or any partial area under the FROC or AFROC curve. Such observers could be useful in imaging system optimization or in assessing human observer efficiency. However, the performance evaluation of these ideal observers is impractically computationally complex. We propose 3 reasonable assumptions under which the ideal observers reduce approximately to a particular form of a scan-statistic observer. Performance for the "scan-statistic-reduced ideal observer" can be evaluated far more rapidly albeit with slight error than that of the originally proposed ideal observer. Through simulations, we confirm the accuracy of our approximate ideal observers. We also compare the performance of our approximate ideal observer with that of a conventional scan-statistic observer and show that the performance of our approximate ideal observer is significantly greater.
Model observers for complex discrimination tasks: assessments of multiple coronary stent placements
Sheng Zhang, Craig K. Abbey, Arian Teymoorian, et al.
As an important clinical task, evaluating the placement of multiple coronary stents requires fine judgments of distance between stents. However, making these judgments is limited by low system resolution, noise, low contrast of the deployed stent, and stent motion during the cardiac cycle. We use task performance as a figure of merit for optimizing image display parameters. In previous work, we described our simulation procedure in detail, and also reported results of human observers for a visual task involving discrimination of 4 gap sizes under various frame rates and number of frames. Here, we report the results of three spatial model observers (i.e. NPW, NPWE, and PWMF) and two temporal sensitivity functions (i.e. transient and sustained) for the same task. Under signal known exactly conditions, we find that model observers can be used to predict human observers in terms of discrimination accuracy by adding internal noise.
ROC and Decision Metrics
icon_mobile_dropdown
Influencing clinicians and healthcare managers: can ROC be more persuasive?
S. Taylor-Phillips, M. G. Wallis, A. Duncan, et al.
Receiver Operating Characteristic analysis provides a reliable and cost effective performance measurement tool, without using full clinical trials. However, when ROC analysis shows that performance is statistically superior in one condition than another it is difficult to relate this result to effects in practice, or even to determine whether it is clinically significant. In this paper we present two concurrent analyses: using ROC methods alongside single threshold recall rate data, and suggest that reporting both provides complimentary data. Four mammographers read 160 difficult cases (41% malignant) twice, with and without prior mammograms. Lesion location and probability of malignancy was reported for each case and analyzed using JAFROC. Concurrently each participant chose recall or return to screen for each case. JAFROC analysis showed that the presence of prior mammograms improved performance (p<.05). Single threshold data showed a trend towards a 26% increase in the number of false positive recalls without prior mammograms (p=.056). If this trend were present throughout the NHS Breast Screening Programme then discarding prior mammograms would correspond to an increase in recall rate from 4.6% to 5.3%, and 12,414 extra women recalled annually for assessment. Whilst ROC methods account for all possible thresholds of recall and have higher power, providing a single threshold example of false positive, false negative, and recall rates when reporting results could be more influential for clinicians. This paper discusses whether this is a useful additional method of presenting data, or whether it is misleading and inaccurate.
Behavior of the decision variables of the three-class ideal observer for univariate trinormal data
Darrin C. Edwards, Charles E. Metz
We are attempting to extend receiver operating characteristic (ROC) analysis to tasks with more than two classes. This is difficult because of the rapid increase in complexity, in general, of both observer behavior and evaluation of its performance, as the number of classes involved increases. Many researchers have proposed addressing this complexity by imposing simplifications on the model; for example, by using univariate data rather than the bivariate data employed by the general three-class ideal observer. We have investigated a univariate trinormal model for the underlying data of a three-class ideal observer. Although a reasonably complete description of the ideal observer's behavior in this case is attainable, this behavior is more complicated than might intuitively be expected.
Therapy operating characteristic (TOC) curves and their application to the evaluation of segmentation algorithms
Harrison H. Barrett, Donald W. Wilson, Matthew A. Kupinski, et al.
This paper presents a general framework for assessing imaging systems and image-analysis methods on the basis of therapeutic rather than diagnostic efficacy. By analogy to receiver operating characteristic (ROC) curves, it introduces the Therapy Operating Characteristic or TOC curve, which is a plot of the probability of tumor control vs. the probability of normal-tissue complications as the overall level of a radiotherapy treatment beam is varied. The proposed figure of merit is the area under the TOC, denoted AUTOC. If the treatment planning algorithm is held constant, AUTOC is a metric for the imaging and image-analysis components, and in particular for segmentation algorithms that are used to delineate tumors and normal tissues. On the other hand, for a given set of segmented images, AUTOC can also be used as a metric for the treatment plan itself. A general mathematical theory of TOC and AUTOC is presented and then specialized to segmentation problems. Practical approaches to implementation of the theory in both simulation and clinical studies are presented. The method is illustrated with a a brief study of segmentation methods for prostate cancer.
Binary ROC curve and three-class 2-D ROC surface
Xin He, Eric C. Frey
In this work, we compared the binary ROC curve and the 2-D three-class ROC surface. We found that the 2-D three-class ROC surface shares most of the properties of the binary ROC curve. In particular, the curve (or surface) has the same DOF as the as the decision rule that is used to generate it; the curve (or surface) contains all the optimal operating points under various decision criteria (note that for three-class, the equal error utility assumption is assumed, same below); different decision criteria result in the same ROC curve (or surface) by considering all the possible prior information; ROC curve (or surface) uniquely corresponds to one pair (or triplet) of likelihood ratio distributions, assuming such a pair (or triplet) exists; the ROC curve (or surface) is concave; and the ROC curve (or surface) provides graphic information as well as figure-of-merit for task performance. We thus conclude that that the 2-D ROC surface may be preferable as a starting point for comparing systems used to perform 3-class diagnostic tasks.
Measuring modality ordering consistency of observer performance paradigms
D. P. Chakraborty, Federica Zanca
Two observer performance paradigms applied to the same modalities, readers and cases are said to order the modalities consistently if both confirm the same sign (positive or negative) of the figure of merit difference. The aim of this work was to develop a modality ordering consistency measure. The paradigms considered were receiver operating characteristic (ROC) and jackknife alternative free-response ROC (JAFROC). Clinical FROC data from a previous study was used. Using the highest rating method ROC ratings were inferred from FROC ratings. JAFROC analyses of the FROC data and Dorfman-Berbaum-Metz multiple-reader multiple-case (DBM-MRMC) analysis of the inferred ROC data showed significant and consistent differences in the two figures of merit. Additionally 2000 bootstrap data sets were sampled and analyzed by JAFROC and DBM-MRMC. It was found that a positive JAFROC figure of merit difference was 101 times more likely when the ROC difference was positive than when the ROC difference was negative (odds ratio = 101). Valid modality ordering consistency (or inconsistency) claims are possible only when both figures of merit differences are statistically significant. For those bootstraps where both JAFROC and ROC yielded significant differences there were no inconsistent orderings. The effect of artificially degrading JAFROC performance was investigated. It was found that the odds ratio was more sensitive to the degradation. The results in this work are likely to be optimistic. A more realistic test of modality ordering consistency would require two separate studies (FROC and ROC) using the same readers and cases.
Characterization and Training
icon_mobile_dropdown
The effect of image interpretation training on the fracture recognition performance of radiographers
Mark F. McEntee, Naomi Bergin
AIM: The aim of this study is to measure the effect of medical image interpretation training on radiographers ability to detect wrist fractures. To achieve this, the study aimed to establish any differences in performance between radiographers without image interpretation training and radiographers with interpretation training. In the course of the study, effects of number of years of radiographic experience and previous image interpretation experience. METHOD: A FROC study was performed to assess nine radiographers undergoing medical image interpretation training and to compare their performance with nineteen radiographers, from a previous study, without similar training. The radiographers evaluated thirty postero-anterior wrist images, in carefully monitored conditions, which included normal and abnormal distal radius'. The results were evaluated with Receiver Operator Characteristic (ROC) analysis. AUC, sensitivity, specificity, and average times were statistically compared using a one-way ANOVA. RESULTS: The study showed there was no statistical difference between the groups of radiographers' AUC values (p≤0.98). There was no statistical difference in sensitivity (p≤0.31), while there was an improved performance noted in specificity (p≤0.06). The study found there was little correlation between increasing years of radiographic experience and improved performance (p≤0.52), but it was noted there was an improvement when radiographers' previous image interpretation experience was considered (p≤0.04). It was seen there was a weak correlation of increasing time spent on interpretation and improved performance (p≤0.70). CONCLUSION This work demonstrates that experienced technologist have inherent image interpretation skills that improves with training, allowing the performance to be comparable to non specialist radiologists.
User modeling for improved computer-aided training in radiology: initial experience
Although mammography is an efficient screening modality for breast cancer, interpretation of mammographic images is a difficult task and notable variability between radiologists performance has been documented. A significant factor impacting radiologists diagnostic performance is adequate training. In this study we propose a new paradigm for computer-assisted training in radiology based on constructing user models for radiologists-in-training that capture individual error making patterns. Such user models are developed and trained to use image features for prediction of the extent of error made by a particular radiologist for variety of cases and therefore estimate difficulty of different types of cases for that radiologist. The constructed user model can be used to develop a personalized training protocol for the radiologist-in-training that focuses on cases that may pose a particular difficulty to the trainee. We initially demonstrate the concept of building individual user models for the task of breast mass diagnosis. Data collected from three resident observers at Duke University was used for the experiments. The result indicate that the proposed models are capable of learning to distinguish difficult and easy cases for each observer with moderate accuracy which shows promise for the proposed concept.
A novel methodology for display 2D MTF evaluation: the pixel spread function (PxSF)
MTF (Modular Transfer Function) is used as a metric for the sharpness of the images displayed on a monitor. However, MTF is often only measured in one dimension (usually horizontally or vertically). The goal of this work is to provide a methodology that allows measuring the MTF of a display correctly in 2D. Existing methodologies are analyzed to determine if they are satisfactory. We concluded that all of the currently used methodologies have shortcomings. To overcome the limitations of the existing technologies, a new methodology is introduced, the Pixel Spread Function. In this methodology, an impulse to the display is approximated by a single lit pixel, on a uniform background. The measurements are performed at a very close distance, to limit the degradations introduced by the measurement system. By applying signal processing, the 2D MTF has been obtained for this methodology. After averaging the results for multiple images and pixel positions and removing the camera degradation, the methodology proved to be reproducible and easy to perform. To validate the proposed methodology, a mathematical model has been created that allows simulating the MTF. Since the geometrical structure of the pixel has the largest impact on the obtained result, the model is based on it. Other degradations in the system (aberrations, diffraction...) are approximated by an additional blurring step. The theoretical and experimental results were compared, and it was concluded that the methodology is valid.
Mammographic feature type and reader variability by occupation: an ROC study
Previous work has outlined that certain mammographic appearances feature more prominently in reader's false negative responses on a self-assessment scheme. Bi-annually 600 breast-screening film-readers complete at least one round of the Personal Performance in Mammographic Screening (PERFORMS) self-assessment scheme in the UK. The main occupational groups in UK Breast Screening can be categorised thus, Radiologist, Technologists and Symptomatics. Previous work has shown that these groups can vary in their reading 'style' and accuracy on self-assessed cases. These groups could be said to contain individuals each with (arguably) pronounced differences in their real life reading experience, symptomatic readers routinely read a large number of cases with abnormal appearances and Technologists (specially trained to read films) do not have the same medical background as breast-screening Radiologists. We aimed to examine overall (national) and group (occupational) differences in terms of ROC analysis on those mammographic cases with different mammographic appearance (feature type). Several main feature types were identified namely; Well Defined Mass (WDM), Ill Defined Mass (IDM), Spiculate Mass (SPIC), Architectural Distortions (AD), Asymmetry (ASYM) and Calcification (CALC). Results are discussed in light of differences in real-life practice for each of the occupational groups and how this may impact on accuracy over certain mammographic appearances.
Toward validation of a 3D structured background model for breast imaging
I. Reiser, S. Lee, K. Little, et al.
Breast tomosynthesis is a novel modality for breast imaging that aims to provide partial depth resolution of the tissue structure. In order to optimize tomosynthesis acquisition and reconstruction parameters, it is necessary to model the structured breast background. The purpose of this work was to investigate whether filtered noise could be used as a structure surrogate. Human performance in a SKE detection task was determined through 2-AFC experiments using tomosynthesis backgrounds extracted from 55 normal breasts. Mathematically defined lesions were projected and reconstructed using the same acquisition and reconstruction parameters as the clinical data. Signal diameters were 0.05, 0.2 and 0.8 cm. Performance in the center projection as well as a reconstructed slice through the signal center was determined. The gray-scale volume was binarized and attenuation coefficients of adipose or fibroglandular tissue were assigned to the voxels. This volume was then projected and reconstructed and performance of a pre-whitening observer was computed for the slice and center projections. Human performance in clinical backgrounds was predicted by a pre-whitening observer model, both in projection images and reconstructed slices. This indicates that pre-whitening observer performance is a good predictor of human performance and can be used to predict human performance in the simulated backgrounds. When comparing model observer performance in the two background types, comparable performance was found. This indicates that structured background based on filtered noise may be useful in tomosynthesis system optimization. In conclusion, for a SKE detection task, similar performance was reached in clinical and filtered noise backgrounds. This indicates that detection performance based on the filtered noise background model may be used to predict performance in actual breast backgrounds.
Fuzzy description of skin lesions
Nikolaos Laskaris, Lucia Ballerini, Robert B. Fisher, et al.
We propose a system for describing skin lesions images based on a human perception model. Pigmented skin lesions including melanoma and other types of skin cancer as well as non-malignant lesions are used. Works on classification of skin lesions already exist but they mainly concentrate on melanoma. The novelty of our work is that our system gives to skin lesion images a semantic label in a manner similar to humans. This work consists of two parts: first we capture they way users perceive each lesion, second we train a machine learning system that simulates how people describe images. For the first part, we choose 5 attributes: colour (light to dark), colour uniformity (uniform to non-uniform), symmetry (symmetric to non-symmetric), border (regular to irregular), texture (smooth to rough). Using a web based form we asked people to pick a value of each attribute for each lesion. In the second part, we extract 93 features from each lesions and we trained a machine learning algorithm using such features as input and the values of the human attributes as output. Results are quite promising, especially for the colour related attributes, where our system classifies over 80% of the lesions into the same semantic classes as humans.
Poster Session
icon_mobile_dropdown
Automated detection of analyzable metaphase chromosome cells depicted on scanned digital microscopic images
Yuchen Qiu, Xingwei Wang, Xiaodong Chen, et al.
Visually searching for analyzable metaphase chromosome cells under microscopes is quite time-consuming and difficult. To improve detection efficiency, consistency, and diagnostic accuracy, an automated microscopic image scanning system was developed and tested to directly acquire digital images with sufficient spatial resolution for clinical diagnosis. A computer-aided detection (CAD) scheme was also developed and integrated into the image scanning system to search for and detect the regions of interest (ROI) that contain analyzable metaphase chromosome cells in the large volume of scanned images acquired from one specimen. Thus, the cytogeneticists only need to observe and interpret the limited number of ROIs. In this study, the high-resolution microscopic image scanning and CAD performance was investigated and evaluated using nine sets of images scanned from either bone marrow (three) or blood (six) specimens for diagnosis of leukemia. The automated CAD-selection results were compared with the visual selection. In the experiment, the cytogeneticists first visually searched for the analyzable metaphase chromosome cells from specimens under microscopes. The specimens were also automated scanned and followed by applying the CAD scheme to detect and save ROIs containing analyzable cells while deleting the others. The automated selected ROIs were then examined by a panel of three cytogeneticists. From the scanned images, CAD selected more analyzable cells than initially visual examinations of the cytogeneticists in both blood and bone marrow specimens. In general, CAD had higher performance in analyzing blood specimens. Even in three bone marrow specimens, CAD selected 50, 22, 9 ROIs, respectively. Except matching with the initially visual selection of 9, 7, and 5 analyzable cells in these three specimens, the cytogeneticists also selected 41, 15 and 4 new analyzable cells, which were missed in initially visual searching. This experiment showed the feasibility of applying this CAD-guided high-resolution microscopic image scanning system to prescreen and select ROIs that may contain analyzable metaphase chromosome cells. The success and the further improvement of this automated scanning system may have great impact on the future clinical practice in genetic laboratories to detect and diagnose diseases.
Implication of the first decision on visual information-sampling in the spatial frequency domain in pulmonary nodule recognition
Aim: To investigate the impact on visual sampling strategy and pulmonary nodule recognition of image-based properties of background locations in dwelled regions where the first overt decision was made. . Background: Recent studies in mammography show that the first overt decision (TP or FP) has an influence on further image reading including the correctness of the following decisions. Furthermore, the correlation between the spatial frequency properties of the local background following decision sites and the first decision correctness has been reported. Methods: Subjects with different radiological experience were eye tracked during detection of pulmonary nodules from PA chest radiographs. Number of outcomes and the overall quality of performance are analysed in terms of the cases where correct or incorrect decisions were made. JAFROC methodology is applied. The spatial frequency properties of selected local backgrounds related to a certain decisions were studied. ANOVA was used to compare the logarithmic values of energy carried by non redundant stationary wavelet packet coefficients. Results: A strong correlation has been found between the number of TP as a first decision and the JAFROC score (r = 0.74). The number of FP as a first decision was found negatively correlated with JAFROC (r = -0.75). Moreover, the differential spatial frequency profiles outcomes depend on the first choice correctness.
Effect of background detail on CD curve slope in CT head images
Kent M. Ogden, Walter Huda, Sameer Tipnis, et al.
The purpose of this work was to analyze the influence of background structure on the slopes of contrast-detail (CD) curves in CT images acquired in uniform phantoms, anthropomorphic head phantoms, and in clinical head CT images. Alternative forced-choice (AFC) studies were performed using CT images acquired in uniform (water) phantoms, anthropomorphic (RANDO and ATOM) phantoms, and clinical head scans. The AFC experiments measure the lesion contrast (I92%) that corresponds to 92% detection efficiency. The AFC experimental results were plotted as a function of lesion size to produce CD curves, and the slopes of the curves determined when plotted on log-log axes. The Rose model of detection predicts a slope of -1.0 for disk lesions in uniform backgrounds and white noise. CD curve slopes showed a progression that depends on the complexity of the background structure in the CT images. For uniform water phantom images, the slope averaged -0.9, which is close to that predicted by the Rose model. For the anthropomorphic phantoms, the slope averaged -0.56, and for the patient scans the average slope was -0.20. The slope of CD curves depends strongly on the background structure of the images in which the lesions are embedded, with increased background structure leading to decreased CD curve slopes. The Rose model reasonably predicts the slopes for CD curves acquired in uniform phantoms, but is a poor predictor of slopes in clinical head images.
A support vector machine designed to identify breasts at high risk using multi-probe generated REIS signals: a preliminary assessment
David Gur, Bin Zheng, Dror Lederman, et al.
A new resonance-frequency based electronic impedance spectroscopy (REIS) system with multi-probes, including one central probe and six external probes that are designed to contact the breast skin in a circular form with a radius of 60 millimeters to the central ("nipple") probe, has been assembled and installed in our breast imaging facility. We are conducting a prospective clinical study to test the performance of this REIS system in identifying younger women (< 50 years old) at higher risk for having or developing breast cancer. In this preliminary analysis, we selected a subset of 100 examinations. Among these, 50 examinations were recommended for a biopsy due to detection of a highly suspicious breast lesion and 50 were determined negative during mammography screening. REIS output signal sweeps that we used to compute an initial feature included both amplitude and phase information representing differences between corresponding (matched) EIS signal values acquired from the left and right breasts. A genetic algorithm was applied to reduce the feature set and optimize a support vector machine (SVM) to classify the REIS examinations into "biopsy recommended" and "non-biopsy" recommended groups. Using the leave-one-case-out testing method, the classification performance as measured by the area under the receiver operating characteristic (ROC) curve was 0.816 ± 0.042. This pilot analysis suggests that the new multi-probe-based REIS system could potentially be used as a risk stratification tool to identify pre-screened young women who are at higher risk of having or developing breast cancer.
Comparison of algorithms for ultrasound image segmentation without ground truth
Image segmentation is a pre-requisite to medical image analysis. A variety of segmentation algorithms have been proposed, and most are evaluated on a small dataset or based on classification of a single feature. The lack of a gold standard (ground truth) further adds to the discrepancy in these comparisons. This work proposes a new methodology for comparing image segmentation algorithms without ground truth by building a matrix called region-correlation matrix. Subsequently, suitable distance measures are proposed for quantitative assessment of similarity. The first measure takes into account the degree of region overlap or identical match. The second considers the degree of splitting or misclassification by using an appropriate penalty term. These measures are shown to satisfy the axioms of a quasi-metric. They are applied for a comparative analysis of synthetic segmentation maps to show their direct correlation with human intuition of similar segmentation. Since ultrasound images are difficult to segment and usually lack a ground truth, the measures are further used to compare the recently proposed spectral clustering algorithm (encoding spatial and edge information) with standard k-means over abdominal ultrasound images. Improving the parameterization and enlarging the feature space for k-means steadily increased segmentation quality to that of spectral clustering.
Optimization of detector thickness for single slice helical CT with ROC study
Helical CT scan has been acknowledged to be a very useful scanning mode. Normally, the speed of bed movement per rotation (pitch) for a helical CT is fixed to meet the requirement of CT scanning speed. To reduce system cost, single slice helical CT (SSHCT) is often chosen in many applications. It is interesting and useful in real life to answer the question that how to design the detector to obtain optimal performance of a SSHCT in a detection task. In this work, we applied ROC study for the optimization of detector thickness along the direction of rotation axis for our SSHCT. Numerical simulations followed by human observer studies are done in this investigation. Compound Gaussian noises are modeled in our numerical simulations for objects both with and without lesions. An analytical FBP reconstruction method with rebinning is used for noisy data reconstruction. It can be seen in the reconstructions that thin detectors lead to artifacts, and that thick detectors lead to lesion blurring and lower contrast. All these impact on lesion detection in the practical imaging applications. According to our ROC tests done on images from five choices of detector thickness, optimal performance is obtained when choosing detector thickness being around 1~1.25 times of the helical pitch. Moreover, we find that, under different noise level, the optimal point is about the same. Some other figures of merit including SNR and HTC are also calculated and examined in this work. The results relates well with the results of AUCs. It shows that, they could serve very well as the indicators for system optimization when few non-linear physical effect and reconstruction processing are involved.
Quantification of radiographic image quality based on patient anatomical contrast-to-noise ratio: a preliminary study with chest images
Yuan Lin, Xiaohui Wang, William J. Sehnert, et al.
The quality of a digital radiograph for diagnostic imaging depends on many factors, such as the capture system DQE and MTF, the exposure technique factors, the patient anatomy, and the particular image processing method and processing parameters used. Therefore, the overall image quality as perceived by the radiologists depends on many factors. This work explores objective image quality metrics directly from display-ready patient images. A preliminary study was conducted based on a multi-frequency analysis of anatomy contrast and noise magnitude from 250 computed radiography (CR) chest radiographs (150 PA, 50 AP captured with anti-scatter grids, and 50 AP without grids). The contrast and noise values were evaluated in different sub-bands separately according to their frequency properties. Contrast-Noise ratio (CNR) was calculated, the results correlated well with the human observers' overall impression on the images captured with and without grids.
Efficacy of fractal analysis in identifying glaucomatous damage
P. Y. Kim, K. M. Iftekharuddin, P. Gunvant, et al.
In this work, we propose a novel fractal-based technique to analyze pseudo 2D representation of 1D retinal nerve fiber layer (RNFL) thickness measurement data vector set for early detection of glaucoma. In our proposed technique, we first convert the 1D RNFL data vector sets into pseudo 2D images and then exploit 2D fractal analysis (FA) technique to obtain the representative features. These 2D fractal-based features are further processed using principal component analysis (PCA) and the final classification between normal and glaucomatous eyes is obtained using Fischer's linear discriminant analysis (LDA). An independent dataset is used for training and testing the classifier. The technique is used on randomly selected GDx variable corneal compensator (VCC) eye data from 227 study participants (116 patients with glaucoma and 111 patients with healthy eyes). We compute sensitivity, specificity and area under receiver operating curve (AUROC) for statistical performance comparison with other known techniques. Our classification performance shows that fractal-based technique is superior to the standard machine classifier Nerve Fiber Indicator (NFI).