Proceedings Volume 8673

Medical Imaging 2013: Image Perception, Observer Performance, and Technology Assessment

cover
Proceedings Volume 8673

Medical Imaging 2013: Image Perception, Observer Performance, and Technology Assessment

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 11 April 2013
Contents: 9 Sessions, 63 Papers, 0 Presentations
Conference: SPIE Medical Imaging 2013
Volume Number: 8673

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 8673
  • Keynote and Visual Search
  • Image Perception
  • ROC
  • Model Observers
  • Tech Assessment
  • Observer Performance: Breast
  • Observer Performance: General
  • Poster Session
Front Matter: Volume 8673
icon_mobile_dropdown
Front Matter: Volume 8673
This PDF file contains the front matter associated with SPIE Proceedings Volume 8673, including the Title Page, Copyright Information, Table of Contents, and the Conference Committee listing.
Keynote and Visual Search
icon_mobile_dropdown
Investigating the association of eye gaze pattern and diagnostic error in mammography
Sophie Voisin, Frank Pinto, Songhua Xu, et al.
The objective of this study was to investigate the association between gaze patterns and the diagnostic performance of radiologists for the task of assessing the likelihood of malignancy of mammographic masses. Six radiologists (2 expert breast imagers and 4 Radiology residents of variable training) assessed the likelihood of malignancy of 40 biopsy-proven mammographic masses (20 malignant and 20 benign) on a computer monitor. Gaze data were collected using a commercial remote eye tracker. Upon reviewing each mass, the radiologists were asked to provide their assessment regarding the probability of malignancy of the depicted mass as well as a rating regarding the perceived difficulty of the diagnostic task. The collected gaze data were analyzed using established algorithms. Various quantitative metrics were extracted to characterize the recorded gaze patterns. The extracted metrics were correlated with the radiologists’ diagnostic decisions and perceived complexity scores. Results showed that the association between radiologists’ gaze metrics and their error making patterns varies, not only depending on the radiologists’ experience level but also among individuals. However, some gaze metrics appear to correlate with diagnostic error and perceived complexity more consistently. These results suggest that although gaze patterns are generally associated with diagnostic error and the perceived difficulty of the diagnostic task, there are substantial differences among individuals that are not explained simply by the training level of the individual performing the diagnostic task.
High throughput screening for mammography using a human-computer interface with rapid serial visual presentation (RSVP)
Chris Hope, Annette Sterr, Premkumar Elangovan, et al.
The steady rise of the breast cancer screening population, coupled with data expansion produced by new digital screening technologies (tomosynthesis/CT) motivates the development of new, more efficient image screening processes. Rapid Serial Visual Presentation (RSVP) is a new fast-content recognition approach which uses electroencephalography to record brain activity elicited by fast bursts of image data. These brain responses are then subjected to machine classification methods to reveal the expert’s ‘reflex’ response to classify images according to their presence or absence of particular targets. The benefit of this method is that images can be presented at high temporal rates (~10 per second), faster than that required for fully conscious detection, facilitating a high throughput of image (screening) material. In the present paper we present the first application of RSVP to medical image data, and demonstrate how cortically coupled computer vision can be successfully applied to breast cancer screening. Whilst prior RSVP work has utilised multichannel approaches, we also present the first RSVP results demonstrating discriminatory response on a single electrode with a ROC area under the curve of 0.62- 0.86 using a simple Fisher discriminator for classification. This increases to 0.75 – 0.94 when multiple electrodes are used in combination.
Image Perception
icon_mobile_dropdown
A novel graphical user interface for high-efficacy modeling of human perceptual similarity opinions
We present a novel graphical user interface (GUI) that facilitates high-efficacy collection of perceptual similarity opinions of a user in an effective and intuitive manner. The GUI is based on a hybrid mechanism that combines ranking and rating. Namely, it presents a base image for rating its similarity to seven peripheral images that are simultaneously displayed in a circular layout. The user is asked to report the base image’s pairwise similarity to each peripheral image on a fixed scale while preserving the relative ranking among all peripheral images. The collected data are then used to predict the user’s subjective opinions regarding the perceptual similarity of images. We tested this new approach against two methods commonly used in perceptual similarity studies: (1) a ranking method that presents triplets of images for selecting the image pair with the highest internal similarity and (2) a rating method that presents pairs of images for rating their relative similarity on a fixed scale. We aimed to determine which data collection method was the most time efficient and effective for predicting a user’s perceptual opinions regarding the similarity of mammographic masses. Our study was conducted with eight individuals. By using the proposed GUI, we were able to derive individual perceptual similarity profiles with a prediction accuracy ranging from 76.83% to 92.06% which was 41.4% to 46.9% more accurate than those derived with the other two data collection GUIs. The accuracy improvement was statistically significant.
BREAST: a novel method to improve the diagnostic efficacy of mammography
P. C. Brennan, K. Tapia, J. Ryan, et al.
High quality breast imaging and accurate image assessment are critical to the early diagnoses, treatment and management of women with breast cancer. Breast Screen Reader Assessment Strategy (BREAST) provides a platform, accessible by researchers and clinicians world-wide, which will contain image data bases, algorithms to assess reader performance and on-line systems for image evaluation. The platform will contribute to the diagnostic efficacy of breast imaging in Australia and beyond on two fronts: reducing errors in mammography, and transforming our assessment of novel technologies and techniques. Mammography is the primary diagnostic tool for detecting breast cancer with over 800,000 women X-rayed each year in Australia, however, it fails to detect 30% of breast cancers with a number of missed cancers being visible on the image [1-6]. BREAST will monitor the mistakes, identify reasons for mammographic errors, and facilitate innovative solutions to reduce error rates. The BREAST platform has the potential to enable expert assessment of breast imaging innovations, anywhere in the world where experts or innovations are located. Currently, innovations are often being assessed by limited numbers of individuals who happen to be geographically located close to the innovation, resulting in equivocal studies with low statistical power. BREAST will transform this current paradigm by enabling large numbers of experts to assess any new method or technology using our embedded evaluation methods. We are confident that this world-first system will play an important part in the future efficacy of breast imaging.
Perception in screening mammography: Can insertion of obvious cases enhance detection?
Sarah J. Lewis, Mariusz W. Pietrzyk, Robert C. K. Nurthen, et al.
Purpose : To determine whether a strategy of inserting obvious cancers can improve the detection of subsequent abnormal cases in screening mammography sets. Method : Eight experienced breast imaging radiologists (mammographers) were asked to interpret 40 mammography cases in two sittings and localise any malignancies present. Two differing conditions were presented to participants. In Condition 1, there were 36 normal images interspersed with 4 abnormal cases determined to be of medium to high difficulty. Condition 2 differed in that two normal cases were replaced with two obvious malignant cases. These two obvious cases were placed shortly before two subtle malignancies. In both sittings, participants were told they were viewing a screening mammography set. Results: There was no statistical difference in the location sensitivity between the 2 conditions. There was decreased overall specificity in Condition 2 (p = 0.43). Conclusion: Preliminary findings suggest that the insertion of more easily observed abnormal cases into image sets does not improve performance and may in fact result in lower specificity. Further analysis of participants’ eye-positions, and search strategies may offer some explanation of our findings
Is grandma like a lichen planus? The problem of image perception and knowledge retention in pathology
Claudia Mello-Thoms, Elizabeth Legowski, Eugene Tseytlin
Medicine is the science of acquiring a lot of obscure knowledge and the art of knowing when to apply it, even if only once in a physician’s lifetime. Although medical experts seem to have it all figured out, being significantly better and faster than trainees, many studies have suggested that it is not only the amount of knowledge – which comes with experience – that differentiates the experts, but it is also how the knowledge is structured in memory. To acquire new knowledge, trainees will first encode both ‘surface’ (i.e., irrelevant) and ‘structural’ (relevant) features, and repeated presentations of the material will allow for dismissal of the unimportant elements from memory. However, just because knowledge has been encoded it does not mean that it is safely guarded in the physician’s memory; as with any information, if it is not tended to, it will slowly decay, and eventually it may be completely forgotten. In this study we investigated knowledge retention in a specific sub-domain of Pathology which is rarely, if ever, used by trainees. We wanted to determine the relationship between the way long-term memory is accessed (i.e., through recognition or free recall) and trainee performance. We also sought to determine whether access to long-term memory through either mechanism led to better transfer of newly acquired knowledge to never before seen cases.
Characterization of human observer detection in AFC volumetric detection tasks
Ivan Diaz, Sabine Kobbe-Schmidt, Francis R. Verdun, et al.
Model observers applied on 3D images should take into account the speed at which the frames are displayed. We performed a series of 3D 2-AFC (Alternative Forced Choice) detection tasks on volumetric stacks of CT lung images. The strategy used by radiologists and naïve observers was assessed using an eye-tracker. In a first set of experiments, the observers were restricted to read the images at fixed speeds of image scrolling and were only shown each alternative once. They were then allowed to scroll through the image stacks at will. The experiment was then modified by adding two more choices (4-AFC), varying the position of the signal in the stack, as well as a lower contrast. The performance of the observers was not depended on the speed, contrast, or experience. However, the naïve observers exhibited a much different pattern of scrolling than the radiologists. We were able to determine a histogram of scrolling speeds in frames per second. The scrolling speed at the moment the signal was detected was estimated at 20 fps. This was higher than that estimated by the radiologists.
ROC
icon_mobile_dropdown
The impact of using a JAFROC or ROC approach on the conclusions of a typical observer performance study
Mohammad Rawashdeh, Warwick Lee, Mariusz Pietrzyk, et al.
Purpose: The current study aims to compare ROC with JAFROC methodologies to investigate how the choice of available analytical approaches in observer studies can impact upon study conclusions.
Methods and materials: A total of 129 readers independently reviewed 60 mammographic cases, 20 of which were biopsy proven cases (abnormal) and 40 were normal. Each case consisted of the four standard cranio-caudal (CC) and medio-lateral oblique (MLO) projections. Readers were asked to interpret and locate any presence of cancer, and levels of confidence were scored on a scale of 1-5. Radiology workstations supporting 5MP diagnostic monitors and with full image manipulation tools were used to display all images. JAFROC and ROC methodologies were used and figures of merit and Az values respectively were correlated against key reader characteristics such as experience, qualifications, breast reading practices and physical characteristics using Spearman techniques.
Results: Correlation analysis between reader characteristics and JAFROC analysis demonstrated that four key characteristics were linked to performance: years of qualification as a radiologist (p=0.05, r= 0.18), years reading mammograms (p=0.01, r=0.24), number of mammograms read per year (p=0.001, r=0.24), and hours reading mammogram per week (p=0.04, r= 0.19). The ROC method indicated that determinants of performance were confined to years reading mammograms (p=0.02, r = 0.2), and number of mammograms read per year (p=0.04, r=0.23).
Conclusion: This work demonstrates the practical impact on study conclusions when different methodologies are used. The location sensitivity approach employed and statistical power with JAFROC, would suggest that the findings from this approach should be prioritized.
One parameter contaminated binormal model (CBM) for analysis of difficult-to-fit ROC data
Introduction. Perception experiments collecting rating method ROC data sometimes result in operating points at only relatively high specificities for some treatment-reader combinations. In the extreme, no operating points are internal to the feasible space of many parametric models (i.e. for all points, FP = 0). Dorfman & Berbaum1 developed a contaminated binormal model (CBM) to account for ROC data that have few false-positive reports even though many healthy subjects are sampled. Unfortunately, CBM can give very different ROC curve shapes for similar ROC points and when there are no internal operating points, the ROC curve shape will often differ substantially from that obtained when there are internal operating points. Materials and Methods. We eliminate the CBM limiting case by adding a small constant to each cell of the rating data matrix2,3 and to set μ, the difference between the visible signal and noise distributions, to the same high value for all conditions.1 Results. We illustrate the resulting ROC curves using an example dataset from Schartz et al.4 All observed ROC points become internal. The fitted ROC curves are similar to those of the limiting CBM and empirical ROC, but all curves using the same μ have the same shape and never cross. ROC accuracy parameters such area, partial area, and sensitivity at any fixed specificity correspond perfectly. Conclusions. Constraining the CBM to a fixed large μ provides a more effective way to apply it to difficult-to-fit data.
Statistical properties of a utility measure of observer performance compared to area under the ROC curve
The receiver operating characteristic (ROC) curve has become a common tool for evaluating diagnostic imaging technologies, and the primary endpoint of such evaluations is the area under the curve (AUC), which integrates sensitivity over the entire false positive range. An alternative figure of merit for ROC studies is expected utility (EU), which focuses on the relevant region of the ROC curve as defined by disease prevalence and the relative utility of the task. However if this measure is to be used, it must also have desirable statistical properties keep the burden of observer performance studies as low as possible. Here, we evaluate effect size and variability for EU and AUC. We use two observer performance studies recently submitted to the FDA to compare the EU and AUC endpoints. The studies were conducted using the multi-reader multi-case methodology in which all readers score all cases in all modalities. ROC curves from the study were used to generate both the AUC and EU values for each reader and modality. The EU measure was computed assuming an iso-utility slope of 1.03. We find mean effect sizes, the reader averaged difference between modalities, to be roughly 2.0 times as big for EU as AUC. The standard deviation across readers is roughly 1.4 times as large, suggesting better statistical properties for the EU endpoint. In a simple power analysis of paired comparison across readers, the utility measure required 36% fewer readers on average to achieve 80% statistical power compared to AUC.
The equivalence of a human observer and an ideal observer in binary diagnostic tasks
The Ideal Observer (IO) is “ideal” for given data populations. In the image perception process, as the raw images are degraded by factors such as display and eye optics, there is an equivalent IO (EIO). The EIO uses the statistical information that exits the perception/cognitive degradations as the data. We assume a human observer who received sufficient training, e.g., radiologists, and hypothesize that such a human observer can be modeled as if he is an EIO. To measure the likelihood ratio (LR) distributions of an EIO, we formalize experimental design principles that encourage rationality based on von Neumann and Morgenstern’s (vNM) axioms. We present examples to show that many observer study design refinements, although motivated by empirical principles explicitly, implicitly encourage rationality. Our hypothesis is supported by a recent review paper on ROC curve convexity by Pesce, Metz, and Berbaum. We also provide additional evidence based on a collection of observer studies in medical imaging. EIO theory shows that the “sub-optimal” performance of a human observer can be mathematically formalized in the form of an IO, and measured through rationality encouragement.
A nonparametric approach for statistical comparison of results from alternative forced choice experiments
Frédéric Noo, Adam Wunderlich, Dominic Heuscher, et al.
Task-based image quality assessment is a valuable methodology for development, optimization and evaluation of new image formation processes in x-ray computed tomography (CT), as well as in other imaging modalities. A simple way to perform such an assessment is through the use of two (or more) alternative forced choice (AFC) experiments. In this paper, we are interested in drawing statistical inference from outcomes of multiple AFC experiments that are obtained using multiple readers as well as multiple cases. We present a non-parametric covariance estimator for this problem. Then, we illustrate its usefulness with a practical example involving x-ray CT simulations. The task for this example is classification between presence or absence of one lesion with unknown location within a given object. This task is used for comparison of three standard image reconstruction algorithms in x-ray CT using four human observers.
Model Observers
icon_mobile_dropdown
Two complementary model observers to evaluate reconstructions of simulated micro-calcifications in digital breast tomosynthesis
Koen Michielsen, Federica Zanca, Nicholas Marshall, et al.
New imaging modalities need to be properly evaluated before being introduced in clinical practice. The gold standard is to perform clinical trials or dedicated clinical performance related observer experiments with experienced readers. Unfortunately this is not feasible during development or optimization of new reconstruction algorithms due to their many degrees of freedom. Our goal is to design a set of model observers to evaluate the performance of newly developed reconstruction methods on the assessment of micro-calcifications in digital breast tomosynthesis. In order to do so, the model observers need to evaluate both detection and classification of micro-calcifications. A channelized Hotelling observer was created for the detection task and a Hotelling observer working on an extracted feature vector was implemented for the classification task. These observers were evaluated on their ability to predict the results of human observers. Results from a previous observer study were used as reference to compare performance between human and model observers. This study evaluated detection of small micro-calcifications (100 { 200 _m) by a free search task in a power law filtered noise background and classification of two types of larger micro-calcifications (200 {600 _m) in the same background. Scores from the free search study were evaluated using the weighted JAFROC method and the classification scores were analyzed using the DBM MRMC method. The same analysis methods were applied to the model observer scores. Results of the detection model observer were related linearly with the human observer results with a correlation coefficient of 0.962. The correlation coefficient for the classification task was 0.959 with a power law non-linear regression.
Integration of spatio-temporal contrast sensitivity with a multi-slice channelized Hotelling observer
Ali N. Avanaki, Kathryn S. Espig, Cedric Marchessoux, et al.
Barten’s model of spatio-temporal contrast sensitivity function of human visual system is embedded in a multi-slice channelized Hotelling observer. This is done by 3D filtering of the stack of images with the spatio-temporal contrast sensitivity function and feeding the result (i.e., the perceived image stack) to the multi-slice channelized Hotelling observer. The proposed procedure of considering spatio-temporal contrast sensitivity function is generic in the sense that it can be used with observers other than multi-slice channelized Hotelling observer. Detection performance of the new observer in digital breast tomosynthesis is measured in a variety of browsing speeds, at two spatial sampling rates, using computer simulations. Our results show a peak in detection performance in mid browsing speeds. We compare our results to those of a human observer study reported earlier (I. Diaz et al. SPIE MI 2011). The effects of display luminance, contrast and spatial sampling rate, with and without considering foveal vision, are also studied. Reported simulations are conducted with real digital breast tomosynthesis image stacks, as well as stacks from an anthropomorphic software breast phantom (P. Bakic et al. Med Phys. 2011). Lesion cases are simulated by inserting single micro-calcifications or masses. Limitations of our methods and ways to improve them are discussed.
Exact confidence intervals for channelized Hotelling observer performance
Adam Wunderlich, Frederic Noo, Marta Heilbrun
Task-based assessments of image quality constitute a rigorous, principled approach to the evaluation of imaging system performance. To conduct such assessments, it has been recognized that mathematical model observers are very useful, particularly for purposes of imaging system development and optimization. One type of model observer that has been widely applied in the medical imaging community is the channelized Hotelling observer (CHO). In the present work, we address the need for reliable confidence interval estimators of CHO performance. Specifically, we observe that a procedure proposed by Reiser for interval estimation of the Mahalanobis distance can be applied to obtain confidence intervals for CHO performance. In addition, we find that these intervals are well-defined with theoretically-exact coverage probabilities, which is a new result not proved by Reiser. The confidence intervals are tested with Monte Carlo simulation and demonstrated with an example comparing x-ray CT reconstruction strategies.
Objectively measuring signal detectability, contrast, blur and noise in medical images using channelized joint observers
To improve imaging systems and image processing techniques, objective image quality assessment is essential. Model observers adopting a task-based quality assessment strategy by estimating signal detectability measures, have shown to be quite successful to this end. At the same time, costly and time-consuming human observer experiments can be avoided. However, optimizing images in terms of signal detectability alone, still allows a lot of freedom in terms of the imaging parameters. More specifically, fixing the signal detectability defines a manifold in the imaging parameter space on which different “possible” solutions reside. In this article, we present measures that can be used to distinguish these possible solutions from each other, in terms of image quality factors such as signal blur, noise and signal contrast. Our approach is based on an extended channelized joint observer (CJO) that simultaneously estimates the signal amplitude, scale and detectability. As an application, we use this technique to design k-space trajectories for MRI acquisition. Our technique allows to compare the different spiral trajectories in terms of blur, noise and contrast, even when the signal detectability is estimated to be equal.
Model mismatch and the ideal observer in SPECT
Michael Ghaly, Jonathan M. Links, Yong Du, et al.
SPECT acquisition parameters can be optimized using the Ideal Observer (IO) applied to projections. The IO implicitly has perfect knowledge of the image formation process, and thus its performance reflects the best achievable with perfect compensation. However, SPECT images are often reconstructed with imperfect or no compensation. This mismatch between the reconstruction and IO can give rise to suboptimal performance when human observers interpret SPECT images. In this study, we investigated the importance of including ‘model mismatch’ (MM) in the IO in the context of myocardial perfusion SPECT for a signal known exactly/background known statistically (SKE/BKS) task. We optimized the energy window using the IO with and without MM. We evaluated IO performance when the observer had (1) a perfect and (2) no or (3) an approximate model of scatter. We used an anthropomorphic model observer (AO) as a benchmark for human observer performance and compared optimal energy windows. IO performance was relatively insensitive to energy window settings for case 1. Performance for case 2 was significantly worse than for 1, and the optimal window width was 13-15%. For case 3, performance was similar to case 1. Performance for the AO was similar to cases 2 and 3 when scatter compensation was or was not, respectively, included in the reconstruction. Incorporating MM into the IO is a new approach for improving agreement of the IO with human observers. This allows optimization of acquisition and instrumentation parameters in the presence of non-ideal compensation methods more efficiently than reconstructed-image-domain AOs.
Tests of a 3D visual-search model observer for SPECT
Observer studies with single 2D images can bias assessments of diagnostic technologies, as physicians usually have access to an entire image volume presented as multiple slices in multiple views. Previously, we introduced a scanning model observer for detection-localization tasks with multislice-multiview (or volumetric) display, but this observer did not compare well against human-observer data. The current work continues our investigation with tests of a 3D visual-search (VS) model observer. The VS framework amounts to an initial holistic search that identifies suspicious locations for analysis by a statistical observer. Our VS model uses a scanning observer for the analysis. The VS model was evaluated against the scanning and human observers in a localization ROC study of mass detection in SPECT lung imaging. The study compared two iterative reconstruction strategies that applied different combinations of corrections for attenuation, scatter, and distance-dependent system resolution. In our earlier work, the scanning and human observers ranked the strategies in opposite order of performance. The ranking from the VS observer matched that of the humans.
Tech Assessment
icon_mobile_dropdown
Identification of depth information with stereoscopic mammography using different display methods
Takamitsu Morikawa, Yoshie Kodera
Stereoscopy in radiography was widely used in the late 80’s because it could be used for capturing complex structures in the human body, thus proving beneficial for diagnosis and screening. When radiologists observed the images stereoscopically, radiologists usually needed the training of their eyes in order to perceive the stereoscopic effect. However, with the development of three-dimensional (3D) monitors and their use in the medical field, only a visual inspection is no longer required in the medical field. The question then arises as to whether there is any difference in recognizing depth information when using conventional methods and that when using a 3D monitor. We constructed a phantom and evaluated the difference in capacity to identify the depth information between the two methods. The phantom consists of acryl steps and 3mm diameter acryl pillars on the top and bottom of each step. Seven observers viewed these images stereoscopically using the two display methods and were asked to judge the direction of the pillar that was on the top. We compared these judged direction with the direction of the real pillar arranged on the top, and calculated the percentage of correct answerers (PCA). The results showed that PCA obtained using the 3D monitor method was higher PCA by about 5% than that obtained using the naked-eye method. This indicated that people could view images stereoscopically more precisely using the 3D monitor method than when using with conventional methods, like the crossed or parallel eye viewing. We were able to estimate the difference in capacity to identify the depth information between the two display methods.
Assessment of visual-spatial skills in medical context tasks when using monoscopic and stereoscopic visualization
Marisol Martinez Escobar, Bethany Juhnke, Kenneth Hisley, et al.
The dramatic rise of digital medical imaging has allowed medical personnel to see inside their patients as never before. Many software products are now available to view this data in various 2D and 3D formats. This also raises many basic research questions on spatial perception for humans viewing these images. The work presented here attempts to answer the question: How would adding the stereopsis depth cue affect relative position tasks in a medical context? By designing and conducting a study to isolate the benefits between monoscopic 3D and stereoscopic 3D displays in a relative position task, the following hypothesis was tested: stereoscopic 3D displays are beneficial over monoscopic 3D displays for relative position judgment tasks in a medical visualization setting. The results show that stereoscopic condition yielded a higher score than the monoscopic condition, but the results were not always statistically significant.
Effect of image processing version on detection of non-calcification cancers in 2D digital mammography imaging
L. M. Warren, J. Cooke, R. M. Given-Wilson, et al.
Image processing (IP) is the last step in the digital mammography imaging chain before interpretation by a radiologist. Each manufacturer has their own IP algorithm(s) and the appearance of an image after IP can vary greatly depending upon the algorithm and version used. It is unclear whether these differences can affect cancer detection. This work investigates the effect of IP on the detection of non-calcification cancers by expert observers. Digital mammography images for 190 patients were collected from two screening sites using Hologic amorphous selenium detectors. Eighty of these cases contained non-calcification cancers. The images were processed using three versions of IP from Hologic – default (full enhancement), low contrast (intermediate enhancement) and pseudo screen-film (no enhancement). Seven experienced observers inspected the images and marked the location of regions suspected to be non-calcification cancers assigning a score for likelihood of malignancy. This data was analysed using JAFROC analysis. The observers also scored the clinical interpretation of the entire case using the BSBR classification scale. This was analysed using ROC analysis. The breast density in the region surrounding each cancer and the number of times each cancer was detected were calculated. IP did not have a significant effect on the radiologists’ judgment of the likelihood of malignancy of individual lesions or their clinical interpretation of the entire case. No correlation was found between number of times each cancer was detected and the density of breast tissue surrounding that cancer.
Can technical characteristics predict clinical performance in PET/CT imaging? A correlation study for thyroid cancer diagnosis
Maria Kallergi, Dimitrios Menychtas, Alexandros Georgakopoulos, et al.
The purpose of this study was to determine whether image characteristics could be used to predict the outcome of ROC studies in PET/CT imaging. Patients suspected for recurrent thyroid cancer underwent a standard whole body (WB) examination and an additional high-resolution head-and-neck (HN) F18-FDG PET/CT scan. The value of the latter was determined with an ROC study, the results of which showed that the WB+HN combination was better than WB alone for thyroid cancer detection and diagnosis. Following the ROC experiment, the WB and HN images of confirmed benign or malignant thyroid disease were analyzed and first and second order textural features were determined. Features included minimum, mean, and maximum intensity, as well as contrast in regions of interest encircling the thyroid lesions. Lesion size and standard uptake values (SUV) were also determined. Bivariate analysis was applied to determine relationships between WB and HN features and between observer ROC responses and the various feature values. The two sets showed significant associations in the values of SUV, contrast, and lesion size. They were completely different when the intensities were considered; no relationship was found between the WB minimum, maximum, and mean ROI values and their HN counterparts. SUV and contrast were the strongest predictors of ROC performance on PET/CT examinations of thyroid cancer. The high resolution HN images seem to enhance these relationships but without a single dramatic effect as was projected from the ROC results. A combination of features from both WB and HN datasets may possibly be a more robust predictor of ROC performance.
Enhancing reproducibility of ultrasonic measurements by new users
Perception of operator influences ultrasound image acquisition and processing. Lower costs are attracting new users to medical ultrasound. Anticipating an increase in this trend, we conducted a study to quantify the variability in ultrasonic measurements made by novice users and identify methods to reduce it. We designed a protocol with four presets and trained four new users to scan and manually measure the head circumference of a fetal phantom with an ultrasound scanner. In the first phase, the users followed this protocol in seven distinct sessions. They then received feedback on the quality of the scans from an expert. In the second phase, two of the users repeated the entire protocol aided by visual cues provided to them during scanning. We performed off-line measurements on all the images using a fully automated algorithm capable of measuring the head circumference from fetal phantom images. The ground truth (198.1±1.6 mm) was based on sixteen scans and measurements made by an expert. Our analysis shows that: (1) the inter-observer variability of manual measurements was 5.5 mm, whereas the inter-observer variability of automated measurements was only 0.6 mm in the first phase (2) consistency of image appearance improved and mean manual measurements was 4-5 mm closer to the ground truth in the second phase (3) automated measurements were more precise, accurate and less sensitive to different presets compared to manual measurements in both phases. Our results show that visual aids and automation can bring more reproducibility to ultrasonic measurements made by new users.
Observer Performance: Breast
icon_mobile_dropdown
Test set readings predict clinical performance to a limited extent: preliminary findings
BaoLin P. Soh, Warwick M. Lee, Peter L. Kench, et al.
Aim: To investigate the level of agreement between test sets and actual clinical reading
Background: The performance of screen readers in detecting breast cancer is being assessed in some countries by using mammographic test sets. However, previous studies have provided little evidence that performance assessed by test sets strongly correlate to performance in clinical reading.
Methods: Five clinicians from BreastScreen New South Wales participated in this study. Each clinician was asked to read 200 de-identified mammographic examinations gathered from their own case history within the BreastScreen NSW Digital Imaging Library. All test sets were designed with specific proportions of true positive, true negative, false positive and false negative examinations from the previous actual clinical reads of each reader. A prior mammogram examination for comparison (when available) was also provided for each case.
Results: Preliminary analyses have shown that there is a moderate level of agreement (Kappa 0.42−0.56, p < 0.001) between laboratory test sets and actual clinical reading. In addition, a mean increase of 38% in sensitivity in the laboratory test sets as compared to their actual clinical readings was demonstrated. Specificity is similar between the laboratory test sets and actual clinical reading.
Conclusion: This study demonstrated a moderate level of agreement between actual clinical reading and test set reading, which suggests that test sets have a role in reflecting clinical performance.
A comparison of image interpretation times in full field digital mammography and digital breast tomosynthesis
Susan Astley, Sophie Connor, Yit Lim, et al.
Digital Breast Tomosynthesis (DBT) provides three-dimensional images of the breast that enable radiologists to discern whether densities are due to overlapping structures or lesions. To aid assessment of the cost-effectiveness of DBT for screening, we have compared the time taken to interpret DBT images and the corresponding two-dimensional Full Field Digital Mammography (FFDM) images. Four Consultant Radiologists experienced in reading FFDM images (4 years 8 months to 8 years) with training in DBT interpretation but more limited experience (137-407 cases in the past 6 months) were timed reading between 24 and 32 two view FFDM and DBT cases. The images were of women recalled from screening for further assessment and women under surveillance because of a family history of breast cancer. FFDM images were read before DBT, according to local practice. The median time for readers to interpret FFDM images was 17.0 seconds, with an interquartile range of 12.3-23.6 seconds. For DBT, the median time was 66.0 seconds, and the interquartile range was 51.1-80.5 seconds. The difference was statistically significant (p<0.001). Reading times were significantly longer in family history clinics (p<0.01). Although it took approximately four times as long to interpret DBT than FFDM images, the cases were more complex than would be expected for routine screening, and with higher mammographic density. The readers were relatively inexperienced in DBT interpretation and may increase their speed over time. The difference in times between clinics may be due to increased throughput at assessment, or decreased density.
Same task, same observers, different values: the problem with visual assessment of breast density
Jamie C. Sergeant, Lani Walshaw, Mary Wilson, et al.
The proportion of radio-opaque fibroglandular tissue in a mammographic image of the breast is a strong and modifiable risk factor for breast cancer. Subjective, area-based estimates made by expert observers provide a simple and efficient way of measuring breast density within a screening programme, but the degree of variability may render the reliable identification of women at increased risk impossible. This study examines the repeatability of visual assessment of percent breast density by expert observers. Five consultant radiologists and two breast physicians, all with at least two years’ experience in mammographic density assessment, were presented with 100 digital mammogram cases for which they had estimated density at least 12 months previously. Estimates of percent density were made for each mammographic view and recorded on a printed visual analogue scale. The level of agreement between the two sets of estimates was assessed graphically using Bland-Altman plots. All but one observer had a mean difference of less than 6 percentage points, while the largest mean difference was 14.66 percentage points. The narrowest 95% limits of agreement for the differences were -11.15 to 17.35 and the widest were -13.95 to 40.43. Coefficients of repeatability ranged from 14.40 to 38.60. Although visual assessment of breast density has been shown to be strongly associated with cancer risk, the lack of agreement shown here between repeat assessments of the same images by the same observers questions the reliability of using visual assessment to identify women at high risk or to detect moderate changes in breast density over time.
The impact of mammographic density and lesion location on detection
Dana Al Mousa, Elaine Ryan, Warwick Lee, et al.
The aim of this study is to examine the impact of breast density and lesion location on detection. A set of 55 mammographic images (23 abnormal images with 26 lesions and 32 normal images) were examined by 22 expert radiologists. The images were classified by an expert radiologist according to the Synoptic Breast Imaging Report of the National Breast Cancer Centre (NBCC) as having low mammographic density (D1<25% glandular and D2> 25-50% glandular) or high density (D3 51-75% glandular and D4> 75-glandular). The observers freely examined the images and located any malignancy using a 5-point confidence. Performance was defined using the following metrics: sensitivity, location sensitivity, specificity, receiver operating characteristic (ROC Az) curves and jackknife free-response receiver operator characteristics (JAFROC) figures of merit. Significant increases in sensitivity (p= 0.0174) and ROC (p=0.0001) values were noted for the higher density compared with lower density images according to NBCC classification. No differences were seen in radiologists’ performance between lesions within or outside the fibroglandular region. In conclusion, analysis of our data suggests that radiologists scored higher using traditional metrics in higher mammographic density images without any improvement in lesion localisation. Lesion location whether within or outside the fibroglandular region appeared to have no impact on detection abilities suggesting that if a masking effect is present the impact is minimal. Eye-tracking analyses are ongoing.
Does routine breast screening practice over-ride display quality in reporting enriched test sets?
The performance of a group of 16 American (US) breast screening radiologists in interpreting a number of cases from a recent PERFORMS self-assessment case set which had been carefully selected to exclude small calcifications, using sub-mammographic resolution displays, as compared to a British (UK) group of radiologists using mammographic displays has previously been reported. It was found that the UK group performed better, detecting more cancers with the US participants correctly recalling less. These results were interpreted as due to differences in the displays employed by each group as well as to routine screening differences between the two countries. This current study extended that work with 11 of these experienced US breast screening radiologists further interpreting 20 new PERFORMS mammographic cases using a suitable mammographic clinical workstation. The PERFORMS cases were selected so as to show a range of normal, benign and abnormal appearances. Data from these radiologists were compared to their earlier performance on different PERFORMS cases and sub-clinical displays. Their data were also compared to recent data of 11 UK radiologists reading the same cases, again on clinical workstations as well as to all UK screeners. Despite using equivalent clinical monitors, data indicate differences between the UK and US groups in recall decisions which is not just a function of the countries’ screening approaches. Lower detection of abnormal cases by the US group was found here and reasons for this are explored.
Difficulty of mammographic cases in the context of resident training: preliminary experimental data
We are currently developing an intelligent data-driven educational system for mammography. Since our system attempts to predict which cases will be difficult for the trainees, it is important to better understand the concept of case difficulty. While the concept of difficulty is central to our efforts on adaptive education, its importance extends to radiology education in general as well as to image perception research. In this study, we tested some hypotheses that related to difficulty. Specifically, we performed a preliminary reader study to evaluate relationship between the error rate (an objective measure of difficulty), individual assessment of case difficulty by a resident and expert’s assessment of case difficulty (two subjective measures of difficulty). Furthermore, we investigated the relationship between individual and expert’s assessment of difficulty and time that the residents took to interpret the case. Time taken to interpret a case by a resident related well with the individual assessment of difficulty but its relationship with the expert’s assessment of difficulty was weaker. The analysis of the difficulty assessments showed that an increase in individual assessment of difficulty made by a resident relates well to an increase in his/her false positive errors but not to an increase in false negative errors. Interestingly, the expert’s assessment of difficulty was related to false negative errors in the trainees but not to false positive errors. These results offer additional guidance in our efforts to construct an adaptive education system as well as provide insight into important aspects of radiology education in general.
Observer Performance: General
icon_mobile_dropdown
Analysis of individual variability and habituation in stereoscopic radiography
In our previous stereoscopic image for medical use research, we reported that observers found it is easier to identify target objects in stereoscopic images than in two dimensional images, however, we found that mental and visual fatigue levels are equivalent in viewing the stereoscopic and the two dimensional images. We reported that a number of users dislike the sensations accompanying stereoscopic vision. Hence, we studied personal variation of stereoscopic visibility and the training effect for the stereoscopic visibility in this research. Simulated images, in which prepared calcifications were arranged at parallactic angles between ±2° to ±15° at object heights from 40 to 80mm, were displayed on a stereoscopic 3D display. Seven observers were selected to judge the achievement of stereoscopic vision (stereopsis) and their visibility was determined. The observers were asked to point the stereoscopic cursor of the 3D mammography viewer at the simulated calcifications and the accuracy rates were determined. Subsequently, re-examination was implemented after 3D visual training for 15 to 20 minutes per day for two weeks, and the visibility and accuracy rates were measured again. We found individual differences in the parallactic angles at which stereopsis was realized. Moreover, the parallactic angles of stereopsis widened through training and the average visibility improved from 69% to 84% as the result of training. Furthermore, the average accuracy rates improved from 53% to 60% the accuracy of depth commands improved. This suggests that observers who are weak in stereoscopic vision can be trained to be better at stereoscopic viewing.
Impact of bone suppression imaging on the detection of lung nodules in chest radiographs: analysis of multiple reading sessions
S. Schalekamp, B. van Ginneken, C. M. Schaefer-Prokop, et al.
Observer studies are frequently performed to test new modalities. Correct study design is important to generate reliable results. Two most frequently used observer study designs are the sequential and the independent reading design. We investigated the effect of different observer study designs on reader performance results and statistical power. The study included multiple assessments of chest radiographs (CXR) with bone suppression images (BSI) for the detection of lung nodules. In a fully crossed study design 8 observers assessed first radiographs without and with BSI sequentially. Secondly they scored radiographs independently having BSI available from the beginning. Five months later, the same readers scored the same cases again in an independent reading session, completing the three scorings for CXRs with BSI. Observer performance was compared using multi reader multi case (MRMC) receiver operating characteristics (ROC). To estimate reader variance, Dorfman, Berbaum, Metz (DBM) variance component estimates were calculated. No significant difference between the sequential and the independent reading sessions could be found (p=0.51; p=0.61). Both reading designs showed increased performance with BSI, with a significant increase for the sequential and the independent reading session after five months (p=0.002; p=0.007). Total observer variance between sequential and independent reading design remained the same. A strong increase of uncorrelated components was found in the independent reading sessions, masking the ability to demonstrate differences in observer performance across modalities. In conclusion, results of the sequential and the independent study design did not show a significant difference. The independent study design had less power compared to the sequential study design due to a strong increase of uncorrelated variance components.
A preliminary comparison of different methods for observer performance estimation
Francesc Massanes, Jovan G. Brankov
In medical imaging, image quality is assessed by the degree to which a human observer can correctly perform a given diagnostic task. Therefore the image quality is typically quantified by using performance measurements from decision/detection theory like the receiver operation characteristic (ROC) curve and the area under ROC curve (AUC). In this paper we compare five different AUC estimation techniques, widely used in the literature, including parametric and non-parametric methods. We compared each method by equivalence hypothesis testing using a model observer as well as data sets from a previously published human observer study. The main conclusions of this work are 1) if a small number of images are scored, one cannot tell apart different AUC estimation methods due to large variability in AUC estimates, regardless whether image scores are reported on a continuous or quantized scale. 2) If the number of scored images is large and image scores are reported on a continuous scale, all tested AUC estimation methods are statistically equivalent.
The variation of radiologists' performance over the course of a reading session
Markus C. Elze, Sian Taylor-Phillips, Claudia Mello-Thoms, et al.
The radiologist’s task of reviewing many cases successively is highly repetitive and requires a high level of concentration. Fatigue effects have, for example, been shown in studies comparing performance at different times of day. However, little is known about changes in performance during an individual reading session. During a session reading an enriched case set, performance may be affected by both fatigue (i.e. decreasing performance) and training (i.e. increasing performance) effects. In this paper, we reanalyze 3 datasets from 4 studies for changes in radiologist performance during a reading session. Studies feature 8-20 radiologists reading and assessing 27-60 cases in single, uninterrupted sessions. As the studies were not designed for this analysis, study setups range from bone fractures to mammograms and randomization varies between studies. Thus, they are analyzed separately using mixed-effects models. There is some indication that, as time goes on, specificity increases (shown with p<0.05 for 2 out of 3 datasets, no significant difference for the other) while sensitivity may also increase (p<0.05 for 1 out of 3 datasets). The difficulty of ‘normal’ (healthy / non-malignant) and ‘abnormal’ (unhealthy / malignant) cases differs (p<0.05 for 3 out of 3 datasets) and the reader’s experience may also be relevant (p<0.05 for 1 out of 3 datasets). These results suggest that careful planning of breaks and session length may help optimize reader performance. Note that the overall results are still inconclusive and a targeted study to investigate fatigue and training effects within a reading session is recommended.
Investigating the feasibility of using partial least squares as a method of extracting salient information for the evaluation of digital breast tomosynthesis
Digital breast tomosynthesis (DBT) has shown promise for improving the detection of breast cancer, but it has not yet been fully optimized due to a large space of system parameters to explore. A task-based statistical approach1 is a rigorous method for evaluating and optimizing this promising imaging technique with the use of optimal observers such as the Hotelling observer (HO). However, the high data dimensionality found in DBT has been the bottleneck for the use of a task-based approach in DBT evaluation. To reduce data dimensionality while extracting salient information for performing a given task, efficient channels have to be used for the HO. In the past few years, 2D Laguerre-Gauss (LG) channels, which are a complete basis for stationary backgrounds and rotationally symmetric signals, have been utilized for DBT evaluation2, 3 . But since background and signal statistics from DBT data are neither stationary nor rotationally symmetric, LG channels may not be efficient in providing reliable performance trends as a function of system parameters. Recently, partial least squares (PLS) has been shown to generate efficient channels for the Hotelling observer in detection tasks involving random backgrounds and signals.4 In this study, we investigate the use of PLS as a method for extracting salient information from DBT in order to better evaluate such systems.
Quantitative anatomical labeling of the anterior abdominal wall
Ventral hernias (VHs) are abnormal openings in the anterior abdominal wall that are common side effects of surgical intervention. Repair of VHs is the most commonly performed procedure by general surgeons worldwide, but VH repair outcomes are not particularly encouraging (with recurrence rates up to 43%). A variety of open and laparoscopic techniques are available for hernia repair, and the specific technique used is ultimately driven by surgeon preference and experience. Despite routine acquisition of computed tomography (CT) for VH patients, little quantitative information is available on which to guide selection of a particular approach and/or optimize patient-specific treatment. From anecdotal interviews, the success of VH repair procedures correlates with hernia size, location, and involvement of secondary structures. Herein, we propose an image labeling protocol to segment the anterior abdominal area to provide a geometric basis with which to derive biomarkers and evaluate treatment efficacy. Based on routine clinical CT data, we are able to identify inner and outer surfaces of the abdominal walls and the herniated volume. This is the first formal presentation of a protocol to quantify these structures on abdominal CT. The intra- and inter rater reproducibilities of this protocol are evaluated on 4 patients with suspected VH (3 patients were ultimately diagnosed with VH while 1 was not). Mean surfaces distances of less than 2mm were achieved for all structures.
Observer performance in semi-automated microbleed detection
Hugo J. Kuijf, Manon Brundel, Jeroen de Bresser, et al.
Cerebral microbleeds are small bleedings in the human brain, detectable with MRI. Microbleeds are associated with vascular disease and dementia. The number of studies involving microbleed detection is increasing rapidly. Visual rating is the current standard for detection, but is a time-consuming process, especially at high-resolution 7.0 T MR images, has limited reproducibility and is highly observer dependent. Recently, multiple techniques have been published for the semi-automated detection of microbleeds, attempting to overcome these problems. In the present study, a 7.0 T dual-echo gradient echo MR image was acquired in 18 participants with microbleeds from the SMART study. Two experienced observers identified 54 microbleeds in these participants, using a validated visual rating scale. The radial symmetry transform (RST) can be used for semi-automated detection of microbleeds in 7.0 T MR images. In the present study, the results of the RST were assessed by two observers and 47 microbleeds were identified: 35 true positives and 12 extra positives (microbleeds that were missed during visual rating). Hence, after scoring a total number of 66 microbleeds could be identified in the 18 participants. The use of the RST increased the average sensitivity of observers from 59% to 69%. More importantly, inter-observer agreement (ICC and Dice's coefficient) increased from 0.85 and 0.64 to 0.98 and 0.96, respectively. Furthermore, the required rating time was reduced from 30 to 2 minutes per participant. By fine-tuning the RST, sensitivities up to 90% can be achieved, at the cost of extra false positives.
Poster Session
icon_mobile_dropdown
An investigation of the relationship between ambient lighting and image manipulation behavior
Lee Shun Ming, Rachel J. Toomey, John T. Ryan, et al.
Purpose: This study examines the relationship between ambient lighting level and image manipulation. Method: Academic radiographers (n=10), with experience in observer performance studies, each assessed 70 postero-anterior projection radiographs of the wrist / scaphoid in both low (12.5 lux) and high (150 lux) ambient lighting. Half of the images featured one or more acute fractures and the remainder did not. Observers were encouraged to window the images to a level they felt was appropriate and, requested to rate their confidence that an acute fracture was present, marking the locations of any suspected acute fractures on the image. The images were displayed on a secondary-class monitor using Ziltron software, which recorded the adjustments to brightness and contrast made for each image. The images were presented in different orders for each lighting level to reduce potential memory effects. Results: Student’s t-tests were applied to compare the mean brightness and contrast adjustments made to the images in each ambient lighting level. Tests were carried out to include all images, only positive cases, and only cases where observers elected to change the brightness and/or contrast. No statistically significant differences were noted except when images where no brightness/contrast adjustments were made were discounted. In that case, mean brightness levels were slightly higher in the high ambient light level (p=0.049). Conclusion: No convincing difference in adjustments of brightness and contrast between high and low ambient lighting levels, although further research is warranted.
The effect of viewing distance on observer performance in skeletal radiographs
M. L. Butler, J. Lowe, R. J. Toomey, et al.
A number of different viewing distances are recommended by international agencies, however none with specific reference to radiologist performance. The purpose of this study was to ascertain the extent to which radiologists performance is affected by viewing distance on softcopy skeletal reporting. Eighty dorsi-palmar (DP) wrist radiographs, of which half feature 1 or more fractures, were viewed by seven observers at 2 viewing distances, 30cm and 70cm. Observers rated the images as normal or not on a scale of 1 to 5 and could mark multiple locations on the images when they visualised a fracture. Viewing distance was measured from the centre of the face plate to the outer canthus of the eye. The DBM MRM analysis showed no statistically significant differences between the area under the curve for the two distances (p = 0.482). The JAFROC analysis, however, demonstrated a statistically significantly higher area under the curve with the 30cm viewing distance than with the 70 cm distance (p = 0.035). This suggests that while observers were able to make decisions about whether an image contained a fracture or not equally well at both viewing distances, they may have been less reliable in terms of fracture localisation or detection of multiple fractures. The impact of viewing distance warrants further attention from both clinical and scientific perspectives.
Breast screening: understanding case difficulty and the nature of errors
In the UK all screeners undertake the PERFORMS scheme where they read annual sets of challenging cases. During this assessment, they give each case a confidence rating on whether it should be recalled. If they decide to recall a case, they also indicate the center of any key mammographic features on a display of the relevant mammographic case view. Expert radiological opinion defines what the key abnormalities (targets) are in any case. Data can then be analyzed using ROC and JAFROC approaches, and particularly for the latter, assessing whether a user has correctly located a feature or not is important. Using image pixel information alone it is possible to delineate correct localization of an abnormality from an incorrect location by defining an area of interest. To explore such location information in more detail, data from the last year of the PERFORMS scheme were reanalyzed and the location responses for each of the 675 participants on 120 screening cases examined. Additionally, expert radiological opinions had been garnered for various reasons, including accurately delineating any abnormalities. An algorithmic approach is developed which assesses whether users’ indications should be included as correct abnormality identification or not, based on the feedback location information of all participants’ indicated locations and the relative position of an indicated location to the abnormality. This approach is proposed to be superior to simple pixel distance approaches which measure a fixed distance from the centre of a target to the user’s indicated location. The approach adds to the experimenter’s repertoire of tools when examining user errors and case difficulty in medical imaging research.
Immersive virtual reality for visualization of abdominal CT
Qiufeng Lin, Zhoubing Xu, Bo Li, et al.
Immersive virtual environments use a stereoscopic head-mounted display and data glove to create high fidelity virtual experiences in which users can interact with three-dimensional models and perceive relationships at their true scale. This stands in stark contrast to traditional PACS-based infrastructure in which images are viewed as stacks of two dimensional slices, or, at best, disembodied renderings. Although there has substantial innovation in immersive virtual environments for entertainment and consumer media, these technologies have not been widely applied in clinical applications. Here, we consider potential applications of immersive virtual environments for ventral hernia patients with abdominal computed tomography imaging data. Nearly a half million ventral hernias occur in the United States each year, and hernia repair is the most commonly performed general surgery operation worldwide. A significant problem in these conditions is communicating the urgency, degree of severity, and impact of a hernia (and potential repair) on patient quality of life. Hernias are defined by ruptures in the abdominal wall (i.e., the absence of healthy tissues) rather than a growth (e.g., cancer); therefore, understanding a hernia necessitates understanding the entire abdomen. Our environment allows surgeons and patients to view body scans at scale and interact with these virtual models using a data glove. This visualization and interaction allows users to perceive the relationship between physical structures and medical imaging data. The system provides close integration of PACS-based CT data with immersive virtual environments and creates opportunities to study and optimize interfaces for patient communication, operative planning, and medical education.
Altered resting-state functional connectivity in post-traumatic stress disorder: a perfusion MRI study
Baojuan Li, Jian Liu, Yang Liu, et al.
The majority of studies on posttraumatic stress disorder (PTSD) so far have focused on delineating patterns of activations during cognitive processes. Recently, more and more researches have started to investigate functional connectivity in PTSD subjects using BOLD-fMRI. Functional connectivity analysis has been demonstrated as a powerful approach to identify biomarkers of different brain diseases. This study aimed to detect resting-state functional connectivity abnormities in patients with PTSD using arterial spin labeling (ASL) fMRI. As a completely non-invasive technique, ASL allows quantitative estimates of cerebral blood flow (CBF). Compared with BOLD-fMRI, ASL fMRI has many advantages, including less low-frequency signal drifts, superior functional localization, etc. In the current study, ASL images were collected from 10 survivors in mining disaster with recent onset PTSD and 10 survivors without PTSD. Decreased regional CBF in the right middle temporal gyrus, lingual gyrus, and postcentral gyrus was detected in the PTSD patients. Seed-based resting-state functional connectivity analysis was performed using an area in the right middle temporal gyrus as region of interest. Compared with the non-PTSD group, the PTSD subjects demonstrated increased functional connectivity between the right middle temporal gyrus and the right superior temporal gyrus, the left middle temporal gyrus. Meanwhile, decreased functional connectivity between the right middle temporal gyrus and the right postcentral gyrus, the right superior parietal lobule was also found in the PTSD patients. This is the first study which investigated resting-state functional connectivity in PTSD using ASL images. The results may provide new insight into the neural substrates of PTSD.
Does image reduction affect the diagnostic accuracy of digital mammograms?
Yumi Takane, Yusuke Kawasumi, Tsunemitsu Horie, et al.
We aimed to evaluate the influence of image reduction using a bi-cubic interpolation method on the accuracy of detection of clustered microcalcifications (MCLs) and masses on digital mammograms. Digital mammograms (n=194) of 97 subjects were selected retrospectively, comprising 47 patients with clustered MCLs or masses and 52 controls. Images were acquired in the craniocaudal view by phase-contrast mammography (PCM). Original PCM images comprised 25-μm pixels. The reduced images converted from the originals by bi-cubic interpolation were of 50-μm pixel size. Five observers independently interpreted all images, and rated their confidence concerning the presence of lesions on a continuous 0-100 scale. Receiver-operating characteristic (ROC) analysis was performed using the jackknife method and LABMRMC program. Differences in areas under the curve (AUC) values based on 95% confidence intervals were evaluated. The average AUC values for detection of masses were 0.8435 and 0.8646 for the original and reduced images, respectively. The difference between the average AUC values was not statistically significant (p=0.5855). Average AUC values for clustered MCLs detection were 0.9273 and 0.9574 for the original and reduced images, respectively. This difference was not statistically significant (p=0.1949). Detection of masses and clustered MCLs on digital mammograms was unaffected by bi-cubic interpolation image reduction.
Deviance statistics in model fit and selection in ROC studies
A general non-linear regression model-based Bayesian inference approach is used in our ROC (Receiver Operating Characteristics) study. In the sampling of posterior distribution, two prior models - continuous Gaussian and discrete categorical - are used for the scale parameter. How to judge Goodness-of-Fit (GOF) of each model and how to criticize these two models, Deviance statistics and Deviance information criterion (DIC) are adopted to address these problems. Model fit and model selection focus on the adequacy of models. Judging model adequacy is essentially measuring agreement of model and observations. Deviance statistics and DIC provide overall measures on model fit and selection. In order to investigate model fit at each category of observations, we find that the cumulative, exponential contributions from individual observations to Deviance statistics are good estimates of FPF (false positive fraction) and TPF (true positive fraction) on which the ROC curve is based. This finding further leads to a new measure for model fit, called FPF-TPF distance, which is an Euclidean distance defined on FPF-TPF space. It combines both local and global fitting. Deviance statistics and FPFTPF distance are shown to be consistent and in good agreement. Theoretical derivation and numerical simulations for this new method for model fit and model selection of ROC data analysis are included. Keywords: General non-linear regression model, Bayesian Inference, Markov Chain Monte Carlo (MCMC) method, Goodness-of-Fit (GOF), Model selection, Deviance statistics, Deviance information criterion (DIC), Continuous conjugate prior, Discrete categorical prior. ∗
Visibility of single spiculations in digital breast tomosynthesis
Pontus Timberg, Magnus Dustler, Daniel Förnvik, et al.
Purpose: To investigate the visibility of single spiculations in digital breast tomosynthesis (DBT). Method: Simulated spheres (6 mm diameter) with single spiculations were added to projection images acquired on a DBT system (MAMMOMAT Inspiration, Siemens). The spiculations had a cylindrical shape and were randomly, diagonally aligned (at four different positions: ± π/4 or ± 3π/4) at a plane parallel to the detector. They were assumed to consist of a fibroglandular tissue composition. The length of the spiculations was 5 mm while the diameter varied (0.12 – 0.28 mm). Reconstructed central slices of the lesion, separated by insertion in fatty or dense breasts (100 images in each), were used in 4-alternative forced choice (4AFC) human observer experiments. Three different reconstructions were used: filtered back projection (FBP) with 1 mm thick slices and a statistical artifact reduction reconstruction (SAR) method generating 1 and 2 mm thick slices. Five readers participated and their task was to locate the spiculation in randomly presented images from the whole image set (4 diameters × 100 images). The percent correct (PC) decision was determined in both fat and dense tissue for all spiculation diameters and reconstructions. Results: At a PC level of 95% the required diameter was about 0.17 – 0.22 mm in dense tissue, and 0.18 – 0.26 mm in fatty tissue (depending upon reconstruction). Conclusions: SAR was found to be a promising alternative to FBP. The visibility of single spiculations was determined. The required diameter depends on both tissue composition and reconstruction.
Assessment of image quality in orthopaedic radiography with digital detectors: a visual grading analysis
Robin Decoster, Harrie Mol, Renaat van den Broeck, et al.
Background: The introduction of digital detectors in the radiology predicted a dose reduction. Due to the dynamic range, radiographs of sufficient quality can be produced with a lower detector air kerma (DAK). However, this reduction was not observed. Some authors indicate a creep towards higher DAK, explained by a better appreciation of the radiographs due to a higher contrast-to-noise ratio. Methodology: To investigate the relation between the DAK and the appreciation of image quality by radiologists, 172 anterior-posterior (AP) radiographs of the knee and 152 radiographs of the pelvis were collected in 19 radiology centres. A Visual Grading Analysis (VGA) with a five-point scale was used to judge the image quality of seven different anatomic structures. The mid-point of the scale (3) was equalized to diagnostic image quality. Six experienced radiologists scored both datasets, in a controlled environment, with ViewDex®. Every observer received instructions and a training dataset. Moreover, twenty radiographs were repeated to determine intra-observer variability. Results: The intra-observer variability was not significant (p>0.05) for both datasets. The knee AP obtained a VGAS score of 3.91, the pelvis AP obtained a VGAS score of 3.71. In both cases, the inter-observer correlation was high and significant. The correlation between the VGAS and the DAK (0.41μGy – 6.18μGy) was not significant in either of the cases; neither did other analysises based on technical parameters. Conclusion: The VGA revealed an image quality higher than diagnostic necessary. Based on the DAK, an overexposure is suspected. The relation between DAK and the appreciation has to be further investigated in detail.
Model-based Bayesian inference for ROC data analysis
This paper presents a study of model-based Bayesian inference to Receiver Operating Characteristics (ROC) data. The model is a simple version of general non-linear regression model. Different from Dorfman model, it uses a probit link function with a covariate variable having zero-one two values to express binormal distributions in a single formula. Model also includes a scale parameter. Bayesian inference is implemented by Markov Chain Monte Carlo (MCMC) method carried out by Bayesian analysis Using Gibbs Sampling (BUGS). Contrast to the classical statistical theory, Bayesian approach considers model parameters as random variables characterized by prior distributions. With substantial amount of simulated samples generated by sampling algorithm, posterior distributions of parameters as well as parameters themselves can be accurately estimated. MCMC-based BUGS adopts Adaptive Rejection Sampling (ARS) protocol which requires the probability density function (pdf) which samples are drawing from be log concave with respect to the targeted parameters. Our study corrects a common misconception and proves that pdf of this regression model is log concave with respect to its scale parameter. Therefore, ARS’s requirement is satisfied and a Gaussian prior which is conjugate and possesses many analytic and computational advantages is assigned to the scale parameter. A cohort of 20 simulated data sets and 20 simulations from each data set are used in our study. Output analysis and convergence diagnostics for MCMC method are assessed by CODA package. Models and methods by using continuous Gaussian prior and discrete categorical prior are compared. Intensive simulations and performance measures are given to illustrate our practice in the framework of model-based Bayesian inference using MCMC method.
Development of digital rectangular phantoms for quality controls of medical primary monitors in RIS-PACS systems
A. Mattacchioni, M. Cristianini, A. Lo Bosco
The purpose of this paper is to project digital rectangular phantoms, Di.Recta Multipurpose Phantoms (Di.Recta MP) for quality controls of primary high resolution medical monitors. The first approach for the monitors quality evaluation is represented by AAPM tests using multipurpose TG-18- CQ phantoms. The TG18-QC patterns are available in two sizes: 1024x1024 and 2048x2048 and the use of these phantoms requires a correct monitor setup. The study demonstrates that this type of phantoms is suitable for CRT monitors with adequate settings procedures. In the second step LCD monitors are analysed. Different types of primary monitors are included in a range between 2 and 5 Mp. The difference between the resolution of monitors and phantoms does not allow a complete analysis of the entire system, just moving phantoms in different positions. Of course, the analysis of images in the peripheral regions of medical monitors can not be neglected, especially because of the possible legal implications. A simpler analysis of these areas can be done through the use of rectangular phantoms in place of square ones. Furthermore, because of different technology, also different analysis patches are necessary for these types of monitors. Therefore, there are proposed digital rectangular phantoms, Di.Recta MP, compatible with the spatial resolution of most of commercial monitors. These phantoms are designed to simulate typical radiological conditions to determine the presence of significant defects using appropriate patches such as luminance, contrast, noise patterns. Finally a preliminary study of dedicate Di.Recta MP are proposed for LED monitors.
Prediction of near-term breast cancer risk using a Bayesian belief network
Bin Zheng, Pandiyarajan Ramalingam, Harishwaran Hariharan, et al.
Accurately predicting near-term breast cancer risk is an important prerequisite for establishing an optimal personalized breast cancer screening paradigm. In previous studies, we investigated and tested the feasibility of developing a unique near-term breast cancer risk prediction model based on a new risk factor associated with bilateral mammographic density asymmetry between the left and right breasts of a woman using a single feature. In this study we developed a multi-feature based Bayesian belief network (BBN) that combines bilateral mammographic density asymmetry with three other popular risk factors, namely (1) age, (2) family history, and (3) average breast density, to further increase the discriminatory power of our cancer risk model. A dataset involving “prior” negative mammography examinations of 348 women was used in the study. Among these women, 174 had breast cancer detected and verified in the next sequential screening examinations, and 174 remained negative (cancer-free). A BBN was applied to predict the risk of each woman having cancer detected six to 18 months later following the negative screening mammography. The prediction results were compared with those using single features. The prediction accuracy was significantly increased when using the BBN. The area under the ROC curve increased from an AUC=0.70 to 0.84 (p<0.01), while the positive predictive value (PPV) and negative predictive value (NPV) also increased from a PPV=0.61 to 0.78 and an NPV=0.65 to 0.75, respectively. This study demonstrates that a multi-feature based BBN can more accurately predict the near-term breast cancer risk than with a single feature.
A new assessment method for image fusion quality
Image fusion quality assessment plays a critically important role in the field of medical imaging. To evaluate image fusion quality effectively, a lot of assessment methods have been proposed. Examples include mutual information (MI), root mean square error (RMSE), and universal image quality index (UIQI). These image fusion assessment methods could not reflect the human visual inspection effectively. To address this problem, we have proposed a novel image fusion assessment method which combines the nonsubsampled contourlet transform (NSCT) with the regional mutual information in this paper. In this proposed method, the source medical images are firstly decomposed into different levels by the NSCT. Then the maximum NSCT coefficients of the decomposed directional images at each level are obtained to compute the regional mutual information (RMI). Finally, multi-channel RMI is computed by the weighted sum of the obtained RMI values at the various levels of NSCT. The advantage of the proposed method lies in the fact that the NSCT can represent image information using multidirections and multi-scales and therefore it conforms to the multi-channel characteristic of human visual system, leading to its outstanding image assessment performance. The experimental results using CT and MRI images demonstrate that the proposed assessment method outperforms such assessment methods as MI and UIQI based measure in evaluating image fusion quality and it can provide consistent results with human visual assessment.
Application of a computed tomography based cystic fibrosis scoring system to chest tomosynthesis
Christina Söderman, Åse Johnsson, Jenny Vikgren, et al.
In the monitoring of progression of lung disease in patients with cystic fibrosis (CF), recurrent computed tomography (CT) examinations are often used. The relatively new imaging technique chest tomosynthesis (CTS) may be an interesting alternative in the follow-up of these patients due to its visualization of the chest in slices at radiation doses and costs significantly lower than is the case with CT. A first step towards introducing CTS imaging in the diagnostics of CF patients is to establish a scoring system appropriate for evaluating the severity of CF pulmonary disease based on findings in CTS images. Previously, several such CF scoring systems based on CT imaging have been published. The purpose of the present study was to develop a CF scoring system for CTS, by starting from an existing scoring system dedicated for CT images and making modifications regarded necessary to make it appropriate for use with CTS images. In order to determine any necessary changes, three thoracic radiologists independently used a scoring system dedicated for CT on both CT and CTS images from CF patients. The results of the scoring were jointly evaluated by all the observers, which lead to suggestions for changes to the scoring system. Suggested modifications include excluding the scoring of air trapping and doing the scoring of the findings in quadrants of the image instead of in each lung lobe.
An initial investigation of radiologist eye movements in vascular imaging
R. J. Toomey, S. Hodgins, M. E. Evanoff, et al.
Eye tracking has been used by many researchers to try to shed light on the perceptual processes involved in medical image perception. Despite a large volume of data having been published regarding radiologist viewing patterns for static images, and more recently for stacked imaging modalities, little has been produced concerning angiographic images, which commonly have substantially different characteristics. A study was performed in which 7 expert radiologists viewed a range of digital subtraction angiograms of the peripheral vascular system. Initial results are presented. The observers were free to control the rate at which they viewed the images. Eye position data was recorded for each participant using Tobii TX300 eyetrackers. Analysis was performed in Tobii Studio software and included qualitative analysis of gaze pattern and analysis of metrics including first and total fixation duration etc. for areas of clinical interest. Early results indicate that experts briefly fixate on lesions but do not dwell in the area, rather continuing to inspect the more distal vascular segments before returning. Some individual variation was noted. Further research is required and ongoing.
The value of the craniocaudal mammographic view in breast cancer detection: a preliminary study
Phuong Dung Trieu, Patrick C. Brennan, Warwick Lee, et al.
Our research aims to assess the value of the single cranio-caudal view mammogram in the detection of breast cancer. 129 radiologists were asked to report 60 two-view mammograms of the left and right breasts and 55 radiologists assessed a set of 55 single cranio-caudal views. Participants were asked to search for the presence of any breast lesions and provide confidence scores for their decisions. Results showed that two-view mammograms were more effective in detecting malignant nodules than single cranio-caudal view in terms of sensitivity, localized-sensitivity, ROC and JAFROC. The single cranio-caudal view had a higher specificity as compared to two-view mammography.
Assessment of methods to extract the mid-sagittal plane from brain MR images
Hugo J. Kuijf, Alexander Leemans, Max A. Viergever, et al.
Automatic detection of the mid-sagittal plane, separating both hemispheres of the brain, is useful in various applications. Several methods have been developed in the past years, applying different techniques to estimate the position of the mid-sagittal plane. These methods can be classified into three distinct classes: feature-based, global symmetry based, and local symmetry based methods. Feature-based methods use the shape or intensity of the interhemispheric fissure to extract the mid-sagittal plane. Global symmetry based methods reflect the entire image with respect to the sagittal axes and perform a rigid registration. Local symmetry based methods try to optimize a symmetry-measure in a small band covering the interhemispheric fissure. From each class, one leading method has been implemented. The methods have been evaluated on the same datasets to allow a fair comparison. Manual delineations were made by two experienced human observers. The results show that the examined methods perform similar to human observers. No significant differences were found between errors (defined as the angle and volume between planes) made by the methods and the inter-observer differences. Feature-based and local symmetry based methods have a low computation time of 1.8 and 0.5 seconds, respectively. The global symmetry based method has a higher computation time of 33.6 seconds, caused by the full 3D rigid registration. The largest errors, both by the methods and observers, are made in participants with cerebral atrophy. These participants have a widened interhemispheric fissure, allowing many plane orientations and positions to result in a valid division of the hemispheres.
A study of the feasibility of using slabbing to reduce tomosynthesis review time
Magnus Dustler, Martin Andersson, Daniel Förnvik, et al.
This study aimed to investigate whether decreasing the amount of slices in breast tomosynthesis (BT) image volumes reduce reading time. BT slices were combined into so-called slabs, by reconstructing thin slices and merging them into thicker slabs. Sets of slabs where created from 35 clinical BT volumes with malignant or benignant findings and from 50 BT volumes drawn from screening sets (without any prior review). The image sets were reviewed in two separate sessions while the review time was recorded. A total of five experienced radiologists were employed for the image review. Additionally a VGA study was performed to compare slabbed images with the originals in order to ensure that the image quality was not significantly degraded. One set of 27 pathological cases (13 masses and 14 microcalcification clusters) and one of 22 subtle lesions that had been missed on digital mammography but detected on BT were presented to an experienced radiologist and 2 medical physicists who rated the quality of the slabbed versions relative to the originals. The study could find no significant degradation in image quality when using 2 mm slabs instead of 1 mm slices. There was no significant decrease in reading time on clinical cases (P = .133), but on screening images there was a significant decrease of 7.7 ± 9.6 s from an average level of 32.2 ± 14.5 s (P < .0001). This suggests that increasing slab thickness can reduce the time radiologists spend studying normal images by 20%.
Comparing the Microsoft Kinect to a traditional mouse for adjusting the viewed tissue densities of three-dimensional anatomical structures
Bethany Juhnke, Monica Berron, Adriana Philip, et al.
Advancements in medical image visualization in recent years have enabled three-dimensional (3D) medical images to be volume-rendered from magnetic resonance imaging (MRI) and computed tomography (CT) scans. Medical data is crucial for patient diagnosis and medical education, and analyzing these three-dimensional models rather than two-dimensional (2D) slices would enable more efficient analysis by surgeons and physicians, especially non-radiologists. An interaction device that is intuitive, robust, and easily learned is necessary to integrate 3D modeling software into the medical community. The keyboard and mouse configuration does not readily manipulate 3D models because these traditional interface devices function within two degrees of freedom, not the six degrees of freedom presented in three dimensions. Using a familiar, commercial-off-the-shelf (COTS) device for interaction would minimize training time and enable maximum usability with 3D medical images. Multiple techniques are available to manipulate 3D medical images and provide doctors more innovative ways of visualizing patient data. One such example is windowing. Windowing is used to adjust the viewed tissue density of digital medical data. A software platform available at the Virtual Reality Applications Center (VRAC), named Isis, was used to visualize and interact with the 3D representations of medical data. In this paper, we present the methodology and results of a user study that examined the usability of windowing 3D medical imaging using a Kinect™ device compared to a traditional mouse.
Potential method for relieving fatigue in radiologists
Radiologists moved to an environment in which they read digital images off of computer displays instead of film images off of light boxes. With film and light boxes they typically would read for about an hour (seated) and then take a break while the film librarian changed the images on the view box. With digital viewing off computers they tend to sit all day in front of the computers with far fewer breaks and opportunity to stand up and move around than with film. There is concern that this lack of activity not only contributes to back, shoulder and neck discomfort, but also contributes to increased fatigue which has been shown to impact reader performance. The goal of this study was to determine if there are differences in physiologic vital signs (heart rate and blood pressure) of radiologists as a function of whether they read images while seated or while standing up. Five subjects had their blood pressure and heart rate measured while seated and while standing reading cases in the normal clinical setting. For all three measures there was a statistically significant (p < 0.0001) difference between seated and standing measures with seated being lower than standing. The higher heart rate and blood pressure with standing suggests that the radiologists are more active in this position and thus potentially more attentive than while seated. This study will be followed up to determine impact on diagnostic performance of standing vs seated as well as subjective ratings of wakefulness and mood.
Availability of color calibration for consistent color display in medical images and optimization of reference brightness for clinical use
Daiki Iwai, Haruka Suganami, Minoru Hosoba, et al.
Color image consistency has not been accomplished yet except the Digital Imaging and Communication in Medicine (DICOM) Supplement 100 for implementing a color reproduction pipeline and device independent color spaces. Thus, most healthcare enterprises could not check monitor degradation routinely. To ensure color consistency in medical color imaging, monitor color calibration should be introduced. Using simple color calibration device . chromaticity of colors including typical color (Red, Green, Blue, Green and White) are measured as device independent profile connection space value called u’v’ before and after calibration. In addition, clinical color images are displayed and visual differences are observed. In color calibration, monitor brightness level has to be set to quite lower value 80 cd/m2 according to sRGB standard. As Maximum brightness of most color monitors available currently for medical use have much higher brightness than 80 cd/m2, it is not seemed to be appropriate to use 80 cd/m2 level for calibration. Therefore, we propose that new brightness standard should be introduced while maintaining the color representation in clinical use. To evaluate effects of brightness to chromaticity experimentally, brightness level is changed in two monitors from 80 to 270cd/m2 and chromaticity value are compared with each brightness levels. As a result, there are no significant differences in chromaticity diagram when brightness levels are changed. In conclusion, chromaticity is close to theoretical value after color calibration. Moreover, chromaticity isn’t moved when brightness is changed. The results indicate optimized reference brightness level for clinical use could be set at high brightness in current monitors .
Analysis of detectability loss through fan-beam x-ray computed tomography reconstruction
Adrian A. Sanchez, Emil Y. Sidky, Xiaochuan Pan
We consider detection of a small signal in fan-beam x-ray computed tomography (CT). In order to characterize the loss of intrinsic signal detectability from the projection data (sinogram) domain to the reconstructed image, we analyze the Hotelling observer SNR in each domain. Further, we characterize the loss of Hotelling observer SNR through decomposition into two components: loss of signal detectability which arises due to unequal variance in the noise of separate detector elements and loss of detectability arising from the fact that some noiseless signals have components which lie in the nullspace of a given reconstruction operator. The proposed methodology is investigated for the back-projection ltration (BPF) algorithm developed by our group [2].
Study of the radiation dose reduction capability of a CT reconstruction algorithm: LCD performance assessment using mathematical model observers
Radiation dose on patient has become a major concern today for Computed Tomography (CT) imaging in clinical practice. Various hardware and algorithm solutions have been designed to reduce dose. Among them, iterative reconstruction (IR) has been widely expected to be an effective dose reduction approach for CT. However, there is no clear understanding on the exact amount of dose saving an IR approach can offer for various clinical applications. We know that quantitative image quality assessment should be task-based. This work applied mathematical model observers to study detectability performance of CT scan data reconstructed using an advanced IR approach as well as the conventional filtered back-projection (FBP) approach. The purpose of this work is to establish a practical and robust approach for CT IR detectability image quality evaluation and to assess the dose saving capability of the IR method under study. Low contrast (LC) objects imbedded in head size and body size phantoms were imaged multiple times with different dose levels. Independent signal present and absent pairs were generated for model observer study training and testing. Receiver Operating Characteristic (ROC) curves for location known exact and location ROC (LROC) curves for location unknown as well as their corresponding the area under the curve (AUC) values were calculated. Results showed approximately 3 times dose reduction has been achieved using the IR method under study.
Cardiovascular CTA applications: patient-specific contrast formulae
C. Saade, R. Bourne, M. Wilkinson, et al.
Clear visualisation of the vertebral arteries is of substantial clinical importance, yet optimisation of contrast administration has not been developed in tandem with recent technological developments in computed tomography (CT). The current work involving 202 patients’ compares the value of a tailored contrast regiment based on patient dynamics and a craniocaudal scan acquisition, with the routine contrast protocol with a caudocranial scan. Attenuation characteristics within 20 arteries were calculated and diagnostic efficacy measured using DBM receiver operating characteristic (ROC) methods. The results demonstrated that the tailored regimen resulted in significantly higher attenuation values (p<0.01) and ROC Az values (p=0.002), along with better inter-observer agreement compared with the routine protocol and contrast volume was reduced by almost 50%. The data demonstrate that patient-specific strategies can result in significant diagnostic benefit.
A novel phantom system facilitating better descriptors of density within mammographic images
Yanpeng Li, Patrick C. Brennan, Carolyn Nickson, et al.
High mammographic density is a risk factor for breast cancer. As it is impossible to measure actual weight or volume of fibroglandular tissue evident within a mammogram, it is hard to know the correlation between measured mammographic density and the actual fibroglandular tissue volume. The aim of this study is to develop a phantom that represents glandular tissue within an adipose tissue structure so that correlations between image feature descriptors and the synthesised glandular structure can be accurately quantified. In this phantom study, ten different weights of fine steel wool were put into gelatine to simulate breast structure. Image feature descriptors are investigated for both the whole phantom image and the simulated density. Descriptors included actual area and percentage area of density, mean pixel intensity for the whole image and dense area, standard deviation of mean intensity, and integrated pixel density which is the production of area and mean intensity. The results show high level correlation between steel-wool weight and percentage density measured on images (r = 0.8421), and the integrated pixel density of dense area (r = 0.8760). The correlation is significant for mean intensity standard deviation for the whole phantom (r = 0.8043). This phantom study may help identify more accurate descriptors of mammographic density, thus facilitating better assessments of fibroglandular tissue appearances.
Physical evaluation of color and monochrome medical displays using an imaging colorimeter
Hans Roehrig, Xiliang Gu, Jiahua Fan
This paper presents an approach to physical evaluation of color and monochrome medical grade displays using an imaging colorimeter. The purpose of this study was to examine the influence of medical display types, monochrome or color at the same maximum luminance settings, on diagnostic performance. The focus was on the measurements of physical characteristics including spatial resolution and noise performance, which we believed could affect the clinical performance. Specifically, Modulation Transfer Function (MTF) and Noise Power Spectrum (NPS) were evaluated and compared at different digital driving levels (DDL) between two EIZO displays.