Proceedings Volume 10136

Medical Imaging 2017: Image Perception, Observer Performance, and Technology Assessment

cover
Proceedings Volume 10136

Medical Imaging 2017: Image Perception, Observer Performance, and Technology Assessment

Purchase the printed version of this volume at proceedings.com or access the digital version at SPIE Digital Library.

Volume Details

Date Published: 31 May 2017
Contents: 13 Sessions, 60 Papers, 27 Presentations
Conference: SPIE Medical Imaging 2017
Volume Number: 10136

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 10136
  • Keynote & Perception in Breast Imaging
  • Radiologists' Performance
  • Observer Performance in Mammography
  • Technology Assessment - Methodology
  • Model Observers
  • Technology Assessment - Applications
  • Joint Session with 10132 and 10136: Task-based Assessment in CT
  • Posters: Model Observers
  • Posters: Image Perception
  • Posters: Technology Assessment - General
  • Posters: Technology Assessment in Volume Imaging
  • Posters: Image Perception and Technology Assessment in Breast Imaging
Front Matter: Volume 10136
icon_mobile_dropdown
Front Matter: Volume 10136
This PDF file contains the front matter associated with SPIE Proceedings Volume 10136, including the Title Page, Copyright information, Table of Contents, Introduction (if any), and Conference Committee listing.
Keynote & Perception in Breast Imaging
icon_mobile_dropdown
The effect of prevalence of disease on performance of residents and fellows during training for interpreting DBT in a test-train-test observer study
Christiane M. Hakim, Lauren Chang Sen, Andrew Degnan, et al.
There are no data on the effect of disease prevalence during training for interpreting digital breast tomosynthesis (DBT) based screening examinations on the performance of residents and fellows. We assessed the performance of six residents (four after one breast imaging rotation and two after two rotations) and two fellows in breast imaging when interpreting DBT screening examinations in a multi-case, mode balanced, test- train-test retrospective reader study (127 training and 160 testing cases). Half were trained with feedback of verified truth after reviewing each case with low prevalence of disease (13/127) and half with high prevalence (52/128). The pre- and post- training dataset was the same. Performance measures were compared (sensitivity, specificity and AUC). Readers trained with the low prevalence set decreased the overall recall rate of non-cancer cases (FPF from 0.21 to 0.13, <0.001), and of cases with known malignancies (TPF from 0.70 to 0.61, p=0.004, due primarily to one clearly outlier reader). Readers trained with the high prevalence increased the overall recall rate (albeit, not statistically significant) of non-cancer cases (FPF from 0.16 to 0.18, p=0.07), and a borderline significant increase of cancer cases (TPF from 0.61 to 0.66, p=0.04). Fellows post six months of specialty training in each group had no significant changes in sensitivity, specificity, or AUC after training (smallest p>0.07). Both residents with two rotations experience had significant changes in sensitivity and specificity (highest p<0.028), but not in AUC. Early training with low disease prevalence of “what not to recall” should be included during training.
The implementation of an AR (augmented reality) approach to support mammographic interpretation training: an initial feasibility study
Appropriate feedback plays an important role in optimising mammographic interpretation training whilst also ensuring good interpretation performance. The traditional keyboard, mouse and workstation technical approach has a critical limitation in providing supplementary image-related information and providing complex feedback in real time. Augmented Reality (AR) provides a possible superior approach in this situation, as feedback can be provided directly overlaying the displayed mammographic images so making a generic approach which can also be vendor neutral. In this study, radiological feedback was dynamically remapped virtually into the real world, using perspective transformation, in order to provide a richer user experience in mammographic interpretation training. This is an initial attempt of an AR approach to dynamically superimpose pre-defined feedback information of a DICOM image on top of a radiologist’s view, whilst the radiologist is examining images on a clinical workstation. The study demonstrates the feasibility of the approach, although there are limitations on interactive operations which are due to the hardware used. The results of this fully functional approach provide appropriate feedback/image correspondence in a simulated mammographic interpretation environment. Thus, it is argued that employing AR is a feasible way to provide rich feedback in the delivery of mammographic interpretation training.
Radiologists' Performance
icon_mobile_dropdown
Does fatigue have any impact on satisfaction of search?
Elizabeth A. Krupinski, Kevin Schartz, Robert Caldwell, et al.
Previous studies have demonstrated that fatigue impacts diagnostic accuracy, especially for those in training. We continued this line of investigation to determine if fatigue has any impact on a common source of errors – satisfaction of search (SOS). SOS requires subjects to participate in 2 sessions (SOS and non-SOS) and so does fatigue (fatigued and not fatigued) so we ran subjets in only the fatigued condition and used a previous non-fatigued study as the comparison. We used 64 chest computed radiographs half demonstrating various ‘‘test’’ abnormalities were read twice by 20 radiologists, once with and once without the addition of a simulated pulmonary nodule. Receiver-operating characteristic detection accuracy and decision thresholds were analyzed to study the effects of adding the nodule on detecting the test abnormalities. Adding nodules did not influence detection accuracy (ROC AUC SOS = 0.667; non-SOS = 0.679), but did induce a reluctance to report them. Adding nodules did not affect inspection time so the reluctance to report was not associated with reduced search. Fatigue did not appear to exacerbate the SOS effect. A second study with fractures revealed the same shift in performance but did reduce viewing times when fatigued. The results of these two studies suggest that the impact of fatigue on SOS is more complicated than expected and thus may require more investigation to fully understand its impact in the clinic.
A model based on temporal dynamics of fixations for distinguishing expert radiologists' scanpaths
Ziba Gandomkar, Kevin Tay, Patrick C. Brennan, et al.
This study investigated a model which distinguishes expert radiologists from less experienced radiologists based on features describing spatio-temporal dynamics of their eye movement during interpretation of digital mammograms. Eye movements of four expert and four less experienced radiologists were recorded during interpretation of 120 two-view digital mammograms of which 59 had biopsy proven cancers. For each scanpath, a two-dimensional recurrence plot, which represents the radiologist’s refixation pattern, was generated. From each plot, six features indicating the spatio-temporal dynamics of fixations were extracted. The first feature measured the percentage of recurrent fixations; the second indicated the percentage of recurrent fixations which was fixated later in several consecutive fixations; the third measured the percentage of recurrent fixations that form a repeated sequence of fixations and the fourth assessed whether the recurrent fixations were occurring sequentially close together. The number of switches between the two mammographic views was also measured, as was the average number of consecutive fixations in each view before switching. These six features along with total time on case and average fixation duration were fed into a support vector machine whose performance was evaluated using 10-fold cross validation. The model achieved a sensitivity of 86.3% and a specificity of 85.2% for distinguishing experts’ scanpaths. The obtained result suggests that spatio-temporal dynamics of eye movements can characterize expertise level and has potential applications for monitoring the development of expertise among radiologists as a result of different training regimes and continuing education schemes.
On the impact of local image texture parameters on search and localization in digital breast imaging
William H. Nisbett, Amareswararao Kavuri, Nathaniel R. Fredette, et al.
Understanding factors that influence search and localization of signals in tomographic breast imaging can allow for the development of efficient system design and image displays. Several acquisition, reconstruction and display parameters are known to influence signal (mass or microcalcification) detection. In this abstract we examine variation in relevant image texture features with respect to digital breast tomosynthesis (DBT) acquisition parameters. We shall relate the impact of these changes in detection via correlations against results obtained from human observer localization ROC (LROC) studies. Our methods included calculation and analysis of these texture features at randomly sampled ROIs in select image sets.
Visual assessment of breast density using Visual Analogue Scales: observer variability, reader attributes and reading time
Teri Ang, Elaine F. Harkness, Anthony J. Maxwell, et al.
Breast density is a strong risk factor for breast cancer and has potential use in breast cancer risk prediction, with subjective methods of density assessment providing a strong relationship with the development of breast cancer. This study aims to assess intra- and inter-observer variability in visual density assessment recorded on Visual Analogue Scales (VAS) among trained readers, and examine whether reader age, gender and experience are associated with assessed density. Eleven readers estimated the breast density of 120 mammograms on two occasions 3 years apart using VAS. Intra- and inter-observer agreement was assessed with Intraclass Correlation Coefficient (ICC) and variation between readers visualised on Bland-Altman plots. The mean scores of all mammograms per reader were used to analyse the effect of reader attributes on assessed density. Excellent intra-observer agreement (ICC>0.80) was found in the majority of the readers. All but one reader had a mean difference of <10 percentage points from the first to the second reading. Inter-observer agreement was excellent for consistency (ICC 0.82) and substantial for absolute agreement (ICC 0.69). However, the 95% limits of agreement for pairwise differences were -6.8 to 15.7 at the narrowest and 0.8 to 62.3 at the widest. No significant association was found between assessed density and reader age, experience or gender, or with reading time. Overall, the readers were consistent in their scores, although some large variations were observed. Reader evaluation and targeted training may alleviate this problem.
Development of discordant likelihood metrics for use in evaluation of 'double reviewer' performance in blinded independent central review (BICR) RECIST imaging studies
Purpose: Perform a retrospective review of a number studies (n=20) for the purpose of proposing basic likelihood metrics for evaluation of discordance between two reviewers performing RECIST (Response Evaluation Criteria in Solid Tumors) assessments in a Blinded Independent Central Review (BICR)

Methods: Retrospective data analysis using R programming scripts to determine discordance subsets of interest and analyze these datasets for both time point discordance and case discordance.

Results: We present a basic time point discordant ratio and a cases discordance ratio based on a range of aggregated time points per case for RECIST datasets.

Conclusions: We propose basic ratios that that might be useful to improve reviewer performance monitoring models
Impact of different study populations on reader behavior and performance metrics: initial results
Brandon D. Gallas, Etta Pisano, Elodia Cole, et al.
The FDA recently completed a study on design methodologies surrounding the Validation of Imaging Premarket Evaluation and Regulation called VIPER. VIPER consisted of five large reader sub-studies to compare the impact of different study populations on reader behavior as seen by sensitivity, specificity, and AUC, the area under the ROC curve (receiver operating characteristic curve). The study investigated different prevalence levels and two kinds of sampling of non-cancer patients: a screening population and a challenge population. The VIPER study compared full-field digital mammography (FFDM) to screenfilm mammography (SFM) for women with heterogeneously dense or extremely dense breasts. All cases and corresponding images were sampled from Digital Mammographic Imaging Screening Trial (DMIST) archives. There were 20 readers (American Board Certified radiologists) for each sub-study, and instead of every reader reading every case (fully-crossed study), readers and cases were split into groups to reduce reader workload and the total number of observations (split-plot study). For data collection, readers first decided whether or not they would recall a patient. Following that decision, they provided an ROC score for how close or far that patient was from the recall decision threshold. Performance results for FFDM show that as prevalence increases to 50%, there is a moderate increase in sensitivity and decrease in specificity, whereas AUC is mainly flat. Regarding precision, the statistical efficiency (ratio of variances) of sensitivity and specificity relative to AUC are 0.66 at best and decrease with prevalence. Analyses comparing modalities and the study populations (screening vs. challenge) are still ongoing.
Observer Performance in Mammography
icon_mobile_dropdown
Does time of day influence cancer detection and recall rates in mammography?
Chris Stinton, David Jenkinson, Victor Adekanmbi, et al.
Background: The interpretation of screening mammograms is influenced by factors such as reader experience and their annual interpretative volume. There is some evidence that time of day can also have an effect, with better diagnostic accuracy for readings conducted early in the day. This is not a consistent finding, however. The aim of our study is to provide further evidence on whether there is an effect of time of day on recall- and breast cancer detection rates. Method: We analysed breast screening data from 222,577 women from the Midlands of England. Data were split into three eight hour periods: 0900-1700, 1700-0100, 0100-0900. Differences in recall- and cancer detection rates were analysed using multilevel logistic regression models. Results: Recall rates were lowest for mammograms read between the 1700-0100 time period. Cancer detection rates were lowest during the 0100-0900 time period. Conclusions: Our findings suggest that there are fluctuations in recall- and cancer detection rates over the course of the day.
Changing behavior and accuracy with time on task in mammography screening
Sian Taylor-Phillips, David Jenkinson, Chris Stinton, et al.
Background: The vigilance decrement and prevalence effect both describe changes to speed and accuracy with time on task. Whilst there is much laboratory based research on these effects, little is known about whether they occur in real world mammography practice. Methods: The Changing Case Order to Optimise Patterns of Performance in Screening (CO-OPS) trial randomised 37,724 batches containing 1.2 million women attending breast screening to intervention or control (222,208 from the Midlands of England). In the control arm the batch was examined in the same order by both readers, in the intervention arm it was examined in a different order by both readers. Time taken, recall decision by both readers, and cancers detected were recorded for each case, and used to examine patterns of performance with time on task. Results: 49,575 women were recalled and 10,484 had cancer detected. Median time taken to examine each case was 35 seconds (out of cases where time taken was 10 minutes or less). The intervention did not affect overall cancer detection rates or recall rates. A more detailed analysis of the Midlands data indicates cancer detection rate did not change when reading up to 60 cases in a batch, but recall rate reduced. Time taken per case reduced with time on task, from a median 41 seconds when examining the second case in the batch to 28.5 seconds examining the 60th case. Conclusion: Reader behavior and performance systematically changes with time on task in breast screening.
How quickly do breast screeners learn their skills?
Hossein Nevisi, Leng Dong, Yan Chen, et al.
The UK’s Breast Screening Programme is 27 years old and many experienced breast radiologists are now retiring, coupled with an influx of new screening personnel. It is important to the ongoing Programme that new mammography readers are quickly up to the skill level of experienced readers. This raises the question of how quickly the necessary cancer detection skills are learnt. All breast screening radiologists in the UK read educational training sets of challenging FFDM images (the PERFORMS® scheme) yearly to maintain and improve their performance in real life screening. Data were examined from the PERFORMS® annual scheme for 54 new screeners, 55 screeners who have been screening for one year and also for more experienced screeners (597 screeners). Not surprisingly, significant differences in cancer detection rate were found between new readers and both of the other groups. Additionally, the performance of 48 new readers who have now been screening for about a year and have taken part twice in the PERFORMS® scheme were further examined where again a significant difference in cancer detection was found. These data imply that cancer detection skills are learnt quickly in the first year of screening. Information was also examined concerning the volume of cases participants read and other factors.
The role of extra-foveal processing in 3D imaging
The field of medical image quality has relied on the assumption that metrics of image quality for simple visual detection tasks are a reliable proxy for the more clinically realistic visual search tasks. Rank order of signal detectability across conditions often generalizes from detection to search tasks. Here, we argue that search in 3D images represents a paradigm shift in medical imaging: radiologists typically cannot exhaustively scrutinize all regions of interest with the high acuity fovea requiring detection of signals with extra-foveal areas (visual periphery) of the human retina. We hypothesize that extra-foveal processing can alter the detectability of certain types of signals in medical images with important implications for search in 3D medical images. We compare visual search of two different types of signals in 2D vs. 3D images. We show that a small microcalcification-like signal is more highly detectable than a larger mass-like signal in 2D search, but its detectability largely decreases (relative to the larger signal) in the 3D search task. Utilizing measurements of observer detectability as a function retinal eccentricity and observer eye fixations we can predict the pattern of results in the 2D and 3D search studies. Our findings: 1) suggest that observer performance findings with 2D search might not always generalize to 3D search; 2) motivate the development of a new family of model observers that take into account the inhomogeneous visual processing across the retina (foveated model observers).
Applying a social network analysis (SNA) approach to understanding radiologists' performance in reading mammograms
Seyedamir Tavakoli Taba, Liaquat Hossain, Robert Heard, et al.
Rationale and objectives: Observer performance has been widely studied through examining the characteristics of individuals. Applying a systems perspective, while understanding of the system’s output, requires a study of the interactions between observers. This research explains a mixed methods approach to applying a social network analysis (SNA), together with a more traditional approach of examining personal/ individual characteristics in understanding observer performance in mammography. Materials and Methods: Using social networks theories and measures in order to understand observer performance, we designed a social networks survey instrument for collecting personal and network data about observers involved in mammography performance studies. We present the results of a study by our group where 31 Australian breast radiologists originally reviewed 60 mammographic cases (comprising of 20 abnormal and 40 normal cases) and then completed an online questionnaire about their social networks and personal characteristics. A jackknife free response operating characteristic (JAFROC) method was used to measure performance of radiologists. JAFROC was tested against various personal and network measures to verify the theoretical model. Results: The results from this study suggest a strong association between social networks and observer performance for Australian radiologists. Network factors accounted for 48% of variance in observer performance, in comparison to 15.5% for the personal characteristics for this study group. Conclusion: This study suggest a strong new direction for research into improving observer performance. Future studies in observer performance should consider social networks’ influence as part of their research paradigm, with equal or greater vigour than traditional constructs of personal characteristics.
Technology Assessment - Methodology
icon_mobile_dropdown
Comparison of two classifiers when the data sets are imbalanced: the power of the area under the precision-recall curve as the figure of merit versus the area under the ROC curve
Berkman Sahiner, Weijie Chen, Aria Pezeshk, et al.
In many two-class problems in automated classification and information retrieval, the classes are imbalanced, and the separation between the positive and negative classes is large. The precision-recall (PR) curve has been suggested as an alternative to the receiver operating characteristic (ROC) curve to characterize the performance of automated systems when the classes are imbalanced, and the area under the precision-recall curve (AUCPR) has been suggested as an alternative performance measure to the area under the ROC curve (AUCROC). AUCPR and AUCROC are distinct measures of performance, even though the relationship between the precision-recall and ROC curves is well-known. In this study, we compared the statistical power of the AUCPR to that of the AUCROC. Our results indicate that the AUCPR can offer a small statistical advantage when the prevalence is low and the separation between the positive and negative classes is large. When the data set is more balanced or the separation between the classes is low or moderate, AUCROC has slightly higher power.
Re-use of pilot data and interim analysis of pivotal data in MRMC studies: a simulation study
Novel medical imaging devices are often evaluated with multi-reader multi-case (MRMC) studies in which radiologists read images of patient cases for a specified clinical task (e.g., cancer detection). A pilot study is often used to measure the effect size and variance parameters that are necessary for sizing a pivotal study (including sizing readers, non-diseased and diseased cases). Due to the practical difficulty of collecting patient cases or recruiting clinical readers, some investigators attempt to include the pilot data as part of their pivotal study. In other situations, some investigators attempt to perform an interim analysis of their pivotal study data based upon which the sample sizes may be re-estimated. Re-use of the pilot data or interim analyses of the pivotal data may inflate the type I error of the pivotal study. In this work, we use the Roe and Metz model to simulate MRMC data under the null hypothesis (i.e., two devices have equal diagnostic performance) and investigate the type I error rate for several practical designs involving re-use of pilot data or interim analysis of pivotal data. Our preliminary simulation results indicate that, under the simulation conditions we investigated, the inflation of type I error is none or only marginal for some design strategies (e.g., re-use of patient data without re-using readers, and size re-estimation without using the effect-size estimated in the interim analysis). Upon further verifications, these are potentially useful design methods in that they may help make a study less burdensome and have a better chance to succeed without substantial loss of the statistical rigor.
Exploring the potential of analysing visual search behaviour data using FROC (free-response receiver operating characteristic) method: an initial study
Leng Dong, Yan Chen, Sarah Dias, et al.
Visual search techniques and FROC analysis have been widely used in radiology to understand medical image perceptual behaviour and diagnostic performance. The potential of exploiting the advantages of both methodologies is of great interest to medical researchers. In this study, eye tracking data of eight dental practitioners was investigated. The visual search measures and their analyses are considered here. Each participant interpreted 20 dental radiographs which were chosen by an expert dental radiologist. Various eye movement measurements were obtained based on image area of interest (AOI) information. FROC analysis was then carried out by using these eye movement measurements as a direct input source. The performance of FROC methods using different input parameters was tested. The results showed that there were significant differences in FROC measures, based on eye movement data, between groups with different experience levels. Namely, the area under the curve (AUC) score evidenced higher values for experienced group for the measurements of fixation and dwell time. Also, positive correlations were found for AUC scores between the eye movement data conducted FROC and rating based FROC. FROC analysis using eye movement measurements as input variables can act as a potential performance indicator to deliver assessment in medical imaging interpretation and assess training procedures. Visual search data analyses lead to new ways of combining eye movement data and FROC methods to provide an alternative dimension to assess performance and visual search behaviour in the area of medical imaging perceptual tasks.
A comparison of methods to evaluate gray scale response of tomosynthesis systems using a software breast phantom
Digital breast tomosynthesis (DBT) has been shown to be an effective imaging tool for breast cancer diagnosis as it provides three-dimensional images of the breast with minimal tissue overlap. The quality of the reconstructed image depends on many factors that can be assessed using uniform or realistic phantoms. In this paper, we created four models of phantoms using an anthropomorphic software breast phantom and compared four methods to evaluate the gray scale response in terms of the contrast, noise and detectability of adipose and glandular tissues binarized according to phantom ground truth. For each method, circular regions of interest (ROIs) were selected with various sizes, quantity and positions inside a square area in the phantom. We also estimated the percent density of the simulated breast and the capability of distinguishing both tissues by receiver operating characteristic (ROC) analysis. Results shows a sensitivity of the methods to the ROI size, placement and to the slices considered.
Regression without truth with Markov chain Monte-Carlo
Hennadii Madan, Franjo Pernuš, Boštjan Likar, et al.
Regression without truth (RWT) is a statistical technique for estimating error model parameters of each method in a group of methods used for measurement of a certain quantity. A very attractive aspect of RWT is that it does not rely on a reference method or "gold standard" data, which is otherwise difficult RWT was used for a reference-free performance comparison of several methods for measuring left ventricular ejection fraction (EF), i.e. a percentage of blood leaving the ventricle each time the heart contracts, and has since been applied for various other quantitative imaging biomarkerss (QIBs). Herein, we show how Markov chain Monte-Carlo (MCMC), a computational technique for drawing samples from a statistical distribution with probability density function known only up to a normalizing coefficient, can be used to augment RWT to gain a number of important benefits compared to the original approach based on iterative optimization. For instance, the proposed MCMC-based RWT enables the estimation of joint posterior distribution of the parameters of the error model, straightforward quantification of uncertainty of the estimates, estimation of true value of the measurand and corresponding credible intervals (CIs), does not require a finite support for prior distribution of the measureand generally has a much improved robustness against convergence to non-global maxima. The proposed approach is validated using synthetic data that emulate the EF data for 45 patients measured with 8 different methods. The obtained results show that 90% CI of the corresponding parameter estimates contain the true values of all error model parameters and the measurand. A potential real-world application is to take measurements of a certain QIB several different methods and then use the proposed framework to compute the estimates of the true values and their uncertainty, a vital information for diagnosis based on QIB.
No-gold-standard evaluation of image-acquisition methods using patient data
Several new and improved modalities, scanners, and protocols, together referred to as image-acquisition methods (IAMs), are being developed to provide reliable quantitative imaging. Objective evaluation of these IAMs on the clinically relevant quantitative tasks is highly desirable. Such evaluation is most reliable and clinically decisive when performed with patient data, but that requires the availability of a gold standard, which is often rare. While no-goldstandard (NGS) techniques have been developed to clinically evaluate quantitative imaging methods, these techniques require that each of the patients be scanned using all the IAMs, which is expensive, time consuming, and could lead to increased radiation dose. A more clinically practical scenario is where different set of patients are scanned using different IAMs. We have developed an NGS technique that uses patient data where different patient sets are imaged using different IAMs to compare the different IAMs. The technique posits a linear relationship, characterized by a slope, bias, and noise standard-deviation term, between the true and measured quantitative values. Under the assumption that the true quantitative values have been sampled from a unimodal distribution, a maximum-likelihood procedure was developed that estimates these linear relationship parameters for the different IAMs. Figures of merit can be estimated using these linear relationship parameters to evaluate the IAMs on the basis of accuracy, precision, and overall reliability. The proposed technique has several potential applications such as in protocol optimization, quantifying difference in system performance, and system harmonization using patient data.
Model Observers
icon_mobile_dropdown
Signal template generation from acquired mammographic images for the non-prewhitening model observer with eye-filter
Christiana Balta, Ramona W. Bouwman, Ioannis Sechopoulos, et al.
Model observers (MOs) are being investigated for image quality assessment in full-field digital mammography (FFDM). Signal templates for the non-prewhitening MO with eye filter (NPWE) were formed using acquired FFDM images. A signal template was generated from acquired images by averaging multiple exposures resulting in a low noise signal template. Noise elimination while preserving the signal was investigated and a methodology which results in a noise-free template is proposed. In order to deal with signal location uncertainty, template shifting was implemented. The procedure to generate the template was evaluated on images of an anthropomorphic breast phantom containing microcalcification-related signals. Optimal reduction of the background noise was achieved without changing the signal. Based on a validation study in simulated images, the difference (bias) in MO performance from the ground truth signal was calculated and found to be <1%. As template generation is a building stone of the entire image quality assessment framework, the proposed method to construct templates from acquired images facilitates the use of the NPWE MO in acquired images.
Real space channelization for generic DBT system image quality evaluation with channelized Hotelling observer
Dimitar Petrov, Lesley Cockmartin, Nicholas Marshall, et al.
Digital breast tomosynthesis (DBT) is a relatively new 3D mammography technique that promises better detection of low contrast masses than conventional 2D mammography. The parameter space for DBT is large however and finding an optimal balance between dose and image quality remains challenging. Given the large number of conditions and images required in optimization studies, the use of human observers (HO) is time consuming and certainly not feasible for the tuning of all degrees of freedom. Our goal was to develop a model observer (MO) that could predict human detectability for clinically relevant details embedded within a newly developed structured phantom for DBT applications. DBT series were acquired on GE SenoClaire 3D, Giotto Class, Fujifilm AMULET Innovality and Philips MicroDose systems at different dose levels, Siemens Inspiration DBT acquisitions were reconstructed with different algorithms, while a larger set of DBT series was acquired on Hologic Dimensions system for first reproducibility testing. A channelized Hotelling observer (CHO) with Gabor channels was developed The parameters of the Gabor channels were tuned on all systems at standard scanning conditions and the candidate that produced the best fit for all systems was chosen. After tuning, the MO was applied to all systems and conditions. Linear regression lines between MO and HO scores were calculated, giving correlation coefficients between 0.87 and 0.99 for all tested conditions.
An observer model for quantifying panning artifacts in digital pathology
Ali R. N. Avanaki, Kathryn S. Espig, Albert Xthona, et al.
Typically, pathologists pan from one region of a slide to another, choosing areas of interest for closer inspection. Due to finite frame rate and imperfect zero-order hold reconstruction (i.e., the non-zero time to reach the target brightness after a change in pixel drive), panning in whole slide images (WSI) cause visual artifacts. It is important to study the impact of such artifacts since research suggests that 49% of navigation is conducted in low-power/overview with digital pathology (Molin et al., Histopathology 2015). In this paper, we explain what types of medical information may be harmed by panning artifacts, propose a method to simulate panning artifacts, and design an observer model to predict the impact of panning artifacts on typical human observers’ performance in basic diagnostically relevant visual tasks. The proposed observer model is based on derivation of perceived object border maps from luminance and chrominance information and may be tuned to account for visual acuity of the human observer to be modeled. Our results suggest that increasing the contrast (e.g., using a wide gamut display) with a slow response panel may not mitigate the panning artifacts which mostly affect visual tasks involving spatial discrimination of objects (e.g., normal vs abnormal structure, cell type and spatial relationships between them, and low-power nuclear morphology), and that the panning artifacts worsen with increasing panning speed. The proposed methods may be used as building blocks in an automatic WSI quality assessment framework.
Foveated model observers to predict human performance in 3D images
We evaluate 3D search requires model observers that take into account the peripheral human visual processing (foveated models) to predict human observer performance. We show that two different 3D tasks, free search and location-known detection, influence the relative human visual detectability of two signals of different sizes in synthetic backgrounds mimicking the noise found in 3D digital breast tomosynthesis. One of the signals resembled a microcalcification (a small and bright sphere), while the other one was designed to look like a mass (a larger Gaussian blob). We evaluated current standard models observers (Hotelling; Channelized Hotelling; non-prewhitening matched filter with eye filter, NPWE; and non-prewhitening matched filter model, NPW) and showed that they incorrectly predict the relative detectability of the two signals in 3D search. We propose a new model observer (3D Foveated Channelized Hotelling Observer) that incorporates the properties of the visual system over a large visual field (fovea and periphery). We show that the foveated model observer can accurately predict the rank order of detectability of the signals in 3D images for each task. Together, these results motivate the use of a new generation of foveated model observers for predicting image quality for search tasks in 3D imaging modalities such as digital breast tomosynthesis or computed tomography.
Evaluation of CNN as anthropomorphic model observer
Francesc Massanes, Jovan G. Brankov
Model observers (MO) are widely used in medical imaging to act as surrogates of human observers in task-based image quality evaluation, frequently towards optimization of reconstruction algorithms. In this paper, we explore the use of convolutional neural networks (CNN) to be used as MO. We will compare CNN MO to alternative MO currently being proposed and used such as the relevance vector machine based MO and channelized Hotelling observer (CHO). As the success of the CNN, and other deep learning approaches, is rooted in large data sets availability, which is rarely the case in medical imaging systems task-performance evaluation, we will evaluate CNN performance on both large and small training data sets.
Technology Assessment - Applications
icon_mobile_dropdown
Interpretation of the rainbow color scale for quantitative medical imaging: perceptually linear color calibration (CSDF) versus DICOM GSDF
Frédérique Chesterman, Hannah Manssens, Céline Morel, et al.
Medical displays for primary diagnosis are calibrated to the DICOM GSDF1 but there is no accepted standard today that describes how display systems for medical modalities involving color should be calibrated. Recently the Color Standard Display Function3,4 (CSDF), a calibration using the CIEDE2000 color difference metric to make a display as perceptually linear as possible has been proposed. In this work we present the results of a first observer study set up to investigate the interpretation accuracy of a rainbow color scale when a medical display is calibrated to CSDF versus DICOM GSDF and a second observer study set up to investigate the detectability of color differences when a medical display is calibrated to CSDF, DICOM GSDF and sRGB. The results of the first study indicate that the error when interpreting a rainbow color scale is lower for CSDF than for DICOM GSDF with statistically significant difference (Mann-Whitney U test) for eight out of twelve observers. The results correspond to what is expected based on CIEDE2000 color differences between consecutive colors along the rainbow color scale for both calibrations. The results of the second study indicate a statistical significant improvement in detecting color differences when a display is calibrated to CSDF compared to DICOM GSDF and a (non-significant) trend indicating improved detection for CSDF compared to sRGB. To our knowledge this is the first work that shows the added value of a perceptual color calibration method (CSDF) in interpreting medical color images using the rainbow color scale. Improved interpretation of the rainbow color scale may be beneficial in the area of quantitative medical imaging (e.g. PET SUV, quantitative MRI and CT and doppler US), where a medical specialist needs to interpret quantitative medical data based on a color scale and/or detect subtle color differences and where improved interpretation accuracy and improved detection of color differences may contribute to a better diagnosis. Our results indicate that for diagnostic applications involving both grayscale and color images, CSDF should be chosen over DICOM GSDF and sRGB as it assures excellent detection for color images and at the same time maintains DICOM GSDF for grayscale images.
Low contrast detection in abdominal CT: comparing single-slice and multi-slice tasks
Image quality assessment is crucial for the optimization of computed tomography (CT) protocols. Human and mathematical model observers are increasingly used for the detection of low contrast signal in abdominal CT, but are frequently limited to the use of a single image slice. Another limitation is that most of them only consider the detection of a signal embedded in a uniform background phantom. The purpose of this paper was to test if human observer performance is significantly different in CT images read in single or multiple slice modes and if these differences are the same for anatomical and uniform clinical images. We investigated detection performance and scrolling trends of human observers of a simulated liver lesion embedded in anatomical and uniform CT backgrounds. Results show that observers don’t take significantly benefit of additional information provided in multi-slice reading mode. Regarding the background, performances are moderately higher for uniform than for anatomical images. Our results suggest that for low contrast detection in abdominal CT, the use of multi-slice model observers would probably only add a marginal benefit. On the other hand, the quality of a CT image is more accurately estimated with clinical anatomical backgrounds.
Effects of increased compression with an ultrasound transducer on the conspicuity of breast lesions in a phantom
Katy Szczepura, Tahreem Faqir, David Manning
Ultrasound imaging of the breast is highly operator dependent. The amount of pressure applied with the transducer has a direct impact on the lesion visibility in breast ultrasound. The conspicuity index is a quantitative measure of lesion visibility, taking into account more parameters than standard measures that impact on lesion detection. This study assessed the conspicuity of lesions within a breast phantom using increased transducer compression in breast ultrasound.

Methods

A phantom was constructed of gelatine to represent adipose tissue, steel wool for glandular/blood vessels and silicone spheres to represent lesions, this meant that the lesions were also compressible, but less than the surrounding tissue. The phantom was imaged under increasing transducer compression. The conspicuity index was measured using the Conspicuity Index Software. The distance between the transducer surface and lesion surface was measured as an indication of increased compression.

Results

When moderate compression (17mm) was applied, the conspicuity index increased resulting in better visualisation of the silicone lesions. However, with increased compression the conspicuity index decreased.

New work to be presented

The conspicuity index has never been demonstrated in ultrasound imaging before. This is preliminary phantom work to demonstrate the impact of increased transducer compression on quantitative lesion visibility assessment.

Conclusion

The compression applied should be considered for optimum visualisation, as excessive pressure decreases conspicuity. However, further work needs to be conducted in order to consider other factors, such as density of the breast and lesion location, for a better understanding of the effect of compression on the visualisation of the lesion. A human study is planned.
Assessment of automatic exposure control performance in digital mammography using a no-reference anisotropic quality index
Bruno Barufaldi, Lucas R. Borges, Predrag R. Bakic, et al.
Automatic exposure control (AEC) is used in mammography to obtain acceptable radiation dose and adequate image quality regardless of breast thickness and composition. Although there are physics methods for assessing the AEC, it is not clear whether mammography systems operate with optimal dose and image quality in clinical practice. In this work, we propose the use of a normalized anisotropic quality index (NAQI), validated in previous studies, to evaluate the quality of mammograms acquired using AEC. The authors used a clinical dataset that consists of 561 patients and 1,046 mammograms (craniocaudal breast views). The results show that image quality is often maintained, even at various radiation levels (mean NAQI = 0.14 ± 0.02). However, a more careful analysis of NAQI reveals that the average image quality decreases as breast thickness increases. The NAQI is reduced by 32% on average, when the breast thickness increases from 31 to 71 mm. NAQI also decreases with lower breast density. The variation in breast parenchyma alone cannot fully account for the decrease of NAQI with thickness. Examination of images shows that images of large, fatty breasts are often inadequately processed. This work shows that NAQI can be applied in clinical mammograms to assess mammographic image quality, and highlights the limitations of the automatic exposure control for some images.
Digital breast tomosynthesis for detecting multifocal and multicentric breast cancer: influence of acquisition geometry on model observer performance in breast phantom images
Multifocal and multicentric breast cancer (MFMC), i.e., the presence of two or more tumor foci within the same breast, has an immense clinical impact on treatment planning and survival outcomes. Detecting multiple breast tumors is challenging as MFMC breast cancer is relatively uncommon, and human observers do not know the number or locations of tumors a priori. Digital breast tomosynthesis (DBT), in which an x-ray beam sweeps over a limited angular range across the breast, has the potential to improve the detection of multiple tumors.1, 2 However, prior efforts to optimize DBT image quality only considered unifocal breast cancers (e.g.,3-9), so the recommended geometries may not necessarily yield images that are informative for the task of detecting MFMC. Hence, the goal of this study is to employ a 3D multi-lesion (ml) channelized-Hotelling observer (CHO) to identify optimal DBT acquisition geometries for MFMC. Digital breast phantoms and simulated DBT scanners of different geometries (e.g., wide or narrow arc scans, different number of projections in each scan) were used to generate image data for the simulation study. Multiple 3D synthetic lesions were inserted into different breast regions to simulate MF cases and MC cases. 3D partial least squares (PLS) channels, and 3D Laguerre-Gauss (LG) channels were estimated to capture discriminant information and correlations among signals in locally varying anatomical backgrounds, enabling the model observer to make both image-level and location-specific detection decisions. The 3D ml-CHO with PLS channels outperformed that with LG channels in this study. The simulated MC cases and MC cases were not equally difficult for the ml-CHO to detect across the different simulated DBT geometries considered in this analysis. Also, the results suggest that the optimal design of DBT may vary as the task of clinical interest changes, e.g., a geometry that is better for finding at least one lesion may be worse for counting the number of lesions.
Lesion detectability in stereoscopically viewed digital breast tomosynthesis projection images: a model observer study with anthropomorphic computational breast phantoms
Stereoscopic views of 3D breast imaging data may better reveal the 3D structures of breasts, and potentially improve the detection of breast lesions. The imaging geometry of digital breast tomosynthesis (DBT) lends itself naturally to stereo viewing because a stereo pair can be easily formed by two projection images with a reasonable separation angle for perceiving depth. This simulation study attempts to mimic breast lesion detection on stereo viewing of a sequence of stereo pairs of DBT projection images. 3D anthropomorphic computational breast phantoms were scanned by a simulated DBT system, and spherical signals were inserted into different breast regions to imitate the presence of breast lesions. The regions of interest (ROI) had different local anatomical structures and consequently different background statistics. The projection images were combined into a sequence of stereo pairs, and then presented to a stereo matching model observer for determining lesion presence. The signal-to-noise ratio (SNR) was used as the figure of merit in evaluation, and the SNR from the stack of reconstructed slices was considered as the benchmark. We have shown that: 1) incorporating local anatomical backgrounds may improve lesion detectability relative to ignoring location-dependent image characteristics. The SNR was lower for the ROIs with the higher local power-law-noise coefficient β. 2) Lesion detectability may be inferior on stereo viewing of projection images relative to conventional viewing of reconstructed slices, but further studies are needed to confirm this observation.
Joint Session with 10132 and 10136: Task-based Assessment in CT
icon_mobile_dropdown
Development of local complexity metrics to quantify the effect of anatomical noise on detectability of lung nodules in chest CT imaging
Justin Solomon, Geoffrey Rubin, Taylor Smith, et al.
The purpose of this study was to develop metrics of local anatomical complexity and compare them with detectability of lung nodules in CT. Data were drawn retrospectively from a published perception experiment in which detectability was assessed in cases enriched with virtual nodules (13 radiologists x 157 total nodules = 2041 responses). A local anatomical complexity metric called the distractor index was developed, defined as the Gaussian weighted proportion (i.e., average) of distracting local voxels (50 voxels in-plane, 5 slices). A distracting voxel was classified by thresholding image data that had been selectively filtered to enhance nodule-like features. The distractor index was measured for each nodule location in the nodule-free images. The local pixel standard deviation (STD) was also measured for each nodule. Other confounding factors of search fraction (proportion of lung voxels to total voxels in the given slice) and peripheral distance (defined as the 3D distance of the nodule from the trachea bifurcation) were measured. A generalized linear mixed-effects statistical model (no interaction terms, probit link function, random reader term) was fit to the data to determine the influence of each metric on detectability. In order of decreasing effect size: distractor index, STD, and search fraction all significantly affected detectability (P < 0.001). Distance to the trachea did not have a significant effect (P < 0.05). These data demonstrate that local lung complexity degrades detection of lung nodules and the distractor index could serve as a good surrogate metric to quantify anatomical complexity.
Task-based image quality assessment in radiation therapy: initial characterization and demonstration with CT simulation images
Steven R. Dolly, Mark A. Anastasio, Lifeng Yu, et al.
In current radiation therapy practice, image quality is still assessed subjectively or by utilizing physically-based metrics. Recently, a methodology for objective task-based image quality (IQ) assessment in radiation therapy was proposed by Barrett et al.1 In this work, we present a comprehensive implementation and evaluation of this new IQ assessment methodology. A modular simulation framework was designed to perform an automated, computer-simulated end-to-end radiation therapy treatment. A fully simulated framework was created that utilizes new learning-based stochastic object models (SOM) to obtain known organ boundaries, generates a set of images directly from the numerical phantoms created with the SOM, and automates the image segmentation and treatment planning steps of a radiation therapy work ow. By use of this computational framework, therapeutic operating characteristic (TOC) curves can be computed and the area under the TOC curve (AUTOC) can be employed as a figure-of-merit to guide optimization of different components of the treatment planning process. The developed computational framework is employed to optimize X-ray CT pre-treatment imaging. We demonstrate that use of the radiation therapy-based-based IQ measures lead to different imaging parameters than obtained by use of physical-based measures.
Task-based data-acquisition optimization for sparse image reconstruction systems
Conventional wisdom dictates that imaging hardware should be optimized by use of an ideal observer (IO) that exploits full statistical knowledge of the class of objects to be imaged, without consideration of the reconstruction method to be employed. However, accurate and tractable models of the complete object statistics are often difficult to determine in practice. Moreover, in imaging systems that employ compressive sensing concepts, imaging hardware and (sparse) image reconstruction are innately coupled technologies. We have previously proposed a sparsity-driven ideal observer (SDIO) that can be employed to optimize hardware by use of a stochastic object model that describes object sparsity. The SDIO and sparse reconstruction method can therefore be "matched" in the sense that they both utilize the same statistical information regarding the class of objects to be imaged. To efficiently compute SDIO performance, the posterior distribution is estimated by use of computational tools developed recently for variational Bayesian inference. Subsequently, the SDIO test statistic can be computed semi-analytically. The advantages of employing the SDIO instead of a Hotelling observer are systematically demonstrated in case studies in which magnetic resonance imaging (MRI) data acquisition schemes are optimized for signal detection tasks.
Posters: Model Observers
icon_mobile_dropdown
The disagreement between the ideal observer and human observers in hardware and software imaging system optimization: theoretical explanations and evidence
Xin He
The ideal observer is widely used in imaging system optimization. One practical question remains open: do the ideal and human observers have the same preference in system optimization and evaluation? Based on the ideal observer’s mathematical properties proposed by Barrett et. al. and the empirical properties of human observers investigated by Myers et. al., I attempt to pursue the general rules regarding the applicability of the ideal observer in system optimization. Particularly, in software optimization, the ideal observer pursues data conservation while humans pursue data presentation or perception. In hardware optimization, the ideal observer pursues a system with the maximum total information, while humans pursue a system with the maximum selected (e.g., certain frequency bands) information. These different objectives may result in different system optimizations between human and the ideal observers. Thus, an ideal observer optimized system is not necessarily optimal for humans. I cite empirical evidence in search and detection tasks, in hardware and software evaluation, in X-ray CT, pinhole imaging, as well as emission computed tomography to corroborate the claims. (Disclaimer: the views expressed in this work do not necessarily represent those of the FDA)
Improving the ideal and human observer consistency: a demonstration of principles
Xin He
In addition to being rigorous and realistic, the usefulness of the ideal observer computational tools may also depend on whether they serve the empirical purpose for which they are created, e.g. to identify desirable imaging systems to be used by human observers. In SPIE 10136-35, I have shown that the ideal and the human observers do not necessarily prefer the same system as the optimal or better one due to their different objectives in both hardware and software optimization. In this work, I attempt to identify a necessary but insufficient condition under which the human and the ideal observer may rank systems consistently. If corroborated, such a condition allows a numerical test on the ideal/human consistency without routine human observer studies. I reproduced data from Abbey et al. JOSA 2001 to verify the proposed condition (i.e., not a rigorous falsification study due to the lack of specificity in the proposed conjecture. A roadmap for more falsifiable conditions is proposed). Via this work, I would like to emphasize the reality of practical decision making in addition to the realism in mathematical modeling. (Disclaimer: the views expressed in this work do not necessarily represent those of the FDA.)
Visual-search models for location-known detection tasks
Lesion-detection studies that analyze a fixed target position are generally considered predictive of studies involving lesion search, but the extent of the correlation often goes untested. The purpose of this work was to develop a visual-search (VS) model observer for location-known tasks that, coupled with previous work on localization tasks, would allow efficient same-observer assessments of how search and other task variations can alter study outcomes. The model observer featured adjustable parameters to control the search radius around the fixed lesion location and the minimum separation between suspicious locations. Comparisons were made against human observers, a channelized Hotelling observer and a nonprewhitening observer with eye filter in a two-alternative forced-choice study with simulated lumpy background images containing stationary anatomical and quantum noise. These images modeled single-pinhole nuclear medicine scans with different pinhole sizes. When the VS observer’s search radius was optimized with training images, close agreement was obtained with human-observer results. Some performance differences between the humans could be explained by varying the model observer’s separation parameter. The range of optimal pinhole sizes identified by the VS observer was in agreement with the range determined with the channelized Hotelling observer.
Visual-search model observer for assessing mass detection in CT
Our aim is to devise model observers (MOs) to evaluate acquisition protocols in medical imaging. To optimize protocols for human observers, an MO must reliably interpret images containing quantum and anatomical noise under aliasing conditions. In this study of sampling parameters for simulated lung CT, the lesion-detection performance of human observers was compared with that of visual-search (VS) observers, a channelized nonprewhitening (CNPW) observer, and a channelized Hoteling (CH) observer. Scans of a mathematical torso phantom modeled single-slice parallel-hole CT with varying numbers of detector pixels and angular projections. Circular lung lesions had a fixed radius. Twodimensional FBP reconstructions were performed. A localization ROC study was conducted with the VS, CNPW and human observers, while the CH observer was applied in a location-known ROC study. Changing the sampling parameters had negligible effect on the CNPW and CH observers, whereas several VS observers demonstrated a sensitivity to sampling artifacts that was in agreement with how the humans performed.
Posters: Image Perception
icon_mobile_dropdown
Hologram stability evaluation for Microsoft HoloLens
Reid Vassallo, Adam Rankin, Elvis C. S. Chen, et al.
Augmented reality (AR) has an increasing presence in the world of image-guided interventions which is amplified by the availability of consumer-grade head-mounted display (HMD) technology. The Microsoft® HoloLensTM optical passthrough device is at the forefront of consumer technology, as it is the first un-tethered head mounted computer (HMC). It shows promise of effectiveness in guiding clinical interventions, however its accuracy and stability must still be evaluated for the clinical environment. We have developed an evaluative protocol for the HoloLensTM using an optical measurement device to digitize the perceived pose of the rendered hologram. This evaluates the ability of the HoloLensTM to maintain the hologram in its intended pose. The stability is measured when actions are performed that may cause a shift in the holograms’ pose due to errors in its simultaneous localization and mapping. An emphasis is placed on actions that are more likely to be performed in a clinical setting. This will be used to determine the most applicable use cases for this technology in the future and how to minimize errors when in use. Our results show promise of this device’s potential for intraoperative clinical use. Further analysis must be performed to evaluate other potential sources of hologram disruption.
Perceived image quality for autostereoscopic holograms in healthcare training
Brian Goldiez, Julian Abich IV, Austin Carter, et al.
The current state of dynamic light field holography requires further empirical investigation to ultimately advance this developing technology. This paper describes a user-centered design approach for gaining insight into the features most important to clinical personnel using emerging dynamic holographic displays. The approach describes the generation of a high quality holographic model of a simulated traumatic amputation above the knee using 3D scanning. Using that model, a set of static holographic prints will be created varying in color or monochrome, contrast ratio, and polygon density. Leveraging methods from image quality research, the goal for this paper is to describe an experimental approach wherein participants are asked to provide feedback regarding the elements previously mentioned in order to guide the ongoing evolution of holographic displays.
On analyzing free-response data on location level
Andriy I. Bandos, Nancy A. Obuchowski
Free-response ROC (FROC) data are typically collected when primary question of interest is focused on the proportions of the correct detection-localization of known targets and frequencies of false positive responses, which can be multiple per subject (image). These studies are particularly relevant for CAD and related applications. The fundamental tool of the location-level FROC analysis is the FROC curve. Although there are many methods of FROC analysis, as we describe in this work, some of the standard and popular approaches, while important, are not suitable for analyzing specifically the location-level FROC performance as summarized by the FROC curve. Analysis of the FROC curve, on the other hand, might not be straightforward. Recently we developed an approach for the location-level analysis of the FROC data using the well-known tools for clustered ROC analysis. In the current work, based on previously developed concepts, and using specific examples, we demonstrate the key reasons why specifically location-level FROC performance cannot be fully addressed by the common approaches as well as illustrate the proposed solution. Specifically, we consider the two most salient FROC approaches, namely JAFROC and the area under the exponentially transformed FROC curve (AFE) and show that clearly superior FROC curves can have lower values for these indices. We describe the specific features that make these approaches inconsistent with FROC curves. This work illustrates some caveats for using the common approaches for location-level FROC analysis and provides guidelines for the appropriate assessment or comparison of FROC systems.
Posters: Technology Assessment - General
icon_mobile_dropdown
Colorimetric calibration of wound photography with off-the-shelf devices
Digital cameras are often used in recent days for photographic documentation in medical sciences. However, color reproducibility of same objects suffers from different illuminations and lighting conditions. This variation in color representation is problematic when the images are used for segmentation and measurements based on color thresholds. In this paper, motivated by photographic follow-up of chronic wounds, we assess the impact of (i) gamma correction, (ii) white balancing, (iii) background unification, and (iv) reference card-based color correction. Automatic gamma correction and white balancing are applied to support the calibration procedure, where gamma correction is a nonlinear color transform. For unevenly illuminated images, non- uniform illumination correction is applied. In the last step, we apply colorimetric calibration using a reference color card of 24 patches with known colors. A lattice detection algorithm is used for locating the card. The least squares algorithm is applied for affine color calibration in the RGB model. We have tested the algorithm on images with seven different types of illumination: with and without flash using three different off-the-shelf cameras including smartphones. We analyzed the spread of resulting color value of selected color patch before and after applying the calibration. Additionally, we checked the individual contribution of different steps of the whole calibration process. Using all steps, we were able to achieve a maximum of 81% reduction in standard deviation of color patch values in resulting images comparing to the original images. That supports manual as well as automatic quantitative wound assessments with off-the-shelf devices.
Color image enhancement of medical images using alpha-rooting and zonal alpha-rooting methods on 2D QDFT
2-D quaternion discrete Fourier transform (2-D QDFT) is the Fourier transform applied to color images when the color images are considered in the quaternion space. The quaternion numbers are four dimensional hyper-complex numbers. Quaternion representation of color image allows us to see the color of the image as a single unit. In quaternion approach of color image enhancement, each color is seen as a vector. This permits us to see the merging effect of the color due to the combination of the primary colors. The color images are used to be processed by applying the respective algorithm onto each channels separately, and then, composing the color image from the processed channels. In this article, the alpha-rooting and zonal alpha-rooting methods are used with the 2-D QDFT. In the alpha-rooting method, the alpha-root of the transformed frequency values of the 2-D QDFT are determined before taking the inverse transform. In the zonal alpha-rooting method, the frequency spectrum of the 2-D QDFT is divided by different zones and the alpha-rooting is applied with different alpha values for different zones. The optimization of the choice of alpha values is done with the genetic algorithm. The visual perception of 3-D medical images is increased by changing the reference gray line.
Do quantitative metrics derived from standard fluoroscopy phantoms used for quality control assess vendor-specific advancements in interventional fluoroscopy systems?
Mark A. Forsberg, David J. Eschelman, Prakruti Talreja, et al.
Approximately 9 million fluoroscopically-guided interventional procedures are performed annually in the USA. Recent technological advancements for interventional fluoroscopy systems have focused towards vendor-specific real-time image and signal processing. Hence, the purpose of this study was to evaluate if quantitative metrics derived from standard image quality phantoms, routinely used for quality control, are able to distinguish vendor-specific processing features for interventional fluoroscopy systems. Six standard image quality phantoms were used to measure contrast-to-noise ratio (CNR), full-width-at-half-maximum (for determining edge blurring) and modulation transfer function, to analyze contrast detail characteristics, and to assess digital subtraction angiography (DSA) performance of six flat-panel detector based interventional fluoroscopy systems from Philips (with and without ClarityIQ) and Siemens. Phantom data were acquired at different dose modes and field-of-view settings. Fluoroscopy loops and digital subtraction acquisitions were saved (duration 3 seconds; repeated 3 times). Images were analyzed off-line using ImageJ. CNR measurements showed no differences between systems, whereas the contrast-detail analysis and edge blurring characterization showed relatively low performance of Philips Clarity systems compared to Siemens and Philips non-Clarity systems. Conversely, the modulation transfer function showed that the limiting spatial resolution was higher for the Philips systems relative to the Siemens suite. However, with the DSA phantom the performance of Siemens and Philips Clarity-systems was similar. In conclusion, depending on the image quality phantom used for comparing different systems, the results may differ and therefore, quantitative metrics derived from standard fluoroscopy phantoms lack the discriminatory ability to assess vendor-specific advancements in interventional fluoroscopy systems.
Posters: Technology Assessment in Volume Imaging
icon_mobile_dropdown
Impact of tube current modulation on lesion conspicuity index in hi-resolution chest computed tomography
Katy Szczepura, David Tomkinson, David Manning
Tube current modulation is a method employed in the use of CT in an attempt to optimize radiation dose to the patient. The acceptable noise (noise index) can be varied, based on the level of optimization required; higher accepted noise reduces the patient dose. Recent research [1] suggests that measuring the conspicuity index (C.I.) of focal lesions within an image is more reflective of a clinical reader's ability to perceive focal lesions than traditional physical measures such as contrast to noise (CNR) and signal to noise ratio (SNR). Software has been developed and validated to calculate the C.I. in DICOM images. The aim of this work is assess the impact of tube current modulation on conspicuity index and CTDIvol, to indicate the benefits and limitations of tube current modulation on lesion detectability. Method An anthropomorphic chest phantom was used “Lungman” with inserted lesions of varying size and HU (see table below) a range of Hounsfield units and sizes were used to represent the variation in lesion Hounsfield units found. This meant some lesions had negative Hounsfield unit values.
Hounsfield Unit inaccuracy in computed tomography lesion size and density, diagnostic quality vs attenuation correction
In computed tomography the Hounsfield Units (HU) are used as an indicator of the tissue type based on the linear attenuation coefficients of the tissue. HU accuracy is essential when this metric is used in any form to support diagnosis. In hybrid imaging, such as SPECT/CT and PET/CT, the information is used for attenuation correction (AC) of the emission images. This work investigates the HU accuracy of nodules of known size and HU, comparing diagnostic quality (DQ) images with images used for AC.
Characterization of a CT unit for the detection of low contrast structures
Major technological advances in CT enable the acquisition of high quality images while minimizing patient exposure. The goal of this study was to objectively compare two generations of iterative reconstruction (IR) algorithms for the detection of low contrast structures. An abdominal phantom (QRM, Germany), containing 8, 6 and 5mm-diameter spheres (with a nominal contrast of 20HU) was scanned using our standard clinical noise index settings on a GE CT: “Discovery 750 HD”. Two additional rings (2.5 and 5 cm) were also added to the phantom. Images were reconstructed using FBP, ASIR-50%, and VEO (full statistical Model Based Iterative Reconstruction, MBIR). The reconstructed slice thickness was 2.5 mm except 0.625 mm for VEO reconstructions. NPS was calculated to highlight the potential noise reduction of each IR algorithm. To assess LCD (low Contrast Detectability), a Channelized Hotelling Observer (CHO) with 10 DDoG channels was used with the area under the curve (AUC) as a figure of merit. Spheres contrast was also measured. ASIR-50% allowed a noise reduction by a factor two when compared to FBP without an improvement of the LCD. VEO allowed an additional noise reduction with a thinner slice thickness compared to ASIR-50% but with a major improvement of the LCD especially for the large-sized phantom and small lesions. Contrast decreased up to 10% with the phantom size increase for FBP and ASIR-50% and remained constant with VEO. VEO is particularly interesting for LCD when dealing with large patients and small lesion sizes and when the detection task is difficult.
Investigation on location dependent detectability in cone beam CT images with uniform and anatomical backgrounds
We investigate location dependent lesion detectability of cone beam computed tomography images for different background types (i.e., uniform and anatomical), image planes (i.e., transverse and longitudinal) and slice thicknesses. Anatomical backgrounds are generated using a power law spectrum of breast anatomy, 1/f3. Spherical object with a 5mm diameter is used as a signal. CT projection data are acquired by the forward projection of uniform and anatomical backgrounds with and without the signal. Then, projection data are reconstructed using the FDK algorithm. Detectability is evaluated by a channelized Hotelling observer with dense difference-of-Gaussian channels. For uniform background, off-centered images yield higher detectability than iso-centered images for the transverse plane, while for the longitudinal plane, detectability of iso-centered and off-centered images are similar. For anatomical background, off-centered images yield higher detectability for the transverse plane, while iso-centered images yield higher detectability for the longitudinal plane, when the slice thickness is smaller than 1.9mm. The optimal slice thickness is 3.8mm for all tasks, and the transverse plane at the off-center (iso-center and off-center) produces the highest detectability for uniform (anatomical) background.
A phantom design and assessment of lesion detectability in PET imaging
The early detection of abnormal regions with increased tracer uptake in positron emission tomography (PET) is a key driver of imaging system design and optimization as well as choice of imaging protocols. Detectability, however, remains difficult to assess due to the need for realistic objects mimicking the clinical scene, multiple lesion-present and lesion-absent images and multiple observers. Fillable phantoms, with tradeoffs between complexity and utility, provide a means to quantitatively test and compare imaging systems under truth-known conditions. These phantoms, however, often focus on quantification rather than detectability. This work presents extensions to a novel phantom design and analysis techniques to evaluate detectability in the context of realistic, non-piecewise constant backgrounds. The design consists of a phantom filled with small solid plastic balls and a radionuclide solution to mimic heterogeneous background uptake. A set of 3D-printed regular dodecahedral ‘features’ were included at user-defined locations within the phantom to create ‘holes’ within the matrix of chaotically-packed balls. These features fill at approximately 3:1 contrast to the lumpy background. A series of signal-known-present (SP) and signal-known-absent (SA) sub-images were generated and used as input for observer studies. This design was imaged in a head-like 20 cm diameter, 20 cm long cylinder and in a body-like 36 cm wide by 21 cm tall by 40 cm long tank. A series of model observer detectability indices were compared across scan conditions (count levels, number of scan replicates), PET image reconstruction methods (with/without TOF and PSF) and between PET/CT scanner system designs using the same phantom imaged on multiple systems. The detectability index was further compared to the noise-equivalent count (NEC) level to characterize the relationship between NEC and observer SNR.
Fractal dimension metric for quantifying noise texture of computed tomography images
P. Khobragade, Jiahua Fan, Franco Rupcich, et al.
This study investigated a fractal dimension algorithm for noise texture quantification in CT images. Quantifying noise in CT images is important for assessing image quality. Noise is typically quantified by calculating noise standard deviation and noise power spectrum (NPS). Different reconstruction kernels and iterative reconstruction approaches affect both the noise magnitude and noise texture. The shape of the NPS can be used as a noise texture descriptor. However, the NPS requires numerous images for calculation and is a vector quantity. This study proposes the metric of fractal dimension to quantify noise texture, because fractal dimension is a single scalar metric calculated from a small number of images. Fractal dimension measures the complexity of a pattern. In this study, the ACR CT phantom was scanned and images were reconstructed using filtered back-projection with three reconstruction kernels: bone, soft and standard. Regions of interest were extracted from the uniform section of the phantom for NPS and fractal dimension calculation. The results demonstrated a mean fractal dimension of 1.86 for soft kernel, 1.92 for standard kernel, and 2.16 for bone kernel. Increasing fractal dimension corresponded to shift in the NPS towards higher spatial frequencies and grainier noise appearance. Stable fractal dimension was calculated from two ROI’s compared to more than 250 ROI’s used for NPS calculation. The scalar fractal dimension metric may be a useful noise texture descriptor for evaluating or optimizing reconstruction algorithms.
Effects of window width and window level adjustment on detection tasks in computed tomography images
P. Khobragade, Jiahua Fan, Franco Rupcich, et al.
This study experimentally evaluated the effect of window width (WW) and window level (WL) on the task based detectability index (d’). Window-level transformation is frequently performed on CT images to improve visualization in clinical arena. Numerous model observers and metrics have been used to assess CT image quality. However, objective assessment is typically performed on the reconstructed CT image without considering the WW and WL settings used by the reader. In this study, the ACR CT phantom was scanned at 120 kV and 90 mAs and images were reconstructed using filtered backprojection. The bone and acrylic contrast objects from module one of the ACR phantom were selected for calculating the effect of the WW/WL on the detectability index (d’). The d’ for each object at 90 mAs was calculated for a range of WW and WL values. The results demonstrated that the d’ values were affected by the WW and WL settings. For example, WL setting of 20 HU and window width of 150 HU resulted in a 35% decrease in d’ compared to that of the untransformed image. A WL of 20 and WW of 1400 resulted in a 33% decrease in the d’ value for the high contrast object. The investigated WW and WL settings did not improve d’ for any of the investigated objects when reconstructed with the standard kernel. The results suggest that d’ is affected by the WW/WL settings. However, because d’ does not model the contrast adaptation of the human visual system, it may not represent the change in perceived image quality with window-level transformation.
An interactive, stereoscopic virtual environment for medical imaging visualization, simulation and training
Evan Krueger, Erik Messier, Cristian A. Linte, et al.
Recent advances in medical image acquisition allow for the reconstruction of anatomies with 3D, 4D, and 5D renderings. Nevertheless, standard anatomical and medical data visualization still relies heavily on the use of traditional 2D didactic tools (i.e., textbooks and slides), which restrict the presentation of image data to a 2D slice format. While these approaches have their merits beyond being cost effective and easy to disseminate, anatomy is inherently three-dimensional. By using 2D visualizations to illustrate more complex morphologies, important interactions between structures can be missed. In practice, such as in the planning and execution of surgical interventions, professionals require intricate knowledge of anatomical complexities, which can be more clearly communicated and understood through intuitive interaction with 3D volumetric datasets, such as those extracted from high-resolution CT or MRI scans. Open source, high quality, 3D medical imaging datasets are freely available, and with the emerging popularity of 3D display technologies, affordable and consistent 3D anatomical visualizations can be created. In this study we describe the design, implementation, and evaluation of one such interactive, stereoscopic visualization paradigm for human anatomy extracted from 3D medical images. A stereoscopic display was created by projecting the scene onto the lab floor using sequential frame stereo projection and viewed through active shutter glasses. By incorporating a PhaseSpace motion tracking system, a single viewer can navigate an augmented reality environment and directly manipulate virtual objects in 3D. While this paradigm is sufficiently versatile to enable a wide variety of applications in need of 3D visualization, we designed our study to work as an interactive game, which allows users to explore the anatomy of various organs and systems. In this study we describe the design, implementation, and evaluation of an interactive and stereoscopic visualization platform for exploring and understanding human anatomy. This system can present medical imaging data in three dimensions and allows for direct physical interaction and manipulation by the viewer. This should provide numerous benefits over traditional, 2D display and interaction modalities, and in our analysis, we aim to quantify and qualify users' visual and motor interactions with the virtual environment when employing this interactive display as a 3D didactic tool.
A new method to quantify fiber orientation similarity in registered volumes
James Dormer, Rong Jiang, Mary B. Wagner, et al.
Differences in fiber orientations between registered image volumes can be difficult to quantify. Angular errors between diffusion tensor imaging (DTI) volumes are often a combination of image registration errors and fluctuations of diffusion values that are used to determine the fiber orientations. In order to properly quantify the similarity between two images containing fiber orientation information, both displacement and angular fluctuation should be considered. We present a method to quantify fiber orientation similarity between registered images by allowing small pixel displacements in conjunction with minor angle differences. Adjustments to the allowed pixel displacement and degree of angle difference can help identify the major factor contributing to the error of fiber angles. The proposed method can provide a new metric for the evaluation of the fiber orientation difference.
Posters: Image Perception and Technology Assessment in Breast Imaging
icon_mobile_dropdown
Automatic breast tissue density estimation scheme in digital mammography images
Renan C. Menechelli, Ana Luisa V. Pacheco, Homero Schiabel
Cases of breast cancer have increased substantially each year. However, radiologists are subject to subjectivity and failures of interpretation which may affect the final diagnosis in this examination. The high density features in breast tissue are important factors related to these failures. Thus, among many functions some CADx (Computer-Aided Diagnosis) schemes are classifying breasts according to the predominant density. In order to aid in such a procedure, this work attempts to describe automated software for classification and statistical information on the percentage change in breast tissue density, through analysis of sub regions (ROIs) from the whole mammography image. Once the breast is segmented, the image is divided into regions from which texture features are extracted. Then an artificial neural network MLP was used to categorize ROIs. Experienced radiologists have previously determined the ROIs density classification, which was the reference to the software evaluation. From tests results its average accuracy was 88.7% in ROIs classification, and 83.25% in the classification of the whole breast density in the 4 BI-RADS density classes – taking into account a set of 400 images. Furthermore, when considering only a simplified two classes division (high and low densities) the classifier accuracy reached 93.5%, with AUC = 0.95.
Characterization of breast density in Vietnam and its association with demographic, reproductive and lifestyle factors
This study aims to investigate patterns of breast density among women in Vietnam and their association with demographic, reproductive and lifestyle features. Mammographic densities of 1,651 women were collected from the two largest breast cancer screening and treatment centers in Ha Noi and Ho Chi Minh city. Putative factors associated with breast density were obtained from self-administered questionnaires which considered demographic, reproductive and lifestyle elements and were provided by women who attended mammography examinations. Results show that a large proportion of Vietnamese women (78.4%) had a high breast density. With multivariable logistic regression, significant associations of high breast density were evident with women with less than 55 years old (OR=3.0), having BMI less than 23 (OR=2.2), experiencing pre-menopausal status (OR=2.9), having less than three children (OR=1.7), and being less than 32 years old when having their last child (OR=1.8). Participants who consumed more than two vegetable servings per day also had an increased risk of higher density (OR=2.6). The findings suggest some unique features regarding mammographic density amongst Vietnamese compared with westernized women.
The development and testing of a unique and flexible training module for residents and fellows using digital breast tomosythesis (DBT)
Christiane M. Hakim, John Drescher, Jill L. King, et al.
The transition from FFDM to digital breast tomosynthesis (DBT) necessitates new approaches for training radiology residents and fellows that highlight depiction differences between the same abnormalities on the two modalities. We developed a unique, flexible training module that enables training with complete feedback, as well as testing performance before and after use of this training module. Currently, 219 examinations, with priors and other relevant information, are included. Using a special interface to the Secure View workstation (Hologic), we developed a management program that displays each case in a randomized manner and in a sequential mode (i.e. FFDM first followed by FFDM+DBT) and allows the reader to rate the case followed by viewing the images side by side with results of the full imaging based history (reporting) by the screening interpreter, the diagnostic workup interpreter (when applicable), and the actual pathology (biopsy and/or surgical). This approach allows the reader to review their correct and/or incorrect interpretation at each step of the management decision making. The module also has sets of pre- and post-training cases, allowing for a test-train-test study to be performed, if so desired. Two observer studies using 18 radiologists, residents, and fellows have been performed using this module, to date. The training module was assembled, tested, and implemented. We found it to be extremely flexible and useful in training. After completing two observer performance studies, the module was installed in our clinical facility and is currently being used to train residents and fellows at their own pace. All users found this module to be useful and extremely informative.
Developing a visual sensitive image features based CAD scheme to assist classification of mammographic masses
Computer-aided diagnosis (CAD) schemes of mammograms have been previously developed and tested. However, due to using “black-box” approaches with a large number of complicated features, radiologists have lower confidence to accept or consider CAD-cued results. In order to help solve this issue, this study aims to develop and evaluate a new CAD scheme that uses visual sensitive image features to classify between malignant and benign mammographic masses. A dataset of 301 masses detected on both craniocaudal (CC) and mediolateraloblique (MLO) view images was retrospectively assembled. Among them, 152 were malignant and 149 were benign. An iterative region-growing algorithm was applied to the special Gaussian-kernel filtered images to segment mass regions. Total 13 Image features were computed to mimic 5 categories of visually sensitive features that are commonly used by radiologists in classifying suspicious mammographic masses namely, mass size, shape factor, contrast, homogeneity and spiculation. We then selected one optimal feature in each of 5 feature categories by using a student t-test, and applied two logistic regression classifiers using either CC or MLO view images to distinguish between malignant and benign masses. Last, a fusion method of combining two classification scores was applied and tested. By applying a 10-fold cross-validation method, the area under receiver operating characteristic curves was 0.806±0.025. This study demonstrated a new approach to develop CAD scheme based on 5 visually sensitive image features. Combining with a “visual-aid” interface, CAD results are much more easily explainable to the observers and may increase their confidence to consider CAD-cued results.
Can BI-RADS features on mammography be used as a surrogate for expensive genomic testing in breast cancer patients?
Michael R. Harowicz, Jeffrey R. Marks, P. Kelly Marcom, et al.
Medical oncologists increasingly rely on expensive genomic analysis to stratify patients for different treatment. The genomic markers are able to divide patients into groups that behave differently in terms of tumor presentation, likelihood of metastatic spread, and response to chemotherapy and radiation therapy. In recent years there has been a rapid increase in the number of genomic tests available, like the Oncotype DX test, which provides the risk of cancer recurrence for a subset of patients. Radiogenomics, a new field that investigates the relationship between imaging phenotypes and genomic characteristics, may offer a less expensive and less invasive imaging surrogate for molecular subtype and Oncotype DX recurrence score (ODRS). This retrospective study analyzes the relationship between Breast Imaging-Reporting and Data System (BI-RADS) features as assessed by radiologists on mammograms with molecular subtype and ODRS. We used data from patients with BI-RADS features (shape or margin) and a genomic feature (subtype or ODRS) for the following cohort: shape vs. subtype (n=69), margin vs. subtype (n=78), shape vs. ODRS (n=20), and margin vs. ODRS (n=18). The association between features was assessed using a Fisher’s exact test. Our results show that shape assessed by radiologists according to the BI-RADS lexicon is associated with molecular subtype (p=0.0171), while BI-RADS features of shape and margin were not significantly associated with ODRS (p=0.7839, p=0.6047 respectively).
Statistical aspects of radiogenomics: can radiogenomics models be used to aid prediction of outcomes in cancer patients?
Radiogenomics is a new direction in cancer research that aims at identifying the relationship between tumor genomics and its appearance in imaging (i.e. its radiophenotype). Recent years brought multiple radiogenomic discoveries in brain, breast, lung, and other cancers. With development of this new field we believe that it important to investigate in which setting radiogenomics could be useful to better direct research effort. One of the general applications of radiogenomics is to generate imaging-based models for prediction of outcomes and doing so through modeling the relationship between imaging and genomics and the relationship between genomics and outcomes. We believe that this is an important potential application of radiogenomic as it could advance imaging-based precision medicine. We show a preliminary simulation study evaluation whether such approach results in improved models. We investigate different setting in terms of the strengths of the radiogenomic relationship, prognostic power of the imaging and genomic descriptors, and availability and quality of data. Our experiments indicated that the following parameters have impact on usefulness of the radiogenomic approach: predictive power of genomic features and imaging features, strength of the radiogenomic relationship as well as number and follow up time for the genomic data. Overall, we found that there are some situations in which radiogenomics approach is beneficial but only when the radiogenomic relationship is strong and low number of imaging cases with outcomes data are available.
Exploring a new bilateral focal density asymmetry based image marker to predict breast cancer risk
Faranak Aghaei, Seyedehnafiseh Mirniaharikandehei, Alan B. Hollingsworth, et al.
Although breast density has been widely considered an important breast cancer risk factor, it is not very effective to predict risk of developing breast cancer in a short-term or harboring cancer in mammograms. Based on our recent studies to build short-term breast cancer risk stratification models based on bilateral mammographic density asymmetry, we in this study explored a new quantitative image marker based on bilateral focal density asymmetry to predict the risk of harboring cancers in mammograms. For this purpose, we assembled a testing dataset involving 100 positive and 100 negative cases. In each of positive case, no any solid masses are visible on mammograms. We developed a computer-aided detection (CAD) scheme to automatically detect focal dense regions depicting on two bilateral mammograms of left and right breasts. CAD selects one focal dense region with the maximum size on each image and computes its asymmetrical ratio. We used this focal density asymmetry as a new imaging marker to divide testing cases into two groups of higher and lower focal density asymmetry. The first group included 70 cases in which 62.9% are positive, while the second group included 130 cases in which 43.1% are positive. The odds ratio is 2.24. As a result, this preliminary study supported the feasibility of applying a new focal density asymmetry based imaging marker to predict the risk of having mammography-occult cancers. The goal is to assist radiologists more effectively and accurately detect early subtle cancers using mammography and/or other adjunctive imaging modalities in the future.