Proceedings Volume 10575

Medical Imaging 2018: Computer-Aided Diagnosis

cover
Proceedings Volume 10575

Medical Imaging 2018: Computer-Aided Diagnosis

Purchase the printed version of this volume at proceedings.com or access the digital version at SPIE Digital Library.

Volume Details

Date Published: 23 April 2018
Contents: 24 Sessions, 137 Papers, 64 Presentations
Conference: SPIE Medical Imaging 2018
Volume Number: 10575

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 10575
  • Lung I and Liver
  • Radiomics
  • Brain I
  • Musculoskeletal and Skin
  • Breast I
  • Cardiac, Vessels, and Novel Applications
  • Keynote and Eye
  • Colon and Prostate
  • Head and Neck
  • Lung II
  • Quantitative
  • Brain II
  • Other Organs
  • Breast II
  • Poster Session: Abdominal and Gastrointestinal
  • Poster Session: Brain and Head
  • Poster Session: Breast
  • Poster Session: Cardiac
  • Poster Session: Eye
  • Poster Session: Lung
  • Poster Session: Musculoskeletal and Skin
  • Poster Session: Quantitative
  • Poster Session: Radiomics
Front Matter: Volume 10575
icon_mobile_dropdown
Front Matter: Volume 10575
This PDF file contains the front matter associated with SPIE Proceedings Volume 10575, including the Title Page, Copyright information, Table of Contents, and Conference Committee listing.
Lung I and Liver
icon_mobile_dropdown
Dense volumetric detection and segmentation of mediastinal lymph nodes in chest CT images
Hirohisa Oda, Holger R. Roth, Kanwal K. Bhatia, et al.
We propose a novel mediastinal lymph node detection and segmentation method from chest CT volumes based on fully convolutional networks (FCNs). Most lymph node detection methods are based on filters for blob-like structures, which are not specific for lymph nodes. The 3D U-Net is a recent example of the state-of-the-art 3D FCNs. The 3D U-Net can be trained to learn appearances of lymph nodes in order to output lymph node likelihood maps on input CT volumes. However, it is prone to oversegmentation of each lymph node due to the strong data imbalance between lymph nodes and the remaining part of the CT volumes. To moderate the balance of sizes between the target classes, we train the 3D U-Net using not only lymph node annotations but also other anatomical structures (lungs, airways, aortic arches, and pulmonary arteries) that can be extracted robustly in an automated fashion. We applied the proposed method to 45 cases of contrast-enhanced chest CT volumes. Experimental results showed that 95.5% of lymph nodes were detected with 16.3 false positives per CT volume. The segmentation results showed that the proposed method can prevent oversegmentation, achieving an average Dice score of 52.3 ± 23.1%, compared to the baseline method with 49.2 ± 23.8%, respectively.
Early detection of lung cancer recurrence after stereotactic ablative radiation therapy: radiomics system design
Salma Dammak, David Palma, Sarah Mattonen, et al.
Stereotactic ablative radiotherapy (SABR) is the standard treatment recommendation for Stage I non-small cell lung cancer (NSCLC) patients who are inoperable or who refuse surgery. This option is well tolerated by even unfit patients and has a low recurrence risk post-treatment. However, SABR induces changes in the lung parenchyma that can appear similar to those of recurrence, and the difference between the two at an early follow-up time point is not easily distinguishable for an expert physician. We hypothesized that a radiomics signature derived from standard-of-care computed tomography (CT) imaging can detect cancer recurrence within six months of SABR treatment. This study reports on the design phase of our work, with external validation planned in future work. In this study, we performed cross-validation experiments with four feature selection approaches and seven classifiers on an 81-patient data set. We extracted 104 radiomics features from the consolidative and the peri-consolidative regions on the follow-up CT scans. The best results were achieved using the sum of estimated Mahalanobis distances (Maha) for supervised forward feature selection and a trainable automatic radial basis support vector classifier (RBSVC). This system produced an area under the receiver operating characteristic curve (AUC) of 0.84, an error rate of 16.4%, a false negative rate of 12.7%, and a false positive rate of 20.0% for leaveone patient out cross-validation. This suggests that once validated on an external data set, radiomics could reliably detect post-SABR recurrence and form the basis of a tool assisting physicians in making salvage treatment decisions.
Pneumothorax detection in chest radiographs using convolutional neural networks
Aviel Blumenfeld, Eli Konen, Hayit Greenspan
This study presents a computer assisted diagnosis system for the detection of pneumothorax (PTX) in chest radiographs based on a convolutional neural network (CNN) for pixel classification. Using a pixel classification approach allows utilization of the texture information in the local environment of each pixel while training a CNN model on millions of training patches extracted from a relatively small dataset. The proposed system uses a pre-processing step of lung field segmentation to overcome the large variability in the input images coming from a variety of imaging sources and protocols. Using a CNN classification, suspected pixel candidates are extracted within each lung segment. A postprocessing step follows to remove non-physiological suspected regions and noisy connected components. The overall percentage of suspected PTX area was used as a robust global decision for the presence of PTX in each lung. The system was trained on a set of 117 chest x-ray images with ground truth segmentations of the PTX regions. The system was tested on a set of 86 images and reached diagnosis accuracy of AUC=0.95. Overall preliminary results are promising and indicate the growing ability of CAD based systems to detect findings in medical imaging on a clinical level accuracy.
Boosting CNN performance for lung texture classification using connected filtering
Infiltrative lung diseases describe a large group of irreversible lung disorders requiring regular follow-up with CT imaging. Quantifying the evolution of the patient status imposes the development of automated classification tools for lung texture. This paper presents an original image pre-processing framework based on locally connected filtering applied in multiresolution, which helps improving the learning process and boost the performance of CNN for lung texture classification. By removing the dense vascular network from images used by the CNN for lung classification, locally connected filters provide a better discrimination between different lung patterns and help regularizing the classification output. The approach was tested in a preliminary evaluation on a 10 patient database of various lung pathologies, showing an increase of 10% in true positive rate (on average for all the cases) with respect to the state of the art cascade of CNNs for this task.
Automatic liver volume segmentation and fibrosis classification
Evgeny Bal, Eyal Klang, Michal Amitai, et al.
In this work, we present an automatic method for liver segmentation and fibrosis classification in liver computed-tomography (CT) portal phase scans. The input is a full abdomen CT scan with an unknown number of slices, and the output is a liver volume segmentation mask and a fibrosis grade. A multi-stage analysis scheme is applied to each scan, including: volume segmentation, texture features extraction and SVM based classification. Data contains portal phase CT examinations from 80 patients, taken with different scanners. Each examination has a matching Fibroscan grade. The dataset was subdivided into two groups: first group contains healthy cases and mild fibrosis, second group contains moderate fibrosis, severe fibrosis and cirrhosis. Using our automated algorithm, we achieved an average dice index of 0.93 ± 0.05 for segmentation and a sensitivity of 0.92 and specificity of 0.81for classification. To the best of our knowledge, this is a first end to end automatic framework for liver fibrosis classification; an approach that, once validated, can have a great potential value in the clinic.
Radiomics
icon_mobile_dropdown
Association of high proliferation marker Ki-67 expression with DCEMR imaging features of breast: a large scale evaluation
Ashirbani Saha, Michael R. Harowicz, Lars J. Grimm, et al.
One of the methods widely used to measure the proliferative activity of cells in breast cancer patients is the immunohistochemical (IHC) measurement of the percentage of cells stained for nuclear antigen Ki-67. Use of Ki-67 expression as a prognostic marker is still under investigation. However, numerous clinical studies have reported an association between a high Ki-67 and overall survival (OS) and disease free survival (DFS). On the other hand, to offer non-invasive alternative in determining Ki-67 expression, researchers have made recent attempts to study the association of Ki-67 expression with magnetic resonance (MR) imaging features of breast cancer in small cohorts (<30). Here, we present a large scale evaluation of the relationship between imaging features and Ki-67 score as: (a) we used a set of 450 invasive breast cancer patients, (b) we extracted a set of 529 imaging features of shape and enhancement from breast, tumor and fibroglandular tissue of the patients, (c) used a subset of patients as the training set to select features and trained a multivariate logistic regression model to predict high versus low Ki-67 values, and (d) we validated the performance of the trained model in an independent test set using the area-under the receiver operating characteristics (ROC) curve (AUC) of the values predicted. Our model was able to predict high versus low Ki-67 in the test set with an AUC of 0.67 (95% CI: 0.58-0.75, p<1.1e-04). Thus, a moderate strength of association of Ki-67 values and MRextracted imaging features was demonstrated in our experiments.
Detecting mammographically occult cancer in women with dense breasts using Radon Cumulative Distribution Transform: a preliminary analysis
We propose using novel imaging biomarkers for detecting mammographically-occult (MO) cancer in women with dense breast tissue. MO cancer indicates visually occluded, or very subtle, cancer that radiologists fail to recognize as a sign of cancer. We used the Radon Cumulative Distribution Transform (RCDT) as a novel image transformation to project the difference between left and right mammograms into a space, increasing the detectability of occult cancer. We used a dataset of 617 screening full-field digital mammograms (FFDMs) of 238 women with dense breast tissue. Among 238 women, 173 were normal with 2 – 4 consecutive screening mammograms, 552 normal mammograms in total, and the remaining 65 women had an MO cancer with a negative screening mammogram. We used Principal Component Analysis (PCA) to find representative patterns in normal mammograms in the RCDT space. We projected all mammograms to the space constructed by the first 30 eigenvectors of the RCDT of normal cases. Under 10-fold crossvalidation, we conducted quantitative feature analysis to classify normal mammograms and mammograms with MO cancer. We used receiver operating characteristic (ROC) analysis to evaluate the classifier’s output using the area under the ROC curve (AUC) as the figure of merit. Four eigenvectors were selected via a feature selection method. The mean and standard deviation of the AUC of the trained classifier on the test set were 0.74 and 0.08, respectively. In conclusion, we utilized imaging biomarkers to highlight differences between left and right mammograms to detect MO cancer using novel imaging transformation.
Deriving stable multi-parametric MRI radiomic signatures in the presence of inter-scanner variations: survival prediction of glioblastoma via imaging pattern analysis and machine learning techniques
Saima Rathore, Spyridon Bakas, Hamed Akbari, et al.
There is mounting evidence that assessment of multi-parametric magnetic resonance imaging (mpMRI) profiles can noninvasively predict survival in many cancers, including glioblastoma. The clinical adoption of mpMRI as a prognostic biomarker, however, depends on its applicability in a multicenter setting, which is hampered by inter-scanner variations. This concept has not been addressed in existing studies. We developed a comprehensive set of within-patient normalized tumor features such as intensity profile, shape, volume, and tumor location, extracted from multicenter mpMRI of two large (npatients=353) cohorts, comprising the Hospital of the University of Pennsylvania (HUP, npatients=252, nscanners=3) and The Cancer Imaging Archive (TCIA, npatients=101, nscanners=8). Inter-scanner harmonization was conducted by normalizing the tumor intensity profile, with that of the contralateral healthy tissue. The extracted features were integrated by support vector machines to derive survival predictors. The predictors’ generalizability was evaluated within each cohort, by two cross-validation configurations: i) pooled/scanner-agnostic, and ii) across scanners (training in multiple scanners and testing in one). The median survival in each configuration was used as a cut-off to divide patients in long- and short-survivors. Accuracy (ACC) for predicting long- versus short-survivors, for these configurations was ACCpooled=79.06% and ACCpooled=84.7%, ACCacross=73.55% and ACCacross=74.76%, in HUP and TCIA datasets, respectively. The hazard ratio at 95% confidence interval was 3.87 (2.87–5.20, P<0.001) and 6.65 (3.57-12.36, P<0.001) for HUP and TCIA datasets, respectively. Our findings suggest that adequate data normalization coupled with machine learning classification allows robust prediction of survival estimates on mpMRI acquired by multiple scanners.
Robustness of radiomic breast features of benign lesions and luminal A cancers across MR magnet strengths
Heather M. Whitney, Karen Drukker, Alexandra Edwards, et al.
Radiomics features extracted from breast lesion images have shown potential in diagnosis and prognosis of breast cancer. As clinical institutions transition from 1.5 T to 3.0 T magnetic resonance imaging (MRI), it is helpful to identify robust features across these field strengths. In this study, dynamic contrast-enhanced MR images were acquired retrospectively under IRB/HIPAA compliance, yielding 738 cases: 241 and 124 benign lesions imaged at 1.5 T and 3.0 T and 231 and 142 luminal A cancers imaged at 1.5 T and 3.0 T, respectively. Lesions were segmented using a fuzzy C-means method. Extracted radiomic values for each group of lesions by cancer status and field strength of acquisition were compared using a Kolmogorov-Smirnov test for the null hypothesis that two groups being compared came from the same distribution, with p-values being corrected for multiple comparisons by the Holm-Bonferroni method. Two shape features, one texture feature, and three enhancement variance kinetics features were found to be potentially robust. All potentially robust features had areas under the receiver operating characteristic curve (AUC) statistically greater than 0.5 in the task of distinguishing between lesion types (range of means 0.57-0.78). The significant difference in voxel size between field strength of acquisition limits the ability to affirm more features as robust or not robust according to field strength alone, and inhomogeneities in static field strength and radiofrequency field could also have affected the assessment of kinetic curve features as robust or not. Vendor-specific image scaling could have also been a factor. These findings will contribute to the development of radiomic signatures that use features identified as robust across field strength.
Empirical evaluation of cross-site reproducibility in radiomic features for characterizing prostate MRI
The recent advent of radiomics has enabled the development of prognostic and predictive tools which use routine imaging, but a key question that still remains is how reproducible these features may be across multiple sites and scanners. This is especially relevant in the context of MRI data, where signal intensity values lack tissue specific, quantitative meaning, as well as being dependent on acquisition parameters (magnetic field strength, image resolution, type of receiver coil). In this paper we present the first empirical study of the reproducibility of 5 different radiomic feature families in a multi-site setting; specifically, for characterizing prostate MRI appearance. Our cohort comprised 147 patient T2w MRI datasets from 4 different sites, all of which were first pre-processed to correct acquisition-related for artifacts such as bias field, differing voxel resolutions, as well as intensity drift (non-standardness). 406 3D voxel wise radiomic features were extracted and evaluated in a cross-site setting to determine how reproducible they were within a relatively homogeneous non-tumor tissue region; using 2 different measures of reproducibility: Multivariate Coefficient of Variation and Instability Score. Our results demonstrated that Haralick features were most reproducible between all 4 sites. By comparison, Laws features were among the least reproducible between sites, as well as performing highly variably across their entire parameter space. Similarly, the Gabor feature family demonstrated good cross-site reproducibility, but for certain parameter combinations alone. These trends indicate that despite extensive pre-processing, only a subset of radiomic features and associated parameters may be reproducible enough for use within radiomics-based machine learning classifier schemes.
A deep learning classifier for prediction of pathological complete response to neoadjuvant chemotherapy from baseline breast DCE-MRI
Neoadjuvant chemotherapy (NAC) is routinely used to treat breast tumors before surgery to reduce tumor size and improve outcome. However, no current clinical or imaging metrics can effectively predict before treatment which NAC recipients will achieve pathological complete response (pCR), the absence of residual invasive disease in the breast or lymph nodes following surgical resection. In this work, we developed and applied a convolu- tional neural network (CNN) to predict pCR from pre-treatment dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) scans on a per-voxel basis. In this study, DCE-MRI data for a total of 166 breast cancer pa- tients from the ISPY1 Clinical Trial were split into a training set of 133 patients and a testing set of 33 patients. A CNN consisting of 6 convolutional blocks was trained over 30 epochs. The pre-contrast and post-contrast DCE-MRI phases were considered in isolation and conjunction. A CNN utilizing a combination of both pre- and post-contrast images best distinguished responders, with an AUC of 0.77; 82% of the patients in the testing set were correctly classified based on their treatment response. Within the testing set, the CNN was able to produce probability heatmaps that visualized tumor regions that most strongly predicted therapeutic response. Multi- variate analysis with prognostic clinical variables (age, largest diameter, hormone receptor and HER2 status), revealed that the network was an independent predictor of response (p=0.05), and that the inclusion of HER2 status could further improve capability to predict response (AUC = 0.85, accuracy = 85%).
Brain I
icon_mobile_dropdown
Deep learning and texture-based semantic label fusion for brain tumor segmentation
Brain tumor segmentation is a fundamental step in surgical treatment and therapy. Many hand-crafted and learning based methods have been proposed for automatic brain tumor segmentation from MRI. Studies have shown that these approaches have their inherent advantages and limitations. This work proposes a semantic label fusion algorithm by combining two representative state-of-the-art segmentation algorithms: texture based hand-crafted, and deep learning based methods to obtain robust tumor segmentation. We evaluate the proposed method using publicly available BRATS 2017 brain tumor segmentation challenge dataset. The results show that the proposed method offers improved segmentation by alleviating inherent weaknesses: extensive false positives in texture based method, and the false tumor tissue classification problem in deep learning method, respectively. Furthermore, we investigate the effect of patient’s gender on the segmentation performance using a subset of validation dataset. Note the substantial improvement in brain tumor segmentation performance proposed in this work has recently enabled us to secure the first place by our group in overall patient survival prediction task at the BRATS 2017 challenge.
Quantifying the association between white matter integrity changes and subconcussive head impact exposure from a single season of youth and high school football using 3D convolutional neural networks
Behrouz Saghafi, Gowtham Murugesan, Elizabeth Davenport, et al.
The effect of subconcussive head impact exposure during contact sports, including American football, on brain health is poorly understood particularly in young and adolescent players, who may be more vulnerable to brain injury during periods of rapid brain maturation. This study aims to quantify the association between cumulative effects of head impact exposure from a single season of football on white matter (WM) integrity as measured with diffusion MRI. The study targets football players aged 9-18 years old. All players were imaged pre- and post-season with structural MRI and diffusion tensor MRI (DTI). Fractional Anisotropy (FA) maps, shown to be closely correlated with WM integrity, were computed for each subject, co-registered and subtracted to compute the change in FA per subject. Biomechanical metrics were collected at every practice and game using helmet mounted accelerometers. Each head impact was converted into a risk of concussion, and the risk of concussion-weighted cumulative exposure (RWE) was computed for each player for the season. Athletes with high and low RWE were selected for a two-category classification task. This task was addressed by developing a 3D Convolutional Neural Network (CNN) to automatically classify players into high and low impact exposure groups from the change in FA maps. Using the proposed model, high classification performance, including ROC Area Under Curve score of 85.71% and F1 score of 83.33% was achieved. This work adds to the growing body of evidence for the presence of detectable neuroimaging brain changes in white matter integrity from a single season of contact sports play, even in the absence of a clinically diagnosed concussion.
Single season changes in resting state network power and the connectivity between regions distinguish head impact exposure level in high school and youth football players
Gowtham Murugesan, Behrouz Saghafi, Elizabeth Davenport, et al.
The effect of repetitive sub-concussive head impact exposure in contact sports like American football on brain health is poorly understood, especially in the understudied populations of youth and high school players. These players, aged 9-18 years old may be particularly susceptible to impact exposure as their brains are undergoing rapid maturation. This study helps fill the void by quantifying the association between head impact exposure and functional connectivity, an important aspect of brain health measurable via resting-state fMRI (rs-fMRI). The contributions of this paper are three fold. First, the data from two separate studies (youth and high school) are combined to form a high-powered analysis with 60 players. These players experience head acceleration within overlapping impact exposure making their combination particularly appropriate. Second, multiple features are extracted from rs-fMRI and tested for their association with impact exposure. One type of feature is the power spectral density decomposition of intrinsic, spatially distributed networks extracted via independent components analysis (ICA). Another feature type is the functional connectivity between brain regions known often associated with mild traumatic brain injury (mTBI). Third, multiple supervised machine learning algorithms are evaluated for their stability and predictive accuracy in a low bias, nested cross-validation modeling framework. Each classifier predicts whether a player sustained low or high levels of head impact exposure. The nested cross validation reveals similarly high classification performance across the feature types, and the Support Vector, Extremely randomized trees, and Gradboost classifiers achieve F1-score up to 75%.
Gaussian processes with optimal kernel construction for neuro-degenerative clinical onset prediction
Liane S. Canas, Benjamin Yvernault, David M. Cash, et al.
Gaussian Processes (GP) are a powerful tool to capture the complex time-variations of a dataset. In the context of medical imaging analysis, they allow a robust modelling even in case of highly uncertain or incomplete datasets. Predictions from GP are dependent of the covariance kernel function selected to explain the data variance. To overcome this limitation, we propose a framework to identify the optimal covariance kernel function to model the data.The optimal kernel is defined as a composition of base kernel functions used to identify correlation patterns between data points. Our approach includes a modified version of the Compositional Kernel Learning (CKL) algorithm, in which we score the kernel families using a new energy function that depends both the Bayesian Information Criterion (BIC) and the explained variance score. We applied the proposed framework to model the progression of neurodegenerative diseases over time, in particular the progression of autosomal dominantly-inherited Alzheimer's disease, and use it to predict the time to clinical onset of subjects carrying genetic mutation.
MRI textures as outcome predictor for Gamma Knife radiosurgery on vestibular schwannoma
P. P. J. H. Langenhuizen, M. J. W. Legters, S. Zinger, et al.
Vestibular schwannomas (VS) are benign brain tumors that can be treated with high-precision focused radiation with the Gamma Knife in order to stop tumor growth. Outcome prediction of Gamma Knife radiosurgery (GKRS) treatment can help in determining whether GKRS will be effective on an individual patient basis. However, at present, prognostic factors of tumor control after GKRS for VS are largely unknown, and only clinical factors, such as size of the tumor at treatment and pre-treatment growth rate of the tumor, have been considered thus far. This research aims at outcome prediction of GKRS by means of quantitative texture feature analysis on conventional MRI scans. We compute first-order statistics and features based on gray-level co- occurrence (GLCM) and run-length matrices (RLM), and employ support vector machines and decision trees for classification. In a clinical dataset, consisting of 20 tumors showing treatment failure and 20 tumors exhibiting treatment success, we have discovered that the second-order statistical metrics distilled from GLCM and RLM are suitable for describing texture, but are slightly outperformed by simple first-order statistics, like mean, standard deviation and median. The obtained prediction accuracy is about 85%, but a final choice of the best feature can only be made after performing more extensive analyses on larger datasets. In any case, this work provides suitable texture measures for successful prediction of GKRS treatment outcome for VS.
ADMultiImg: a novel missing modality transfer learning based CAD system for diagnosis of MCI due to AD using incomplete multi-modality imaging data
Xiaonan Liu, Kewei Chen, Teresa Wu, et al.
Alzheimer’s Disease (AD) is the most common cause of dementia and currently has no cure. Treatments targeting early stages of AD such as Mild Cognitive Impairment (MCI) may be most effective to deaccelerate AD, thus attracting increasing attention. However, MCI has substantial heterogeneity in that it can be caused by various underlying conditions, not only AD. To detect MCI due to AD, NIA-AA published updated consensus criteria in 2011, in which the use of multi-modality images was highlighted as one of the most promising methods. It is of great interest to develop a CAD system based on automatic, quantitative analysis of multi-modality images and machine learning algorithms to help physicians more adequately diagnose MCI due to AD. The challenge, however, is that multi-modality images are not universally available for many patients due to cost, access, safety, and lack of consent. We developed a novel Missing Modality Transfer Learning (MMTL) algorithm capable of utilizing whatever imaging modalities are available for an MCI patient to diagnose the patient’s likelihood of MCI due to AD. Furthermore, we integrated MMTL with radiomics steps including image processing, feature extraction, and feature screening, and a post-processing for uncertainty quantification (UQ), and developed a CAD system called “ADMultiImg” to assist clinical diagnosis of MCI due to AD using multi-modality images together with patient demographic and genetic information. Tested on ADNI date, our system can generate a diagnosis with high accuracy even for patients with only partially available image modalities (AUC=0.94), and therefore may have broad clinical utility.
Longitudinal connectome-based predictive modeling for REM sleep behavior disorder from structural brain connectivity
Luca Giancardo, Timothy M. Ellmore, Jessika Suescun, et al.
Methods to identify neuroplasticity patterns in human brains are of the utmost importance in understanding and potentially treating neurodegenerative diseases. Parkinson disease (PD) research will greatly benefit and advance from the discovery of biomarkers to quantify brain changes in the early stages of the disease, a prodromal period when subjects show no obvious clinical symptoms. Diffusion tensor imaging (DTI) allows for an in-vivo estimation of the structural connectome inside the brain and may serve to quantify the degenerative process before the appearance of clinical symptoms. In this work, we introduce a novel strategy to compute longitudinal structural connectomes in the context of a whole-brain data-driven pipeline. In these initial tests, we show that our predictive models are able to distinguish controls from asymptomatic subjects at high risk of developing PD (REM sleep behavior disorder, RBD) with an area under the receiving operating characteristic curve of 0.90 (p<0.001) and a longitudinal dataset of 46 subjects part of the Parkinson’s Progression Markers Initiative. By analyzing the brain connections most relevant for the predictive ability of the best performing model, we find connections that are biologically relevant to the disease.
Musculoskeletal and Skin
icon_mobile_dropdown
Automated synovium segmentation in doppler ultrasound images for rheumatoid arthritis assessment
Pak-Hei Yeung, York-Kiat Tan, Shuoyu Xu
We need better clinical tools to improve monitoring of synovitis, synovial inflammation in the joints, in rheumatoid arthritis (RA) assessment. Given its economical, safe and fast characteristics, ultrasound (US) especially Doppler ultrasound is frequently used. However, manual scoring of synovitis in US images is subjective and prone to observer variations. In this study, we propose a new and robust method for automated synovium segmentation in the commonly affected joints, i.e. metacarpophalangeal (MCP) and metatarsophalangeal (MTP) joints, which would facilitate automation in quantitative RA assessment. The bone contour in the US image is firstly detected based on a modified dynamic programming method, incorporating angular information for detecting curved bone surface and using image fuzzification to identify missing bone structure. K-means clustering is then performed to initialize potential synovium areas by utilizing the identified bone contour as boundary reference. After excluding invalid candidate regions, the final segmented synovium is identified by reconnecting remaining candidate regions using level set evolution. 15 MCP and 15 MTP US images were analyzed in this study. For each image, segmentations by our proposed method as well as two sets of annotations performed by an experienced clinician at different time-points were acquired. Dice’s coefficient is 0.77±0.12 between the two sets of annotations. Similar Dice’s coefficients are achieved between automated segmentation and either the first set of annotations (0.76±0.12) or the second set of annotations (0.75±0.11), with no significant difference (P = 0.77). These results verify that the accuracy of segmentation by our proposed method and by clinician is comparable. Therefore, reliable synovium identification can be made by our proposed method.
Automatic pedicles detection using convolutional neural network in a 3D spine reconstruction from biplanar radiographs
The 3D analysis of the spine deformities (scoliosis) has a high potential in its clinical diagnosis and treatment. In a biplanar radiographs context, a 3D analysis requires a 3D reconstruction from a pair of 2D X-rays. Whether being fully-/semiautomatic or manual, this task is complex because of the noise, the structure superimposition and partial information due to a limited projections number. Being involved in the axial vertebra rotation (AVR), which is a fundamental clinical parameter for scoliosis diagnosis, pedicles are important landmarks for the 3D spine modeling and pre-operative planning. In this paper, we focus on the extension of a fully-automatic 3D spine reconstruction method where the Vertebral Body Centers (VBCs) are automatically detected using Convolutional Neural Network (CNN) and then regularized using a Statistical Shape Model (SSM) framework. In this global process, pedicles are inferred statistically during the SSM regularization. Our contribution is to add a CNN-based regression model for pedicle detection allowing a better pedicle localization and improving the clinical parameters estimation (e.g. AVR, Cobb angle). Having 476 datasets including healthy patients and Adolescent Idiopathic Scoliosis (AIS) cases with different scoliosis grades (Cobb angles up to 116°), we used 380 for training, 48 for testing and 48 for validation. Adding the local CNN-based pedicle detection decreases the mean absolute error of the AVR by 10%. The 3D mean Euclidian distance error between detected pedicles and ground truth decreases by 17% and the maximum error by 19%. Moreover, a general improvement is observed in the 3D spine reconstruction and reflected in lower errors on the Cobb angle estimation.
Fully automated bone mineral density assessment from low-dose chest CT
Shuang Liu, Jessica Gonzalez, Javier Zulueta, et al.
A fully automated system is presented for bone mineral density (BMD) assessment from low-dose chest CT (LDCT). BMD assessment is central in the diagnosis and follow-up therapy monitoring of osteoporosis, which is characterized by low bone density and is estimated to affect 12.3 million US population aged 50 years or older, creating tremendous social and economic burdens. BMD assessment from DXA scans (BMDDXA) is currently the most widely used and gold standard technique for the diagnosis of osteoporosis and bone fracture risk estimation. With the recent large-scale implementation of annual lung cancer screening using LDCT, great potential emerges for the concurrent opportunistic osteoporosis screening. In the presented BMDCT assessment system, each vertebral body is first segmented and labeled with its anatomical name. Various 3D region of interest (ROI) inside the vertebral body are then explored for BMDCT measurements at different vertebral levels. The system was validated using 76 pairs of DXA and LDCT scans of the same subject. Average BMDDXA of L1-L4 was used as the reference standard. Statistically significant (p-value < 0.001) strong correlation is obtained between BMDDXA and BMDCT at all vertebral levels (T1 – L2). A Pearson correlation of 0.857 was achieved between BMDDXA and average BMDCT of T9-T11 by using a 3D ROI taking into account of both trabecular and cortical bone tissue. These encouraging results demonstrate the feasibility of fully automated quantitative BMD assessment and the potential of opportunistic osteoporosis screening with concurrent lung cancer screening using LDCT.
Segmentation of skin lesions in chronic graft versus host disease photographs with fully convolutional networks
Jianing Wang, Fuyao Chen, Laura E. Dellalana, et al.
Chronic graft-versus-host disease (cGVHD) is a frequent and potentially life-threatening complication of allogeneic hematopoietic stem cell transplantation (HCT) and commonly affects the skin, resulting in distressing patient morbidity. The percentage of involved body surface area (BSA) is commonly used for diagnosing and scoring the severity of cGVHD. However, the segmentation of the involved BSA from patient whole body serial photography is challenging because (1) it is difficult to design traditional segmentation method that rely on hand crafted features as the appearance of cGVHD lesions can be drastically different from patient to patient; (2) to the best of our knowledge, currently there is no publicavailable labelled image set of cGVHD skin for training deep networks to segment the involved BSA. In this preliminary study we create a small labelled image set of skin cGVHD, and we explore the possibility to use a fully convolutional neural network (FCN) to segment the skin lesion in the images. We use a commercial stereoscopic Vectra H1 camera (Canfield Scientific) to acquire ~400 3D photographs of 17 cGVHD patients aged between 22 and 72. A rotational data augmentation process is then applied, which rotates the 3D photos through 10 predefined angles, producing one 2D projection image at each position. This results in ~4000 2D images that constitute our cGVHD image set. A FCN model is trained and tested using our images. We show that our method achieves encouraging results for segmenting cGVHD skin lesion in photographic images.
Computer-aided detection of basal cell carcinoma through blood content analysis in dermoscopy images
Pegah Kharazmi, Sunil Kalia, Harvey Lui, et al.
Basal cell carcinoma (BCC) is the most common type of skin cancer, which is highly damaging to the skin at its advanced stages and causes huge costs on the healthcare system. However, most types of BCC are easily curable if detected at early stage. Due to limited access to dermatologists and expert physicians, non-invasive computer-aided diagnosis is a viable option for skin cancer screening. A clinical biomarker of cancerous tumors is increased vascularization and excess blood flow. In this paper, we present a computer-aided technique to differentiate cancerous skin tumors from benign lesions based on vascular characteristics of the lesions. Dermoscopy image of the lesion is first decomposed using independent component analysis of the RGB channels to derive melanin and hemoglobin maps. A novel set of clinically inspired features and ratiometric measurements are then extracted from each map to characterize the vascular properties and blood content of the lesion. The feature set is then fed into a random forest classifier. Over a dataset of 664 skin lesions, the proposed method achieved an area under ROC curve of 0.832 in a 10-fold cross validation for differentiating basal cell carcinomas from benign lesions.
Breast I
icon_mobile_dropdown
Improving performance of breast cancer risk prediction using a new CAD-based region segmentation scheme
Morteza Heidari, Abolfazl Zargari Khuzani, Gopichandh Danala, et al.
Objective of this study is to develop and test a new computer-aided detection (CAD) scheme with improved region of interest (ROI) segmentation combined with an image feature extraction framework to improve performance in predicting short-term breast cancer risk. A dataset involving 570 sets of "prior" negative mammography screening cases was retrospectively assembled. In the next sequential "current" screening, 285 cases were positive and 285 cases remained negative. A CAD scheme was applied to all 570 "prior" negative images to stratify cases into the high and low risk case group of having cancer detected in the "current" screening. First, a new ROI segmentation algorithm was used to automatically remove useless area of mammograms. Second, from the matched bilateral craniocaudal view images, a set of 43 image features related to frequency characteristics of ROIs were initially computed from the discrete cosine transform and spatial domain of the images. Third, a support vector machine model based machine learning classifier was used to optimally classify the selected optimal image features to build a CAD-based risk prediction model. The classifier was trained using a leave-one-case-out based cross-validation method. Applying this improved CAD scheme to the testing dataset, an area under ROC curve, AUC = 0.70±0.04, which was significantly higher than using the extracting features directly from the dataset without the improved ROI segmentation step (AUC = 0.63±0.04). This study demonstrated that the proposed approach could improve accuracy on predicting short-term breast cancer risk, which may play an important role in helping eventually establish an optimal personalized breast cancer paradigm.
Cross-domain and multi-task transfer learning of deep convolutional neural network for breast cancer diagnosis in digital breast tomosynthesis
We propose a cross-domain, multi-task transfer learning framework to transfer knowledge learned from non-medical images by a deep convolutional neural network (DCNN) to medical image recognition task while improving the generalization by multi-task learning of auxiliary tasks. A first stage cross-domain transfer learning was initiated from ImageNet trained DCNN to mammography trained DCNN. 19,632 regions-of-interest (ROI) from 2,454 mass lesions were collected from two imaging modalities: digitized-screen film mammography (SFM) and full-field digital mammography (DM), and split into training and test sets. In the multi-task transfer learning, the DCNN learned the mass classification task simultaneously from the training set of SFM and DM. The best transfer network for mammography was selected from three transfer networks with different number of convolutional layers frozen. The performance of single-task and multitask transfer learning on an independent SFM test set in terms of the area under the receiver operating characteristic curve (AUC) was 0.78±0.02 and 0.82±0.02, respectively. In the second stage cross-domain transfer learning, a set of 12,680 ROIs from 317 mass lesions on DBT were split into validation and independent test sets. We first studied the data requirements for the first stage mammography trained DCNN by varying the mammography training data from 1% to 100% and evaluated its learning on the DBT validation set in inference mode. We found that the entire available mammography set provided the best generalization. The DBT validation set was then used to train only the last four fully connected layers, resulting in an AUC of 0.90±0.04 on the independent DBT test set.
Improving classification with forced labeling of other related classes: application to prediction of upstaged ductal carcinoma in situ using mammographic features
Rui Hou, Bibo Shi, Lars J. Grimm, et al.
Predicting whether ductal carcinoma in situ (DCIS) identified at core biopsy contains occult invasive disease is an import task since these “upstaged” cases will affect further treatment planning. Therefore, a prediction model that better classifies pure DCIS and upstaged DCIS can help avoid overtreatment and overdiagnosis. In this work, we propose to improve this classification performance with the aid of two other related classes: Atypical Ductal Hyperplasia (ADH) and Invasive Ductal Carcinoma (IDC). Our data set contains mammograms for 230 cases. Specifically, 66 of them are ADH cases; 99 of them are biopsy-proven DCIS cases, of whom 25 were found to contain invasive disease at the time of definitive surgery. The remaining 65 cases were diagnosed with IDC at core biopsy. Our hypothesis is that knowledge can be transferred from training with the easier and more readily available cases of benign but suspicious ADH versus IDC that is already apparent at initial biopsy. Thus, embedding both ADH and IDC cases to the classifier will improve the performance of distinguishing upstaged DCIS from pure DCIS. We extracted 113 mammographic features based on a radiologist’s annotation of clusters.Our method then added both ADH and IDC cases during training, where ADH were “force labeled” or treated by the classifier as pure DCIS (negative) cases, and IDC were labeled as upstaged DCIS (positive) cases. A logistic regression classifier was built based on the designed training dataset to perform a prediction of whether biopsy-proven DCIS cases contain invasive cancer. The performance was assessed by repeated 5-fold CrossValidation and Receiver Operating Characteristic(ROC) curve analysis. While prediction performance with only training on DCIS dataset had an average AUC of 0.607(%95CI, 0.479-0.721). By adding both ADH and IDC cases for training, we improved the performance to 0.691(95%CI, 0.581-0.801).
Deep learning in breast cancer risk assessment: evaluation of fine-tuned convolutional neural networks on a clinical dataset of FFDMs
We evaluated the potential of deep learning in the assessment of breast cancer risk using convolutional neural networks (CNNs) fine-tuned on full-field digital mammographic (FFDM) images. This study included 456 clinical FFDM cases from two high-risk datasets: BRCA1/2 gene-mutation carriers (53 cases) and unilateral cancer patients (75 cases), and a low-risk dataset as the control group (328 cases). All FFDM images (12-bit quantization and 100 micron pixel) were acquired with a GE Senographe 2000D system and were retrospectively collected under an IRB-approved, HIPAA-compliant protocol. Regions of interest of 256x256 pixels were selected from the central breast region behind the nipple in the craniocaudal projection. VGG19 pre-trained on the ImageNet dataset was used to classify the images either as high-risk or as low-risk subjects. The last fully-connected layer of pre-trained VGG19 was fine-tuned on FFDM images for breast cancer risk assessment. Performance was evaluated using the area under the receiver operating characteristic (ROC) curve (AUC) in the task of distinguishing between high-risk and low-risk subjects. AUC values of 0.84 (SE=0.05) and 0.72 (SE=0.06) were obtained in the task of distinguishing between the BRCA1/2 gene-mutation carriers and low-risk women and between unilateral cancer patients and low-risk women, respectively. Deep learning with CNNs appears to be able to extract parenchymal characteristics directly from FFDMs which are relevant to the task of distinguishing between cancer risk populations, and therefore has potential to aid clinicians in assessing mammographic parenchymal patterns for cancer risk assessment.
Transfer learning with convolutional neural networks for lesion classification on clinical breast tomosynthesis
With growing adoption of digital breast tomosynthesis (DBT) in breast cancer screening protocols, it is important to compare the performance of computer-aided diagnosis (CAD) in the diagnosis of breast lesions on DBT images compared to conventional full-field digital mammography (FFDM). In this study, we retrospectively collected FFDM and DBT images of 78 lesions from 76 patients, each containing lesions that were biopsy-proven as either malignant or benign. A square region of interest (ROI) was placed to fully cover the lesion on each FFDM, DBT synthesized 2D images, and DBT key slice images in the cranial-caudal (CC) and mediolateral-oblique (MLO) views. Features were extracted on each ROI using a pre-trained convolutional neural network (CNN). These features were then input to a support vector machine (SVM) classifier, and area under the ROC curve (AUC) was used as the figure of merit. We found that in both the CC view and MLO view, the synthesized 2D image performed best (AUC = 0.814, AUC = 0.881 respectively) in the task of lesion characterization. Small database size was a key limitation in this study, and could lead to overfitting in the application of the SVM classifier. In future work, we plan to expand this dataset and to explore more robust deep learning methodology such as fine-tuning.
Breast tumor segmentation in DCE-MRI using fully convolutional networks with an application in radiogenomics
Jun Zhang, Ashirbani Saha, Zhe Zhu, et al.
Breast tumor segmentation based on dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) remains an active as well as a challenging problem. Previous studies often rely on manual annotation for tumor regions, which is not only time-consuming but also error-prone. Recent studies have shown high promise of deep learning-based methods in various segmentation problems. However, these methods are usually faced with the challenge of limited number (e.g., tens or hundreds) of medical images for training, leading to sub-optimal segmentation performance. Also, previous methods cannot efficiently deal with prevalent class-imbalance problems in tumor segmentation, where the number of voxels in tumor regions is much lower than that in the background area. To address these issues, in this study, we propose a mask-guided hierarchical learning (MHL) framework for breast tumor segmentation via fully convolutional networks (FCN). Our strategy is first decomposing the original difficult problem into several sub-problems and then solving these relatively simpler sub-problems in a hierarchical manner. To precisely identify locations of tumors that underwent a biopsy, we further propose an FCN model to detect two landmarks defined on nipples. Finally, based on both segmentation probability maps and our identified landmarks, we proposed to select biopsied tumors from all detected tumors via a tumor selection strategy using the pathology location. We validate our MHL method using data for 272 patients, and achieve a mean Dice similarity coefficient (DSC) of 0.72 in breast tumor segmentation. Finally, in a radiogenomic analysis, we show that a previously developed image features show a comparable performance for identifying luminal A subtype when applied to the automatic segmentation and a semi-manual segmentation demonstrating a high promise for fully automated radiogenomic analysis in breast cancer.
Cardiac, Vessels, and Novel Applications
icon_mobile_dropdown
A quality score for coronary artery tree extraction results
Coronary artery trees (CATs) are often extracted to aid the fully automatic analysis of coronary artery disease on coronary computed tomography angiography (CCTA) images. Automatically extracted CATs often miss some arteries or include wrong extractions which require manual corrections before performing successive steps. For analyzing a large number of datasets, a manual quality check of the extraction results is time-consuming. This paper presents a method to automatically calculate quality scores for extracted CATs in terms of clinical significance of the extracted arteries and the completeness of the extracted CAT. Both right dominant (RD) and left dominant (LD) anatomical statistical models are generated and exploited in developing the quality score. To automatically determine which model should be used, a dominance type detection method is also designed. Experiments are performed on the automatically extracted and manually refined CATs from 42 datasets to evaluate the proposed quality score. In 39 (92.9%) cases, the proposed method is able to measure the quality of the manually refined CATs with higher scores than the automatically extracted CATs. In a 100-point scale system, the average scores for automatically and manually refined CATs are 82.0 (±15.8) and 88.9 (±5.4) respectively. The proposed quality score will assist the automatic processing of the CAT extractions for large cohorts which contain both RD and LD cases. To the best of our knowledge, this is the first time that a general quality score for an extracted CAT is presented.
Aortic root segmentation in 4D transesophageal echocardiography
Shubham Chechani, Rahul Suresh, Kedar A. Patwardhan
The Aortic Valve (AV) is an important anatomical structure which lies on the left side of the human heart. The AV regulates the flow of oxygenated blood from the Left Ventricle (LV) to the rest of the body through aorta. Pathologies associated with the AV manifest themselves in structural and functional abnormalities of the valve. Clinical management of pathologies often requires repair, reconstruction or even replacement of the valve through surgical intervention. Assessment of these pathologies as well as determination of specific intervention procedure requires quantitative evaluation of the valvular anatomy. 4D (3D + t) Transesophageal Echocardiography (TEE) is a widely used imaging technique that clinicians use for quantitative assessment of cardiac structures. However, manual quantification of 3D structures is complex, time consuming and suffers from inter-observer variability. Towards this goal, we present a semiautomated approach for segmentation of the aortic root (AR) structure. Our approach requires user-initialized landmarks in two reference frames to provide AR segmentation for full cardiac cycle. We use ‘coarse-to-fine’ B-spline Explicit Active Surface (BEAS) for AR segmentation and Masked Normalized Cross Correlation (NCC) method for AR tracking. Our method results in approximately 0.51 mm average localization error in comparison with ground truth annotation performed by clinical experts on 10 real patient cases (139 3D volumes).
Automated assessment of aortic and main pulmonary arterial diameters using model-based blood vessel segmentation for predicting chronic thromboembolic pulmonary hypertension in low-dose CT lung screening
Chronic thromboembolic pulmonary hypertension (CTEPH) is characterized by obstruction of the pulmonary vasculature by residual organized thrombi. A morphological abnormality inside mediastinum of CTEPH patient is enlargement of pulmonary artery. This paper presents an automated assessment of aortic and main pulmonary arterial diameters for predicting CTEPH in low-dose CT lung screening. The distinctive feature of our method is to segment aorta and main pulmonary artery using both of prior probability and vascular direction which were estimated from mediastinal vascular region using principal curvatures of four-dimensional hyper surface. The method was applied to two datasets, 64 lowdose CT scans of lung cancer screening and 19 normal-dose CT scans of CTEPH patients through the training phase with 121 low-dose CT scans. This paper demonstrates effectiveness of our method for predicting CTEPH in low-dose CT screening.
Computer aided diagnosis system for automated label-free early detection of oral epithelial cancer and dysplasia based on autofluorescence lifetime imaging endoscopy (Conference Presentation)
Javier A. Jo, Shuna Cheng, Rodrigo Cuenca, et al.
Despite the fact that the oral cavity is easily accessible, only ~30% of oral cancers are diagnosed at an early stage, which is the main factor attributed to the low 5-year survival rate (63%) of oral cancer patients. Several screening tools for oral cancer have been commercially available; however, none of them have been demonstrated to have sufficient sensitivity and specificity for early detection of oral cancer and dysplasia. We hypothesized that an array of biochemical and metabolic biomarkers for oral cancer and dysplasia can be quantified by endogenous fluorescence lifetime imaging (FLIM), thus enabling levels of sensitivity and specificity adequate for early detection of oral cancer and dysplasia. Our group has recently developed multispectral FLIM endoscopes to image the oral cavity with unprecedented imaging speed (>2fps). We have also performed an in vivo pilot study, in which endogenous multispectral FLIM images were acquired from clinically suspicious oral lesions of 52 patients undergoing tissue biopsy. The results from this pilot study indicated that mild-dysplasia and early stage oral cancer could be detected from benign lesions using a computed aided diagnosis (CAD) system developed based on biochemical and metabolic biomarkers that could be quantified from endogenous multispectral FLIM images. The diagnostic performance of this novel FLIM clinical tool was estimated using a cross-validation approach, showing levels of sensitivity and specificity >80%, and Area Under the Receiving Operating Curve (RO- AUC) >0.9. Future efforts are focused on developing cost-effective FLIM endoscopes and validating this novel clinical tool in prospective multi-center clinical studies.
Identification and functional characterization of HIV-associated neurocognitive disorders with large-scale Granger causality analysis on resting-state functional MRI
Udaysankar Chockanathan, Adora M. DSouza, Anas Z. Abidin, et al.
Resting-state functional MRI (rs-fMRI), coupled with advanced multivariate time-series analysis methods such as Granger causality, is a promising tool for the development of novel functional connectivity biomarkers of neurologic and psychiatric disease. Recently large-scale Granger causality (lsGC) has been proposed as an alternative to conventional Granger causality (cGC) that extends the scope of robust Granger causal analyses to high-dimensional systems such as the human brain. In this study, lsGC and cGC were comparatively evaluated on their ability to capture neurologic damage associated with HIV-associated neurocognitive disorders (HAND). Functional brain network models were constructed from rs-fMRI data collected from a cohort of HIV+ and HIV− subjects. Graph theoretic properties of the resulting networks were then used to train a support vector machine (SVM) model to predict clinically relevant parameters, such as HIV status and neuropsychometric (NP) scores. For the HIV+/− classification task, lsGC, which yielded a peak area under the receiver operating characteristic curve (AUC) of 0.83, significantly outperformed cGC, which yielded a peak AUC of 0.61, at all parameter settings tested. For the NP score regression task, lsGC, with a minimum mean squared error (MSE) of 0.75, significantly outperformed cGC, with a minimum MSE of 0.84 (p < 0.001, one-tailed paired t-test). These results show that, at optimal parameter settings, lsGC is better able to capture functional brain connectivity correlates of HAND than cGC. However, given the substantial variation in the performance of the two methods at different parameter settings, particularly for the regression task, improved parameter selection criteria are necessary and constitute an area for future research.
Keynote and Eye
icon_mobile_dropdown
Automated segmentation of geographic atrophy using deep convolutional neural networks
Zhihong Hu, Ziyuan Wang, SriniVas R. Sadda
Geographic atrophy (GA) is an end-stage manifestation of the advanced age-related macular degeneration (AMD), the leading cause of blindness and visual impairment in developed nations. Techniques to rapidly and precisely detect and quantify GA would appear to be of critical importance in advancing the understanding of its pathogenesis. In this study, we develop an automated supervised classification system using deep convolutional neural networks (CNNs) for segmenting GA in fundus autofluorescene (FAF) images. More specifically, to enhance the contrast of GA relative to the background, we apply the contrast limited adaptive histogram equalization. Blood vessels may cause GA segmentation errors due to similar intensity level to GA. A tensor-voting technique is performed to identify the blood vessels and a vessel inpainting technique is applied to suppress the GA segmentation errors due to the blood vessels. To handle the large variation of GA lesion sizes, three deep CNNs with three varying sized input image patches are applied. Fifty randomly chosen FAF images are obtained from fifty subjects with GA. The algorithm-defined GA regions are compared with manual delineation by a certified grader. A two-fold cross-validation is applied to evaluate the algorithm performance. The mean segmentation accuracy, true positive rate (i.e. sensitivity), true negative rate (i.e. specificity), positive predictive value, false discovery rate, and overlap ratio, between the algorithm- and manually-defined GA regions are 0.97 ± 0.02, 0.89 ± 0.08, 0.98 ± 0.02, 0.87 ± 0.12, 0.13 ± 0.12, and 0.79 ± 0.12 respectively, demonstrating a high level of agreement.
An affordable and easy-to-use diagnostic method for keratoconus detection using a smartphone
Behnam Askarian, Fatemehsadat Tabei, Amin Askarian, et al.
Recently, smartphones are used for disease diagnosis and healthcare. In this paper, we propose a novel affordable diagnostic method of detecting keratoconus using a smartphone. Keratoconus is usually detected in clinics with ophthalmic devices, which are large, expensive and not portable, and need to be operated by trained technicians. However, our proposed smartphone-based eye disease detection method is small, affordable, portable, and it can be operated by patients in a convenient way. The results show that the proposed keratoconus detection method detects severe, advanced, and moderate keratoconus with accuracies of 93%, 86%, 67%, respectively. Due to its convenience with these accuracies, the proposed keratoconus detection method is expected to be applied in detecting keratoconus at an earlier stage in an affordable way.
Colon and Prostate
icon_mobile_dropdown
Detection of protruding lesion in wireless capsule endoscopy videos of small intestine
Wireless capsule endoscopy (WCE) is a developed revolutionary technology with important clinical benefits. But the huge image data brings a heavy burden to the doctors for locating and diagnosing the lesion images. In this paper, a novel and efficient approach is proposed to help clinicians to detect protruding lesion images in small intestine. First, since there are many possible disturbances such as air bubbles and so on in WCE video frames, which add the difficulty of efficient feature extraction, the color-saliency region detection (CSD) method is developed for extracting the potentially saliency region of interest (SROI). Second, a novel color channels modelling of local binary pattern operator (CCLBP) is proposed to describe WCE images, which combines grayscale and color angle. The CCLBP feature is more robust to variation of illumination and more discriminative for classification. Moreover, support vector machine (SVM) classifier with CCLBP feature is utilized to detect protruding lesion images. Experimental results on real WCE images demonstrate that proposed method has higher accuracy on protruding lesion detection than some art-of-state methods.
Radiomics analysis of DWI data to identify the rectal cancer patients qualified for local excision after neoadjuvant chemoradiotherapy
Zhenchao Tang, Zhenyu Liu, Xiaoyan Zhang, et al.
The Locally advanced rectal cancer (LARC) patients were routinely treated with neoadjuvant chemoradiotherapy (CRT) firstly and received total excision afterwards. While, the LARC patients might relieve to T1N0M0/T0N0M0 stage after the CRT, which would enable the patients be qualified for local excision. However, accurate pathological TNM stage could only be obtained by the pathological examination after surgery. We aimed to conduct a Radiomics analysis of Diffusion weighted Imaging (DWI) data to identify the patients in T1N0M0/T0N0M0 stages before surgery, in hope of providing clinical surgery decision support. 223 routinely treated LARC patients in Beijing Cancer Hospital were enrolled in current study. DWI data and clinical characteristics were collected after CRT. According to the pathological TNM stage, the patients of T1N0M0 and T0N0M0 stages were labelled as 1 and the other patients were labelled as 0. The first 123 patients in chronological order were used as training set, and the rest patients as validation set. 563 image features extracted from the DWI data and clinical characteristics were used as features. Two-sample T test was conducted to pre-select the top 50% discriminating features. Least absolute shrinkage and selection operator (Lasso)-Logistic regression model was conducted to further select features and construct the classification model. Based on the 14 selected image features, the area under the Receiver Operating Characteristic (ROC) curve (AUC) of 0.8781, classification Accuracy (ACC) of 0.8432 were achieved in the training set. In the validation set, AUC of 0.8707, ACC (ACC) of 0.84 were observed.
Development of a computer aided diagnosis model for prostate cancer classification on multi-parametric MRI
R. Alfano, D. Soetemans, G. S. Bauman, et al.
Multi-parametric MRI (mp-MRI) is becoming a standard in contemporary prostate cancer screening and diagnosis, and has shown to aid physicians in cancer detection. It offers many advantages over traditional systematic biopsy, which has shown to have very high clinical false-negative rates of up to 23% at all stages of the disease. However beneficial, mp-MRI is relatively complex to interpret and suffers from inter-observer variability in lesion localization and grading. Computer-aided diagnosis (CAD) systems have been developed as a solution as they have the power to perform deterministic quantitative image analysis. We measured the accuracy of such a system validated using accurately co-registered whole-mount digitized histology. We trained a logistic linear classifier (LOGLC), support vector machine (SVC), k-nearest neighbour (KNN) and random forest classifier (RFC) in a four part ROI based experiment against: 1) cancer vs. non-cancer, 2) high-grade (Gleason score ≥4+3) vs. low-grade cancer (Gleason score <4+3), 3) high-grade vs. other tissue components and 4) high-grade vs. benign tissue by selecting the classifier with the highest AUC using 1-10 features from forward feature selection. The CAD model was able to classify malignant vs. benign tissue and detect high-grade cancer with high accuracy. Once fully validated, this work will form the basis for a tool that enhances the radiologist’s ability to detect malignancies, potentially improving biopsy guidance, treatment selection, and focal therapy for prostate cancer patients, maximizing the potential for cure and increasing quality of life.
Cascade classification of endocytoscopic images of colorectal lesions for automated pathological diagnosis
This paper presents a new classification method for endocytoscopic images. Endocytoscopy is a new endoscope that enables us to perform conventional endoscopic observation and ultramagnified observation of cell level. This ultramagnified views (endocytoscopic images) make possible to perform pathological diagnosis only on endo-scopic views of polyps during colonoscopy. However, endocytoscopic image diagnosis requires higher experiences for physicians. An automated pathological diagnosis system is required to prevent the overlooking of neoplastic lesions in endocytoscopy. For this purpose, we propose a new automated endocytoscopic image classification method that classifies neoplastic and non-neoplastic endocytoscopic images. This method consists of two classification steps. At the first step, we classify an input image by support vector machine. We forward the image to the second step if the confidence of the first classification is low. At the second step, we classify the forwarded image by convolutional neural network. We reject the input image if the confidence of the second classification is also low. We experimentally evaluate the classification performance of the proposed method. In this experiment, we use about 16,000 and 4,000 colorectal endocytoscopic images as training and test data, respectively. The results show that the proposed method achieves high sensitivity 93.4% with small rejection rate 9.3% even for difficult test data.
A new fractional order derivative based active contour model for colon wall segmentation
Segmentation of colon wall plays an important role in advancing computed tomographic colonography (CTC) toward a screening modality. Due to the low contrast of CT attenuation around colon wall, accurate segmentation of the boundary of both inner and outer wall is very challenging. In this paper, based on the geodesic active contour model, we develop a new model for colon wall segmentation. First, tagged materials in CTC images were automatically removed via a partial volume (PV) based electronic colon cleansing (ECC) strategy. We then present a new fractional order derivative based active contour model to segment the volumetric colon wall from the cleansed CTC images. In this model, the regionbased Chan-Vese model is incorporated as an energy term to the whole model so that not only edge/gradient information but also region/volume information is taken into account in the segmentation process. Furthermore, a fractional order differentiation derivative energy term is also developed in the new model to preserve the low frequency information and improve the noise immunity of the new segmentation model. The proposed colon wall segmentation approach was validated on 16 patient CTC scans. Experimental results indicate that the present scheme is very promising towards automatically segmenting colon wall, thus facilitating computer aided detection of initial colonic polyp candidates via CTC.
Detection of colorectal masses in CT colonography: application of deep residual networks for differentiating masses from normal colon anatomy
Even though the clinical consequences of a missed colorectal cancer far outweigh those of a missed polyp, there has been little work on computer-aided detection (CADe) for colorectal masses in CT colonography (CTC). One of the problems is that it is not clear how to manually design mathematical image-based features that could be used to differentiate effectively between masses and certain types of normal colon anatomy such as ileocecal valves (ICVs). Deep learning has demonstrated ability to automatically determine effective discriminating features in many image-based problems. Recently, residual networks (ResNets) were developed to address the practical problems of constructing deep network architectures for optimizing the performance of deep learning. In this pilot study, we compared the classification performance of a conventional 2D-convolutional ResNet (2D-ResNet) with that of a volumetric 3D-convolutional ResNet (3D-ResNet) in differentiating masses from normal colon anatomy in CTC. For the development and evaluation of the ResNets, 695 volumetric images of biopsy-proven colorectal masses, ICVs, haustral folds, and rectal tubes were sampled from 196 clinical CTC cases and divided randomly into independent training, validation, and test datasets. The training set was expanded by use of volumetric data augmentation. Our preliminary results on the 140 test samples indicate that it is feasible to train a deep volumetric 3D-ResNet for performing effective image-based discriminations in CTC. The 3D-ResNet slightly outperformed the 2D-ResNet in the discrimination of masses and normal colon anatomy, but the statistical difference between their very high classification accuracies was not significant. The highest classification accuracy was obtained by combining the mass-likelihood estimates of the 2D- and 3D-ResNets, which enabled correct classification of all of the masses.
Head and Neck
icon_mobile_dropdown
Comparison of different deep learning approaches for parotid gland segmentation from CT images
Annika Hänsch, Michael Schwier, Tobias Gass, et al.
The segmentation of target structures and organs at risk is a crucial and very time-consuming step in radiotherapy planning. Good automatic methods can significantly reduce the time clinicians have to spend on this task. Due to its variability in shape and often low contrast to surrounding structures, segmentation of the parotid gland is especially challenging. Motivated by the recent success of deep learning, we study different deep learning approaches for parotid gland segmentation. Particularly, we compare 2D, 2D ensemble and 3D U-Net approaches and find that the 2D U-Net ensemble yields the best results with a mean Dice score of 0.817 on our test data. The ensemble approach reduces false positives without the need for an automatic region of interest detection. We also apply our trained 2D U-Net ensemble to segment the test data of the 2015 MICCAI head and neck auto-segmentation challenge. With a mean Dice score of 0.861, our classifier exceeds the highest mean score in the challenge. This shows that the method generalizes well onto data from independent sites. Since appropriate reference annotations are essential for training but often difficult and expensive to obtain, it is important to know how many samples are needed to properly train a neural network. We evaluate the classifier performance after training with differently sized training sets (50–450) and find that 250 cases (without using extensive data augmentation) are sufficient to obtain good results with the 2D ensemble. Adding more samples does not significantly improve the Dice score of the segmentations.
Detection of eardrum abnormalities using ensemble deep learning approaches
Caglar Senaras, Aaron C. Moberly M.D., Theodoros Teknos M.D., et al.
In this study, we proposed an approach to report the condition of the eardrum as “normal” or “abnormal” by ensembling two different deep learning architectures. In the first network (Network 1), we applied transfer learning to the Inception V3 network by using 409 labeled samples. As a second network (Network 2), we designed a convolutional neural network to take advantage of auto-encoders by using additional 673 unlabeled eardrum samples. The individual classification accuracies of the Network 1 and Network 2 were calculated as 84.4%(± 12.1%) and 82.6% (± 11.3%), respectively. Only 32% of the errors of the two networks were the same, making it possible to combine two approaches to achieve better classification accuracy. The proposed ensemble method allows us to achieve robust classification because it has high accuracy (84.4%) with the lowest standard deviation (± 10.3%).
A convolutional neural network for intracranial hemorrhage detection in non-contrast CT
The assessment of the presence of intracranial hemorrhage is a crucial step in the work-up of patients requiring emergency care. Fast and accurate detection of intracranial hemorrhage can aid treating physicians by not only expediting and guiding diagnosis, but also supporting choices for secondary imaging, treatment and intervention. However, the automatic detection of intracranial hemorrhage is complicated by the variation in appearance on non-contrast CT images as a result of differences in etiology and location. We propose a method using a convolutional neural network (CNN) for the automatic detection of intracranial hemorrhage. The method is trained on a dataset comprised of cerebral CT studies for which the presence of hemorrhage has been labeled for each axial slice. A separate test dataset of 20 images is used for quantitative evaluation and shows a sensitivity of 0.87, specificity of 0.97 and accuracy of 0.95. The average processing time for a single three-dimensional (3D) CT volume was 2.7 seconds. The proposed method is capable of fast and automated detection of intracranial hemorrhages in non-contrast CT without being limited to a specific subtype of pathology.
Deep 3D convolution neural network for CT brain hemorrhage classification
Kamal Jnawali, Mohammad R. Arbabshirani, Navalgund Rao, et al.
Intracranial hemorrhage is a critical conditional with the high mortality rate that is typically diagnosed based on head computer tomography (CT) images. Deep learning algorithms, in particular, convolution neural networks (CNN), are becoming the methodology of choice in medical image analysis for a variety of applications such as computer-aided diagnosis, and segmentation. In this study, we propose a fully automated deep learning framework which learns to detect brain hemorrhage based on cross sectional CT images. The dataset for this work consists of 40,367 3D head CT studies (over 1.5 million 2D images) acquired retrospectively over a decade from multiple radiology facilities at Geisinger Health System. The proposed algorithm first extracts features using 3D CNN and then detects brain hemorrhage using the logistic function as the last layer of the network. Finally, we created an ensemble of three different 3D CNN architectures to improve the classification accuracy. The area under the curve (AUC) of the receiver operator characteristic (ROC) curve of the ensemble of three architectures was 0.87. Their results are very promising considering the fact that the head CT studies were not controlled for slice thickness, scanner type, study protocol or any other settings. Moreover, the proposed algorithm reliably detected various types of hemorrhage within the skull. This work is one of the first applications of 3D CNN trained on a large dataset of cross sectional medical images for detection of a critical radiological condition
Cerebral microbleed detection in traumatic brain injury patients using 3D convolutional neural networks
K. Standvoss, T. Crijns, L. Goerke, et al.
The number and location of cerebral microbleeds (CMBs) in patients with traumatic brain injury (TBI) is important to determine the severity of trauma and may hold prognostic value for patient outcome. However, manual assessment is subjective and time-consuming due to the resemblance of CMBs to blood vessels, the possible presence of imaging artifacts, and the typical heterogeneity of trauma imaging data. In this work, we present a computer aided detection system based on 3D convolutional neural networks for detecting CMBs in 3D susceptibility weighted images. Network architectures with varying depth were evaluated. Data augmentation techniques were employed to improve the networks’ generalization ability and selective sampling was implemented to handle class imbalance. The predictions of the models were clustered using a connected component analysis. The system was trained on ten annotated scans and evaluated on an independent test set of eight scans. Despite this limited data set, the system reached a sensitivity of 0.87 at 16.75 false positives per scan (2.5 false positives per CMB), outperforming related work on CMB detection in TBI patients.
Lung II
icon_mobile_dropdown
Comparing deep learning models for population screening using chest radiography
R. Sivaramakrishnan, Sameer Antani, Sema Candemir, et al.
According to the World Health Organization (WHO), tuberculosis (TB) remains the most deadly infectious disease in the world. In a 2015 global annual TB report, 1.5 million TB related deaths were reported. The conditions worsened in 2016 with 1.7 million reported deaths and more than 10 million people infected with the disease. Analysis of frontal chest X-rays (CXR) is one of the most popular methods for initial TB screening, however, the method is impacted by the lack of experts for screening chest radiographs. Computer-aided diagnosis (CADx) tools have gained significance because they reduce the human burden in screening and diagnosis, particularly in countries that lack substantial radiology services. State-of-the-art CADx software typically is based on machine learning (ML) approaches that use hand-engineered features, demanding expertise in analyzing the input variances and accounting for the changes in size, background, angle, and position of the region of interest (ROI) on the underlying medical imagery. More automatic Deep Learning (DL) tools have demonstrated promising results in a wide range of ML applications. Convolutional Neural Networks (CNN), a class of DL models, have gained research prominence in image classification, detection, and localization tasks because they are highly scalable and deliver superior results with end-to-end feature extraction and classification. In this study, we evaluated the performance of CNN based DL models for population screening using frontal CXRs. The results demonstrate that pre-trained CNNs are a promising feature extracting tool for medical imagery including the automated diagnosis of TB from chest radiographs but emphasize the importance of large data sets for the most accurate classification.
Deep-learning derived features for lung nodule classification with limited datasets
P. Thammasorn, W. Wu, L. A. Pierce, et al.
Only a few percent of indeterminate nodules found in lung CT images are cancer. However, enabling earlier diagnosis is important to avoid invasive procedures or long-time surveillance to those benign nodules. We are evaluating a classification framework using radiomics features derived with a machine learning approach from a small data set of indeterminate CT lung nodule images. We used a retrospective analysis of 194 cases with pulmonary nodules in the CT images with or without contrast enhancement from lung cancer screening clinics. The nodules were contoured by a radiologist and texture features of the lesion were calculated. In addition, sematic features describing shape were categorized. We also explored a Multiband network, a feature derivation path that uses a modified convolutional neural network (CNN) with a Triplet Network. This was trained to create discriminative feature representations useful for variable-sized nodule classification. The diagnostic accuracy was evaluated for multiple machine learning algorithms using texture, shape, and CNN features. In the CT contrast-enhanced group, the texture or semantic shape features yielded an overall diagnostic accuracy of 80%. Use of a standard deep learning network in the framework for feature derivation yielded features that substantially underperformed compared to texture and/or semantic features. However, the proposed Multiband approach of feature derivation produced results similar in diagnostic accuracy to the texture and semantic features. While the Multiband feature derivation approach did not outperform the texture and/or semantic features, its equivalent performance indicates promise for future improvements to increase diagnostic accuracy. Importantly, the Multiband approach adapts readily to different size lesions without interpolation, and performed well with relatively small amount of training data.
Prognostic importance of pleural attachment status measured by pretreatment CT images in patients with stage IA lung adenocarcinoma: measurement of the ratio of the interface between nodule and neighboring pleura to nodule surface area
Y. Kawata, N. Niki, M. Kusumoto, et al.
Screening for lung cancer with low-dose computed tomography (CT) has led to increased recognition of small lung cancers and is expected to increase the rate of detection of early-stage lung cancer. Major concerns in the implementation of the CT screening of large populations include determining the appropriate management of pulmonary nodules found on a scan. The identification of patients with early-stage lung cancer who have a higher risk for relapse and who require more aggressive surveillance has been a target of intense investigation. This study was performed to investigate whether image features of internal intensity in combination with surrounding structure characteristics are associated with an increased risk of relapse in patients with stage IA lung adenocarcinoma. We focused on pleural attachment status which is one of morphological characteristics associated with prognosis in three-dimensional thoracic CT images.
Differentiating invasive and pre-invasive lung cancer by quantitative analysis of histopathologic images
Chuan Zhou, Hongliu Sun, Heang-Ping Chan, et al.
We are developing automated radiopathomics method for diagnosis of lung nodule subtypes. In this study, we investigated the feasibility of using quantitative methods to analyze the tumor nuclei and cytoplasm in pathologic wholeslide images for the classification of pathologic subtypes of invasive nodules and pre-invasive nodules. We developed a multiscale blob detection method with watershed transform (MBD-WT) to segment the tumor cells. Pathomic features were extracted to characterize the size, morphology, sharpness, and gray level variation in each segmented nucleus and the heterogeneity patterns of tumor nuclei and cytoplasm. With permission of the National Lung Screening Trial (NLST) project, a data set containing 90 digital haematoxylin and eosin (HE) whole-slide images from 48 cases was used in this study. The 48 cases contain 77 regions of invasive subtypes and 43 regions of pre-invasive subtypes outlined by a pathologist on the HE images using the pathological tumor region description provided by NLST as reference. A logistic regression model (LRM) was built using leave-one-case-out resampling and receiver operating characteristic (ROC) analysis for classification of invasive and pre-invasive subtypes. With 11 selected features, the LRM achieved a test area under the ROC curve (AUC) value of 0.91±0.03. The results demonstrated that the pathologic invasiveness of lung adenocarcinomas could be categorized with high accuracy using pathomics analysis.
Using YOLO based deep learning network for real time detection and localization of lung nodules from low dose CT scans
Sindhu Ramachandran S., Jose George, Shibon Skaria, et al.
Lung cancer is the leading cause of cancer related deaths in the world. The survival rate can be improved if the presence of lung nodules are detected early. This has also led to more focus being given to computer aided detection (CAD) and diagnosis of lung nodules. The arbitrariness of shape, size and texture of lung nodules is a challenge to be faced when developing these detection systems. In the proposed work we use convolutional neural networks to learn the features for nodule detection, replacing the traditional method of handcrafting features like geometric shape or texture. Our network uses the DetectNet architecture based on YOLO (You Only Look Once) to detect the nodules in CT scans of lung. In this architecture, object detection is treated as a regression problem with a single convolutional network simultaneously predicting multiple bounding boxes and class probabilities for those boxes. By performing training using chest CT scans from Lung Image Database Consortium (LIDC), NVIDIA DIGITS and Caffe deep learning framework, we show that nodule detection using this single neural network can result in reasonably low false positive rates with high sensitivity and precision.
Automated volumetric lung segmentation of thoracic CT images using fully convolutional neural network
Mohammadreza Negahdar, David Beymer, Tanveer Syeda-Mahmood
Deep Learning models such as Convolutional Neural Networks (CNNs) have achieved state-of-the-art performance in 2D medical image analysis. In clinical practice; however, most analyzed and acquired medical data are formed of 3D volumes. In this paper, we present a fast and efficient 3D lung segmentation method based on V-net: a purely volumetric fully CNN. Our model is trained on chest CT images through volume to volume learning, which palliates overfitting problem on limited number of annotated training data. Adopting a pre-processing step and training an objective function based on Dice coefficient addresses the imbalance between the number of lung voxels against that of background. We have leveraged Vnet model by using batch normalization for training which enables us to use higher learning rate and accelerates the training of the model. To address the inadequacy of training data and obtain better robustness, we augment the data applying random linear and non-linear transformations. Experimental results on two challenging medical image data show that our proposed method achieved competitive result with a much faster speed.
Quantitative
icon_mobile_dropdown
Variations in algorithm implementation among quantitative texture analysis software packages
Joseph J. Foy, Prerana Mitta, Lauren R. Nowosatka, et al.
Open-source texture analysis software allows for the advancement of radiomics research. Variations in texture features, however, result from discrepancies in algorithm implementation. Anatomically matched regions of interest (ROIs) that captured normal breast parenchyma were placed in the magnetic resonance images (MRI) of 20 patients at two time points. Six first-order features and six gray-level co-occurrence matrix (GLCM) features were calculated for each ROI using four texture analysis packages. Features were extracted using package-specific default GLCM parameters and using GLCM parameters modified to yield the greatest consistency among packages. Relative change in the value of each feature between time points was calculated for each ROI. Distributions of relative feature value differences were compared across packages. Absolute agreement among feature values was quantified by the intra-class correlation coefficient. Among first-order features, significant differences were found for max, range, and mean, and only kurtosis showed poor agreement. All six second-order features showed significant differences using package-specific default GLCM parameters, and five second-order features showed poor agreement; with modified GLCM parameters, no significant differences among second-order features were found, and all second-order features showed poor agreement. While relative texture change discrepancies existed across packages, these differences were not significant when consistent parameters were used.
Towards quantitative imaging: stability of fully automated nodule segmentation across varied dose levels and reconstruction parameters in a low-dose CT screening patient cohort
Quantitative imaging in lung cancer CT seeks to characterize nodules through quantitative features, usually from a region of interest delineating the nodule. The segmentation, however, can vary depending on segmentation approach and image quality, which can affect the extracted feature values. In this study, we utilize a fully-automated nodule segmentation method – to avoid reader-influenced inconsistencies – to explore the effects of varied dose levels and reconstruction parameters on segmentation.

Raw projection CT images from a low-dose screening patient cohort (N=59) were reconstructed at multiple dose levels (100%, 50%, 25%, 10%), two slice thicknesses (1.0mm, 0.6mm), and a medium kernel. Fully-automated nodule detection and segmentation was then applied, from which 12 nodules were selected. Dice similarity coefficient (DSC) was used to assess the similarity of the segmentation ROIs of the same nodule across different reconstruction and dose conditions.

Nodules at 1.0mm slice thickness and dose levels of 25% and 50% resulted in DSC values greater than 0.85 when compared to 100% dose, with lower dose leading to a lower average and wider spread of DSC values. At 0.6mm, the increased bias and wider spread of DSC values from lowering dose were more pronounced. The effects of dose reduction on DSC for CAD-segmented nodules were similar in magnitude to reducing the slice thickness from 1.0mm to 0.6mm. In conclusion, variation of dose and slice thickness can result in very different segmentations because of noise and image quality. However, there exists some stability in segmentation overlap, as even at 1mm, an image with 25% of the lowdose scan still results in segmentations similar to that seen in a full-dose scan.
Anomaly detection for medical images based on a one-class classification
Detecting an anomaly such as a malignant tumor or a nodule from medical images including mammogram, CT or PET images is still an ongoing research problem drawing a lot of attention with applications in medical diagnosis. A conventional way to address this is to learn a discriminative model using training datasets of negative and positive samples. The learned model can be used to classify a testing sample into a positive or negative class. However, in medical applications, the high unbalance between negative and positive samples poses a difficulty for learning algorithms, as they will be biased towards the majority group, i.e., the negative one. To address this imbalanced data issue as well as leverage the huge amount of negative samples, i.e., normal medical images, we propose to learn an unsupervised model to characterize the negative class. To make the learned model more flexible and extendable for medical images of different scales, we have designed an autoencoder based on a deep neural network to characterize the negative patches decomposed from large medical images. A testing image is decomposed into patches and then fed into the learned autoencoder to reconstruct these patches themselves. The reconstruction error of one patch is used to classify this patch into a binary class, i.e., a positive or a negative one, leading to a one-class classifier. The positive patches highlight the suspicious areas containing anomalies in a large medical image. The proposed method has been tested on InBreast dataset and achieves an AUC of 0.84.

The main contribution of our work can be summarized as follows. 1) The proposed one-class learning requires only data from one class, i.e., the negative data; 2) The patch-based learning makes the proposed method scalable to images of different sizes and helps avoid the large scale problem for medical images; 3) The training of the proposed deep convolutional neural network (DCNN) based auto-encoder is fast and stable.
Quantitative CT analysis for the preoperative prediction of pathologic grade in pancreatic neuroendocrine tumors
Pancreatic neuroendocrine tumors (PanNETs) account for approximately 5% of all pancreatic tumors, affecting one individual per million each year.1 PanNETs are difficult to treat due to biological variability from benign to highly malignant, indolent to very aggressive. The World Health Organization classifies PanNETs into three categories based on cell proliferative rate, usually detected using the Ki67 index and cell morphology: low-grade (G1), intermediate-grade (G2) and high-grade (G3) tumors. Knowledge of grade prior to treatment would select patients for optimal therapy: G1/G2 tumors respond well to somatostatin analogs and targeted or cytotoxic drugs whereas G3 tumors would be targeted with platinum or alkylating agents.2, 3 Grade assessment is based on the pathologic examination of the surgical specimen, biopsy or ne-needle aspiration; however, heterogeneity in the proliferative index can lead to sampling errors.4 Based on studies relating qualitatively assessed shape and enhancement characteristics on CT imaging to tumor grade in PanNET,5 we propose objective classification of PanNET grade with quantitative analysis of CT images. Fifty-five patients were included in our retrospective analysis. A pathologist graded the tumors. Texture and shape-based features were extracted from CT. Random forest and naive Bayes classifiers were compared for the classification of G1/G2 and G3 PanNETs. The best area under the receiver operating characteristic curve (AUC) of 0:74 and accuracy of 71:64% was achieved with texture features. The shape-based features achieved an AUC of 0:70 and accuracy of 78:73%.
Quantitative image feature variability amongst CT scanners with a controlled scan protocol
Rachel B. Ger, Shouhao Zhou, Pai-Chun Melinda Chi, et al.
Radiomics studies often analyze patient computed tomography (CT) images acquired from different CT scanners. This may result in differences in imaging parameters, e.g. different manufacturers, different acquisition protocols, etc. However, quantifiable differences in radiomics features can occur based on acquisition parameters. A controlled protocol may allow for minimization of these effects, thus allowing for larger patient cohorts from many different CT scanners. In order to test radiomics feature variability across different CT scanners a radiomics phantom was developed with six different cartridges encased in high density polystyrene. A harmonized protocol was developed to control for tube voltage, tube current, scan type, pitch, CTDIvol, convolution kernel, display field of view, and slice thickness across different manufacturers. The radiomics phantom was imaged on 18 scanners using the control protocol. A linear mixed effects model was created to assess the impact of inter-scanner variability with decomposition of feature variation between scanners and cartridge materials. The inter-scanner variability was compared to the residual variability (the unexplained variability) and to the inter-patient variability using two different patient cohorts. The patient cohorts consisted of 20 non-small cell lung cancer (NSCLC) and 30 head and neck squamous cell carcinoma (HNSCC) patients. The inter-scanner standard deviation was at least half of the residual standard deviation for 36 of 49 quantitative image features. The ratio of inter-scanner to patient coefficient of variation was above 0.2 for 22 and 28 of the 49 features for NSCLC and HNSCC patients, respectively. Inter-scanner variability was a significant factor compared to patient variation in this small study for many of the features. Further analysis with a larger cohort will allow more thorough analysis with additional variables in the model to truly isolate the interscanner difference.
Brain II
icon_mobile_dropdown
A primitive study on unsupervised anomaly detection with an autoencoder in emergency head CT volumes
Daisuke Sato M.D., Shouhei Hanaoka M.D., Yukihiro Nomura, et al.
Purpose: The target disorders of emergency head CT are wide-ranging. Therefore, people working in an emergency department desire a computer-aided detection system for general disorders. In this study, we proposed an unsupervised anomaly detection method in emergency head CT using an autoencoder and evaluated the anomaly detection performance of our method in emergency head CT. Methods: We used a 3D convolutional autoencoder (3D-CAE), which contains 11 layers in the convolution block and 6 layers in the deconvolution block. In the training phase, we trained the 3D-CAE using 10,000 3D patches extracted from 50 normal cases. In the test phase, we calculated abnormalities of each voxel in 38 emergency head CT volumes (22 abnormal cases and 16 normal cases) for evaluation and evaluated the likelihood of lesion existence. Results: Our method achieved a sensitivity of 68% and a specificity of 88%, with an area under the curve of the receiver operating characteristic curve of 0.87. It shows that this method has a moderate accuracy to distinguish normal CT cases to abnormal ones. Conclusion: Our method has potentialities for anomaly detection in emergency head CT.
Artery and vein segmentation of the cerebral vasculature in 4D CT using a 3D fully convolutional neural network
Midas Meijs, Rashindra Manniesing
Segmentation of the arteries and veins of the cerebral vasculature is important for improved visualization and for the detection of vascular related pathologies including arteriovenous malformations. We propose a 3D fully convolutional neural network (CNN) using a time-to-signal image as input and the distance to the center of gravity of the brain as spatial feature integrated in the final layers of the CNN. The method was trained and validated on 6 and tested on 4 4D CT patient imaging data. The reference standard was acquired by manual annotations by an experienced observer. Quantitative evaluation showed a mean Dice similarity coefficient of 0.94 ± 0.03 and 0.97 ± 0.01, a mean absolute volume difference of 4.36 ± 5.47 % and 1.79 ± 2.26 % for artery and vein respectively and an overall accuracy of 0.96 ± 0.02. The average calculation time per volume on the test set was approximately one minute. Our method shows promising results and enables fast and accurate segmentation of arteries and veins in full 4D CT imaging data.
Detection of brain tumor margins using optical coherence tomography
Ronald M. Juarez-Chambi, Carmen Kut, Jesus Rico-Jimenez, et al.
In brain cancer surgery, it is critical to achieve extensive resection without compromising adjacent healthy, non-cancerous regions. Various technological advances have made major contributions in imaging, including intraoperative magnetic imaging (MRI) and computed tomography (CT). However, these technologies have pros and cons in providing quantitative, real-time and three-dimensional (3D) continuous guidance in brain cancer detection. Optical Coherence Tomography (OCT) is a non-invasive, label-free, cost-effective technique capable of imaging tissue in three dimensions and real time. The purpose of this study is to reliably and efficiently discriminate between non-cancer and cancer-infiltrated brain regions using OCT images. To this end, a mathematical model for quantitative evaluation known as the Blind End- Member and Abundances Extraction method (BEAE). This BEAE method is a constrained optimization technique which extracts spatial information from volumetric OCT images. Using this novel method, we are able to discriminate between cancerous and non-cancerous tissues and using logistic regression as a classifier for automatic brain tumor margin detection. Using this technique, we are able to achieve excellent performance using an extensive cross-validation of the training dataset (sensitivity 92.91% and specificity 98.15%) and again using an independent, blinded validation dataset (sensitivity 92.91% and specificity 86.36%). In summary, BEAE is well-suited to differentiate brain tissue which could support the guiding surgery process for tissue resection.
Classification of brain MRI with big data and deep 3D convolutional neural networks
Viktor Wegmayr, Sai Aitharaju, Joachim Buhmann
Our ever-aging society faces the growing problem of neurodegenerative diseases, in particular dementia. Magnetic Resonance Imaging provides a unique tool for non-invasive investigation of these brain diseases. However, it is extremely difficult for neurologists to identify complex disease patterns from large amounts of three-dimensional images. In contrast, machine learning excels at automatic pattern recognition from large amounts of data. In particular, deep learning has achieved impressive results in image classification. Unfortunately, its application to medical image classification remains difficult. We consider two reasons for this difficulty: First, volumetric medical image data is considerably scarcer than natural images. Second, the complexity of 3D medical images is much higher compared to common 2D images. To address the problem of small data set size, we assemble the largest dataset ever used for training a deep 3D convolutional neural network to classify brain images as healthy (HC), mild cognitive impairment (MCI) or Alzheimers disease (AD). We use more than 20.000 images from subjects of these three classes, which is almost 9x the size of the previously largest data set. The problem of high dimensionality is addressed by using a deep 3D convolutional neural network, which is state-of-the-art in large-scale image classification. We exploit its ability to process the images directly, only with standard preprocessing, but without the need for elaborate feature engineering. Compared to other work, our workflow is considerably simpler, which increases clinical applicability. Accuracy is measured on the ADNI+AIBL data sets, and the independent CADDementia benchmark.
Saliency U-Net: A regional saliency map-driven hybrid deep learning network for anomaly segmentation
Alex Karargyros, Tanveer Syeda-Mahmood
Deep learning networks are gaining popularity in many medical image analysis tasks due to their generalized ability to automatically extract relevant features from raw images. However, this can make the learning problem unnecessarily harder requiring network architectures of high complexity. In case of anomaly detection, in particular, there is often sufficient regional difference between the anomaly and the surrounding parenchyma that could be easily highlighted through bottom-up saliency operators. In this paper we propose a new hybrid deep learning network using a combination of raw image and such regional maps to more accurately learn the anomalies using simpler network architectures. Specifically, we modify a deep learning network called U-Net using both the raw and pre-segmented images as input to produce joint encoding (contraction) and expansion paths (decoding) in the U-Net. We present results of successfully delineating subdural and epidural hematomas in brain CT imaging and liver hemangioma in abdominal CT images using such network.
Radiation-free quantification of head malformations in craniosynostosis patients from 3D photography
Liyun Tu, Antonio R. Porras, Albert Oh, et al.
The evaluation of cranial malformations plays an essential role both in the early diagnosis and in the decision to perform surgical treatment for craniosynostosis. In clinical practice, both cranial shape and suture fusion are evaluated using CT images, which involve the use of harmful radiation on children. Three-dimensional (3D) photography offers noninvasive, radiation-free, and anesthetic-free evaluation of craniofacial morphology. The aim of this study is to develop an automated framework to objectively quantify cranial malformations in patients with craniosynostosis from 3D photography. We propose a new method that automatically extracts the cranial shape by identifying a set of landmarks from a 3D photograph. Specifically, it registers the 3D photograph of a patient to a reference template in which the position of the landmarks is known. Then, the method finds the closest cranial shape to that of the patient from a normative statistical shape multi-atlas built from 3D photographs of healthy cases, and uses it to quantify objectively cranial malformations. We calculated the cranial malformations on 17 craniosynostosis patients and we compared them with the malformations of the normative population used to build the multi-atlas. The average malformations of the craniosynostosis cases were 2.68 ± 0.75 mm, which is significantly higher (p<0.001) than the average malformations of 1.70 ± 0.41 mm obtained from the normative cases. Our approach can support the quantitative assessment of surgical procedures for cranial vault reconstruction without exposing pediatric patients to harmful radiation.
Other Organs
icon_mobile_dropdown
Bladder cancer treatment response assessment in CT urography using two-channel deep-learning network
We are developing a CAD system for bladder cancer treatment response assessment in CT. We trained a 2- Channel Deep-learning Convolution Neural Network (2Ch-DCNN) to identify responders (T0 disease) and nonresponders to chemotherapy. The 87 lesions from 82 cases generated 18,600 training paired ROIs that were extracted from segmented bladder lesions in the pre- and post-treatment CT scans and partitioned for 2-fold cross validation. The paired ROIs were input to two parallel channels of the 2Ch-DCNN. We compared the 2Ch-DCNN with our hybrid prepost- treatment ROI DCNN method and the assessments by 2 experienced abdominal radiologists. The radiologist estimated the likelihood of stage T0 after viewing each pre-post-treatment CT pair. Receiver operating characteristic analysis was performed and the area under the curve (AUC) and the partial AUC at sensitivity <90% (AUC0.9) were compared. The test AUCs were 0.76±0.07 and 0.75±0.07 for the 2 partitions, respectively, for the 2Ch-DCNN, and were 0.75±0.08 and 0.75±0.07 for the hybrid ROI method. The AUCs for Radiologist 1 were 0.67±0.09 and 0.75±0.07 for the 2 partitions, respectively, and were 0.79±0.07 and 0.70±0.09 for Radiologist 2. For the 2Ch-DCNN, the AUC0.9s were 0.43 and 0.39 for the 2 partitions, respectively, and were 0.19 and 0.28 for the hybrid ROI method. For Radiologist 1, the AUC0.9s were 0.14 and 0.34 for partition 1 and 2, respectively, and were 0.33 and 0.23 for Radiologist 2. Our study demonstrated the feasibility of using a 2Ch-DCNN for the estimation of bladder cancer treatment response in CT.
Automated detection and segmentation of follicles in 3D ultrasound for assisted reproduction
Follicle quantification refers to the computation of the number and size of follicles in 3D ultrasound volumes of the ovary. This is one of the key factors in determining hormonal dosage during female infertility treatments. In this paper, we propose an automated algorithm to detect and segment follicles in 3D ultrasound volumes of the ovary for quantification. In a first of its kind attempt, we employ noise-robust phase symmetry feature maps as likelihood function to perform mean-shift based follicle center detection. Max-flow algorithm is used for segmentation and gray weighted distance transform is employed for post-processing the results. We have obtained state-of-the-art results with a true positive detection rate of >90% on 26 3D volumes with 323 follicles.
Comparison of machine learned approaches for thyroid nodule characterization from shear wave elastography images
Carina Pereira, Manjiri Dighe M.D., Adam M. Alessio
Various Computer Aided Diagnosis (CAD) systems have been developed that characterize thyroid nodules using the features extracted from the B-mode ultrasound images and Shear Wave Elastography images (SWE). These features, however, are not perfect predictors of malignancy. In other domains, deep learning techniques such as Convolutional Neural Networks (CNNs) have outperformed conventional feature extraction based machine learning approaches. In general, fully trained CNNs require substantial volumes of data, motivating several efforts to use transfer learning with pre-trained CNNs. In this context, we sought to compare the performance of conventional feature extraction, fully trained CNNs, and transfer learning based, pre-trained CNNs for the detection of thyroid malignancy from ultrasound images. We compared these approaches applied to a data set of 964 B-mode and SWE images from 165 patients. The data were divided into 80% training/validation and 20% testing data. The highest accuracies achieved on the testing data for the conventional feature extraction, fully trained CNN, and pre-trained CNN were 0.80, 0.75, and 0.83 respectively. In this application, classification using a pre-trained network yielded the best performance, potentially due to the relatively limited sample size and sub-optimal architecture for the fully trained CNN.
Bladder cancer treatment response assessment with radiomic, clinical, and radiologist semantic features
Marshall N. Gordon, Kenny H. Cha, Lubomir M. Hadjiiski, et al.
We are developing a decision support system for assisting clinicians in assessment of response to neoadjuvant chemotherapy for bladder cancer. Accurate treatment response assessment is crucial for identifying responders and improving quality of life for non-responders. An objective machine learning decision support system may help reduce variability and inaccuracy in treatment response assessment. We developed a predictive model to assess the likelihood that a patient will respond based on image and clinical features. With IRB approval, we retrospectively collected a data set of pre- and post- treatment CT scans along with clinical information from surgical pathology from 98 patients. A linear discriminant analysis (LDA) classifier was used to predict the likelihood that a patient would respond to treatment based on radiomic features extracted from CT urography (CTU), a radiologist’s semantic feature, and a clinical feature extracted from surgical and pathology reports. The classification accuracy was evaluated using the area under the ROC curve (AUC) with a leave-one-case-out cross validation. The classification accuracy was compared for the systems based on radiomic features, clinical feature, and radiologist’s semantic feature. For the system based on only radiomic features the AUC was 0.75. With the addition of clinical information from examination under anesthesia (EUA) the AUC was improved to 0.78. Our study demonstrated the potential of designing a decision support system to assist in treatment response assessment. The combination of clinical features, radiologist semantic features and CTU radiomic features improved the performance of the classifier and the accuracy of treatment response assessment.
Automatic detection of kidney in 3D pediatric ultrasound images using deep neural networks
Pooneh R. Tabrizi, Awais Mansoor, Elijah Biggs, et al.
Ultrasound (US) imaging is the routine and safe diagnostic modality for detecting pediatric urology problems, such as hydronephrosis in the kidney. Hydronephrosis is the swelling of one or both kidneys because of the build-up of urine. Early detection of hydronephrosis can lead to a substantial improvement in kidney health outcomes. Generally, US imaging is a challenging modality for the evaluation of pediatric kidneys with different shape, size, and texture characteristics. The aim of this study is to present an automatic detection method to help kidney analysis in pediatric 3DUS images. The method localizes the kidney based on its minimum volume oriented bounding box) using deep neural networks. Separate deep neural networks are trained to estimate the kidney position, orientation, and scale, making the method computationally efficient by avoiding full parameter training. The performance of the method was evaluated using a dataset of 45 kidneys (18 normal and 27 diseased kidneys diagnosed with hydronephrosis) through the leave-one-out cross validation method. Quantitative results show the proposed detection method could extract the kidney position, orientation, and scale ratio with root mean square values of 1.3 ± 0.9 mm, 6.34 ± 4.32 degrees, and 1.73 ± 0.04, respectively. This method could be helpful in automating kidney segmentation for routine clinical evaluation.
Breast II
icon_mobile_dropdown
Generalization error analysis: deep convolutional neural network in mammography
We conducted a study to gain understanding of the generalizability of deep convolutional neural networks (DCNNs) given their inherent capability to memorize data. We examined empirically a specific DCNN trained for classification of masses on mammograms. Using a data set of 2,454 lesions from 2,242 mammographic views, a DCNN was trained to classify masses into malignant and benign classes using transfer learning from ImageNet LSVRC-2010. We performed experiments with varying amounts of label corruption and types of pixel randomization to analyze the generalization error for the DCNN. Performance was evaluated using the area under the receiver operating characteristic curve (AUC) with an N-fold cross validation. Comparisons were made between the convergence times, the inference AUCs for both the training set and the test set of the original image patches without corruption, and the root-mean-squared difference (RMSD) in the layer weights of the DCNN trained with different amounts and methods of corruption. Our experiments observed trends which revealed that the DCNN overfitted by memorizing corrupted data. More importantly, this study improved our understanding of DCNN weight updates when learning new patterns or new labels. Although we used a specific classification task with the ImageNet as example, similar methods may be useful for analysis of the DCNN learning processes, especially those that employ transfer learning for medical image analysis where sample size is limited and overfitting risk is high.
Compression of deep convolutional neural network for computer-aided diagnosis of masses in digital breast tomosynthesis
Deep-learning models are highly parameterized, causing difficulty in inference and transfer learning. We propose a layered pathway evolution method to compress a deep convolutional neural network (DCNN) for classification of masses in DBT while maintaining the classification accuracy. Two-stage transfer learning was used to adapt the ImageNet-trained DCNN to mammography and then to DBT. In the first-stage transfer learning, transfer learning from ImageNet trained DCNN was performed using mammography data. In the second-stage transfer learning, the mammography-trained DCNN was trained on the DBT data using feature extraction from fully connected layer, recursive feature elimination and random forest classification. The layered pathway evolution encapsulates the feature extraction to the classification stages to compress the DCNN. Genetic algorithm was used in an iterative approach with tournament selection driven by count-preserving crossover and mutation to identify the necessary nodes in each convolution layer while eliminating the redundant nodes. The DCNN was reduced by 99% in the number of parameters and 95% in mathematical operations in the convolutional layers. The lesion-based area under the receiver operating characteristic curve on an independent DBT test set from the original and the compressed network resulted in 0.88±0.05 and 0.90±0.04, respectively. The difference did not reach statistical significance. We demonstrated a DCNN compression approach without additional fine-tuning or loss of performance for classification of masses in DBT. The approach can be extended to other DCNNs and transfer learning tasks. An ensemble of these smaller and focused DCNNs has the potential to be used in multi-target transfer learning.
ICADx: interpretable computer aided diagnosis of breast masses
In this study, a novel computer aided diagnosis (CADx) framework is devised to investigate interpretability for classifying breast masses. Recently, a deep learning technology has been successfully applied to medical image analysis including CADx. Existing deep learning based CADx approaches, however, have a limitation in explaining the diagnostic decision. In real clinical practice, clinical decisions could be made with reasonable explanation. So current deep learning approaches in CADx are limited in real world deployment. In this paper, we investigate interpretability in CADx with the proposed interpretable CADx (ICADx) framework. The proposed framework is devised with a generative adversarial network, which consists of interpretable diagnosis network and synthetic lesion generative network to learn the relationship between malignancy and a standardized description (BI-RADS). The lesion generative network and the interpretable diagnosis network compete in an adversarial learning so that the two networks are improved. The effectiveness of the proposed method was validated on public mammogram database. Experimental results showed that the proposed ICADx framework could provide the interpretability of mass as well as mass classification. It was mainly attributed to the fact that the proposed method was effectively trained to find the relationship between malignancy and interpretations via the adversarial learning. These results imply that the proposed ICADx framework could be a promising approach to develop the CADx system.
Do pre-trained deep learning models improve computer-aided classification of digital mammograms?
Sarah S. Aboutalib, Aly A. Mohamed, Margarita L. Zuley, et al.
Digital mammography screening is an important exam for the early detection of breast cancer and reduction in mortality. False positives leading to high recall rates, however, results in unnecessary negative consequences to patients and health care systems. In order to better aid radiologists, computer-aided tools can be utilized to improve distinction between image classifications and thus potentially reduce false recalls. The emergence of deep learning has shown promising results in the area of biomedical imaging data analysis. This study aimed to investigate deep learning and transfer learning methods that can improve digital mammography classification performance. In particular, we evaluated the effect of pre-training deep learning models with other imaging datasets in order to boost classification performance on a digital mammography dataset. Two types of datasets were used for pre-training: (1) a digitized film mammography dataset, and (2) a very large non-medical imaging dataset. By using either of these datasets to pre-train the network initially, and then fine-tuning with the digital mammography dataset, we found an increase in overall classification performance in comparison to a model without pre-training, with the very large non-medical dataset performing the best in improving the classification accuracy.
Fully automated gynecomastia quantification from low-dose chest CT
Shuang Liu, Emily B. Sonnenblick, Lea Azour, et al.
Gynecomastia is characterized by the enlargement of male breasts, which is a common and sometimes distressing condition found in over half of adult men over the age of 44. Although the majority of gynecomastia is physiologic or idiopathic, its occurrence may also associate with an extensive variety of underlying systemic disease or drug toxicity. With the recent large-scale implementation of annual lung cancer screening using low-dose chest CT (LDCT), gynecomastia is believed to be a frequent incidental finding on LDCT. A fully automated system for gynecomastia quantification from LDCT is presented in this paper. The whole breast region is first segmented using an anatomyorientated approach based on the propagation of pectoral muscle fronts in the vertical direction. The subareolar region is then localized, and the fibroglandular tissue within it is measured for the assessment of gynecomastia. The presented system was validated using 454 breast regions from non-contrast LDCT scans of 227 adult men. The ground truth was established by an experienced radiologist by classifying each breast into one of the five categorical scores. The automated measurements have been demonstrated to achieve promising performance for the gynecomastia diagnosis with the AUC of 0.86 for the ROC curve and have statistically significant Spearman correlation r=0.70 (p < 0.001) with the reference categorical grades. The encouraging results demonstrate the feasibility of fully automated gynecomastia quantification from LDCT, which may aid the early detection as well as the treatment of both gynecomastia and the underlying medical problems, if any, that cause gynecomastia.
Breast mass detection in mammography and tomosynthesis via fully convolutional network-based heatmap regression
Jun Zhang, Elizabeth Hope Cain, Ashirbani Saha, et al.
Breast mass detection in mammography and digital breast tomosynthesis (DBT) is an essential step in computerized breast cancer analysis. Deep learning-based methods incorporate feature extraction and model learning into a unified framework and have achieved impressive performance in various medical applications (e.g., disease diagnosis, tumor detection, and landmark detection). However, these methods require large-scale accurately annotated data. Unfortunately, it is challenging to get precise annotations of breast masses. To address this issue, we propose a fully convolutional network (FCN) based heatmap regression method for breast mass detection, using only weakly annotated mass regions in mammography images. Specifically, we first generate heat maps of masses based on human-annotated rough regions for breast masses. We then develop an FCN model for end-to-end heatmap regression with an F-score loss function, where the mammography images are regarded as the input and heatmaps for breast masses are used as the output. Finally, the probability map of mass locations can be estimated with the trained model. Experimental results on a mammography dataset with 439 subjects demonstrate the effectiveness of our method. Furthermore, we evaluate whether we can use mammography data to improve detection models for DBT, since mammography shares similar structure with tomosynthesis. We propose a transfer learning strategy by fine-tuning the learned FCN model from mammography images. We test this approach on a small tomosynthesis dataset with only 40 subjects, and we show an improvement in the detection performance as compared to training the model from scratch.
Poster Session: Abdominal and Gastrointestinal
icon_mobile_dropdown
Detecting PHG frames in wireless capsule endoscopy video by integrating rough global dominate-color with fine local texture features
Xiaoqi Liu, Chengliang Wang Sr., Jianying Bai Sr., et al.
Portal hypertensive gastropathy (PHG) is common in gastrointestinal (GI) diseases, and a severe stage of PHG (S-PHG) is a source of gastrointestinal active bleeding. Generally, the diagnosis of PHG is made visually during endoscopic examination; compared with traditional endoscopy, (wireless capsule endoscopy) WCE with noninvasive and painless is chosen as a prevalent tool for visual observation of PHG. However, accurate measurement of WCE images with PHG is a difficult task due to faint contrast and confusing variations in background gastric mucosal tissue for physicians. Therefore, this paper proposes a comprehensive methodology to automatically detect S-PHG images in WCE video to help physicians accurately diagnose S-PHG. Firstly, a rough dominatecolor-tone extraction approach is proposed for better describing global color distribution information of gastric mucosa. Secondly, a hybrid two-layer texture acquisition model is designed by integrating co-occurrence matrix into local binary pattern to depict complex and unique gastric mucosal microstructure local variation. Finally, features of mucosal color and microstructure texture are merged into linear support vector machine to accomplish this automatic classification task. Experiments were implemented on an annotated data set including 1,050 SPHG and 1,370 normal images collected from 36 real patients of different nationalities, ages and genders. By comparison with three traditional texture extraction methods, our method, combined with experimental results, performs best in detection of S-PHG images in WCE video: the maximum of accuracy, sensitivity and specificity reach 0.90, 0.92 and 0.92 respectively.
Automatic blood vessel based-liver segmentation using the portal phase abdominal CT
Liver segmentation is the basis for computer-based planning of hepatic surgical interventions. In diagnosis and analysis of hepatic diseases and surgery planning, automatic segmentation of liver has high importance. Blood vessel (BV) has showed high performance at liver segmentation. In our previous work, we developed a semi-automatic method that segments the liver through the portal phase abdominal CT images in two stages. First stage was interactive segmentation of abdominal blood vessels (ABVs) and subsequent classification into hepatic (HBVs) and non-hepatic (non-HBVs). This stage had 5 interactions that include selective threshold for bone segmentation, selecting two seed points for kidneys segmentation, selection of inferior vena cava (IVC) entrance for starting ABVs segmentation, identification of the portal vein (PV) entrance to the liver and the IVC-exit for classifying HBVs from other ABVs (non-HBVs). Second stage is automatic segmentation of the liver based on segmented ABVs as described in [4]. For full automation of our method we developed a method [5] that segments ABVs automatically tackling the first three interactions. In this paper, we propose full automation of classifying ABVs into HBVs and non- HBVs and consequently full automation of liver segmentation that we proposed in [4]. Results illustrate that the method is effective at segmentation of the liver through the portal abdominal CT images.
Deep convolutional neural network for the classification of hepatocellular carcinoma and intrahepatic cholangiocarcinoma
Abhishek Midya, Jayasree Chakraborty, Linda M. Pak M.D., et al.
Liver cancer is the second leading cause of cancer-related death worldwide.1 Hepatocellular carcinoma (HCC) is the most common primary liver cancer accounting for approximately 80% of cases. Intrahepatic cholangiocarcinoma (ICC) is a rare liver cancer, arising in patients with the same risk factors as HCC, but treatment options and prognosis differ. The diagnosis of HCC is based primarily on imaging but distinguishing between HCC and ICC is challenging due to common radiographic features.2-4 The aim of the present study is to classify HCC and ICC in portal venous phase CT. 107 patients with resected ICC and 116 patients with resected HCC were included in our analysis. We developed a deep neural network by modifying a pre-trained Inception network by retraining the final layers. The proposed method achieved the best accuracy and area under the receiver operating characteristics curve of 69.70% and 0.72, respectively on the test data.
Computer-aided detection of bladder wall thickening in CT urography (CTU)
We are developing a computer-aided detection system for bladder cancer in CT urography (CTU). Bladder wall thickening is a manifestation of bladder cancer and its detection is more challenging than the detection of bladder masses. We first segmented the inner and outer bladder walls using our method that combined deep-learning convolutional neural network with level sets. The non-contrast-enhanced region was separated from the contrast-enhanced region with a maximum-intensity-projection-based method. The non-contrast region was smoothed and gray level threshold was applied to the contrast and non-contrast regions separately to extract the bladder wall and potential lesions. The bladder wall was transformed into a straightened thickness profile, which was analyzed to identify regions of wall thickening candidates. Volume-based features of the wall thickening candidates were analyzed with linear discriminant analysis (LDA) to differentiate bladder wall thickenings from false positives. A data set of 112 patients, 87 with wall thickening and 25 with normal bladders, was collected retrospectively with IRB approval, and split into independent training and test sets. Of the 57 training cases, 44 had bladder wall thickening and 13 were normal. Of the 55 test cases, 43 had wall thickening and 12 were normal. The LDA classifier was trained with the training set and evaluated with the test set. FROC analysis showed that the system achieved sensitivities of 93.2% and 88.4% for the training and test sets, respectively, at 0.5 FPs/case.
Histogram-based adaptive gray level scaling for texture feature classification of colorectal polyps
Marc Pomeroy, Hongbing Lu, Perry J. Pickhardt, et al.
Texture features have played an ever increasing role in computer aided detection (CADe) and diagnosis (CADx) methods since their inception. Texture features are often used as a method of false positive reduction for CADe packages, especially for detecting colorectal polyps and distinguishing them from falsely tagged residual stool and healthy colon wall folds. While texture features have shown great success there, the performance of texture features for CADx have lagged behind primarily because of the more similar features among different polyps types. In this paper, we present an adaptive gray level scaling and compare it to the conventional equal-spacing of gray level bins. We use a dataset taken from computed tomography colonography patients, with 392 polyp regions of interest (ROIs) identified and have a confirmed diagnosis through pathology. Using the histogram information from the entire ROI dataset, we generate the gray level bins such that each bin contains roughly the same number of voxels Each image ROI is the scaled down to two different numbers of gray levels, using both an equal spacing of Hounsfield units for each bin, and our adaptive method. We compute a set of texture features from the scaled images including 30 gray level co-occurrence matrix (GLCM) features and 11 gray level run length matrix (GLRLM) features. Using a random forest classifier to distinguish between hyperplastic polyps and all others (adenomas and adenocarcinomas), we find that the adaptive gray level scaling can improve performance based on the area under the receiver operating characteristic curve by up to 4.6%.
Bladder cancer staging in CT urography: effect of stage labels on statistical modeling of a decision support system
In bladder cancer, stage T2 is an important threshold in the decision of administering neoadjuvant chemotherapy. Our long-term goal is to develop a quantitative computerized decision support system (CDSS-S) to aid clinicians in accurate staging. In this study, we examined the effect of stage labels of the training samples on modeling such a system. We used a data set of 84 bladder cancers imaged with CT Urography (CTU). At clinical staging prior to treatment, 43 lesions were staged as below stage T2 and 41 were stage T2 or above. After cystectomy and pathological staging that is considered the gold standard, 10 of the lesions were upstaged to stage T2 or above. After correcting the stage labels, 33 lesions were below stage T2, and 51 were stage T2 or above. For the CDSS-S, the lesions were segmented using our AI-CALS method and radiomic features were extracted. We trained a linear discriminant analysis (LDA) classifier with leave-one-case-out cross validation to distinguish between bladder lesions of stage T2 or above and those below stage T2. The CDSS-S was trained and tested with the corrected post-cystectomy labels, and as a comparison, CDSS-S was also trained with understaged pre-treatment labels and tested on lesions with corrected labels. The test AUC for the CDSS-S trained with corrected labels was 0.89 ± 0.04. For the CDSS-S trained with understaged pre-treatment labels and tested on the lesions with corrected labels, the test AUC was 0.86 ± 0.04. The likelihood of stage T2 or above for 9 out of the 10 understaged lesions was correctly increased for the CDSS-S trained with corrected labels. The CDSS-S is sensitive to the accuracy of stage labeling. The CDSS-S trained with correct labels shows promise in prediction of the bladder cancer stage.
Performance evaluation of 2D and 3D deep learning approaches for automatic segmentation of multiple organs on CT images
Xiangrong Zhou, Kazuma Yamada, Takuya Kojima, et al.
The purpose of this study is to evaluate and compare the performance of modern deep learning techniques for automatically recognizing and segmenting multiple organ regions on 3D CT images. CT image segmentation is one of the important task in medical image analysis and is still very challenging. Deep learning approaches have demonstrated the capability of scene recognition and semantic segmentation on nature images and have been used to address segmentation problems of medical images. Although several works showed promising results of CT image segmentation by using deep learning approaches, there is no comprehensive evaluation of segmentation performance of the deep learning on segmenting multiple organs on different portions of CT scans. In this paper, we evaluated and compared the segmentation performance of two different deep learning approaches that used 2D- and 3D deep convolutional neural networks (CNN) without- and with a pre-processing step. A conventional approach that presents the state-of-the-art performance of CT image segmentation without deep learning was also used for comparison. A dataset that includes 240 CT images scanned on different portions of human bodies was used for performance evaluation. The maximum number of 17 types of organ regions in each CT scan were segmented automatically and compared to the human annotations by using ratio of intersection over union (IU) as the criterion. The experimental results demonstrated the IUs of the segmentation results had a mean value of 79% and 67% by averaging 17 types of organs that segmented by a 3D- and 2D deep CNN, respectively. All the results of the deep learning approaches showed a better accuracy and robustness than the conventional segmentation method that used probabilistic atlas and graph-cut methods. The effectiveness and the usefulness of deep learning approaches were demonstrated for solving multiple organs segmentation problem on 3D CT images.
Urinary bladder cancer T-staging from T2-weighted MR images using an optimal biomarker approach
Magnetic resonance imaging (MRI) is often used in clinical practice to stage patients with bladder cancer to help plan treatment. However, qualitative assessment of MR images is prone to inaccuracies, adversely affecting patient outcomes. In this paper, T2-weighted MR image-based quantitative features were extracted from the bladder wall in 65 patients with bladder cancer to classify them into two primary tumor (T) stage groups: group 1 – T stage < T2, with primary tumor locally confined to the bladder, and group 2 – T stage < T2, with primary tumor locally extending beyond the bladder. The bladder was divided into 8 sectors in the axial plane, where each sector has a corresponding reference standard T stage that is based on expert radiology qualitative MR image review and histopathologic results. The performance of the classification for correct assignment of T stage grouping was then evaluated at both the patient level and the sector level. Each bladder sector was divided into 3 shells (inner, middle, and outer), and 15,834 features including intensity features and texture features from local binary pattern and gray-level co-occurrence matrix were extracted from the 3 shells of each sector. An optimal feature set was selected from all features using an optimal biomarker approach. Nine optimal biomarker features were derived based on texture properties from the middle shell, with an area under the ROC curve of AUC value at the sector and patient level of 0.813 and 0.806, respectively.
Poster Session: Brain and Head
icon_mobile_dropdown
Automated volumetry of temporal horn of lateral ventricle for detection of Alzheimer's disease in CT scan
The rapid increase in the incidence of Alzheimer’s disease (AD) has become a critical issue in low and middle income countries. In general, MR imaging has become sufficiently suitable in clinical situations, while CT scan might be uncommonly used in the diagnosis of AD due to its low contrast between brain tissues. However, in those countries, CT scan, which is less costly and readily available, will be desired to become useful for the diagnosis of AD. For CT scan, the enlargement of the temporal horn of the lateral ventricle (THLV) is one of few findings for the diagnosis of AD. In this paper, we present an automated volumetry of THLV with segmentation based on Bayes’ rule on CT images. In our method, first, all CT data sets are normalized into an atlas by using linear affine transformation and non-linear wrapping techniques. Next, a probability map of THLV is constructed in the normalized data. Then, THLV regions are extracted based on Bayes’ rule. Finally, the volume of the THLV is evaluated. This scheme was applied to CT scans from 20 AD patients and 20 controls to evaluate the performance of the method for detecting AD. The estimated THLV volume was markedly increased in the AD group compared with the controls (P < .0001), and the area under the receiver operating characteristic curve (AUC) was 0.921. Therefore, this computerized method may have the potential to accurately detect AD on CT images.
Exploring DeepMedic for the purpose of segmenting white matter hyperintensity lesions
Fiona Lippert, Bastian Cheng, Amir Golsari, et al.
DeepMedic, an open source software library based on a multi-channel multi-resolution 3D convolutional neural network, has recently been made publicly available for brain lesion segmentations. It has already been shown that segmentation tasks on MRI data of patients having traumatic brain injuries, brain tumors, and ischemic stroke lesions can be performed very well. In this paper we describe how it can efficiently be used for the purpose of detecting and segmenting white matter hyperintensity lesions. We examined if it can be applied to single-channel routine 2D FLAIR data. For evaluation, we annotated 197 datasets with different numbers and sizes of white matter hyperintensity lesions. Our experiments have shown that substantial results with respect to the segmentation quality can be achieved. Compared to the original parametrization of the DeepMedic neural network, the timings for training can be drastically reduced if adjusting corresponding training parameters, while at the same time the Dice coefficients remain nearly unchanged. This enables for performing a whole training process within a single day utilizing a NVIDIA GeForce GTX 580 graphics board which makes this library also very interesting for research purposes on low-end GPU hardware.
Evaluation of a deep learning architecture for MR imaging prediction of ATRX in glioma patients
Panagiotis Korfiatis, Timothy L. Kline, Bradley J. Erickson
Predicting mutation/loss of alpha-thalassemia/mental retardation syndrome X-linked (ATRX) gene utilizing MR imaging is of high importance since it is a predictor of response and prognosis in brain tumors. In this study, we compare a deep neural network approach based on a residual deep neural network (ResNet) architecture and one based on a classical machine learning approach and evaluate their ability in predicting ATRX mutation status without the need for a distinct tumor segmentation step. We found that the ResNet50 (50 layers) architecture, pre trained on ImageNet data was the best performing model, achieving an accuracy of 0.91 for the test set (classification of a slice as no tumor, ATRX mutated, or mutated) in terms of f1 score in a test set of 35 cases. The SVM classifier achieved 0.63 for differentiating the Flair signal abnormality regions from the test patients based on their mutation status. We report a method that alleviates the need for extensive preprocessing and acts as a proof of concept that deep neural network architectures can be used to predict molecular biomarkers from routine medical images.
Measurement of hard tissue density based on image density of intraoral radiograph
Akitoshi Katsumata, Tatsumasa Fukui, Shinji Shimoda, et al.
We developed a DentalSCOPE computer program to measure the bone mineral density (BMD) of the alveolar bone. Mineral density measurement of alveolar bone may be useful to predict possible patients who will occur medication-related osteonecrosis of the jaw (MRONJ). Because these osteoporosis medicines affect the mineral density of alveolar bone significantly. The BMD of alveolar bone was compared between dual-energy X-ray absorptiometry (DEXA) and the DentalSCOPE program. A high correlation coefficient was revealed between the DentalSCOPE measurement and the DEXA measurement.
Classifying magnetic resonance image modalities with convolutional neural networks
Samuel Remedios, Dzung L. Pham, John A. Butman, et al.
Magnetic Resonance (MR) imaging allows the acquisition of images with different contrast properties depending on the acquisition protocol and the magnetic properties of tissues. Many MR brain image processing techniques, such as tissue segmentation, require multiple MR contrasts as inputs, and each contrast is treated differently. Thus it is advantageous to automate the identification of image contrasts for various purposes, such as facilitating image processing pipelines, and managing and maintaining large databases via content-based image retrieval (CBIR). Most automated CBIR techniques focus on a two-step process: extracting features from data and classifying the image based on these features. We present a novel 3D deep convolutional neural network (CNN)- based method for MR image contrast classification. The proposed CNN automatically identifies the MR contrast of an input brain image volume. Specifically, we explored three classification problems: (1) identify T1-weighted (T1-w), T2-weighted (T2-w), and fluid-attenuated inversion recovery (FLAIR) contrasts, (2) identify pre vs postcontrast T1, (3) identify pre vs post-contrast FLAIR. A total of 3418 image volumes acquired from multiple sites and multiple scanners were used. To evaluate each task, the proposed model was trained on 2137 images and tested on the remaining 1281 images. Results showed that image volumes were correctly classified with 97.57% accuracy.
Poster Session: Breast
icon_mobile_dropdown
Pectoral muscle segmentation in breast tomosynthesis with deep learning
Alejandro Rodriguez-Ruiz, Jonas Teuwen, Kaman Chung, et al.
Digital breast tomosynthesis (DBT) has superior detection performance than mammography (DM) for population-based breast cancer screening, but the higher number of images that must be reviewed poses a challenge for its implementation. This may be ameliorated by creating a twodimensional synthetic mammographic image (SM) from the DBT volume, containing the most relevant information. When creating a SM, it is of utmost importance to have an accurate lesion localization detection algorithm, while segmenting fibroglandular tissue could also be beneficial. These tasks encounter an extra challenge when working with images in the medio-lateral oblique view, due to the presence of the pectoral muscle, which has similar radiographic density. In this work, we present an automatic pectoral muscle segmentation model based on a u-net deep learning architecture, trained with 136 DBT images acquired with a single system (different BIRADS ® densities and pathological findings). The model was tested on 36 DBT images from that same system resulting in a dice similarity coefficient (DSC) of 0.977 (0.967-0.984). In addition, the model was tested on 125 images from two different systems and three different modalities (DBT, SM, DM), obtaining DSCs between 0.947 and 0.970, a range determined visually to provide adequate segmentations. For reference, a resident radiologist independently annotated a mix of 25 cases obtaining a DSC of 0.971. The results suggest the possibility of using this model for inter-manufacturer DBT, DM and SM tasks that benefit from the segmentation of the pectoral muscle, such as SM generation, computer aided detection systems, or patient dosimetry algorithms.
Computer-aided classification of breast masses using contrast-enhanced digital mammograms
Gopichandh Danala, Faranak Aghaei, Morteza Heidari, et al.
By taking advantages of both mammography and breast MRI, contrast-enhanced digital mammography (CEDM) has emerged as a new promising imaging modality to improve efficacy of breast cancer screening and diagnosis. The primary objective of study is to develop and evaluate a new computer-aided detection and diagnosis (CAD) scheme of CEDM images to classify between malignant and benign breast masses. A CEDM dataset consisting of 111 patients (33 benign and 78 malignant) was retrospectively assembled. Each case includes two types of images namely, low-energy (LE) and dual-energy subtracted (DES) images. First, CAD scheme applied a hybrid segmentation method to automatically segment masses depicting on LE and DES images separately. Optimal segmentation results from DES images were also mapped to LE images and vice versa. Next, a set of 109 quantitative image features related to mass shape and density heterogeneity was initially computed. Last, four multilayer perceptron-based machine learning classifiers integrated with correlationbased feature subset evaluator and leave-one-case-out cross-validation method was built to classify mass regions depicting on LE and DES images, respectively. Initially, when CAD scheme was applied to original segmentation of DES and LE images, the areas under ROC curves were 0.7585±0.0526 and 0.7534±0.0470, respectively. After optimal segmentation mapping from DES to LE images, AUC value of CAD scheme significantly increased to 0.8477±0.0376 (p<0.01). Since DES images eliminate overlapping effect of dense breast tissue on lesions, segmentation accuracy was significantly improved as compared to regular mammograms, the study demonstrated that computer-aided classification of breast masses using CEDM images yielded higher performance.
Applying a new unequally weighted feature fusion method to improve CAD performance of classifying breast lesions
Abolfazl Zargari Khuzani, Gopichandh Danala, Morteza Heidari, et al.
Higher recall rates are a major challenge in mammography screening. Thus, developing computer-aided diagnosis (CAD) scheme to classify between malignant and benign breast lesions can play an important role to improve efficacy of mammography screening. Objective of this study is to develop and test a unique image feature fusion framework to improve performance in classifying suspicious mass-like breast lesions depicting on mammograms. The image dataset consists of 302 suspicious masses detected on both craniocaudal and mediolateral-oblique view images. Amongst them, 151 were malignant and 151 were benign. The study consists of following 3 image processing and feature analysis steps. First, an adaptive region growing segmentation algorithm was used to automatically segment mass regions. Second, a set of 70 image features related to spatial and frequency characteristics of mass regions were initially computed. Third, a generalized linear regression model (GLM) based machine learning classifier combined with a bat optimization algorithm was used to optimally fuse the selected image features based on predefined assessment performance index. An area under ROC curve (AUC) with was used as a performance assessment index. Applying CAD scheme to the testing dataset, AUC was 0.75±0.04, which was significantly higher than using a single best feature (AUC=0.69±0.05) or the classifier with equally weighted features (AUC=0.73±0.05). This study demonstrated that comparing to the conventional equal-weighted approach, using an unequal-weighted feature fusion approach had potential to significantly improve accuracy in classifying between malignant and benign breast masses.
Recurrent neural networks for breast lesion classification based on DCE-MRIs
Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) plays a significant role in breast cancer screening, cancer staging, and monitoring response to therapy. Recently, deep learning methods are being rapidly incorporated in image-based breast cancer diagnosis and prognosis. However, most of the current deep learning methods make clinical decisions based on 2-dimentional (2D) or 3D images and are not well suited for temporal image data. In this study, we develop a deep learning methodology that enables integration of clinically valuable temporal components of DCE-MRIs into deep learning-based lesion classification. Our work is performed on a database of 703 DCE-MRI cases for the task of distinguishing benign and malignant lesions, and uses the area under the ROC curve (AUC) as the performance metric in conducting that task. We train a recurrent neural network, specifically a long short-term memory network (LSTM), on sequences of image features extracted from the dynamic MRI sequences. These features are extracted with VGGNet, a convolutional neural network pre-trained on a large dataset of natural images ImageNet. The features are obtained from various levels of the network, to capture low-, mid-, and high-level information about the lesion. Compared to a classification method that takes as input only images at a single time-point (yielding an AUC = 0.81 (se = 0.04)), our LSTM method improves lesion classification with an AUC of 0.85 (se = 0.03).
Multi-resolution analysis using integrated microscopic configuration with local patterns for benign-malignant mass classification
Rinku Rabidas, Abhishek Midya, Jayasree Chakraborty, et al.
In this paper, Curvelet based local attributes, Curvelet-Local configuration pattern (C-LCP), is introduced for the characterization of mammographic masses as benign or malignant. Amid different anomalies such as micro- calcification, bilateral asymmetry, architectural distortion, and masses, the reason for targeting the mass lesions is due to their variation in shape, size, and margin which makes the diagnosis a challenging task. Being efficient in classification, multi-resolution property of the Curvelet transform is exploited and local information is extracted from the coefficients of each subband using Local configuration pattern (LCP). The microscopic measures in concatenation with the local textural information provide more discriminating capability than individual. The measures embody the magnitude information along with the pixel-wise relationships among the neighboring pixels. The performance analysis is conducted with 200 mammograms of the DDSM database containing 100 mass cases of each benign and malignant. The optimal set of features is acquired via stepwise logistic regression method and the classification is carried out with Fisher linear discriminant analysis. The best area under the receiver operating characteristic curve and accuracy of 0.95 and 87.55% are achieved with the proposed method, which is further compared with some of the state-of-the-art competing methods.
Association between mammogram density and background parenchymal enhancement of breast MRI
Breast density has been widely considered as an important risk factor for breast cancer. The purpose of this study is to examine the association between mammogram density results and background parenchymal enhancement (BPE) of breast MRI. A dataset involving breast MR images was acquired from 65 high-risk women. Based on mammography density (BIRADS) results, the dataset was divided into two groups of low and high breast density cases. The Low-Density group has 15 cases with mammographic density (BIRADS 1 and 2), while the High-density group includes 50 cases, which were rated by radiologists as mammographic density BIRADS 3 and 4. A computer-aided detection (CAD) scheme was applied to segment and register breast regions depicted on sequential images of breast MRI scans. CAD scheme computed 20 global BPE features from the entire two breast regions, separately from the left and right breast region, as well as from the bilateral difference between left and right breast regions. An image feature selection method namely, CFS method, was applied to remove the most redundant features and select optimal features from the initial feature pool. Then, a logistic regression classifier was built using the optimal features to predict the mammogram density from the BPE features. Using a leave-one-case-out validation method, the classifier yields the accuracy of 82% and area under ROC curve, AUC=0.81±0.09. Also, the box-plot based analysis shows a negative association between mammogram density results and BPE features in the MRI images. This study demonstrated a negative association between mammogram density and BPE of breast MRI images.
Reduction of false-positives in a CAD scheme for automated detection of architectural distortion in digital mammography
Helder C. R. de Oliveira, Arianna Mencattini, Paola Casti, et al.
This paper proposes a method to reduce the number of false-positives (FP) in a computer-aided detection (CAD) scheme for automated detection of architectural distortion (AD) in digital mammography. AD is a subtle contraction of breast parenchyma that may represent an early sign of breast cancer. Due to its subtlety and variability, AD is more difficult to detect compared to microcalcifications and masses, and is commonly found in retrospective evaluations of false-negative mammograms. Several computer-based systems have been proposed for automated detection of AD in breast images. The usual approach is automatically detect possible sites of AD in a mammographic image (segmentation step) and then use a classifier to eliminate the false-positives and identify the suspicious regions (classification step). This paper focus on the optimization of the segmentation step to reduce the number of FPs that is used as input to the classifier. The proposal is to use statistical measurements to score the segmented regions and then apply a threshold to select a small quantity of regions that should be submitted to the classification step, improving the detection performance of a CAD scheme. We evaluated 12 image features to score and select suspicious regions of 74 clinical Full-Field Digital Mammography (FFDM). All images in this dataset contained at least one region with AD previously marked by an expert radiologist. The results showed that the proposed method can reduce the false positives of the segmentation step of the CAD scheme from 43.4 false positives (FP) per image to 34.5 FP per image, without increasing the number of false negatives.
A fully automatic microcalcification detection approach based on deep convolution neural network
Guanxiong Cai, Yanhui Guo, Yaqin Zhang, et al.
Breast cancer is one of the most common cancers and has high morbidity and mortality worldwide, posing a serious threat to the health of human beings. The emergence of microcalcifications (MCs) is an important signal of early breast cancer. However, it is still challenging and time consuming for radiologists to identify some tiny and subtle individual MCs in mammograms. This study proposed a novel computer-aided MC detection algorithm on the full field digital mammograms (FFDMs) using deep convolution neural network (DCNN). Firstly, a MC candidate detection system was used to obtain potential MC candidates. Then a DCNN was trained using a novel adaptive learning strategy, neutrosophic reinforcement sample learning (NRSL) strategy to speed up the learning process. The trained DCNN served to recognize true MCs. After been classified by DCNN, a density-based regional clustering method was imposed to form MC clusters. The accuracy of the DCNN with our proposed NRSL strategy converges faster and goes higher than the traditional DCNN at same epochs, and the obtained an accuracy of 99.87% on training set, 95.12% on validation set, and 93.68% on testing set at epoch 40. For cluster-based MC cluster detection evaluation, a sensitivity of 90% was achieved at 0.13 false positives (FPs) per image. The obtained results demonstrate that the designed DCNN plays a significant role in the MC detection after being prior trained.
Learning better deep features for the prediction of occult invasive disease in ductal carcinoma in situ through transfer learning
Purpose: To determine whether domain transfer learning can improve the performance of deep features extracted from digital mammograms using a pre-trained deep convolutional neural network (CNN) in the prediction of occult invasive disease for patients with ductal carcinoma in situ (DCIS) on core needle biopsy.

Method: In this study, we collected digital mammography magnification views for 140 patients with DCIS at biopsy, 35 of which were subsequently upstaged to invasive cancer. We utilized a deep CNN model that was pre-trained on two natural image data sets (ImageNet and DTD) and one mammographic data set (INbreast) as the feature extractor, hypothesizing that these data sets are increasingly more similar to our target task and will lead to better representations of deep features to describe DCIS lesions. Through a statistical pooling strategy, three sets of deep features were extracted using the CNNs at different levels of convolutional layers from the lesion areas. A logistic regression classifier was then trained to predict which tumors contain occult invasive disease. The generalization performance was assessed and compared using repeated random sub-sampling validation and receiver operating characteristic (ROC) curve analysis.

Result: The best performance of deep features was from CNN model pre-trained on INbreast, and the proposed classifier using this set of deep features was able to achieve a median classification performance of ROC-AUC equal to 0.75, which is significantly better (p<=0.05) than the performance of deep features extracted using ImageNet data set (ROCAUC = 0.68).

Conclusion: Transfer learning is helpful for learning a better representation of deep features, and improves the prediction of occult invasive disease in DCIS.
Deformable image registration as a tool to improve survival prediction after neoadjuvant chemotherapy for breast cancer: results from the ACRIN 6657/I-SPY-1 trial
Nariman Jahani, Eric Cohen, Meng-Kang Hsieh, et al.
We examined the ability of DCE-MRI longitudinal features to give early prediction of recurrence-free survival (RFS) in women undergoing neoadjuvant chemotherapy for breast cancer, in a retrospective analysis of 106 women from the ISPY 1 cohort. These features were based on the voxel-wise changes seen in registered images taken before treatment and after the first round of chemotherapy. We computed the transformation field using a robust deformable image registration technique to match breast images from these two visits. Using the deformation field, parametric response maps (PRM) — a voxel-based feature analysis of longitudinal changes in images between visits — was computed for maps of four kinetic features (signal enhancement ratio, peak enhancement, and wash-in/wash-out slopes). A two-level discrete wavelet transform was applied to these PRMs to extract heterogeneity information about tumor change between visits. To estimate survival, a Cox proportional hazard model was applied with the C statistic as the measure of success in predicting RFS. The best PRM feature (as determined by C statistic in univariable analysis) was determined for each of the four kinetic features. The baseline model, incorporating functional tumor volume, age, race, and hormone response status, had a C statistic of 0.70 in predicting RFS. The model augmented with the four PRM features had a C statistic of 0.76. Thus, our results suggest that adding information on the texture of voxel-level changes in tumor kinetic response between registered images of first and second visits could improve early RFS prediction in breast cancer after neoadjuvant chemotherapy.
Expert identification of visual primitives used by CNNs during mammogram classification
Jimmy Wu, Diondra Peck, Scott Hsieh, et al.
This work interprets the internal representations of deep neural networks trained for classification of diseased tissue in 2D mammograms. We propose an expert-in-the-loop inter- pretation method to label the behavior of internal units in convolutional neural networks (CNNs). Expert radiologists identify that the visual patterns detected by the units are correlated with meaningful medical phenomena such as mass tissue and calcificated vessels. We demonstrate that several trained CNN models are able to produce explanatory descriptions to support the final classification decisions. We view this as an important first step toward interpreting the internal representations of medical classification CNNs and explaining their predictions.
Similarity estimation for reference image retrieval in mammograms using convolutional neural network
Chisako Muramatsu, Shunichi Higuchi, Takako Morita, et al.
Periodic breast cancer screening with mammography is considered effective in decreasing breast cancer mortality. For screening programs to be successful, an intelligent image analytic system may support radiologists’ efficient image interpretation. In our previous studies, we have investigated image retrieval schemes for diagnostic references of breast lesions on mammograms and ultrasound images. Using a machine learning method, reliable similarity measures that agree with radiologists’ similarity were determined and relevant images could be retrieved. However, our previous method includes a feature extraction step, in which hand crafted features were determined based on manual outlines of the masses. Obtaining the manual outlines of masses is not practical in clinical practice and such data would be operator-dependent. In this study, we investigated a similarity estimation scheme using a convolutional neural network (CNN) to skip such procedure and to determine data-driven similarity scores. By using CNN as feature extractor, in which extracted features were employed in determination of similarity measures with a conventional 3-layered neural network, the determined similarity measures were correlated well with the subjective ratings and the precision of retrieving diagnostically relevant images was comparable with that of the conventional method using handcrafted features. By using CNN for determination of similarity measure directly, the result was also comparable. By optimizing the network parameters, results may be further improved. The proposed method has a potential usefulness in determination of similarity measure without precise lesion outlines for retrieval of similar mass images on mammograms.
Convolutional encoder-decoder for breast mass segmentation in digital breast tomosynthesis
Jun Zhang, Sujata V. Ghate, Lars J. Grimm, et al.
Digital breast tomosynthesis (DBT) is a relatively new modality for breast imaging that can provide detailed assessment of dense tissue within the breast. In the domains of cancer diagnosis, radiogenomics, and resident education, it is important to accurately segment breast masses. However, breast mass segmentation is a very challenging task, since mass regions have low contrast difference between their neighboring tissues. Notably, the task might become more difficult in cases that were assigned BI-RADS 0 category since this category includes many lesions that are of low conspicuity and locations that were deemed to be overlapping normal tissue upon further imaging and were not sent to biopsy. Segmentation of such lesions is of particular importance in the domain of reader performance analysis and education. In this paper, we propose a novel deep learning-based method for segmentation of BI-RADS 0 lesions in DBT. The key components of our framework are an encoding path for local-to-global feature extraction, and a decoding patch to expand the images. To address the issue of limited training data, in the training stage, we propose to sample patches not only in mass regions but also in non-mass regions. We utilize a Dice-like loss function in the proposed network to alleviate the class-imbalance problem. The preliminary results on 40 subjects show promise of our method. In addition to quantitative evaluation of the method, we present a visualization of the results that demonstrate both the performance of the algorithm as well as the difficulty of the task at hand.
Deep learning-based features of breast MRI for prediction of occult invasive disease following a diagnosis of ductal carcinoma in situ: preliminary data
Zhe Zhu, Michael Harowicz D.D.S., Jun Zhang, et al.
Approximately 25% of patients with ductal carcinoma in situ (DCIS) diagnosed from core needle biopsy are subsequently upstaged to invasive cancer at surgical excision. Identifying patients with occult invasive disease is important as it changes treatment and precludes enrollment in active surveillance for DCIS. In this study, we investigated upstaging of DCIS to invasive disease using deep features. While deep neural networks require large amounts of training data, the available data to predict DCIS upstaging is sparse and thus directly training a neural network is unlikely to be successful. In this work, a pre-trained neural network is used as a feature extractor and a support vector machine (SVM) is trained on the extracted features. We used the dynamic contrast-enhanced (DCE) MRIs of patients at our institution from January 1, 2000, through March 23, 2014 who underwent MRI following a diagnosis of DCIS. Among the 131 DCIS patients, there were 35 patients who were upstaged to invasive cancer. Area under the ROC curve within the 10-fold cross-validation scheme was used for validation of our predictive model. The use of deep features was able to achieve an AUC of 0.68 (95% CI: 0.56-0.78) to predict occult invasive disease. This preliminary work demonstrates the promise of deep features to predict surgical upstaging following a diagnosis of DCIS.
Breast cancer molecular subtype classification using deep features: preliminary results
Zhe Zhu, Ehab Albadawy, Ashirbani Saha, et al.
Radiogenomics is a field of investigation that attempts to examine the relationship between imaging characteris- tics of cancerous lesions and their genomic composition. This could offer a noninvasive alternative to establishing genomic characteristics of tumors and aid cancer treatment planning. While deep learning has shown its supe- riority in many detection and classification tasks, breast cancer radiogenomic data suffers from a very limited number of training examples, which renders the training of the neural network for this problem directly and with no pretraining a very difficult task. In this study, we investigated an alternative deep learning approach referred to as deep features or off-the-shelf network approach to classify breast cancer molecular subtypes using breast dynamic contrast enhanced MRIs. We used the feature maps of different convolution layers and fully connected layers as features and trained support vector machines using these features for prediction. For the feature maps that have multiple layers, max-pooling was performed along each channel. We focused on distinguishing the Luminal A subtype from other subtypes. To evaluate the models, 10 fold cross-validation was performed and the final AUC was obtained by averaging the performance of all the folds. The highest average AUC obtained was 0.64 (0.95 CI: 0.57-0.71), using the feature maps of the last fully connected layer. This indicates the promise of using this approach to predict the breast cancer molecular subtypes. Since the best performance appears in the last fully connected layer, it also implies that breast cancer molecular subtypes may relate to high level image features
Poster Session: Cardiac
icon_mobile_dropdown
Voxel-based plaque classification in coronary intravascular optical coherence tomography images using decision trees
Chaitanya Kolluru, David Prabhu, Yazan Gharaibeh, et al.
Intravascular Optical Coherence Tomography (IVOCT) is a high contrast, 3D microscopic imaging technique that can be used to assess atherosclerosis and guide stent interventions. Despite its advantages, IVOCT image interpretation is challenging and time consuming with over 500 image frames generated in a single pullback volume. We have developed a method to classify voxel plaque types in IVOCT images using machine learning. To train and test the classifier, we have used our unique database of labeled cadaver vessel IVOCT images accurately registered to gold standard cryoimages. This database currently contains 300 images and is growing. Each voxel is labeled as fibrotic, lipid-rich, calcified or other. Optical attenuation, intensity and texture features were extracted for each voxel and were used to build a decision tree classifier for multi-class classification. Five-fold cross-validation across images gave accuracies of 96 % ± 0.01 %, 90 ± 0.02% and 90 % ± 0.01 % for fibrotic, lipid-rich and calcified classes respectively. To rectify performance degradation seen in left out vessel specimens as opposed to left out images, we are adding data and reducing features to limit overfitting. Following spatial noise cleaning, important vascular regions were unambiguous in display. We developed displays that enable physicians to make rapid determination of calcified and lipid regions. This will inform treatment decisions such as the need for devices (e.g., atherectomy or scoring balloon in the case of calcifications) or extended stent lengths to ensure coverage of lipid regions prone to injury at the edge of a stent.
Myocardial scar segmentation from magnetic resonance images using convolutional neural network
Accurate segmentation of the myocardial fibrosis or scar may provide important advancements for the prediction and management of malignant ventricular arrhythmias in patients with cardiovascular disease. In this paper, we propose a semi-automated method for segmentation of myocardial scar from late gadolinium enhancement magnetic resonance image (LGE-MRI) using a convolutional neural network (CNN). In contrast to image intensitybased methods, CNN-based algorithms have the potential to improve the accuracy of scar segmentation through the creation of high-level features from a combination of convolutional, detection and pooling layers. Our developed algorithm was trained using 2,336,703 image patches extracted from 420 slices of five 3D LGE-MR datasets, then validated on 2,204,178 patches from a testing dataset of seven 3D LGE-MR images including 624 slices, all obtained from patients with chronic myocardial infarction. For evaluation of the algorithm, we compared the algorithmgenerated segmentations to manual delineations by experts. Our CNN-based method reported an average Dice similarity coefficient (DSC), precision, and recall of 94.50 ± 3.62%, 96.08 ± 3.10%, and 93.96 ± 3.75% as the accuracy of segmentation, respectively. As compared to several intensity threshold-based methods for scar segmentation, the results of our developed method have a greater agreement with manual expert segmentation.
Convolutional neural networks for the detection of diseased hearts using CT images and left atrium patches
James D. Dormer, Martin Halicek, Ling Ma, et al.
Cardiovascular disease is a leading cause of death in the United States. The identification of cardiac diseases on conventional three-dimensional (3D) CT can have many clinical applications. An automated method that can distinguish between healthy and diseased hearts could improve diagnostic speed and accuracy when the only modality available is conventional 3D CT. In this work, we proposed and implemented convolutional neural networks (CNNs) to identify diseased hears on CT images. Six patients with healthy hearts and six with previous cardiovascular disease events received chest CT. After the left atrium for each heart was segmented, 2D and 3D patches were created. A subset of the patches were then used to train separate convolutional neural networks using leave-one-out cross-validation of patient pairs. The results of the two neural networks were compared, with 3D patches producing the higher testing accuracy. The full list of 3D patches from the left atrium was then classified using the optimal 3D CNN model, and the receiver operating curves (ROCs) were produced. The final average area under the curve (AUC) from the ROC curves was 0.840 ± 0.065 and the average accuracy was 78.9% ± 5.9%. This demonstrates that the CNN-based method is capable of distinguishing healthy hearts from those with previous cardiovascular disease.
Poster Session: Eye
icon_mobile_dropdown
Lesion detection in ultra-wide field retinal images for diabetic retinopathy diagnosis
Anastasia Levenkova, Arcot Sowmya, Michael Kalloniatis, et al.
Diabetic retinopathy (DR) leads to irreversible vision loss. Diagnosis and staging of DR is usually based on the presence, number, location and type of retinal lesions. Ultra-wide field (UWF) digital scanning laser technology provides an opportunity for computer-aided DR lesion detection. High-resolution UWF images (3078×2702 pixels) may allow detection of more clinically relevant retinopathy in comparison with conventional retinal images as UWF imaging covers a 200° retinal area, versus 45° by conventional cameras. Current approaches to DR diagnosis that analyze 7-field Early Treatment Diabetic Retinopathy Study (ETDRS) retinal images provide similar results to UWF imaging. However, in 40% of cases, more retinopathy was found outside the 7- field ETDRS fields by UWF and in 10% of cases, retinopathy was reclassified as more severe. The reason is that UWF images examine both the central retina and more peripheral regions. We propose an algorithm for automatic detection and classification of DR lesions such as cotton wool spots, exudates, microaneurysms and haemorrhages in UWF images. The algorithm uses convolutional neural network (CNN) as a feature extractor and classifies the feature vectors extracted from colour-composite UWF images using a support vector machine (SVM). The main contribution includes detection of four types of DR lesions in the peripheral retina for diagnostic purposes. The evaluation dataset contains 146 UWF images. The proposed method for detection of DR lesion subtypes in UWF images using two scenarios for transfer learning achieved AUC ≈ 80%. Data was split at the patient level to validate the proposed algorithm.
Poster Session: Lung
icon_mobile_dropdown
3D GGO candidate extraction in lung CT images using multilevel thresholding on supervoxels
Shan Huang, Xiabi Liu, Guanghui Han, et al.
The earlier detection of ground glass opacity (GGO) is of great importance since GGOs are more likely to be malignant than solid nodules. However, the detection of GGO is a difficult task in lung cancer screening. This paper proposes a novel GGO candidate extraction method, which performs multilevel thresholding on supervoxels in 3D lung CT images. Firstly, we segment the lung parenchyma based on Otsu algorithm. Secondly, the voxels which are adjacent in 3D discrete space and sharing similar grayscale are clustered into supervoxels. This procedure is used to enhance GGOs and reduce computational complexity. Thirdly, Hessian matrix is used to emphasize focal GGO candidates. Lastly, an improved adaptive multilevel thresholding method is applied on segmented clusters to extract GGO candidates. The proposed method was evaluated on a set of 19 lung CT scans containing 166 GGO lesions from the Lung CT Imaging Signs (LISS) database. The experimental results show that our proposed GGO candidate extraction method is effective, with a sensitivity of 100% and 26.3 of false positives per scan (665 GGO candidates, 499 non-GGO regions and 166 GGO regions). It can handle both focal GGOs and diffuse GGOs.
Opacity annotation of diffuse lung diseases using deep convolutional neural network with multi-channel information
Shingo Mabu, Shoji Kido M.D., Noriaki Hashimoto, et al.
This research proposes a multi-channel deep convolutional neural network (DCNN) for computer-aided diagnosis (CAD) that classifies normal and abnormal opacities of diffuse lung diseases in Computed Tomography (CT) images. Because CT images are gray scale, DCNN usually uses one channel for inputting image data. On the other hand, this research uses multi-channel DCNN where each channel corresponds to the original raw image or the images transformed by some preprocessing techniques. In fact, the information obtained only from raw images is limited and some conventional research suggested that preprocessing of images contributes to improving the classification accuracy. Thus, the combination of the original and preprocessed images is expected to show higher accuracy. The proposed method realizes region of interest (ROI)-based opacity annotation. We used lung CT images taken in Yamaguchi University Hospital, Japan, and they are divided into 32 × 32 ROI images. The ROIs contain six kinds of opacities: consolidation, ground-glass opacity (GGO), emphysema, honeycombing, nodular, and normal. The aim of the proposed method is to classify each ROI into one of the six opacities (classes). The DCNN structure is based on VGG network that secured the first and second places in ImageNet ILSVRC-2014. From the experimental results, the classification accuracy of the proposed method was better than the conventional method with single channel, and there was a significant difference between them.
Localization of lung fields in HRCT images using a deep convolution neural network
Abhishek Kumar, Sunita Agarwala, Ashis Kumar Dhara, et al.
Lung field segmentation is a prerequisite step for the development of a computer-aided diagnosis system for interstitial lung diseases observed in chest HRCT images. Conventional methods of lung field segmentation rely on a large gray value contrast between lung fields and surrounding tissues. These methods fail on lung HRCT images with dense and diffused pathology. An efficient prepro- cessing could improve the accuracy of segmentation of pathological lung field in HRCT images. In this paper, a convolution neural network is used for localization of lung fields in HRCT images. The proposed method provides an optimal bounding box enclosing the lung fields irrespective of the presence of diffuse pathology. The performance of the proposed algorithm is validated on 330 lung HRCT images obtained from MedGift database on ZF and VGG networks. The model achieves a mean average precision of 0.94 with ZF net and a slightly better performance giving a mean average precision of 0.95 in case of VGG net.
Deep neural network convolution (NNC) for three-class classification of diffuse lung disease opacities in high-resolution CT (HRCT): consolidation, ground-glass opacity (GGO), and normal opacity
Noriaki Hashimoto, Kenji Suzuki, Junchi Liu, et al.
Consolidation and ground-glass opacity (GGO) are two major types of opacities associated with diffuse lung diseases. Accurate detection and classification of such opacities are crucially important in the diagnosis of lung diseases, but the process is subjective, and suffers from interobserver variability. Our study purpose was to develop a deep neural network convolution (NNC) system for distinguishing among consolidation, GGO, and normal lung tissue in high-resolution CT (HRCT). We developed ensemble of two deep NNC models, each of which was composed of neural network regression (NNR) with an input layer, a convolution layer, a fully-connected hidden layer, and a fully-connected output layer followed by a thresholding layer. The output layer of each NNC provided a map for the likelihood of being each corresponding lung opacity of interest. The two NNC models in the ensemble were connected in a class-selection layer. We trained our NNC ensemble with pairs of input 2D axial slices and “teaching” probability maps for the corresponding lung opacity, which were obtained by combining three radiologists’ annotations. We randomly selected 10 and 40 slices from HRCT scans of 172 patients for each class as a training and test set, respectively. Our NNC ensemble achieved an area under the receiver-operating-characteristic (ROC) curve (AUC) of 0.981 and 0.958 in distinction of consolidation and GGO, respectively, from normal opacity, yielding a classification accuracy of 93.3% among 3 classes. Thus, our deep-NNC-based system for classifying diffuse lung diseases achieved high accuracies for classification of consolidation, GGO, and normal opacity.
A deep-learning based automatic pulmonary nodule detection system
Yiyuan Zhao, Liang Zhao, Zhennan Yan, et al.
Lung cancer is the deadliest cancer worldwide. Early detection of lung cancer is a promising way to lower the risk of dying. Accurate pulmonary nodule detection in computed tomography (CT) images is crucial for early diagnosis of lung cancer. The development of computer-aided detection (CAD) system of pulmonary nodules contributes to making the CT analysis more accurate and with more efficiency. Recent studies from other groups have been focusing on lung cancer diagnosis CAD system by detecting medium to large nodules. However, to fully investigate the relevance between nodule features and cancer diagnosis, a CAD that is capable of detecting nodules with all sizes is needed. In this paper, we present a deep-learning based automatic all size pulmonary nodule detection system by cascading two artificial neural networks. We firstly use a U-net like 3D network to generate nodule candidates from CT images. Then, we use another 3D neural network to refine the locations of the nodule candidates generated from the previous subsystem. With the second sub-system, we bring the nodule candidates closer to the center of the ground truth nodule locations. We evaluate our system on a public CT dataset provided by the Lung Nodule Analysis (LUNA) 2016 grand challenge. The performance on the testing dataset shows that our system achieves 90% sensitivity with an average of 4 false positives per scan. This indicates that our system can be an aid for automatic nodule detection, which is beneficial for lung cancer diagnosis.
An evaluation of consensus techniques for diagnostic interpretation
Learning diagnostic labels from image content has been the standard in computer-aided diagnosis. Most computer-aided diagnosis systems use low-level image features extracted directly from image content to train and test machine learning classifiers for diagnostic label prediction. When the ground truth for the diagnostic labels is not available, reference truth is generated from the experts diagnostic interpretations of the image/region of interest. More specifically, when the label is uncertain, e.g. when multiple experts label an image and their interpretations are different, techniques to handle the label variability are necessary. In this paper, we compare three consensus techniques that are typically used to encode the variability in the experts labeling of the medical data: mean, median and mode, and their effects on simple classifiers that can handle deterministic labels (decision trees) and probabilistic vectors of labels (belief decision trees). Given that the NIH/NCI Lung Image Database Consortium (LIDC) data provides interpretations for lung nodules by up to four radiologists, we leverage the LIDC data to evaluate and compare these consensus approaches when creating computer-aided diagnosis systems for lung nodules. First, low-level image features of nodules are extracted and paired with their radiologists semantic ratings (1= most likely benign, , 5 = most likely malignant); second, machine learning multi-class classifiers that handle deterministic labels (decision trees) and probabilistic vectors of labels (belief decision trees) are built to predict the lung nodules semantic ratings. We show that the mean-based consensus generates the most robust classi- fier overall when compared to the median- and mode-based consensus. Lastly, the results of this study show that, when building CAD systems with uncertain diagnostic interpretation, it is important to evaluate different strategies for encoding and predicting the diagnostic label.
Lung nodule detection from CT scans using 3D convolutional neural networks without candidate selection
Natalia M. Jenuwine, Sunny N. Mahesh, Jacob D. Furst, et al.
Early detection of lung nodules from CT scans is key to improving lung cancer treatment, but poses a significant challenge for radiologists due to the high throughput required of them. Computer-Aided Detection (CADe) systems aim to automatically detect these nodules with computer algorithms, thus improving diagnosis. These systems typically use a candidate selection step, which identifies all objects that resemble nodules, followed by a machine learning classifier which separates true nodules from false positives. We create a CADe system that uses a 3D convolutional neural network (CNN) to detect nodules in CT scans without a candidate selection step. Using data from the LIDC database, we train a 3D CNN to analyze subvolumes from anywhere within a CT scan and output the probability that each subvolume contains a nodule. Once trained, we apply our CNN to detect nodules from entire scans, by systematically dividing the scan into overlapping subvolumes which we input into the CNN to obtain the corresponding probabilities. By enabling our network to process an entire scan, we expect to streamline the detection process while maintaining its effectiveness. Our results imply that with continued training using an iterative training scheme, the one-step approach has the potential to be highly effective.
An automatically generated texture-based atlas of the lungs
Yashin Dicente Cid, Oula Puonti, Alexandra Platon, et al.
Many pulmonary diseases can be characterized by visual abnormalities on lung CT scans. Some diseases manifest similar defects but require completely different treatments, as is the case for Pulmonary Hypertension (PH) and Pulmonary Embolism (PE): both present hypo- and hyper-perfused regions but with different distribution across the lung and require different treatment protocols. Finding these distributions by visual inspection is not trivial even for trained radiologists who currently use invasive catheterism to diagnose PH. A Computer-Aided Diagnosis (CAD) tool that could facilitate the non-invasive diagnosis of these diseases can benefit both the radiologists and the patients. Most of the visual differences in the parenchyma can be characterized using texture descriptors. Current CAD systems often use texture information but the texture is either computed in a patch-based fashion, or based on an anatomical division of the lung. The difficulty of precisely finding these divisions in abnormal lungs calls for new tools for obtaining new meaningful divisions of the lungs. In this paper we present a method for unsupervised segmentation of lung CT scans into subregions that are similar in terms of texture and spatial proximity. To this extent, we combine a previously validated Riesz-wavelet texture descriptor with a well-known superpixel segmentation approach that we extend to 3D. We demonstrate the feasibility and accuracy of our approach on a simulated texture dataset, and show preliminary results for CT scans of the lung comparing subjects suffering either from PH or PE. The resulting texture-based atlas of individual lungs can potentially help physicians in diagnosis or be used for studying common texture distributions related to other diseases.
A coarse-to-fine approach for pericardial effusion localization and segmentation in chest CT scans
Jiamin Liu, Karthik Chellamuthu, Le Lu, et al.
Pericardial effusion on CT scans demonstrates very high shape and volume variability and very low contrast to adjacent structures. This inhibits traditional automated segmentation methods from achieving high accuracies. Deep neural networks have been widely used for image segmentation in CT scans. In this work, we present a two-stage method for pericardial effusion localization and segmentation. For the first step, we localize the pericardial area from the entire CT volume, providing a reliable bounding box for the more refined segmentation step. A coarse-scaled holistically-nested convolutional networks (HNN) model is trained on entire CT volume. The resulting HNN per-pixel probability maps are then threshold to produce a bounding box covering the pericardial area. For the second step, a fine-scaled HNN model is trained only on the bounding box region for effusion segmentation to reduce the background distraction. Quantitative evaluation is performed on a dataset of 25 CT scans of patient (1206 images) with pericardial effusion. The segmentation accuracy of our two-stage method, measured by Dice Similarity Coefficient (DSC), is 75.59±12.04%, which is significantly better than the segmentation accuracy (62.74±15.20%) of only using the coarse-scaled HNN model.
Poster Session: Musculoskeletal and Skin
icon_mobile_dropdown
Automated quasi-3D spine curvature quantification and classification
Rupal Khilari, Juris Puchin, Kazunori Okada
Scoliosis is a highly prevalent spine deformity that has traditionally been diagnosed through measurement of the Cobb angle on radiographs. More recent technology such as the commercial EOS imaging system, although more accurate, also require manual intervention for selecting the extremes of the vertebrae forming the Cobb angle. This results in a high degree of inter and intra observer error in determining the extent of spine deformity. Our primary focus is to eliminate the need for manual intervention by robustly quantifying the curvature of the spine in three dimensions, making it consistent across multiple observers. Given the vertebrae centroids, the proposed Vertebrae Sequence Angle (VSA) estimation and segmentation algorithm finds the largest angle between consecutive pairs of centroids within multiple inflection points on the curve. To exploit existing clinical diagnostic standards, the algorithm uses a quasi-3-dimensional approach considering the curvature in the coronal and sagittal projection planes of the spine. Experiments were performed with manuallyannotated ground-truth classification of publicly available, centroid-annotated CT spine datasets. This was compared with the results obtained from manual Cobb and Centroid angle estimation methods. Using the VSA, we then automatically classify the occurrence and the severity of spine curvature based on Lenke’s classification for idiopathic scoliosis. We observe that the results appear promising with a scoliotic angle lying within ± 9° of the Cobb and Centroid angle, and vertebrae positions differing by at the most one position. Our system also resulted in perfect classification of scoliotic from healthy spines with our dataset with six cases.
Computer aided detection system for Osteoporosis using low dose thoracic 3D CT images
The patient of osteoporosis is about 13 million people in Japan and it is one of healthy life problems in the aging society. It is necessary to do early stage detection and treatment in order to prevent the osteoporosis. Multi-slice CT technology has been improving the three dimensional (3D) image analysis with higher resolution and shorter scan time. The 3D image analysis of thoracic vertebra can be used for supporting to diagnosis of osteoporosis. This analysis can be used for lung cancer detection at the same time. We develop method of shape analysis and CT values of spongy bone for the detection osteoporosis. Osteoporosis and lung cancer screening show high extraction rate by the thoracic vertebral evaluation CT images. In addition, we created standard pattern of CT value per thoracic vertebra for male age group using 298 low dose data.
Automatic detection of anatomical regions in frontal x-ray images: comparing convolutional neural networks to random forest
R. Olory Agomma, C. Vázquez, T. Cresson, et al.
Most algorithms to detect and identify anatomical structures in medical images require either to be initialized close to the target structure, or to know that the structure is present in the image, or to be trained on a homogeneous database (e.g. all full body or all lower limbs). Detecting these structures when there is no guarantee that the structure is present in the image, or when the image database is heterogeneous (mixed configurations), is a challenge for automatic algorithms. In this work we compared two state-of-the-art machine learning techniques in order to determine which one is the most appropriate for predicting targets locations based on image patches. By knowing the position of thirteen landmarks points, labelled by an expert in EOS frontal radiography, we learn the displacement between salient points detected in the image and these thirteen landmarks. The learning step is carried out with a machine learning approach by exploring two methods: Convolutional Neural Network (CNN) and Random Forest (RF). The automatic detection of the thirteen landmarks points in a new image is then obtained by averaging the positions of each one of these thirteen landmarks estimated from all the salient points in the new image. We respectively obtain for CNN and RF, an average prediction error (both mean and standard deviation in mm) of 29 ±18 and 30 ± 21 for the thirteen landmarks points, indicating the approximate location of anatomical regions. On the other hand, the learning time is 9 days for CNN versus 80 minutes for RF. We provide a comparison of the results between the two machine learning approaches.
Poster Session: Quantitative
icon_mobile_dropdown
Applying a CAD-generated imaging marker to assess short-term breast cancer risk
Although whether using computer-aided detection (CAD) helps improve radiologists’ performance in reading and interpreting mammograms is controversy due to higher false-positive detection rates, objective of this study is to investigate and test a new hypothesis that CAD-generated false-positives, in particular, the bilateral summation of false-positives, is a potential imaging marker associated with short-term breast cancer risk. An image dataset involving negative screening mammograms acquired from 1,044 women was retrospectively assembled. Each case involves 4 images of craniocaudal (CC) and mediolateral oblique (MLO) view of the left and right breasts. In the next subsequent mammography screening, 402 cases were positive for cancer detected and 642 remained negative. A CAD scheme was applied to process all “prior” negative mammograms. Some features from CAD scheme were extracted, which include detection seeds, the total number of false-positive regions, an average of detection scores and the sum of detection scores in CC and MLO view images. Then the features computed from two bilateral images of left and right breasts from either CC or MLO view were combined. In order to predict the likelihood of each testing case being positive in the next subsequent screening, two logistic regression models were trained and tested using a leave-one-case-out based cross-validation method. Data analysis demonstrated the maximum prediction accuracy with an area under a ROC curve of AUC=0.65±0.017 and the maximum adjusted odds ratio of 4.49 with a 95% confidence interval of [2.95, 6.83]. The results also illustrated an increasing trend in the adjusted odds ratio and risk prediction scores (p<0.01). Thus, the study showed that CAD-generated false-positives might provide a new quantitative imaging marker to help assess short-term breast cancer risk.
Asymmetry quantification from reflectance images of orthotic patients using structural similarity metrics
Marc-Antoine Boucher, Nicolas Watts, Frederic Gremillet, et al.
Pathologies like plantar fasciitis, a common soft tissue disorder of the foot, is frequently associated with older age, high BMI and little exercise. Like other pathologies associated with the foot, the knee or hip, foot orthoses can help the patient’s posture and recent techniques allow the creation of personalized foot orthoses based on 3D foot model that are fitted with high accuracy to the foot surface. In order to assess the efficacy of the personalized orthoses on the patient’s pose and balance, depth images with reflectance camera filters are acquired in order to evaluate the posture of the patient before and after the use of the orthoses. Images are analysed by clinicians to assess the region asymmetry and posture changes. However, this remains a subjective evaluation and a quantifiable measurement is required to follow patient progression. In this paper, we present a novel tool to assess and quantify the asymmetry of body regions using a color-based structural similarity metric calculated from paired regions. This provides a quantitative measure to evaluate the effect of the personalized orthoses on the patient. A user-friendly interface allows the user to select an area of the body and automatically generate a symmetry axis, along with a measure of asymmetry measuring reflectance variations from the skin. The tool was validated on 30 patients, demonstrating an 83% agreement rate compare to clinical observations.
Quantitative characterization of liver tumor radiodensity in CT images: a phantom study between two scanners
Benjamin Paul Berman, Qin Li, Sarah McKenney, et al.
Quantitative assessment of tumor radiodensity is important for the clinical evaluation of contrast enhancement and treatment response, as well as for the extraction of texture-related features for image analysis or radiomics. Radiodensity estimation, Hounsfield Units (HU) in CT images, can be affected by patient factors such as tumor size, and by system factors such as acquisition and reconstruction protocols. In this project, we quantified the measurability of liver tumor HU using a 3D-printed phantom, imaged with two CT systems: Siemens Somatom Force and GE Lightspeed VCT. The phantom was printed by dithering two materials to create spherical tumors (10, 14 mm) with uniform densities (90, 95, 100, 105 HU). Image datasets were acquired at 120 kVp including 15 repeats using two matching exposures across the CT systems, and reconstructed using comparable algorithms. The radiodensity of each tumor was measured using an automated matched-filter method. We assessed the performance of each protocol using the area under the ROC curve (AUC) as the metric for distinguishing between tumors with different radiodensities. The AUC ranged from 0.8 to 1.0 and was affected by tumor size, radiodensity, and scanner; the lowest AUC values corresponded to low dose measurements of 10 mm tumors with less than 5 HU difference. The two scanners exhibited similar performance >0.9 AUC for large lesions with contrast above 7 HU, though differences were observed for the smallest and lowest contrast tumors. These results show that HU estimation should be carefully examined, considering that uncertainty in the tumor radiodensity may propagate to quantification of other characteristics, such as size and texture.
Deep convolutional neural network for mammographic density segmentation
Jun Wei, Songfeng Li, Heang-Ping Chan, et al.
Breast density is one of the most significant factors for cancer risk. In this study, we proposed a supervised deep learning approach for automated estimation of percentage density (PD) on digital mammography (DM). The deep convolutional neural network (DCNN) was trained to estimate a probability map of breast density (PMD). PD was calculated as the ratio of the dense area to the breast area based on the probability of each pixel belonging to dense region or fatty region at a decision threshold of 0.5. The DCNN estimate was compared to a feature-based statistical learning approach, in which gray level, texture and morphological features were extracted from each ROI and the least absolute shrinkage and selection operator (LASSO) was used to select and combine the useful features to generate the PMD. The reference PD of each image was provided by two experienced MQSA radiologists. With IRB approval, we retrospectively collected 347 DMs from patient files at our institution. The 10-fold cross-validation results showed a strong correlation r=0.96 between the DCNN estimation and interactive segmentation by radiologists while that of the feature-based statistical learning approach vs radiologists’ segmentation had a correlation r=0.78. The difference between the segmentation by DCNN and by radiologists was significantly smaller than that between the feature-based learning approach and radiologists (p < 0.0001) by two-tailed paired t-test. This study demonstrated that the DCNN approach has the potential to replace radiologists’ interactive thresholding in PD estimation on DMs.
Quantitative assessment for pneumoconiosis severity diagnosis using 3D CT images
Koki Hino, Mikio Matsuhiro, Hidenobu Suzuki, et al.
Pneumoconiosis is an occupational respiratory illness that occur by inhaling dust to the lungs. 240,000 participants are screened for diagnosis of pneumoconiosis every year in Japan. Radiograph is used for staging of severity rate in pneumoconiosis worldwide. CT imaging is useful for the differentiation of requirements for industrial accident approval because it can detect small lesions in comparison with radiograph. In this paper, we extracted lung nodules from 3D pneumoconiosis CT images by two manual processes and automatic process, and created a database of pneumoconiosis CT images. We used the database to analyze, compare, and evaluate visual diagnostic results of radiographs and quantitative assessment (number, size and volume) of lung nodules. This method was applied to twenty pneumoconiosis patients. Initial results showed that the proposed method can assess severity rate in pneumoconiosis quantitatively. This study demonstrates effectiveness on diagnosis and prognosis of pneumoconiosis in CT screening.
Quantitative analysis of adipose tissue on chest CT to predict primary graft dysfunction in lung transplant recipients: a novel optimal biomarker approach
In this study, patients who underwent lung transplantation are categorized into two groups of successful (positive) or failed (negative) transplantations according to primary graft dysfunction (PGD), i.e., acute lung injury within 72 hours of lung transplantation. Obesity or being underweight is associated with an increased risk of PGD. Adipose quantification and characterization via computed tomography (CT) imaging is an evolving topic of interest. However, very little research of PGD prediction using adipose quantity or characteristics derived from medical images has been performed.

The aim of this study is to explore image-based features of thoracic adipose tissue on pre-operative chest CT to distinguish between the above two groups of patients. 140 unenhanced chest CT images from three lung transplant centers (Columbia, Penn, and Duke) are included in this study. 124 patients are in the successful group and 16 in failure group. Chest CT slices at the T7 and T8 vertebral levels are captured to represent the thoracic fat burden by using a standardized anatomic space (SAS) approach. Fat (subcutaneous adipose tissue (SAT)/ visceral adipose tissue (VAT)) intensity and texture properties (1142 in total) for each patient are collected, and then an optimal feature set is selected to maximize feature independence and separation between the two groups. Leave-one-out and leave-ten-out crossvalidation strategies are adopted to test the prediction ability based on those selected features all of which came from VAT texture properties. Accuracy of prediction (ACC), sensitivity (SEN), specificity (SPE), and area under the curve (AUC) of 0.87/0.97, 0.87/0.97, 0.88/1.00, and 0.88/0.99, respectively are achieved by the method. The optimal feature set includes only 5 features (also all from VAT), which might suggest that thoracic VAT plays a more important role than SAT in predicting PGD in lung transplant recipients.
Impact of deep learning on the normalization of reconstruction kernel effects in imaging biomarker quantification: a pilot study in CT emphysema
Differing reconstruction kernels are known to strongly affect the variability of imaging biomarkers and thus remain as a barrier in translating the computer aided quantification techniques into clinical practice. This study presents a deep learning application to CT kernel conversion which converts a CT image of sharp kernel to that of standard kernel and evaluates its impact on variability reduction of a pulmonary imaging biomarker, the emphysema index (EI). Forty cases of low-dose chest CT exams obtained with 120kVp, 40mAs, 1mm thickness, of 2 reconstruction kernels (B30f, B50f) were selected from the low dose lung cancer screening database of our institution. A Fully convolutional network was implemented with Keras deep learning library. The model consisted of symmetric layers to capture the context and fine structure characteristics of CT images from the standard and sharp reconstruction kernels. Pairs of the full-resolution CT data set were fed to input and output nodes to train the convolutional network to learn the appropriate filter kernels for converting the CT images of sharp kernel to standard kernel with a criterion of measuring the mean squared error between the input and target images. EIs (RA950 and Perc15) were measured with a software package (ImagePrism Pulmo, Seoul, South Korea) and compared for the data sets of B50f, B30f, and the converted B50f. The effect of kernel conversion was evaluated with the mean and standard deviation of pair-wise differences in EI. The population mean of RA950 was 27.65 ± 7.28% for B50f data set, 10.82 ± 6.71% for the B30f data set, and 8.87 ± 6.20% for the converted B50f data set. The mean of pair-wise absolute differences in RA950 between B30f and B50f is reduced from 16.83% to 1.95% using kernel conversion. Our study demonstrates the feasibility of applying the deep learning technique for CT kernel conversion and reducing the kernel-induced variability of EI quantification. The deep learning model has a potential to improve the reliability of imaging biomarker, especially in evaluating the longitudinal changes of EI even when the patient CT scans were performed with different kernels.
Lung parenchymal analysis on dynamic MRI in thoracic insufficiency syndrome to assess changes following surgical intervention
General surgeons, orthopedists, and pulmonologists individually treat patients with thoracic insufficiency syndrome (TIS). The benefits of growth-sparing procedures such as Vertical Expandable Prosthetic Titanium Rib (VEPTR)insertionfor treating patients with TIS have been demonstrated. However, at present there is no objective assessment metricto examine different thoracic structural components individually as to their roles in the syndrome, in contributing to dynamics and function, and in influencing treatment outcome. Using thoracic dynamic MRI (dMRI), we have been developing a methodology to overcome this problem. In this paper, we extend this methodology from our previous structural analysis approaches to examining lung tissue properties. We process the T2-weighted dMRI images through a series of steps involving 4D image construction of the acquired dMRI images, intensity non-uniformity correction and standardization of the 4D image, lung segmentation, and estimation of the parameters describing lung tissue intensity distributions in the 4D image. Based on pre- and post-operative dMRI data sets from 25 TIS patients (predominantly neuromuscular and congenital conditions), we demonstrate how lung tissue can be characterized by the estimated distribution parameters. Our results show that standardized T2-weighted image intensity values decrease from the pre- to post-operative condition, likely reflecting improved lung aeration post-operatively. In both pre- and post-operative conditions, the intensity values decrease also from end-expiration to end-inspiration, supporting the basic premise of our results.
Applying a new mammographic imaging marker to predict breast cancer risk
Faranak Aghaei, Gopichandh Danala, Alan B. Hollingsworth, et al.
Identifying and developing new mammographic imaging markers to assist prediction of breast cancer risk has been attracting extensive research interest recently. Although mammographic density is considered an important breast cancer risk, its discriminatory power is lower for predicting short-term breast cancer risk, which is a prerequisite to establish a more effective personalized breast cancer screening paradigm. In this study, we presented a new interactive computer-aided detection (CAD) scheme to generate a new quantitative mammographic imaging marker based on the bilateral mammographic tissue density asymmetry to predict risk of cancer detection in the next subsequent mammography screening. An image database involving 1,397 women was retrospectively assembled and tested. Each woman had two digital mammography screenings namely, the “current” and “prior” screenings with a time interval from 365 to 600 days. All “prior” images were originally interpreted negative. In “current” screenings, these cases were divided into 3 groups, which include 402 positive, 643 negative, and 352 biopsy-proved benign cases, respectively. There is no significant difference of BIRADS based mammographic density ratings between 3 case groups (p < 0.6). When applying the CAD-generated imaging marker or risk model to classify between 402 positive and 643 negative cases using “prior” negative mammograms, the area under a ROC curve is 0.70±0.02 and the adjusted odds ratios show an increasing trend from 1.0 to 8.13 to predict the risk of cancer detection in the “current” screening. Study demonstrated that this new imaging marker had potential to yield significantly higher discriminatory power to predict short-term breast cancer risk.
Poster Session: Radiomics
icon_mobile_dropdown
Quantitative CT based radiomics as predictor of resectability of pancreatic adenocarcinoma
In current clinical practice, the resectability of pancreatic ductal adenocarcinoma (PDA) is determined subjec- tively by a physician, which is an error-prone procedure. In this paper, we present a method for automated determination of resectability of PDA from a routine abdominal CT, to reduce such decision errors. The tumor features are extracted from a group of patients with both hypo- and iso-attenuating tumors, of which 29 were resectable and 21 were not. The tumor contours are supplied by a medical expert. We present an approach that uses intensity, shape, and texture features to determine tumor resectability. The best classification results are obtained with fine Gaussian SVM and the L0 Feature Selection algorithms. Compared to expert predictions made on the same dataset, our method achieves better classification results. We obtain significantly better results on correctly predicting non-resectability (+17%) compared to a expert, which is essential for patient treatment (negative prediction value). Moreover, our predictions of resectability exceed expert predictions by approximately 3% (positive prediction value).
Stability of deep features across CT scanners and field of view using a physical phantom
Rahul Paul, Muhammad Shafiq-ul-Hassan, Eduardo G. Moros, et al.
Radiomics is the process of analyzing radiological images by extracting quantitative features for monitoring and diagnosis of various cancers. Analyzing images acquired from different medical centers is confounded by many choices in acquisition, reconstruction parameters and differences among device manufacturers. Consequently, scanning the same patient or phantom using various acquisition/reconstruction parameters as well as different scanners may result in different feature values. To further evaluate this issue, in this study, CT images from a physical radiomic phantom were used. Recent studies showed that some quantitative features were dependent on voxel size and that this dependency could be reduced or removed by the appropriate normalization factor. Deep features extracted from a convolutional neural network, may also provide additional features for image analysis. Using a transfer learning approach, we obtained deep features from three convolutional neural networks pre-trained on color camera images. An we examination of the dependency of deep features on image pixel size was done. We found that some deep features were pixel size dependent, and to remove this dependency we proposed two effective normalization approaches. For analyzing the effects of normalization, a threshold has been used based on the calculated standard deviation and average distance from a best fit horizontal line among the features’ underlying pixel size before and after normalization. The inter and intra scanner dependency of deep features has also been evaluated.
Temporal assessment of radiomic features on clinical mammography in a high-risk population
Extraction of high-dimensional quantitative data from medical images has become necessary in disease risk assessment, diagnostics and prognostics. Radiomic workflows for mammography typically involve a single medical image for each patient although medical images may exist for multiple imaging exams, especially in screening protocols. Our study takes advantage of the availability of mammograms acquired over multiple years for the prediction of cancer onset. This study included 841 images from 328 patients who developed subsequent mammographic abnormalities, which were confirmed as either cancer (n=173) or non-cancer (n=155) through diagnostic core needle biopsy. Quantitative radiomic analysis was conducted on antecedent FFDMs acquired a year or more prior to diagnostic biopsy. Analysis was limited to the breast contralateral to that in which the abnormality arose. Novel metrics were used to identify robust radiomic features. The most robust features were evaluated in the task of predicting future malignancies on a subset of 72 subjects (23 cancer cases and 49 non-cancer controls) with mammograms over multiple years. Using linear discriminant analysis, the robust radiomic features were merged into predictive signatures by: (i) using features from only the most recent contralateral mammogram, (ii) change in feature values between mammograms, and (iii) ratio of feature values over time, yielding AUCs of 0.57 (SE=0.07), 0.63 (SE=0.06), and 0.66 (SE=0.06), respectively. The AUCs for temporal radiomics (ratio) statistically differed from chance, suggesting that changes in radiomics over time may be critical for risk assessment. Overall, we found that our two-stage process of robustness assessment followed by performance evaluation served well in our investigation on the role of temporal radiomics in risk assessment.
Deep radiomic prediction with clinical predictors of the survival in patients with rheumatoid arthritis-associated interstitial lung diseases
Radin A. Nasirudina, Janne J. Näppi, Chinatsu Watari M.D., et al.
We developed and evaluated the effect of our deep-learning-derived radiomic features, called deep radiomic features (DRFs), together with their combination with clinical predictors, on the prediction of the overall survival of patients with rheumatoid arthritis-associated interstitial lung disease (RA-ILD). We retrospectively identified 70 RA-ILD patients with thin-section lung CT and pulmonary function tests. An experienced observer delineated regions of interest (ROIs) from the lung regions on the CT images, and labeled them into one of four ILD patterns (ground-class opacity, reticulation, consolidation, and honeycombing) or a normal pattern. Small image patches centered at individual pixels on these ROIs were extracted and labeled with the class of the ROI to which the patch belonged. A deep convolutional neural network (DCNN), which consists of a series of convolutional layers for feature extraction and a series of fully connected layers, was trained and validated with 5-fold cross-validation for classifying the image patches into one of the above five patterns. A DRF vector for each patch was identified as the output of the last convolutional layer of the DCNN. Statistical moments of each element of the DRF vectors were computed to derive a DRF vector that characterizes the patient. The DRF vector was subjected to a Cox proportional hazards model with elastic-net penalty for predicting the survival of the patient. Evaluation was performed by use of bootstrapping with 2,000 replications, where concordance index (C-index) was used as a comparative performance metric. Preliminary results on clinical predictors, DRFs, and their combinations thereof showed (a) Gender and Age: C-index 64.8% [95% confidence interval (CI): 51.7, 77.9]; (b) gender, age, and physiology (GAP index): C-index: 78.5% [CI: 70.50 86.51], P < 0.0001 in comparison with (a); (c) DRFs: C-index 85.5% [CI: 73.4, 99.6], P < 0.0001 in comparison with (b); and (d) DRF and GAP: C-index 91.0% [CI: 84.6, 97.2], P < 0.0001 in comparison with (c). Kaplan-Meier survival curves of patients stratified to low- and high-risk groups based on the DRFs showed a statistically significant (P < 0.0001) difference. The DRFs outperform the clinical predictors in predicting patient survival, and a combination of the DRFs and GAP index outperforms either one of these predictors. Our results indicate that the DRFs and their combination with clinical predictors provide an accurate prognostic biomarker for patients with RA-ILD.
Radiomic biomarkers from PET/CT multi-modality fusion images for the prediction of immunotherapy response in advanced non-small cell lung cancer patients
Wei Mu, Jin Qi, Hong Lu, et al.
Purpose: Investigate the ability of using complementary information provided by the fusion of PET/CT images to predict immunotherapy response in non-small cell lung cancer (NSCLC) patients. Materials and methods: We collected 64 patients diagnosed with primary NSCLC treated with anti PD-1 checkpoint blockade. Using PET/CT images, fused images were created following multiple methodologies, resulting in up to 7 different images for the tumor region. Quantitative image features were extracted from the primary image (PET/CT) and the fused images, which included 195 from primary images and 1235 features from the fusion images. Three clinical characteristics were also analyzed. We then used support vector machine (SVM) classification models to identify discriminant features that predict immunotherapy response at baseline. Results: A SVM built with 87 fusion features and 13 primary PET/CT features on validation dataset had an accuracy and area under the ROC curve (AUROC) of 87.5% and 0.82, respectively, compared to a model built with 113 original PET/CT features on validation dataset 78.12% and 0.68. Conclusion: The fusion features shows better ability to predict immunotherapy response prediction compared to individual image features.
Classification of brain tumors using texture based analysis of T1-post contrast MR scans in a preclinical model
Tien T. Tang, Janice A. Zawaski, Kathleen N. Francis, et al.
Accurate diagnosis of tumor type is vital for effective treatment planning. Diagnosis relies heavily on tumor biopsies and other clinical factors. However, biopsies do not fully capture the tumor’s heterogeneity due to sampling bias and are only performed if the tumor is accessible. An alternative approach is to use features derived from routine diagnostic imaging such as magnetic resonance (MR) imaging. In this study we aim to establish the use of quantitative image features to classify brain tumors and extend the use of MR images beyond tumor detection and localization. To control for interscanner, acquisition and reconstruction protocol variations, the established workflow was performed in a preclinical model. Using glioma (U87 and GL261) and medulloblastoma (Daoy) models, T1-weighted post contrast scans were acquired at different time points post-implant. The tumor regions at the center, middle, and peripheral were analyzed using in-house software to extract 32 different image features consisting of first and second order features. The extracted features were used to construct a decision tree, which could predict tumor type with 10-fold cross-validation. Results from the final classification model demonstrated that middle tumor region had the highest overall accuracy at 79%, while the AUC accuracy was over 90% for GL261 and U87 tumors. Our analysis further identified image features that were unique to certain tumor region, although GL261 tumors were more homogenous with no significant differences between the central and peripheral tumor regions. In conclusion our study shows that texture features derived from MR scans can be used to classify tumor type with high success rates. Furthermore, the algorithm we have developed can be implemented with any imaging datasets and may be applicable to multiple tumor types to determine diagnosis.
Radiomics for ultrafast dynamic contrast-enhanced breast MRI in the diagnosis of breast cancer: a pilot study
Karen Drukker, Rachel Anderson, Alexandra Edwards, et al.
Radiomics for dynamic contrast-enhanced (DCE) breast MRI have shown promise in the diagnosis of breast cancer as applied to conventional DCE-MRI protocols. Here, we investigate the potential of using such radiomic features in the diagnosis of breast cancer applied on ultrafast breast MRI in which images are acquired every few seconds. The dataset consisted of 64 lesions (33 malignant and 31 benign) imaged with both ‘conventional’ and ultrafast DCE-MRI. After automated lesion segmentation in each image sequence, we calculated 38 radiomic features categorized as describing size, shape, margin, enhancement-texture, kinetics, and enhancement variance kinetics. For each feature, we calculated the 95% confidence interval of the area under the ROC curve (AUC) to determine whether the performance of each feature in the task of distinguishing between malignant and benign lesions was better than random guessing. Subsequently, we assessed performance of radiomic signatures in 10-fold cross-validation repeated 10 times using a support vector machine with as input all the features as well as features by category. We found that many of the features remained useful (AUC>0.5) for the ultrafast protocol, with the exception of some features, e.g., those designed for latephase kinetics such as the washout rate. For ultrafast MRI, the radiomics enhancement-texture signature achieved the best performance, which was comparable to that of the kinetics signature for ‘conventional’ DCE-MRI, both achieving AUC values of 0.71. Radiomic developed for ‘conventional’ DCE-MRI shows promise for translation to the ultrafast protocol, where enhancement texture appears to play a dominant role.
Reduction in training time of a deep learning model in detection of lesions in CT
Nazanin Makkinejad, Nima Tajbakhsh, Amin Zarshenas, et al.
Deep learning (DL) emerged as a powerful tool for object detection and classification in medical images. Building a well-performing DL model, however, requires a huge number of images for training, and it takes days to train a DL model even on a cutting edge high-performance computing platform. This study is aimed at developing a method for selecting a “small” number of representative samples from a large collection of training samples to train a DL model for the could be used to detect polyps in CT colonography (CTC), without compromising the classification performance. Our proposed method for representative sample selection (RSS) consists of a K-means clustering algorithm. For the performance evaluation, we applied the proposed method to select samples for the training of a massive training artificial neural network based DL model, to be used for the classification of polyps and non-polyps in CTC. Our results show that the proposed method reduce the training time by a factor of 15, while maintaining the classification performance equivalent to the model trained using the full training set. We compare the performance using area under the receiveroperating- characteristic curve (AUC).
The effects of variations in parameters and algorithm choices on calculated radiomics feature values: initial investigations and comparisons to feature variability across CT image acquisition conditions
Translation of radiomics into clinical practice requires confidence in its interpretations. This may be obtained via understanding and overcoming the limitations in current radiomic approaches. Currently there is a lack of standardization in radiomic feature extraction. In this study we examined a few factors that are potential sources of inconsistency in characterizing lung nodules, such as 1)different choices of parameters and algorithms in feature calculation, 2)two CT image dose levels, 3)different CT reconstruction algorithms (WFBP, denoised WFBP, and Iterative). We investigated the effect of variation of these factors on entropy textural feature of lung nodules. CT images of 19 lung nodules identified from our lung cancer screening program were identified by a CAD tool and contours provided. The radiomics features were extracted by calculating 36 GLCM based and 4 histogram based entropy features in addition to 2 intensity based features. A robustness index was calculated across different image acquisition parameters to illustrate the reproducibility of features. Most GLCM based and all histogram based entropy features were robust across two CT image dose levels. Denoising of images slightly improved robustness of some entropy features at WFBP. Iterative reconstruction resulted in improvement of robustness in a fewer times and caused more variation in entropy feature values and their robustness. Within different choices of parameters and algorithms texture features showed a wide range of variation, as much as 75% for individual nodules. Results indicate the need for harmonization of feature calculations and identification of optimum parameters and algorithms in a radiomics study.