Proceedings Volume 10989

Big Data: Learning, Analytics, and Applications

cover
Proceedings Volume 10989

Big Data: Learning, Analytics, and Applications

Purchase the printed version of this volume at proceedings.com or access the digital version at SPIE Digital Library.

Volume Details

Date Published: 26 July 2019
Contents: 7 Sessions, 22 Papers, 13 Presentations
Conference: SPIE Defense + Commercial Sensing 2019
Volume Number: 10989

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 10989
  • Big Data Analytics, Modeling, and Processing I
  • Machine Learning for Big Data
  • Big Data Analytics, Modeling, and Processing II
  • Signal Processing for Big Data
  • Tensor Methods for Signal Processing and Machine Learning
  • Poster Session
Front Matter: Volume 10989
icon_mobile_dropdown
Front Matter: Volume 10989
This PDF file contains the front matter associated with SPIE Proceedings Volume 10989, including the title page, copyright information, table of contents, and author and conference committee lists.
Big Data Analytics, Modeling, and Processing I
icon_mobile_dropdown
A big data inspired preprocessing scheme for bandwidth use optimization in smart cities applications using Raspberry Pi
The advancement of Internet of Things (IoT) technologies, such as low-cost embedded single board computers which integrate sensors, communication hardware, and processing power in one unit, has given more traction to the concept of Smart Cities. Having cheaper processing power at their disposal, the sensing units are capable of gathering increasingly larger amounts of raw data locally, which must be processed before being usable. One concern for this scheme is the amount of infrastructure and network bandwidth needed to transfer the data from the acquisition location to a server, which may be miles away, for further processing. The bandwidth available to the sensor network, distributed through the city, is expanding in a lower rate than the size and bandwidth demand of the network it serves. Therefore, transferring the unprocessed data to a central server does not seem feasible unless major compromises are made in terms of data resolution and size. This paper proposes a local big data based preprocessing scheme before the data is transferred to the storage. Using this scheme can free up the network bandwidth, exploit the otherwise wasted local processing power, and release processing load from the central server, allowing it to serve a larger network without the need for more powerful hardware. By making efficient use of network infrastructure the smart city applications are more affordable and scalable.
Edge-to-fog computing for color-assisted moving object detection
Future Internet-of-Things (IoT) will be featured by ubiquitous and pervasive vision sensors that generate enormous amount of streaming videos. The ability to analyze the big video data in a timely manner is essential to delay-sensitive applications, such as autonomous vehicles and body-worn cameras for police forces. Due to the limitation of computing power and storage capacity on local devices, the fog computing paradigm has been developed in recent years to process big sensor data closer to the end users while it avoids the transmission delay and huge uplink bandwidth requirements in cloud-based data analysis. In this work, we propose an edge-to-fog computing framework for object detection from surveillance videos. Videos are captured locally at an edge device and sent to fog nodes for color-assisted L1-subspace background modeling. The results are then sent back to the edge device for data fusion and final object detection. Experimental studies demonstrate that the proposed color-assisted background modeling offers more diversity than pure luminance based background modeling and hence achieves higher object detection accuracy. Meanwhile, the proposed edge-to-fog paradigm leverages the computing resources on multiple platforms.
Sparsity-based collaborative sensing in a scalable wireless network
In this paper, we propose a collaborative sensing scheme for source localization and imaging in an unmanned aerial vehicle (UAV) network. A two-stage image formation approach, which combines the robust adaptive beamforming technique and sparsity-based reconstruction strategy, is proposed to achieve accurate multi-source localization. In order to minimize the communication traffic in the UAV network, each UAV node only transmits the coarse-resolution image, in lieu of the large volume of raw sampled data. The proposed method maintains the robustness in the presence of model mismatch while providing a high-resolution image.
Near-field source localization with an underwater inflatable sonar co-prime array
Tongdi Zhou, Fauzia Ahmad, Bing Ouyang, et al.
We present a near-field source localization method for use with a recently proposed inflatable array-based sonar system. The sensing system, known as the Underwater Inflatable Co-prime Sonar Array, can be easily deployed from an unmanned underwater vehicle (UUV), as it can be stored as a folded package with compact stowed dimension. This package can be detached from the UUV and it can morph into its final structural form at the destination. The co-prime sparse array configuration offers the capability to resolve a much higher number of sources as compared to a conventional uniform half-wavelength-spaced array. Using real-data measurements in a water tank with a co-prime array prototype, we demonstrate the capability of the proposed system to localize wideband acoustic sources.
RF sensing for continuous monitoring of human activities for home consumer applications
Moeness G. Amin, Arun Ravisankar, Ronny G. Guendel
Radar for indoor monitoring is an emerging area of research and development, covering and supporting different health and wellbeing applications of smart homes, assisted living, and medical diagnosis. We report on a successful RF sensing system for home monitoring applications. The system recognizes Activities of Daily Living (ADL) and detects unique motion characteristics, using data processing and training algorithms. We also examine the challenges of continuously monitoring various human activities which can be categorized into translation motions (active mode) and in-place motions (resting mode). We use the range-map, offered by a range-Doppler radar, to obtain the transition time between these two categories, characterized by changing and constant range values, respectively. This is achieved using the Radon transform that identifies straight lines of different slopes in the range-map image. Over the in-place motion time intervals, where activities have insignificant or negligible range swath, power threshold of the radar return micro-Doppler signatures, which is employed to define the time-spans of individual activities with insignificant or negligible range swath. Finding both the transition times and the time-spans of the different motions leads to improved classifications, as it avoids decisions rendered over time windows covering mixed activities.
Machine Learning for Big Data
icon_mobile_dropdown
A compressed sensing approach to hyperspectral classification
C. J. Della Porta, Bernard Lampe, Adam Bekit, et al.
Although hyperspectral technology has continued to improve over the years, its use is often still limited due to size, weight and power (SWaP) constraints. One of the more taxing requirements, is the need to sample a large number of very fine spectral bands. The prohibitively large size of hyperpsectral data creates challenges in both archival and processing. Compressive sensing is an enabling technology for reducing the overall processing and SWaP requirements. This paper explores the viability of performing classification on sparsely sampled hyperspectral data without the need of performing sparse reconstruction. In particular, a spatial-spectral classifier based on a Support Vector Machine (SVM) and edge-preserving filters (EPFs) is applied directly in the compressed domain. The well-known Restricted Isometry Property (RIP) and a random spectral sampling strategy are used to evaluate analytically, the error between the compressed classifier and the full band classifier. The mathematical analysis presented shows that the classification error can be expressed in terms of the Restricted Isometry Constant (RIC) and that it is indeed possible to achieve full classification performance in the compressed domain, given that sufficient sampling conditions are met. A set of experiments are performed to empirically demonstrate compressed classification. Images from both the Airborne Visible / Infrared Imaging Spectrometer (AVIRIS) and Reflective Optics System Imaging Spectrometer (ROSIS) are examined to draw inferences on the impact of scene complexity. The results presented clearly demonstrate the possibility of compressed classification and lead to several open research questions to be addressed in future work.
Machine learning application to hydraulic fracturing
Giovanni Tamez, Danytza Castillo, Aaron Colmenero, et al.
Hydraulic fracturing has greatly impacted the oil and gas industry and is a large component of future oil production. Proper operation is dependent on supervisors monitoring data for signs of dangerous pressure spikes. Human error becomes a large factor in processing such a large amount of data in real time. The system proposed in this paper is able to read and interpret the data from a well to make accurate predictions on when potential pressure spikes will occur within the well, saving time and money on projects that are pushed to the limit.
Learning to rank faulty source files for dependent bug reports
Nasir Safdari, Hussein Alrubaye, Wajdi Aljedaani, et al.
With the rise of autonomous systems, the automation of faults detection and localization becomes critical to their reliability. An automated strategy that can provide a ranked list of faulty modules or files with respect to how likely they contain the root cause of the problem would help in the automation bug localization. Learning from the history if previously located bugs in general, and extracting the dependencies between these bugs in particular, helps in building models to accurately localize any potentially detected bugs. In this study, we propose a novel fault localization solution based on a learning-to-rank strategy, using the history of previously localized bugs and their dependencies as features, to rank files in terms of their likelihood of being a root cause of a bug. The evaluation of our approach has shown its efficiency in localizing dependent bugs.
Fake news identification: a comparison of parts-of-speech and N-grams with neural networks
The rise of the internet has enabled fake news to reach larger audiences more quickly. As more people turn to social media for news, the accuracy of information on these platforms is especially important. To help enable classification of the accuracy news articles at scale, machine learning models have been developed and trained to recognize fake articles. Previous linguistic work suggests part-of-speech and N-gram frequencies are often different between fake and real articles. To compare how these frequencies relate to the accuracy of the article, a dataset of 260 news articles, 130 fake and 130 real, was collected for training neural network classifiers. The first model relies solely on part-of-speech frequencies within the body of the text and consistently achieved 82% accuracy. As the proportion of the dataset used for training grew smaller, accuracy decreased, as expected. The true negative rate, however, remained high. Thus, some aspect of the fake articles was readily identifiable, even when the classifier was trained on a limited number of examples. The second model relies on the most commonly occurring N-gram frequencies. The neural nets were trained on N-grams of different length. Interestingly, the accuracy was near 61% for each N-gram size. This suggests some of the same information may be ascertainable across N-grams of different sizes.
Big Data Analytics, Modeling, and Processing II
icon_mobile_dropdown
Big data analytics in medical imaging using deep learning
Big data has been one of the hottest topics of scientific discussions in the recent years. In early 2000s, an industry analyst attempted to describe big data as the three Vs: Volume, Velocity, and Variability. With the new technologies such as Hadoop, it is now feasible to store and use extremely large volumes of data that comes in at an unprecedented velocity. The variability of this data can be large as it can come in different formats such as text documents, voice or video, and financial transactions. Big data analytics has been proven to be useful is various fields such as science, sports, advertising, health care, genomic sequence data, and medical imaging. This study presents a brief overview of big data analytics in medical imaging approaches with considering the importance of contemporary machine learning techniques such as deep learning.
Defense against adversarial spectroscopic attacks (Conference Presentation)
Raman spectra were perturbed such that an intentional misclassification was induced when using a dimension reduction classifier such as linear discriminate analysis (LDA). These perturbations were primarily targeted at patterning the noise within the spectra such that detection is difficult to detect by visual inspection. Data-intensive decisions are increasingly important to mine the increasing volume of information accessible by modern instrumentation. These decisions are conceptually performed through projection of measurements on high dimensional manifolds to low-dimensional outcomes. This dimension reduction provides suppression of stochastic random noise to better inform the decision. However, non-stochastic patterning of the “noise” can induce intentional misclassification that is difficult to easily detect by visual inspection. Such digital attacks could result in intentional changes in decisions made from many routine automated classifiers. Preliminary results using Raman spectra showed that misclassification can be induced by picking a target classification and patterning the noise in the spectra such that in a reduced dimensional space, it is moved towards the target classification. Development of approaches for optimizing the attacks serves as a prelude for generation of robust classification strategies less susceptible to intentional attacks.
Unsupervised automatic target generation process via compressive sensing
Adam Bekit, Charles Della Porta, Bernard Lampe, et al.
Unsupervised target generation for hyperspectral imagery (HSI) have generated great interest in the hyperspectral community. However, most of the current unsupervised target generation algorithms have to process large HSI data, which is acquired using the traditional Nyquist-Shannon sampling theorem, resulting in data with high band-to-band correlation. As a consequence, these algorithms end up processing redundant information, raising the demand for large memory storage, processing time, and transmission bandwidth. In the past, some efforts have been dedicated to dealing with the redundant information via data reduction (DR) or data compression post-acquisition. However, to the best of our knowledge, this challenge has been addressed outside the context of Compressive Sensing (CS). This paper applies CS data acquisition process at the sensor level so that the redundant information is removed at the early stage of the data processing chain. The main advantage of our approach is that it employs a random sensing process, and the concept of universality, to randomly sense the HSI bands and produce data containing the bare minimum information. We take advantage of CS Restricted Isometric Properties (RIP), Restricted Conformal Properties (RCP), and newly derived orthogonal sub-space projection (OSP) properties to perform automatic target generation process (ATGP) in the compressively sensed band domain (CSBD), instead of in the original data space (ODS), where the HSI data contains full spectral bands. Our experimental results show that, by working in the CSBD, we avoid processing redundant data and still maintain performance results that are comparable with the performance results obtained in the ODS.
Human activity monitoring using a compressive active sensing electro-optical sensor
Bing Ouyang, Dennis Estrada, Yanjun Li, et al.
Unobtrusive human movement monitoring is important in elderly care and assisted living to alert the caregivers of any potential accidents. The current state of art relies on features extracted from a radar system. The conventional optical camera system is not generally used in this application due to the concern of privacy. As an alternative, we investigate a different kind of electro-optical sensor – compressive active sensing electro-optical (CASEO) sensor. A CSAEO system consists of an infrared (IR) based spatial light modulation devices (SLM) based illuminator and a single-element avalanche photodetector (APD) based receiver. The scene information is encoded through a sequence of coded illumination patterns via the SLM illuminator and the corresponding measurements captured by the APD. Such compressive measurements can be used in scene classification instead of the actual video frames. In this work, we explore the feasibility of using CASEO in the identification of different activities. A CASEO processing framework was developed. Simulations using UIUC Human Activity Recognition test dataset were conducted to validate this framework.
Signal Processing for Big Data
icon_mobile_dropdown
Superresolution via bilinear fusion of multimodal imaging data
Pulak Sarangi, Piya Pal
Advancements in neural imaging now allow simultaneous acquisition of multiple imaging modalities. When these multimodal data are combined with advanced signal processing algorithms, they can provide a better understanding of brain dynamics, which was otherwise not possible. In this paper, we specifically demonstrate a new technique for the fusion" of neural activity recorded with two-photon calcium imaging and Electrocorticography (ECoG) recordings acquired using an electrode array. Calcium signals are usually acquired at a low sampling frequency and have a high spatial resolution, but suffer in temporal resolution due to blurring of the spiking signal. ECoG, on the other hand, has high temporal resolution and is acquired at high sampling frequency, but can only detect aggregated neural population activity and suffers from poor spatial resolution. In this paper, we will develop novel signal processing techniques using bilinear fusion and sparsity-aware reconstruction to overcome these drawbacks. The data from both modalities are represented by a bilinear model which is inverted to infer the spiking activity using suitable prior assumptions such as sparsity or independence of the sources. Our approach leverages on the complementary strengths of the two modalities (in terms of temporal and spatial resolution) along with use of powerful non-convex algorithms that harness the unique structure of the dataset.
Two dimensional sparse array design for wideband signals
Syed A. Hamza, Moeness G. Amin
Recently, thinned arrays and sensor position selections have attracted much attention since they lead to reduced hardware cost and yield effective receiver design. The data is fetched from systematically selected sensor locations for maximizing the Signal-to-Interference plus noise ratio (SINR) in a dynamically changing environment. We ensure that the most pertinent data samples are collected, thereby avoiding data redundancies. In this paper, we examine two dimensional sparse array design for wideband signal models. The data is processed jointly in the temporal and spatial domains. Reducing the number of spatial samples, through sparse array configuration, decreases the overall data processing complexity. The performance of the proposed sparse array design, in terms of maximizing the SINR, is presented and the effectiveness of the proposed algorithm is demonstrated.
Performance trade-off analysis of co-design of radar and phase modulated communication signals in frequency hopping MIMO systems
Indu Priya Eedara, Moeness G. Amin
We present a dual-function radar communication (DFRC) system in which phase modulated information symbols are embedded in the multiple-input multiple-output (MIMO) frequency hopping (FH) radar sub-pulses in fast-time. The communications are considered as the secondary function of the proposed dual-function radar communication (DFRC) systems. We use differential phase shift keying (DPSK) and continuous phase modulation (CPM). These modulations preserve the continuous phase between the FH sub-pulses, unlike phase shift keying (PSK) modulated symbols. The performance of FH/DPSK and FH/CPM systems is compared with that of the FH/PSK system in terms of range sidelobe levels (RSL) and power spectral density (PSD). It is shown that the proposed DFRC systems exhibit better spectral containment and lower spectral sidelobe levels. The communication data rates for the proposed system are derived, and to assume relatively high values.
Clutter Identification based on sparse recovery with dynamically changing dictionary sizes for cognitive radar
Yuansheng Zhu, Yijian Xiang, Satyabrata Sen, et al.
Existing radar algorithms assume stationary statistical characteristics for environment/clutter. In practical scenarios, the statistical characteristics of the clutter can dynamically change depending on where the radar is operating. Non-stationarity in the statistical characteristics of the clutter may negatively affect the radar performance. Cognitive radar that can sense the changes in the clutter statistics, learn the new statistical characteristics, and adapt to these changes has been proposed to overcome these shortcomings. We have recently developed techniques for detection of statistical changes and learning the new clutter distribution for cognitive radar. In this work, we will extend the learning component. More specifically, in our previous work, we have developed a sparse recovery based clutter distribution identification to learn the distribution of the new clutter characteristics after the detected change in the statistics of the clutter. In our method, we have built a dictionary of clutter distributions and used this dictionary in orthogonal matching pursuit to perform sparse recovery of the clutter distribution assuming that the dictionary includes the new distribution. In this work, we propose a hypothesis testing based approach to detect whether the new distribution of the clutter is included in the dictionary or not, and suggest a method to dynamically update the dictionary. We envision that the successful outcomes of this work will be of high relevance to the adaptive learning and cognitive augmentation of the radar systems that are used in remotely piloted vehicles for surveillance and reconnaissance operations.
Compressive hyperspectral image reconstruction with deep neural networks
In the recent years, we have developed several architectures for compressive hyperspectral (HS) imagers. The compressive sensing (CS) design has allowed the reduction of the enormous acquisition effort associated with the huge dimensionality of the HS data. Unfortunately, the reduced sensing effort offered by the CS approach comes on the account of increased post-sensing computational burden. Conventional CS reconstruction involves algorithms that solve a ℓ1 minimization problem. Those algorithms are iterative and typically very computationally heavy. The computation burden is even more prominent when reconstructing 3D HS data, where each spectral image may have Gigavoxel size. Motivated by this, we have investigated replacing the CS iterative reconstruction step with an appropriate Deep Neural Network.
Tensor Methods for Signal Processing and Machine Learning
icon_mobile_dropdown
Separation of composite tensors with sparse Tucker representations
Ashley Prater-Bennette, Kenneth Theodore Carr
A common tool to process and interpret multimodal data is to represent the data in a sparse Tucker format, decomposed as a sparse core tensor and dictionary matrices for each modal dimension. In real-world applications one may be presented with a composition of several tensors, each with its own sparse Tucker representation and collection of dictionaries. The Tucker model and associated recovery algorithms struggle to accurately separate composite tensors in this situation, either having difficulty with the overcomplete dictionaries or not fully taking advantage of the special structure of the decomposition. To address these deficiencies, we introduce an overcomplete sparse Tucker model and an iterative algorithm to separate a composite sparse Tucker tensor. The method, which is based on soft-thresholding shrinkage techniques, is demonstrated to effectively separate overcomplete tensors and recover the sparse component tensors on real-world datasets, and to do so more accurately than other Tucker methods.
Options for multimodal classification based on L1-Tucker decomposition
Dimitris G. Chachlakis, Mayur Dhanaraj, Ashley Prater-Bennette, et al.
Most commonly used classification algorithms process data in the form of vectors. At the same time, mod- ern datasets often comprise multimodal measurements that are naturally modeled as multi-way arrays, also known as tensors. Processing multi-way data in their tensor form can enable enhanced inference and classification accuracy. Tucker decomposition is a standard method for tensor data processing, which however has demonstrated severe sensitivity to corrupted measurements due to its L2-norm formulation. In this work, we present a selection of classification methods that employ an L1-norm-based, corruption-resistant reformulation of Tucker (L1-Tucker). Our experimental studies on multiple real datasets corroborate the corruption-resistance and classification accuracy afforded by L1-Tucker.
Conformity evaluation and L1-norm principal-component analysis of tensor data
Multi-modal tensor data sets arise with increasing frequency in modern day scientific and engineering applications, for example in biomedical sciences and autonomous engineered systems. Over the past twenty years, tensor-domain data analysis has been attempted primarily in the context of standard (L2-norm) eigenvector decompositions across tensor domains. The algorithms are not joint-tensor-domain optimal and exhibit the familiar sensitivity to faulty/corrupted/missing measurements that characterizes all L2-norm principal-component analysis methods. In this work, we present a robustified method to evaluate the conformity of tensor data entries with respect to the whole accessible data set. Conformity evaluation is based on a continuously refined sequence of calculated L1- norm tensor subspaces. The theoretical developments are illustrated in the context of a multisensor localization application that indicates unprecedented estimation performance and resistance to intermittent disturbances. An electroencephalogram (EEG) data analysis experiment is also presented.
Poster Session
icon_mobile_dropdown
Security analysis and recommendations for AI/ML enabled automated cyber medical systems
J. Farroha
Artificial Intelligence (AI) and remote surgery go together in defining the future of intelligent medicine. The advancement of robotic surgery became a possibility by leveraging intelligent sensors, Machine Learning (ML), and reliable wireless connectivity in addition to low-latency response to surgical commands and automated responses to patient's status resulting in an accurate automated system. The trend is to develop cyber-medical systems through the integration of medical knowledge and engineering applications in order to create customizable intelligent healthcare procedures. Security plays a critical role while the systems navigate the dynamics of human-controlled connectivity versus autonomy, and remote commands versus automated responses to sensors. The systems need to “learn” from experience and transferred knowledge from other systems while protecting against learning from false data. This paper addresses the Cyber Security risks introduced by adapting the emerging technologies as well as providing potential solutions that are based on best practices. The ML model used in cyber medicine should be as simple as possible providing the required accuracy to produce the desired outcome, understanding that more complex models have higher chances of suffering the degradation effects of overfitting.