High-Performance Computing in Remote Sensing IV

Front Matter: Volume 9247

Show abstract

This PDF file contains the front matter associated with SPIE Proceedings Volume 9247, including the Title Page, Copyright information, Table of Contents, Introduction (if any), and Conference Committee listing.

Parallel random selection and projection for hyperspectral image analysis

Qian Du, Xiaochao Li

Show abstract

In this paper, we investigate the use of random selection (RS) and random projection (RP) for hyperspectral image analysis, which are data-independent and computationally more efficient than other widely used dimensionality reduction methods. Both anomaly detection and target detection are considered. Due to the random nature, multiple runs of RS or RP are conducted followed by decision fusion to ensure a stable output. Parallel implementations using graphics processing unit (GPU) and clusters are also investigated. The experimental results demonstrated that both RS and RP are capable of providing better target detection performance after decision fusion, while the overall computing time can be greatly decreased with parallel implementations.

FPGA-based architecture for hyperspectral endmember extraction

João Rosário, José M. P. Nascimento, Mário Véstias

Show abstract

Hyperspectral instruments have been incorporated in satellite missions, providing data of high spectral resolution of the Earth. This data can be used in remote sensing applications, such as, target detection, hazard prevention, and monitoring oil spills, among others. In most of these applications, one of the requirements of paramount importance is the ability to give real-time or near real-time response. Recently, onboard processing systems have emerged, in order to overcome the huge amount of data to transfer from the satellite to the ground station, and thus, avoiding delays between hyperspectral image acquisition and its interpretation. For this purpose, compact reconfigurable hardware modules, such as field programmable gate arrays (FPGAs) are widely used. This paper proposes a parallel FPGA-based architecture for endmember’s signature extraction. This method based on the Vertex Component Analysis (VCA) has several advantages, namely it is unsupervised, fully automatic, and it works without dimensionality reduction (DR) pre-processing step. The architecture has been designed for a low cost Xilinx Zynq board with a Zynq-7020 SoC FPGA based on the Artix-7 FPGA programmable logic and tested using real hyperspectral data sets collected by the NASA’s Airborne Visible Infra-Red Imaging Spectrometer (AVIRIS) over the Cuprite mining district in Nevada. Experimental results indicate that the proposed implementation can achieve real-time processing, while maintaining the methods accuracy, which indicate the potential of the proposed platform to implement high-performance, low cost embedded systems, opening new perspectives for onboard hyperspectral image processing.

Intel Many Integrated Core (MIC) architecture optimization strategies for a memory-bound Weather and Research Forecasting (WRF) Goddard microphysics scheme

Jarno Mielikainen, Bormin Huang, Allen H.-L. Huang

Show abstract

The Goddard cloud microphysics scheme is a sophisticated cloud microphysics scheme in the Weather Research and Forecasting (WRF) model. The WRF is a widely used weather prediction system in the world. It development is a done in collaborative around the globe. The Goddard microphysics scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. Compared to the earlier microphysics schemes, the Goddard scheme incorporates a large number of improvements. Thus, we have optimized the code of this important part of WRF. In this paper, we present our results of optimizing the Goddard microphysics scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The Intel MIC is capable of executing a full operating system and entire programs rather than just kernels as the GPU do. The MIC coprocessor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of MICs will require using some novel optimization techniques. Those optimization techniques are discusses in this paper. The results show that the optimizations improved performance of the original code on Xeon Phi 7120P by a factor of 4.7x. Furthermore, the same optimizations improved performance on a dual socket Intel Xeon E5-2670 system by a factor of 2.8x compared to the original code.

FPGA implementation of the hyperspectral Lossy Compression for Exomars (LCE) algoritm.

Aday García, L. Santos, S. López, et al.

Show abstract

The increase of data rates and data volumes in present remote sensing payload instruments, together with the restrictions imposed in the downlink connection requirements, represent at the same time a challenge and a must in the field of data and image compression. This is especially true for the case of hyperspectral images, in which both, reduction of spatial and spectral redundancy is mandatory. Recently the Consultative Committee for Space Data Systems (CCSDS) published the Lossless Multispectral and Hyperespectral Image Compression recommendation (CCSDS 123), a prediction-based technique resulted from the consensus of its members. Although this standard offers a good trade-off between coding performance and computational complexity, the appearance of future hyperspectral and ultraspectral sensors with vast amount of data imposes further efforts from the scientific community to ensure optimal transmission to ground stations based on greater compression rates. Furthermore, hardware implementations with specific features to deal with solar radiation problems play an important role in order to achieve real time applications. In this scenario, the Lossy Compression for Exomars (LCE) algorithm emerges as a good candidate to achieve these characteristics. Its good quality/compression ratio together with its low complexity facilitates the implementation in hardware platforms such as FPGAs or ASICs. In this work the authors present the implementation of the LCE algorithm into an antifuse-based FPGA and the optimizations carried out to obtain the RTL description code using CatapultC, a High Level Synthesis (HLS) Tool. Experimental results show an area occupancy of 75% in an RTAX2000 FPGA from Microsemi, with an operating frequency of 18 MHz. Additionally, the power budget obtained is presented giving an idea of the suitability of the proposed algorithm implementation for onboard compression applications.

EM scattering from a 2D target above a 1D sea surface using GPU-based FDTD

Lixin Guo, Chungang Jia, Pengju Yang

Show abstract

This paper presents the graphics processor unit (GPU)-based finite difference time domain (FDTD) algorithm to investigate the electromagnetic (EM) scattering from a two-dimensional target above a one-dimensional rough sea surface with the Pierson–Moskowitz spectrum. The proposed method is validated by comparing the numerical results with those obtained through sequential FDTD execution on a central processing unit, as well as through method of moments. Using compute unified device architecture technology, significant speed-up ratios are achieved. Furthermore, the speedup is improved by shared memory and asynchronous transfer.

Hybrid DWT-DCT and Hash function based digital image watermarking for copyright protection and content authentication of DubaiSat-2 images

Saeed Al-Mansoori, Alavi Kunhu

Show abstract

This paper presents a new technique for copyright protection and content authentication of satellite images. The novelty in the presented approach consists in designing a hybrid Discrete Wavelet Transform (DWT), Discrete Cosine Transform (DCT) and hash function based digital image watermarking. In this study, two watermarks are embedded in the cover image called robust copyright watermark and fragile authentication watermark. The robust watermark is embedded in hybrid frequency and wavelet domain by applying DWT and DCT respectively. Such two steps embedding allows the proposed approach to provide better imperceptibility in harmony with the human visual system and offers higher robustness against signal processing attacks. Subsequently, the fragile watermark is embedded in a spatial domain using the hash function approach. The proposed hybrid watermarking technique has been tested on DubaiSat-2 images with 1 meter resolution. The experimental results show that the proposed method is robust against JPEG compression, rotation and resizing attacks. In addition, the multi-watermarked image has a good transparency and the fragile watermark is sensitive to any tampering.

Building high-performance system for processing a daily large volume of Chinese satellite imagery

Huawu Deng, Shicun Huang, Qi Wang, et al.

Show abstract

The number of Earth observation satellites from China increases dramatically recently and those satellites are acquiring a large volume of imagery daily. As the main portal of image processing and distribution from those Chinese satellites, the China Centre for Resources Satellite Data and Application (CRESDA) has been working with PCI Geomatics during the last three years to solve two issues in this regard: processing the large volume of data (about 1,500 scenes or 1 TB per day) in a timely manner and generating geometrically accurate orthorectified products. After three-year research and development, a high performance system has been built and successfully delivered. The high performance system has a service oriented architecture and can be deployed to a cluster of computers that may be configured with high end computing power. The high performance is gained through, first, making image processing algorithms into parallel computing by using high performance graphic processing unit (GPU) cards and multiple cores from multiple CPUs, and, second, distributing processing tasks to a cluster of computing nodes. While achieving up to thirty (and even more) times faster in performance compared with the traditional practice, a particular methodology was developed to improve the geometric accuracy of images acquired from Chinese satellites (including HJ-1 A/B, ZY-1-02C, ZY-3, GF-1, etc.). The methodology consists of fully automatic collection of dense ground control points (GCP) from various resources and then application of those points to improve the photogrammetric model of the images. The delivered system is up running at CRESDA for pre-operational production and has been and is generating good return on investment by eliminating a great amount of manual labor and increasing more than ten times of data throughput daily with fewer operators. Future work, such as development of more performance-optimized algorithms, robust image matching methods and application workflows, is identified to improve the system in the coming years.

Implementation of the 5-layer thermal diffusion scheme in weather research and Forecasting model with Intel many integrated core (MIC) architecture

Melin Huang, Bormin Huang, Allen H.-L. Huang

Show abstract

For weather forecasting and research, the Weather Research and Forecasting (WRF) model has been developed, consisting of several components such as dynamic solvers and physical simulation modules. WRF includes several Land- Surface Models (LSMs). The LSMs use atmospheric information, the radiative and precipitation forcing from the surface layer scheme, the radiation scheme, and the microphysics/convective scheme all together with the land’s state variables and land-surface properties, to provide heat and moisture fluxes over land and sea-ice points. The WRF 5-layer thermal diffusion simulation is an LSM based on the MM5 5-layer soil temperature model with an energy budget that includes radiation, sensible, and latent heat flux. The WRF LSMs are very suitable for massively parallel computation as there are no interactions among horizontal grid points. The features, efficient parallelization and vectorization essentials, of Intel Many Integrated Core (MIC) architecture allow us to optimize this WRF 5-layer thermal diffusion scheme. In this work, we present the results of the computing performance on this scheme with Intel MIC architecture. Our results show that the MIC-based optimization improved the performance of the first version of multi-threaded code on Xeon Phi 5110P by a factor of 2.1x. Accordingly, the same CPU-based optimizations improved the performance on Intel Xeon E5- 2603 by a factor of 1.6x as compared to the first version of multi-threaded code.

A composite algorithm for variable size object tracking for high performance FPGA-based on-board vision systems

Boris Alpatov, Semen Korepanov, Valery Strotov

Show abstract

An approach to tracking objects changing their size from several pixels to whole image size is suggested in this paper. A complex object tracking algorithm designed for a family of FPGA-based on-board vision systems is proposed. The results of experimental examination are given.

Initial results on computational performance of Intel Many Integrated Core (MIC) architecture: implementation of the Weather and Research Forecasting (WRF) Purdue-Lin microphysics scheme

Jarno Mielikainen, Bormin Huang, Allen H.-L. Huang

Show abstract

Purdue-Lin scheme is a relatively sophisticated microphysics scheme in the Weather Research and Forecasting (WRF) model. The scheme includes six classes of hydro meteors: water vapor, cloud water, raid, cloud ice, snow and graupel. The scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. In this paper, we accelerate the Purdue Lin scheme using Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi is a high performance coprocessor consists of up to 61 cores. The Xeon Phi is connected to a CPU via the PCI Express (PICe) bus. In this paper, we will discuss in detail the code optimization issues encountered while tuning the Purdue-Lin microphysics Fortran code for Xeon Phi. In particularly, getting a good performance required utilizing multiple cores, the wide vector operations and make efficient use of memory. The results show that the optimizations improved performance of the original code on Xeon Phi 5110P by a factor of 4.2x. Furthermore, the same optimizations improved performance on Intel Xeon E5-2603 CPU by a factor of 1.2x compared to the original code.

GPU efficient SAR image despeckling using mixed norms

Caner Özcan, Baha Şen, Fatih Nar

Show abstract

Speckle noise which is inherent to Synthetic Aperture Radar (SAR) imaging obstructs various image exploitation tasks such as edge detection, segmentation, change detection, and target recognition. Therefore, speckle reduction is generally used as a first step which has to smooth out homogeneous regions while preserving edges and point scatterers. Traditional speckle reduction methods are fast and their memory consumption is insignificant. However, they are either good at smoothing homogeneous regions or preserving edges and point scatterers. State of the art despeckling methods are proposed to overcome this trade-off. However, they introduce another trade-off between denoising quality and resource consumption, thereby higher denoising quality requires higher computational load and/or memory consumption. In this paper, a local pixel-based total variation (TV) approach is proposed, which combines l₂-norm and l₁-norm in order to improve despeckling quality while keeping execution times reasonably short. Pixel-based approach allows efficient computation model with relatively low memory consumption. Their parallel implementations are also more efficient comparing to global TV approaches which generally require numerical solution of sparse linear systems. However, pixel-based approaches are trapped to local minima frequently hence despeckling quality is worse comparing to global TV approaches. Proposed method, namely mixed norm despeckling (MND), combines l₂-norm and l₁-norm in order to improve despeckling performance by alleviating local minima problem. All steps of the MND are parallelized using OpenMP on CPU and CUDA on GPU. Speckle reduction performance, execution time and memory consumption of the proposed method are shown using synthetic images and TerraSAR-X spot mode SAR images.

Acceleration of the partitioned predictive vector quantization lossless compression method with Intel MIC

Shih-Chieh Wei, Bormin Huang

Show abstract

The partitioned predictive vector quantization (PPVQ) algorithm is known for its high compression ratio for lossless compression of the ultraspectral sounder data with high spatial and spectral resolutions. With the advent of the multicore technologies, parallelization of several parts of the algorithm has been explored in previous work using a compute unified device architecture (CUDA) aided environment on the Graphics Processing Unit (GPU). Recently the Intel Many Integrated Core (MIC) architecture on a coprocessor is introduced which shows promise in handling more divergent workloads as needed in PPVQ. Therefore we will explore the parallel performance of the MIC-aided implementation. With parallelization of the two most time-consuming modules of linear prediction and vector quantization in PPVQ, the total processing time of an AIRS granule can be compressed in less than 7.5 seconds which is equivalent to a speedup of ~8.8x. The use of MIC for PPVQ compression is thus promising as a low-cost and effective compression solution for ultraspectral sounder data for ground rebroadcast use.

Fast motion detection in coded video streams for a large-scale remote video sensor system

Yong-Sung Kim, Gyu-Hee Park, Seung-Hwan Kim, et al.

Show abstract

A large number of remote video sensors are being deployed in the world to collect, store, and analyze the real-world data. Since a remote video sensor produces very large data, the total amount of video data are extremely large in size, complexity, and capacity. Important events from a remote video sensor are closely related to a motion in video. We present, in this paper, a fast motion detection method based on the number of bits used for encoding a video stream and the GOP-level motion detection. A low complexity measurement of the number of bits is performed in the coded video sequence and then, we store and process the coded video stream only if the total bits are larger than a pre-defined threshold. We also use a GOP level motion detection to reduce processing overhead compared to the conventional motion vector-based approach which processes every frame. Manipulating the number of bits is itself a much easier task than full reconstruction of each pixel of a video frame and it can save storage cost because it only stores a coded video sequence with a motion. The proposed method also contributes to reduction of computational complexity compared to the manipulation of motion vectors per 4x4 macro block. To evaluate our method, we deployed a centralized single server connected to H.264 capable remote video sensors. Results on the video sequences showed that the proposed approach can process more video sequences than the conventional compressed domain approach.

Fast Computational Method of Beam Scattering from Sea Surface

Xiang Su, Zhensen Wu, Xiaoxiao Zhang

Show abstract

Microwave backscattering from sea surface at low grazing angle (LGA) is important in predicting radar detection of targets on or near the surface, since the probability of a false alarm depends upon the observed signal-to-clutter ratio. However, the large area illuminated by incident beam at LGA leads to a number of unknown when using numerical method to calculate backscattering from sea surface. In addition, the calculation of backscattering from hundreds of random sea surface samples is needed to obtain the statistical properties of random rough surface. The sparse matrix canonical grid (SMCG) and compute unified device architecture (CUDA) libraries are used to accelerate the computation. By using these techniques, Doppler spectral characteristics of electromagnetic scattering from sea surface is effectively calculated.

A novel highly-parallel algorithm for linearly unmixing hyperspectral images

Raúl Guerra, Sebastián López, Gustavo M. Callico, et al.

Show abstract

Endmember extraction and abundances calculation represent critical steps within the process of linearly unmixing a given hyperspectral image because of two main reasons. The first one is due to the need of computing a set of accurate endmembers in order to further obtain confident abundance maps. The second one refers to the huge amount of operations involved in these time-consuming processes. This work proposes an algorithm to estimate the endmembers of a hyperspectral image under analysis and its abundances at the same time. The main advantage of this algorithm is its high parallelization degree and the mathematical simplicity of the operations implemented. This algorithm estimates the endmembers as virtual pixels. In particular, the proposed algorithm performs the descent gradient method to iteratively refine the endmembers and the abundances, reducing the mean square error, according with the linear unmixing model. Some mathematical restrictions must be added so the method converges in a unique and realistic solution. According with the algorithm nature, these restrictions can be easily implemented. The results obtained with synthetic images demonstrate the well behavior of the algorithm proposed. Moreover, the results obtained with the well-known Cuprite dataset also corroborate the benefits of our proposal.

The backscattering characteristics and accelerated arithmetic for complex rough target in THz and laser bands

Yuan Mou, Zhensen Wu, Xing Guo

Show abstract

The coherent and incoherent scattering are composed in the backscattering characteristics of arbitrarily shaped dielectric object with rough surface both in laser and THz bands. If the radius of curvature at any point of the surface is much greater than the incident wavelength which is also longer than the surface height fluctuation and RMS of surface slope, the Kirchhoff approximation and Physical optics method, as well as the stationary phase evaluation are invited here to deduce the analytical expression of coherent backscattering cross section of rough dielectric object. Basically, the coherent cross section can be viewed as the combination of the RCS of corresponding smooth and perfectly conducting object, the Fresnel reflection coefficient of dielectric surface and the characteristic function of rough surface. Thus, the scattering expression of rough conducting object, smooth dielectric object and the rough dielectric object can be logically obtained. Using the tangent plane approximation, the surface of the object is divided into a series of patches, and then the incoherent component is achieved by integrating over the illuminated area combined with the covering function. Based on the Physical optics approximation and GPU parallel computing, the coherent scattering component of smooth conducting object, the incoherent component of rough object and its corresponding backscattering cross section can be easily computed. In this paper, we numerically simulate the backscattering characteristics in laser and THz bands of rough dielectric sphere and other complex rough dielectric targets respectively, meanwhile, we also analysis the influence of dielectric coefficient and roughness concentration on the results of the backscattering cross section.

Application of Intel many integrated core (MIC) architecture to the Yonsei University planetary boundary layer scheme in Weather Research and Forecasting model

Melin Huang, Bormin Huang, Allen H.-L. Huang

Show abstract

The Weather Research and Forecasting (WRF) model provided operational services worldwide in many areas and has linked to our daily activity, in particular during severe weather events. The scheme of Yonsei University (YSU) is one of planetary boundary layer (PBL) models in WRF. The PBL is responsible for vertical sub-grid-scale fluxes due to eddy transports in the whole atmospheric column, determines the flux profiles within the well-mixed boundary layer and the stable layer, and thus provide atmospheric tendencies of temperature, moisture (including clouds), and horizontal momentum in the entire atmospheric column. The YSU scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. To accelerate the computation process of the YSU scheme, we employ Intel Many Integrated Core (MIC) Architecture as it is a multiprocessor computer structure with merits of efficient parallelization and vectorization essentials. Our results show that the MIC-based optimization improved the performance of the first version of multi-threaded code on Xeon Phi 5110P by a factor of 2.4x. Furthermore, the same CPU-based optimizations improved the performance on Intel Xeon E5-2603 by a factor of 1.6x as compared to the first version of multi-threaded code.

Efficient parallel implementation of polarimetric synthetic aperture radar data processing

Sergio S. Martinez, Prashanth R. Marpu, Antonio J. Plaza

Show abstract

This work investigates the parallel implementation of polarimetric synthetic aperture radar (POLSAR) data processing chain. Such processing can be computationally expensive when large data sets are processed. However, the processing steps can be largely implemented in a high performance computing (HPC) environ- ment. In this work, we studied different aspects of the computations involved in processing the POLSAR data and developed an efficient parallel scheme to achieve near-real time performance. The algorithm is implemented using message parsing interface (MPI) framework in this work, but it can be easily adapted for other parallel architectures such as general purpose graphics processing units (GPGPUs).

Optimizing zonal advection of the Advanced Research WRF (ARW) dynamics for Intel MIC

Jarno Mielikainen, Bormin Huang, Allen H.-L. Huang

Show abstract

The Weather Research and Forecast (WRF) model is the most widely used community weather forecast and research model in the world. There are two distinct varieties of WRF. The Advanced Research WRF (ARW) is an experimental, advanced research version featuring very high resolution. The WRF Nonhydrostatic Mesoscale Model (WRF-NMM) has been designed for forecasting operations. WRF consists of dynamics code and several physics modules. The WRF-ARW core is based on an Eulerian solver for the fully compressible nonhydrostatic equations. In the paper, we will use Intel Intel Many Integrated Core (MIC) architecture to substantially increase the performance of a zonal advection subroutine for optimization. It is of the most time consuming routines in the ARW dynamics core. Advection advances the explicit perturbation horizontal momentum equations by adding in the large-timestep tendency along with the small timestep pressure gradient tendency. We will describe the challenges we met during the development of a high-speed dynamics code subroutine for MIC architecture. Furthermore, lessons learned from the code optimization process will be discussed. The results show that the optimizations improved performance of the original code on Xeon Phi 5110P by a factor of 2.4x.

GPU-based Calculation of Scattering Characteristics of Space Target in the Visible Spectrum

YunHua Cao, Zhensen Wu, Lu Bai, et al.

Show abstract

Scattering characteristics of space target in the visible spectrum, which can be used in target detection, target identification, and space docking, is calculated in this paper. Algorithm of scattering characteristics of space target is introduced. In the algorithm, space target is divided into thousands of triangle facets. In order to obtain scattering characteristics of the target, calculation of each facet will be needed. For each facet, calculation will be executed in the spectrum of 400-760 nanometers at intervals of 1 nanometer. Thousands of facets and hundreds of bands of each facet will cause huge calculation, thus the calculation will be very time-consuming. Taking into account the high parallelism of the algorithm, Graphic Processing Units (GPUs) are used to accelerate the algorithm. The acceleration reaches 300 times speedup on single Femi-generation NVDIA GTX 590 as compared to the single-thread CPU version of code on Intel(R) Xeon(R) CPU E5-2620. And a speedup of 412x can be reached when a Kepler-generation NVDIA K20c is used.

GPU-based rectification of high-resolution remote sensing stereo images

Niko Lukač, Borut Žalik

Show abstract

One of the major challenges in the topographic mapping and fast generation of digital terrain or surface models from stereo optical aerial or satellite imagery is the stereo rectification preprocessing step. The general case is the use of extrinsic and intrinsic parameters from each calibrated camera, in order to establish epipolar geometry. Stereo rectification consists of geometric transformation and image sub-pixel resampling. Such a task is computationally demanding when dealing with high-resolution optical imagery. This presents an increasingly evident problem as the remote sensing technologies are becoming more accurate, causing even higher computational demands. This paper proposes a novel method for fast rectification of stereo images pairs to epipolar geometry by using General Purpose computing on Graphics Processing Units (GPGPU). The method is capable of resampling large high-resolution imagery on-the-fly, due to efficient out-of-core processing. In the experiments a runtime comparison was made between the proposed GPU-based and multi-core CPU methods over a dataset consisting of 420 stereo aerial images, where the proposed method achieved a significant speedup.

High-Performance Computing in Remote Sensing IV

Volume Details

Table of Contents

Table of Contents