High-Performance Computing in Remote Sensing II | (2012) | Publications

Volume Details

Date Published: 14 November 2012

Contents: 5 Sessions, 20 Papers, 0 Presentations

Conference: SPIE Remote Sensing 2012

Volume Number: 8539

All links to SPIE Proceedings will open in the SPIE Digital Library.

Show all abstracts

View Session

Front Matter: Volume 8539
High Performance Computing I
High Performance Computing II
High Performance Computing III
High Performance Computing IV

Front Matter: Volume 8539

Show abstract

This PDF file contains the front matter associated with SPIE Proceedings Volume 8539, including the Title Page, Copyright Information, Table of Contents, and the Conference Committee listing.

High Performance Computing I

High-performance computing in image registration

Michele Zanin, Fabio Remondino, Mauro Dalla Mura

Show abstract

Thanks to the recent technological advances, a large variety of image data is at our disposal with variable geometric, radiometric and temporal resolution. In many applications the processing of such images needs high performance computing techniques in order to deliver timely responses e.g. for rapid decisions or real-time actions. Thus, parallel or distributed computing methods, Digital Signal Processor (DSP) architectures, Graphical Processing Unit (GPU) programming and Field-Programmable Gate Array (FPGA) devices have become essential tools for the challenging issue of processing large amount of geo-data. The article focuses on the processing and registration of large datasets of terrestrial and aerial images for 3D reconstruction, diagnostic purposes and monitoring of the environment. For the image alignment procedure, sets of corresponding feature points need to be automatically extracted in order to successively compute the geometric transformation that aligns the data. The feature extraction and matching are ones of the most computationally demanding operations in the processing chain thus, a great degree of automation and speed is mandatory. The details of the implemented operations (named LARES) exploiting parallel architectures and GPU are thus presented. The innovative aspects of the implementation are (i) the effectiveness on a large variety of unorganized and complex datasets, (ii) capability to work with high-resolution images and (iii) the speed of the computations. Examples and comparisons with standard CPU processing are also reported and commented.

GPU-parallel performance of the community radiative transfer model (CRTM) with the optical depth in absorber space (ODAS)-based transmittance algorithm

Jarno Mielikainen, Bormin Huang, Hung-Lung Allen Huang, et al.

Show abstract

An Atmospheric radiative transfer model calculates radiative transfer of electromagnetic radiation through earth’s atmosphere. The Community Radiative Transfer Model (CRTM) is a fast model for simulating the infrared (IR) and microwave (MW) radiances of a given state of the Earth's atmosphere and its surface. The CRTM radiances have been used for satellite data assimilation in numerical weather prediction. The CRTM takes into account the radiance emission and absorption of various atmospheric gaseous as well as the emission and the reflection of various surface types. Two different transmittance algorithms are currently available in the CRTM OPTRAN: Optical Depth in Absorber Space (ODAS) and Optical Depth in Pressure Space (ODPS). ODAS in the current CRTM allows two variable absorbers (water vapor and ozone). In this paper, we examine the feasibility of using graphics processing units (GPUs) to accelerate the CRTM with the ODAS transmittance model. Using commodity GPUs for accelerating CRTM means that the hardware cost of adding high performance accelerators to computation hardware configuration are significantly reduced. Our results show that GPUs can provide significant speedup over conventional processors for the 8461-channel IASI sounder. In particular, a GPU on the dual-GPU NVIDIA GTX 590 card can provide a speedup 339x for the single-precision version of the CRTM ODAS compared to its single-threaded Fortran counterpart running on Intel i7 920 CPU.

Real-time progressive band processing of linear spectral unmixing

Chao-Cheng Wu, Keng-hao Liu, Chein-I Chang

Show abstract

Band selection (BS) has advantages over data dimensionality in satellite communication and data transmission in the sense that the spectral bands can be tuned by users at their discretion for data analysis while keeping data integrity. However, to materialize BS in such practical applications several issues need to be addressed. One is how many bands required for BS. Another is how to select appropriate bands. A third one is how to take advantage of previously selected bands without re-implementing BS. Finally and most importantly is how to tune bands to be selected in real time as number of bands varies. This paper presents an application in spectral unmixing, progressive band selection in linear spectral unmixing to address the above-mentioned issues where data unmixing can be carried out in a real time and progressive fashion with data updated recursively band by band in the same way that data is processed by a Kalman filter.

Fast Fourier Transform Co-processor (FFTC), towards embedded GFLOPs

Christopher Kuehl, Uwe Liebstueckel, Isaac Tejerina, et al.

Show abstract

Many signal processing applications and algorithms perform their operations on the data in the transform domain to gain efficiency. The Fourier Transform Co-Processor has been developed with the aim to offload General Purpose Processors from performing these transformations and therefore to boast the overall performance of a processing module. The IP of the commercial PowerFFT processor has been selected and adapted to meet the constraints of the space environment. In frame of the ESA activity “Fast Fourier Transform DSP Co-processor (FFTC)” (ESTEC/Contract No. 15314/07/NL/LvH/ma) the objectives were the following: • Production of prototypes of a space qualified version of the commercial PowerFFT chip called FFTC based on the PowerFFT IP. • The development of a stand-alone FFTC Accelerator Board (FTAB) based on the FFTC including the Controller FPGA and SpaceWire Interfaces to verify the FFTC function and performance. The FFTC chip performs its calculations with floating point precision. Stand alone it is capable computing FFTs of up to 1K complex samples in length in only 10μsec. This corresponds to an equivalent processing performance of 4.7 GFlops. In this mode the maximum sustained data throughput reaches 6.4Gbit/s. When connected to up to 4 EDAC protected SDRAM memory banks the FFTC can perform long FFTs with up to 1M complex samples in length or multidimensional FFT-based processing tasks. A Controller FPGA on the FTAB takes care of the SDRAM addressing. The instructions commanded via the Controller FPGA are used to set up the data flow and generate the memory addresses. The paper will give an overview on the project, including the results of the validation of the FFTC ASIC prototypes.

High Performance Computing II

Progressive hyperspectral imaging

Chein-I Chang

Show abstract

Progressive hyperspectral imaging (PHSI) is a new concept that has never been explored in the past. It decomposes data processing into a number of stages and processes data stage-by–stage progressively in the sense that the results obtained by previous stages are updated and improved by subsequent stages through the past processed information as well as new innovations information that is not available in previous stages. This is quite different from a sequential processing which processes data samples in a sequential manner where each data sample is fully processed. Specially, the idea of PHSI is believed to be the first ever materialized to utilize available information in a best possible means to process massive data by updating data processing via using the new innovations information provided by the current being processed data sample. As a result, the PHSI can significantly reduce computing time and in the mean time can be also realized in real-time in many application areas such as hyerspectral data communications and transmission where data can be communicated and transmitted progressively. Most importantly, the PHSI allows users to screen preliminary results before deciding to continuously complete the data processing.

GPU-based parallel implementation of 5-layer thermal diffusion scheme

Melin Huang, Jarno Mielikainen, Bormin Huang, et al.

Show abstract

The Weather Research and Forecasting (WRF) is a system of numerical weather prediction and atmospheric simulation with dual purposes for forecasting and research. The WRF software infrastructure consists of several components such as dynamic solvers and physical simulation modules. WRF includes several Land-Surface Models (LSMs). The LSMs use atmospheric information, the radiative and precipitation forcing from the surface layer scheme, the radiation scheme, and the microphysics/convective scheme all together with the lands state variables and land-surface properties, to provide heat and moisture fluxes over land and sea-ice points. The WRF 5-layer thermal diffusion simulation is an LSM based on the MM5 5-layer soil temperature model with an energy budget that includes radiation, sensible, and latent heat flux. The WRF LSMs are very suitable for massively parallel computation as there are no interactions among horizontal grid points. More and more scientific applications have adopted graphics processing units (GPUs) to accelerate the computing performance. This study demonstrates our GPU massively parallel computation efforts on the WRF 5-layer thermal diffusion scheme. Since this scheme is only an intermediate module of the entire WRF model, the I/O transfer does not involve in the intermediate process. Without data transfer, this module can achieve a speedup of 36x with one GPU and 108x with four GPUs as compared to a single threaded CPU processor. With CPU/GPU hybrid strategy, this module can accomplish a even higher speedup, ~114x with one GPU and ~240x with four GPUs. Meanwhile, we are seeking other approaches to improve the speeds.

An efficient watermarking technique for satellite images using discrete cosine transform

Saeed AL-Mansoori

Show abstract

Due to the significant progress in science and technology, the digital world became an interesting topic for many studies. “Data Security” is one of the main concepts related to the digital world especially in the field of remote sensing. Therefore, to deal with this matter the “Watermarking” concept was introduced. The idea of digital image watermarking is to embed the information within a signal (i.e. image, video, etc.), which cannot be easily extracted by a third party. This will generate a copyright protection and authentication for the owner(s). Emirates Institution for Advanced Science and Technology (EIAST) as an owner provides satellite images captured by DubaiSat-1 satellite to customers. The aim of this study is to implement a robust algorithm to hide EIAST logo within any delivered image in order to increase the data security and protect the ownership of DubaiSat-1 images. In addition, it is necessary to provide a high quality images to the end-user; nevertheless, adding any information (logo) to these images will affect its quality. Therefore, the model will be designed to keep the observable difference between the watermarked and original image at minimum. Moreover, the watermark should be difficult to remove or alter without the degradation of the host image. This study will be based on Discrete Cosine Transform (DCT) to provide an excellent and highly robust protection in cases such as noise addition, cropping, rotation and JPEG compression attacks.

GPU acceleration of simplex volume algorithm for hyperspectral endmember extraction

Haicheng Qu, Junping Zhang, Zhouhan Lin, et al.

Show abstract

The simplex volume algorithm (SVA)¹ is an endmember extraction algorithm based on the geometrical properties of a simplex in the feature space of hyperspectral image. By utilizing the relation between a simplex volume and its corresponding parallelohedron volume in the high-dimensional space, the algorithm extracts endmembers from the initial hyperspectral image directly without the need of dimension reduction. It thus avoids the drawback of the N-FINDER algorithm, which requires the dimension of the data to be reduced to one less than the number of the endmembers. In this paper, we take advantage of the large-scale parallelism of CUDA (Compute Unified Device Architecture) to accelerate the computation of SVA on the NVidia GeForce 560 GPU. The time for computing a simplex volume increases with the number of endmembers. Experimental results show that the proposed GPU-based SVA achieves a significant 112.56x speedup for extracting 16 endmembers, as compared to its CPU-based single-threaded counterpart.

Real-time causal processing of anomaly detection

Yulei Wang, Shih-Yu Chen, Chao-Cheng Wu, et al.

Show abstract

Anomaly detection generally requires real time processing to find targets on a timely basis. However, for an algorithm to be a real time processing it can only use data samples up to the sample currently being visited and no future data samples can be used for data processing. Such a property is generally called “causality”, which has unfortunately received little interest in the past. Recently, a causal anomaly detector derived from a well-known anomaly detector, called RX detector, referred to as causal RXD (C-RXD) was developed for this purpose where the sample covariance matrix, K used in RXD was replaced by the sample correlation matrix, R(n) which can be updated up to the currently being visited data sample, r_n. However, such proposed C-RXD is not a real processing algorithm since the inverse of the matrix R(n), R^-1(n) is recalculated by entire data samples up to rn. In order to implement C-RXD the matrix R(n) must be carried out in such a fashion that the matrix R^-1(n) can be updated only through previously calculated R^-1(n-1) as well as the currently being processed data sample r_n. This paper develops a real time processing of CRXD, called real time causal anomaly detector (RT-C-RXD) which is derived from the concept of Kalman filtering via a causal update equation using only innovations information provided by the pixel currently being processed without re-processing previous pixels.

Further optimizations of the GPU-based pixel purity index algorithm for hyperspectral unmixing

Xianyun Wu, Bormin Huang, Antonio Plaza, et al.

Show abstract

Many algorithms have been proposed to automatically find spectral endmembers in hyperspectral data sets. Perhaps one of the most popular ones is the pixel purity index (PPI), available in the ENVI software from Exelis Visual Information Solutions. Although the algorithm has been widely used in the spectral unmixing community, it is highly time consuming as its precision increases asymptotically. Due to its high computational complexity, the PPI algorithm has been recently implemented in several high performance computing architectures including commodity clusters, heterogeneous and distributed systems, field programmable gate arrays (FPGAs) and graphics processing units (GPUs). In this letter, we present an improved GPU implementation of the PPI algorithm which provides real-time performance for the first time in the literature.

High Performance Computing III

Visualization tools for extremely high resolution DEM from the LRO and other orbiter satellites

J. Montgomery, John McDonald

Show abstract

Recent space missions have included laser altimetry instrumentation that provides precise high-resolution global topographic data products. These products are critical in analyzing geomorphological surface processes of planets and moons. Although highly valued, the high-resolution data is often overlooked by researchers due to the high level of IT sophistication necessary to use the high-resolution data products, which can be as large as several hundred gigabytes. Researchers have developed software tools to assist in viewing and manipulating data products derived from altimetry data, however current software tools require substantial off-line processing, provide rudimentary visualization or are not suited for viewing the new high-resolution data. We have adapted mVTK, a novel software visualization tool, to work with NASA’s recently acquired Lunar Reconnaissance Orbiter data. mVTK is a software visualization package that dynamically creates cylindrical cartographic map projections from gridded high-resolution altimetry data in real-time. The projections are interactive 2D shade relief, false color maps that allow the user to make simple slope and distance measurements on the actual underlying high-resolution data. We have tested mVTK on several laser altimetry data sets including binned gridded record data from NASA's Mars Global Surveyor and Lunar Reconnaissance Orbiter space missions.

Modified full abundance-constrained spectral unmixing

Englin Wong, Chein-I Chang

Show abstract

Abundance fully constrained least squares (FLCS) method has been widely used for spectral unmixing. A modified FCLS (MFCLS) was previously proposed for the same purpose to derive two iterative equations for solving fully abundance-constrained spectral unmixing problems. Unfortunately, its advantages have not been recognized. This paper conducts a comparative study and analysis between FCLS and MFCLS via custom-designed synthetic images and real images to demonstrate that while both methods perform comparably in unmixing data, MFCLS edges out FCLS in less computing time.

Further GPU implementation of prediction-based lower triangular transform using a zero-order entropy coder for ultraspectral sounder data compression

Shih-Chieh Wei, Bormin Huang

Show abstract

The ultraspectral sounder data consists of two dimensional pixels, each containing thousands of channels. In retrieval of geophysical parameters, the sounder data is sensitive to noises. Therefore lossless compression is highly desired for storing and transmitting the huge volume data. The prediction-based lower triangular transform (PLT) features the same de-correlation and coding gain properties as the Karhunen-Loeve transform (KLT), but with a lower design and implementational cost. In previous work, we have shown that PLT has the perfect reconstruction property which allows its direct use for lossless compression of sounder data. However PLT is time-consuming in doing compression. To speed up the PLT encoding scheme, we have recently exploited the parallel compute power of modern graphics processing unit (GPU) and implemented several important transform stages to compute the transform coefficients on GPU. In this work, we further incorporated a GPU-based zero-order entropy coder for the last stage of compression. The experimental result shows that our full implementation of the PLT encoding scheme on GPU shows a speedup of 88x compared to its original full implementation on CPU.

Evolutionary techniques for sensor networks energy optimization in marine environmental monitoring

Francesco Grimaccia, Ron Johnstone, Marco Mussetta, et al.

Show abstract

The sustainable management of coastal and offshore ecosystems, such as for example coral reef environments, requires the collection of accurate data across various temporal and spatial scales. Accordingly, monitoring systems are seen as central tools for ecosystem-based environmental management, helping on one hand to accurately describe the water column and substrate biophysical properties, and on the other hand to correctly steer sustainability policies by providing timely and useful information to decision-makers. A robust and intelligent sensor network that can adjust and be adapted to different and changing environmental or management demands would revolutionize our capacity to wove accurately model, predict, and manage human impacts on our coastal, marine, and other similar environments. In this paper advanced evolutionary techniques are applied to optimize the design of an innovative energy harvesting device for marine applications. The authors implement an enhanced technique in order to exploit in the most effective way the uniqueness and peculiarities of two classical optimization approaches, Particle Swarm Optimization and Genetic Algorithms. Here, this hybrid procedure is applied to a power buoy designed for marine environmental monitoring applications in order to optimize the recovered energy from sea-wave, by selecting the optimal device configuration.

Calculating the electromagnetic scattering of ocean surface by physical optics and CUDA

Xiang Su, Zhen-sen Wu, Jia-ji Wu

Show abstract

Research on the electromagnetic scattering of ocean surface is significant in target recognition and signal separation technologies and its application are widely involved in remote sensing, radar imaging and early warning. In this paper the statistical wave model are introduced. It uses physical optics (PO), one of high frequency approximation methods, to calculate the backward scattering coefficients of sea surface composed of a large number of triangle patches. PO based on some reasonable approximation needs less memory and execution time than numerical methods; however it should judge the patches whether they are exposed or not to the incident wave which costs a lot of time. We take advantage of the massively parallel compute capability of NVIDIA Fermi GTX480 with the Compute Unified Device Architecture (CUDA) to judge the patches and compute the scattering field of them. Our parallel design includes the pipelined multiple-stream asynchronous transfer and parallel reduction with shared memory. By using these techniques, we achieved speedup of 26-fold on the NVIDIA GTX 480 GPU.

High Performance Computing IV

A unified theory for target-specified virtual dimensionality of hyperspectral imagery

Chein-I Chang

Show abstract

Virtual dimensionality (VD) is defined as the number of spectrally distinct signatures in hyperspectral data. Unfortunately, there is no provided specific definition of what “spectrally distinct signatures” are. As a result, many techniques developed to estimate VD have produced various values for VD. This paper develops a unified theory for VD of hyperspectral imagery where the value of VD is completely determined by the targets of interest characterized by their spectral statistics. Using this new developed theory the VD techniques can be categorized according to targets characterized by various orders of statistics into 1st order statistics-based, 2nd order of statistics-based and high order statistics (HOS)-based methods. In order to determine the number of specific targets of interest, a binary composite hypothesis testing problem is formulated where a target of interest is considered as a desired signal under the alternative hypothesis while the null hypothesis represents absence of the target. With this interpretation many VD estimation techniques can be unified under the same framework, for example, Harsanyi-Farrand-Chang (HFC) method, maximum orthogonal complement algorithm (MOCA).

On the acceleration of Eta Ferrier Cloud Microphysics Scheme in the Weather Research and Forecasting (WRF) model using a GPU

Melin Huang, Jarno Mielikainen, Bormin Huang, et al.

Show abstract

The Eta Ferrier cloud microphysics scheme is a sophisticated cloud microphysics module in the Weather Research and Forecasting (WRF) model. In this paper, we present the approach and the results of accelerating the Eta Ferrier microphysics scheme on NVIDIA Graphics Processing Units (GPUs). We discuss how our GPU implementation takes advantage of the parallelism in Eta Ferrier scheme, leading to a highly efficient GPU acceleration. We implement the Eta Ferrier microphysics scheme on NVidia GTX 590 GPU. Our 1-GPU implementation achieves an overall speedup of 37 as compared with a single thread CPU. Since Eta Ferrier microphysics scheme is only an intermediate module of the entire WRF model, the GPU I/O should not occur, i.e. its input data should be already available in the GPU global memory from previous modules and its output data should reside at the GPU global memory for later usage by other modules. The speedup without the host-device data transfer time is 272 with respect to its serial version running on 3.20GHz Intel® Core^TM i7 970 CPU.

Hyperspectral image feature extraction accelerated by GPU

HaiCheng Qu, Ye Zhang, Zhouhan Lin, et al.

Show abstract

PCA (principal components analysis) algorithm is the most basic method of dimension reduction for high-dimensional data¹, which plays a significant role in hyperspectral data compression, decorrelation, denoising and feature extraction. With the development of imaging technology, the number of spectral bands in a hyperspectral image is getting larger and larger, and the data cube becomes bigger in these years. As a consequence, operation of dimension reduction is more and more time-consuming nowadays. Fortunately, GPU-based high-performance computing has opened up a novel approach for hyperspectral data processing⁶. This paper is concerning on the two main processes in hyperspectral image feature extraction: (1) calculation of transformation matrix; (2) transformation in spectrum dimension. These two processes belong to computationally intensive and data-intensive data processing respectively. Through the introduction of GPU parallel computing technology, an algorithm containing PCA transformation based on eigenvalue decomposition ⁸(EVD) and feature matching identification is implemented, which is aimed to explore the characteristics of the GPU parallel computing and the prospects of GPU application in hyperspectral image processing by analysing thread invoking and speedup of the algorithm. At last, the result of the experiment shows that the algorithm has reached a 12x speedup in total, in which some certain step reaches higher speedups up to 270 times.

High performance cluster system design for remote sensing data processing

Yuanli Shi, Wenming Shen, Wencheng Xiong, et al.

Show abstract

During recent years, cluster systems have played a more important role in the architecture design of high-performance computing area which is cost-effective and efficient parallel computing system able to satisfy specific computational requirements in the earth and space sciences communities. This paper presents a powerful cluster system built by Satellite Environment Center, Ministry of Environment Protection of China that is designed to process massive remote sensing data of HJ-1 satellites automatically everyday. The architecture of this cluster system including hardware device layer, network layer, OS/FS layer, middleware layer and application layer have been given. To verify the performance of our cluster system, image registration has been chose to experiment with one scene of HJ-1 CCD sensor. The experiments of imagery registration shows that it is an effective system to improve the efficiency of data processing, which could provide a response rapidly in applications that certainly demand, such as wild land fire monitoring and tracking, oil spill monitoring, military target detection, etc. Further work would focus on the comprehensive parallel design and implementations of remote sensing data processing.