High-Performance Computing in Remote Sensing V

Front Matter: Volume 9646

Show abstract

This PDF file contains the front matter associated with SPIE Proceedings Volume 9646, including the Title Page, Copyright information, Table of Contents, Introduction (if any), and Conference Committee listing.

The implementation of multiple objects tracking algorithm based on partition of bipartite graph in FPGA-based onboard vision systems

Boris Alpatov, Pavel Babayan, Valeriy Strotov

Show abstract

This paper describes the implementation of the multiple targets tracking algorithm in FPGA-based vision system. The described algorithm was designed to process such situations as the object trajectories crossing and the temporary object screening by other objects. The source data for this algorithm is a list of the parameters of the previously extracted binary regions from each frame of the sequence. The main idea of this algorithm is to represent the source data as a bipartite graph and split it into insolated elementary graphs corresponding to five situations: object is moving or staying still, a new object detected, object is missed, the pair of the objects is merged into one and the region is divided. These graphs are used to form a new object list. The goal of this work was to implement the described algorithm in small-sized onboard vision system based on the single Xilinx FPGA using MicroBlaze soft processor block. In the proposed implementation of this algorithm recursive procedures were replaced with table-based procedures. The experimental research of the algorithm shows the increasing tracking performance 5 – 9 times on previously described hardware.

Connected Component Labeling algorithm for very complex and high-resolution images on an FPGA platform

Kurt Schwenk, Felix Huber

Show abstract

Connected Component Labeling (CCL) is a basic algorithm in image processing and an essential step in nearly every application dealing with object detection. It groups together pixels belonging to the same connected component (e.g. object). Special architectures such as ASICs, FPGAs and GPUs were utilised for achieving high data throughput, primarily for video processing. In this article, the FPGA implementation of a CCL method is presented, which was specially designed to process high resolution images with complex structure at high speed, generating a label mask. In general, CCL is a dynamic task and therefore not well suited for parallelisation, which is needed to achieve high processing speed with an FPGA. Facing this issue, most of the FPGA CCL implementations are restricted to low or medium resolution images (≤ 2048 ∗ 2048 pixels) with lower complexity, where the fastest implementations do not create a label mask. Instead, they extract object features like size and position directly, which can be realized with high performance and perfectly suits the need for many video applications. Since these restrictions are incompatible with the requirements to label high resolution images with highly complex structures and the need for generating a label mask, a new approach was required. The CCL method presented in this work is based on a two-pass CCL algorithm, which was modified with respect to low memory consumption and suitability for an FPGA implementation. Nevertheless, since not all parts of CCL can be parallelised, a stop-and-go high-performance pipeline processing CCL module was designed. The algorithm, the performance and the hardware requirements of a prototype implementation are presented. Furthermore, a clock-accurate runtime analysis is shown, which illustrates the dependency between processing speed and image complexity in detail. Finally, the performance of the FPGA implementation is compared with that of a software implementation on modern embedded platforms.

Dimensionality reduction and endmember extraction for hyperspectral imaging using an RVC-CAL library

D. Madroñal Quintín, R. Lazcano López, E. Juárez Martínez, et al.

Show abstract

Hyperspectral Imaging (HI) collects high resolution spectral information consisting of hundred of bands raging from the infrared to the ultraviolet wave lengths. In the medical field, specifically, in the cancer tissue identification at the operating room, the potential of HI is huge. However, given the data volume of HI and the computational complexity and cost of identification algorithms, real-time processing is the key, differential feature that brings value to surgeons. In order to achieve real-time implementations, the parallelism available in a specification needs to be explicitly highlighted. Data-flow programming languages, like RVC-CAL, are able to accomplish this goal.

In this paper, an RVC-CAL library to implement dimensionality reduction and endmember extraction is presented. The results obtained show significant improvements with regard to a state-of-the-art analysis tool. A speedup of 30% is carried out using the complete processing chain and, in particular, a speedup of 5% has been achieved in the dimensionality reduction step. This dimensionality reduction takes ten of the thirteen seconds that the whole system needs to analyze one of the images. In addition, the RVC-CAL library is an excellent tool to simplify the implementation process of HI algorithms. Effectively, during the experimental test, the potential of the RVC-CAL library to reveal possible bottlenecks present in the HI processing chain and, therefore, to improve the system performance to achieve real-time constraints has been shown. Furthermore, the RVC-CAL library provides the possibility of system performance testing.

Revisiting Intel Xeon Phi optimization of Thompson cloud microphysics scheme in Weather Research and Forecasting (WRF) model

Jarno Mielikainen, Bormin Huang, Allen Huang

Show abstract

The Thompson cloud microphysics scheme is a sophisticated cloud microphysics scheme in the Weather Research and Forecasting (WRF) model. The scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. Compared to the earlier microphysics schemes, the Thompson scheme incorporates a large number of improvements. Thus, we have optimized the speed of this important part of WRF. Intel Many Integrated Core (MIC) ushers in a new era of supercomputing speed, performance, and compatibility. It allows the developers to run code at trillions of calculations per second using the familiar programming model. In this paper, we present our results of optimizing the Thompson microphysics scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The coprocessor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of MICs will require using some novel optimization techniques. New optimizations for an updated Thompson scheme are discusses in this paper. The optimizations improved the performance of the original Thompson code on Xeon Phi 7120P by a factor of 1.8x. Furthermore, the same optimizations improved the performance of the Thompson on a dual socket configuration of eight core Intel Xeon E5-2670 CPUs by a factor of 1.8x compared to the original Thompson code.

GPU-based ray tracing algorithm for high-speed propagation prediction in typical indoor environments

Lixin Guo, Xiaowei Guan, Zhongyu Liu

Show abstract

A fast 3-D ray tracing propagation prediction model based on virtual source tree is presented in this paper, whose theoretical foundations are geometrical optics(GO) and the uniform theory of diffraction(UTD). In terms of typical single room indoor scene, taking the geometrical and electromagnetic information into account, some acceleration techniques are adopted to raise the efficiency of the ray tracing algorithm. The simulation results indicate that the runtime of the ray tracing algorithm will sharply increase when the number of the objects in the single room is large enough. Therefore, GPU acceleration technology is used to solve that problem. As is known to all, GPU is good at calculation operation rather than logical judgment, so that tens of thousands of threads in CUDA programs are able to calculate at the same time, in order to achieve massively parallel acceleration. Finally, a typical single room with several objects is simulated by using the serial ray tracing algorithm and the parallel one respectively. It can be found easily from the results that compared with the serial algorithm, the GPU-based one can achieve greater efficiency.

GPU implementation of the simplex identification via split augmented Lagrangian

Jorge Sevilla, José M. P. Nascimento

Show abstract

Hyperspectral imaging can be used for object detection and for discriminating between different objects based on their spectral characteristics. One of the main problems of hyperspectral data analysis is the presence of mixed pixels, due to the low spatial resolution of such images. This means that several spectrally pure signatures (endmembers) are combined into the same mixed pixel. Linear spectral unmixing follows an unsupervised approach which aims at inferring pure spectral signatures and their material fractions at each pixel of the scene. The huge data volumes acquired by such sensors put stringent requirements on processing and unmixing methods.

This paper proposes an efficient implementation of a unsupervised linear unmixing method on GPUs using CUDA. The method finds the smallest simplex by solving a sequence of nonsmooth convex subproblems using variable splitting to obtain a constraint formulation, and then applying an augmented Lagrangian technique. The parallel implementation of SISAL presented in this work exploits the GPU architecture at low level, using shared memory and coalesced accesses to memory. The results herein presented indicate that the GPU implementation can significantly accelerate the method's execution over big datasets while maintaining the methods accuracy.

Embedded GPU implementation of anomaly detection for hyperspectral images

Yuanfeng Wu, Lianru Gao, Bing Zhang, et al.

Show abstract

Anomaly detection is one of the most important techniques for remotely sensed hyperspectral data interpretation. Developing fast processing techniques for anomaly detection has received considerable attention in recent years, especially in analysis scenarios with real-time constraints. In this paper, we develop an embedded graphics processing units based parallel computation for streaming background statistics anomaly detection algorithm. The streaming background statistics method can simulate real-time anomaly detection, which refer to that the processing can be performed at the same time as the data are collected. The algorithm is implemented on NVIDIA Jetson TK1 development kit. The experiment, conducted with real hyperspectral data, indicate the effectiveness of the proposed implementations. This work shows the embedded GPU gives a promising solution for high-performance with low power consumption hyperspectral image applications.

RVC-CAL library for endmember and abundance estimation in hyperspectral image analysis

R. Lazcano López, D. Madroñal Quintín, E. Juárez Martínez, et al.

Show abstract

Hyperspectral imaging (HI) collects information from across the electromagnetic spectrum, covering a wide range of wavelengths. Although this technology was initially developed for remote sensing and earth observation, its multiple advantages - such as high spectral resolution - led to its application in other fields, as cancer detection. However, this new field has shown specific requirements; for instance, it needs to accomplish strong time specifications, since all the potential applications - like surgical guidance or in vivo tumor detection - imply real-time requisites. Achieving this time requirements is a great challenge, as hyperspectral images generate extremely high volumes of data to process. Thus, some new research lines are studying new processing techniques, and the most relevant ones are related to system parallelization.

In that line, this paper describes the construction of a new hyperspectral processing library for RVC–CAL language, which is specifically designed for multimedia applications and allows multithreading compilation and system parallelization. This paper presents the development of the required library functions to implement two of the four stages of the hyperspectral imaging processing chain--endmember and abundances estimation. The results obtained show that the library achieves speedups of 30%, approximately, comparing to an existing software of hyperspectral images analysis; concretely, the endmember estimation step reaches an average speedup of 27.6%, which saves almost 8 seconds in the execution time. It also shows the existence of some bottlenecks, as the communication interfaces among the different actors due to the volume of data to transfer. Finally, it is shown that the library considerably simplifies the implementation process. Thus, experimental results show the potential of a RVC–CAL library for analyzing hyperspectral images in real-time, as it provides enough resources to study the system performance.

Fault-tolerant NAND-flash memory module for next-generation scientific instruments

Tobias Lange, Holger Michel, Björn Fiethe, et al.

Show abstract

Remote sensing instruments on today's space missions deliver a high amount of data which is typically evaluated on ground. Especially for deep space missions the telemetry downlink is very limited which creates the need for the scientific evaluation and thereby a reduction of data volume already on-board the spacecraft. A demanding example is the Polarimetric and Helioseismic Imager (PHI) instrument on Solar Orbiter. To enable on-board offline processing for data reduction, the instrument has to be equipped with a high capacity memory module. The module is based on non-volatile NAND-Flash technology, which requires more advanced operation than volatile DRAM. Unlike classical mass memories, the module is integrated into the instrument and allows readback of data for processing. The architecture and safe operation of such kind of memory module is described in the following paper.

APES-based procedure for super-resolution SAR imagery with GPU parallel computing

Weiwei Jia, Xiaojian Xu, Guangyao Xu

Show abstract

The amplitude and phase estimation (APES) algorithm is widely used in modern spectral analysis. Compared with conventional Fourier transform (FFT), APES results in lower sidelobes and narrower spectral peaks. However, in synthetic aperture radar (SAR) imaging with large scene, without parallel computation, it is difficult to apply APES directly to super-resolution radar image processing due to its great amount of calculation. In this paper, a procedure is proposed to achieve target extraction and parallel computing of APES for super-resolution SAR imaging. Numerical experimental are carried out on Tesla K40C with 745 MHz GPU clock rate and 2880 CUDA cores. Results of SAR image with GPU parallel computing show that the parallel APES is remarkably more efficient than that of CPU-based with the same super-resolution.

FAPEC-based lossless and lossy hyperspectral data compression

Jordi Portell, Gabriel Artigues, Riccardo Iudica, et al.

Show abstract

Data compression is essential for remote sensing based on hyperspectral sensors owing to the increasing amount of data generated by modern instrumentation. CCSDS issued the 123.0 standard for lossless hyperspectral compression, and a new lossy hyperspectral compression recommendation is being prepared. We have developed multispectral and hyperspectral pre-processing stages for FAPEC, a data compression algorithm based on an entropy coder. We can select a prediction-based lossless stage that offers excellent results and speed. Alternatively, a DWT-based lossless and lossy stage can be selected, which offers excellent results yet obviously requiring more compression time. Finally, a lossless stage based on our HPA algorithm can also be selected, only lossless for now but with the lossy option in preparation. Here we present the overall design of these data compression systems and the results obtained on a variety of real data, including ratios, speed and quality.

Application of Intel Many Integrated Core (MIC) accelerators to the Pleim-Xiu land surface scheme

Melin Huang, Bormin Huang, Allen H.-L. Huang

Show abstract

The land-surface model (LSM) is one physics process in the weather research and forecast (WRF) model. The LSM includes atmospheric information from the surface layer scheme, radiative forcing from the radiation scheme, and precipitation forcing from the microphysics and convective schemes, together with internal information on the land’s state variables and land-surface properties. The LSM is to provide heat and moisture fluxes over land points and sea-ice points. The Pleim-Xiu (PX) scheme is one LSM. The PX LSM features three pathways for moisture fluxes: evapotranspiration, soil evaporation, and evaporation from wet canopies. To accelerate the computation process of this scheme, we employ Intel Xeon Phi Many Integrated Core (MIC) Architecture as it is a multiprocessor computer structure with merits of efficient parallelization and vectorization essentials. Our results show that the MIC-based optimization of this scheme running on Xeon Phi coprocessor 7120P improves the performance by 2.3x and 11.7x as compared to the original code respectively running on one CPU socket (eight cores) and on one CPU core with Intel Xeon E5-2670.

GPU-based ray tracing algorithm for high-speed propagation prediction in multiroom indoor environments

Xiaowei Guan, Lixin Guo, Zhongyu Liu

Show abstract

A novel ray tracing algorithm for high-speed propagation prediction in multi-room indoor environments is proposed in this paper, whose theoretical foundations are geometrical optics (GO) and the uniform theory of diffraction(UTD). Taking the geometrical and electromagnetic information of the complex indoor scene into account, some acceleration techniques are adopted to raise the efficiency of the ray tracing algorithm. The simulation results indicate that the runtime of the ray tracing algorithm will sharply increase when the number of the objects in multi-room buildings is large enough. Therefore, GPU acceleration technology is used to solve that problem. Finally, a typical multi-room indoor environment with several objects in each room is simulated by using the serial ray tracing algorithm and the parallel one respectively. It can be found easily from the results that compared with the serial algorithm, the GPU-based one can achieve greater efficiency.

Performance tuning Weather Research and Forecasting (WRF) Goddard longwave radiative transfer scheme on Intel Xeon Phi

Jarno Mielikainen, Bormin Huang, Allen H.-L. Huang

Show abstract

Next-generation mesoscale numerical weather prediction system, the Weather Research and Forecasting (WRF) model, is a designed for dual use for forecasting and research. WRF offers multiple physics options that can be combined in any way. One of the physics options is radiance computation. The major source for energy for the earth's climate is solar radiation. Thus, it is imperative to accurately model horizontal and vertical distribution of the heating. Goddard solar radiative transfer model includes the absorption duo to water vapor,ozone, ozygen, carbon dioxide, clouds and aerosols. The model computes the interactions among the absorption and scattering by clouds, aerosols, molecules and surface. Finally, fluxes are integrated over the entire longwave spectrum.In this paper, we present our results of optimizing the Goddard longwave radiative transfer scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The coprocessor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of MICs will require using some novel optimization techniques. Those optimization techniques are discusses in this paper. The optimizations improved the performance of the original Goddard longwave radiative transfer scheme on Xeon Phi 7120P by a factor of 2.2x. Furthermore, the same optimizations improved the performance of the Goddard longwave radiative transfer scheme on a dual socket configuration of eight core Intel Xeon E5-2670 CPUs by a factor of 2.1x compared to the original Goddard longwave radiative transfer scheme code.

A novel hardware-friendly algorithm for hyperspectral linear unmixing

Raúl Guerra, Lucana Santos, Sebastián López, et al.

Show abstract

Linear unmixing of hyperspectral images has rapidly become one of the most widely utilized tools for analyzing the content of hyperspectral images captured by state-of-the-art remote hyperspectral sensors. The aforementioned unmixing process consists of the following three sequential steps: dimensionality estimation, endmember extraction and abundances computation. Within this procedure, the first two steps are by far the most demanding from a computational point of view, since they involve a large amount of matrix operations. Moreover, the complex nature of these operations seriously difficult the hardware implementation of these two unmixing steps, leading to non-optimized implementations which are not able to satisfy the strict delay requirements imposed by those applications under real-time or near real-time requirements. This paper uncovers a new algorithm which is capable of estimating the number of endmembers and extracting them from a given hyperspectral image with at least the same accuracy than state-of-the-art approaches while demanding a much lower computational effort, with independence of the characteristics of the image under analysis. In particular, the proposed algorithm is based on the concept of orthogonal projections and allows performing the estimation of the number of end- members and their extraction simultaneously, using simple operations, which can be also easily parallelized. In this sense, it is worth to mention that our algorithm does not perform complex matrix operations, such as the inverse of a matrix or the extraction of eigenvalues and eigenvectors, which makes easier its ulterior hardware. The experimental results obtained with synthetic and real hyperspectral images demonstrate that the accuracy obtained with the proposed algorithm when estimating the number of endmembers and extracting them is similar or better than the one provided by well-known state-of-the-art algorithms, while the complexity of the overall process is significantly reduced.

Parallel implementation of the multiple endmember spectral mixture analysis algorithm for hyperspectral unmixing

Sergio Bernabe, Francisco D. Igual, Guillermo Botella, et al.

Show abstract

In the last decade, the issue of endmember variability has received considerable attention, particularly when each pixel is modeled as a linear combination of endmembers or pure materials. As a result, several models and algorithms have been developed for considering the effect of endmember variability in spectral unmixing and possibly include multiple endmembers in the spectral unmixing stage. One of the most popular approach for this purpose is the multiple endmember spectral mixture analysis (MESMA) algorithm. The procedure executed by MESMA can be summarized as follows: (i) First, a standard linear spectral unmixing (LSU) or fully constrained linear spectral unmixing (FCLSU) algorithm is run in an iterative fashion; (ii) Then, we use different endmember combinations, randomly selected from a spectral library, to decompose each mixed pixel; (iii) Finally, the model with the best fit, i.e., with the lowest root mean square error (RMSE) in the reconstruction of the original pixel, is adopted. However, this procedure can be computationally very expensive due to the fact that several endmember combinations need to be tested and several abundance estimation steps need to be conducted, a fact that compromises the use of MESMA in applications under real-time constraints. In this paper we develop (for the first time in the literature) an efficient implementation of MESMA on different platforms using OpenCL, an open standard for parallel programing on heterogeneous systems. Our experiments have been conducted using a simulated data set and the clMAGMA mathematical library. This kind of implementations with the same descriptive language on different architectures are very important in order to actually calibrate the possibility of using heterogeneous platforms for efficient hyperspectral imaging processing in real remote sensing missions.

Performance portability study of an automatic target detection and classification algorithm for hyperspectral image analysis using OpenCL

Sergio Bernabe, Francisco D. Igual, Guillermo Botella, et al.

Show abstract

Recent advances in heterogeneous high performance computing (HPC) have opened new avenues for demanding remote sensing applications. Perhaps one of the most popular algorithm in target detection and identification is the automatic target detection and classification algorithm (ATDCA) widely used in the hyperspectral image analysis community. Previous research has already investigated the mapping of ATDCA on graphics processing units (GPUs) and field programmable gate arrays (FPGAs), showing impressive speedup factors that allow its exploitation in time-critical scenarios. Based on these studies, our work explores the performance portability of a tuned OpenCL implementation across a range of processing devices including multicore processors, GPUs and other accelerators. This approach differs from previous papers, which focused on achieving the optimal performance on each platform. Here, we are more interested in the following issues: (1) evaluating if a single code written in OpenCL allows us to achieve acceptable performance across all of them, and (2) assessing the gap between our portable OpenCL code and those hand-tuned versions previously investigated. Our study includes the analysis of different tuning techniques that expose data parallelism as well as enable an efficient exploitation of the complex memory hierarchies found in these new heterogeneous devices. Experiments have been conducted using hyperspectral data sets collected by NASA's Airborne Visible Infra- red Imaging Spectrometer (AVIRIS) and the Hyperspectral Digital Imagery Collection Experiment (HYDICE) sensors. To the best of our knowledge, this kind of analysis has not been previously conducted in the hyperspectral imaging processing literature, and in our opinion it is very important in order to really calibrate the possibility of using heterogeneous platforms for efficient hyperspectral imaging processing in real remote sensing missions.

Accelerating the prediction-based lower triangular transform for data compression using Intel MIC

Shih-Chieh Wei, Bormin Huang

Show abstract

With the same decorrelation and coding gain capabilities as the Karhunen-Loeve transform, the prediction-based lower triangular transform (PLT) can apply its perfect reconstruction property for lossless compression of ultraspectral sounder data. As the compression process requires computation of the covariance matrix, the LDU decomposition, and the transform kernel and coefficient matrices, it will be beneficial to introduce parallel processing technology in the PLT implementation. In this work, the recent Intel Many Integrated Core (MIC) architecture will be used which can exploit up to 60 cores with 4 hardware threads per core. Both threading and vectorization techniques will be explored for performance improvement. In our experiment, the total processing time of an AIRS granule can have a speedup of ~4.6x. With the offload mode, the MIC architecture provides a convenient and efficient tool for parallelization of the PLT compression scheme.

Optimizing the Betts-Miller-Janjic cumulus parameterization with Intel Many Integrated Core (MIC) architecture

Melin Huang, Bormin Huang, Allen H.-L. Huang

Show abstract

The schemes of cumulus parameterization are responsible for the sub-grid-scale effects of convective and/or shallow clouds, and intended to represent vertical fluxes due to unresolved updrafts and downdrafts and compensating motion outside the clouds. Some schemes additionally provide cloud and precipitation field tendencies in the convective column, and momentum tendencies due to convective transport of momentum. The schemes all provide the convective component of surface rainfall. Betts-Miller-Janjic (BMJ) is one scheme to fulfill such purposes in the weather research and forecast (WRF) model. National Centers for Environmental Prediction (NCEP) has tried to optimize the BMJ scheme for operational application. As there are no interactions among horizontal grid points, this scheme is very suitable for parallel computation. With the advantage of Intel Xeon Phi Many Integrated Core (MIC) architecture, efficient parallelization and vectorization essentials, it allows us to optimize the BMJ scheme. If compared to the original code respectively running on one CPU socket (eight cores) and on one CPU core with Intel Xeon E5-2670, the MIC-based optimization of this scheme running on Xeon Phi coprocessor 7120P improves the performance by 2.4x and 17.0x, respectively.

Parallel hyperspectral compressive sensing method on GPU

Sergio Bernabé, Gabriel Martín, José M. P. Nascimento

Show abstract

Remote hyperspectral sensors collect large amounts of data per flight usually with low spatial resolution. It is known that the bandwidth connection between the satellite/airborne platform and the ground station is reduced, thus a compression onboard method is desirable to reduce the amount of data to be transmitted. This paper presents a parallel implementation of an compressive sensing method, called parallel hyperspectral coded aperture (P-HYCA), for graphics processing units (GPU) using the compute unified device architecture (CUDA). This method takes into account two main properties of hyperspectral dataset, namely the high correlation existing among the spectral bands and the generally low number of endmembers needed to explain the data, which largely reduces the number of measurements necessary to correctly reconstruct the original data. Experimental results conducted using synthetic and real hyperspectral datasets on two different GPU architectures by NVIDIA: GeForce GTX 590 and GeForce GTX TITAN, reveal that the use of GPUs can provide real-time compressive sensing performance. The achieved speedup is up to 20 times when compared with the processing time of HYCA running on one core of the Intel i7-2600 CPU (3.4GHz), with 16 Gbyte memory.

GPU-based parallel clustered differential pulse code modulation

Jiaji Wu, Wenze Li, Wanqiu Kong

Show abstract

Hyperspectral remote sensing technology is widely used in marine remote sensing, geological exploration, atmospheric and environmental remote sensing. Owing to the rapid development of hyperspectral remote sensing technology, resolution of hyperspectral image has got a huge boost. Thus data size of hyperspectral image is becoming larger. In order to reduce their saving and transmission cost, lossless compression for hyperspectral image has become an important research topic. In recent years, large numbers of algorithms have been proposed to reduce the redundancy between different spectra. Among of them, the most classical and expansible algorithm is the Clustered Differential Pulse Code Modulation (CDPCM) algorithm. This algorithm contains three parts: first clusters all spectral lines, then trains linear predictors for each band. Secondly, use these predictors to predict pixels, and get the residual image by subtraction between original image and predicted image. Finally, encode the residual image. However, the process of calculating predictors is timecosting. In order to improve the processing speed, we propose a parallel C-DPCM based on CUDA (Compute Unified Device Architecture) with GPU. Recently, general-purpose computing based on GPUs has been greatly developed. The capacity of GPU improves rapidly by increasing the number of processing units and storage control units. CUDA is a parallel computing platform and programming model created by NVIDIA. It gives developers direct access to the virtual instruction set and memory of the parallel computational elements in GPUs. Our core idea is to achieve the calculation of predictors in parallel. By respectively adopting global memory, shared memory and register memory, we finally get a decent speedup.

Differential evolution algorithm-based kernel parameter selection for Fukunaga-Koontz Transform subspaces construction

Hamidullah Binol, Abdullah Bal, Huseyin Cukur

Show abstract

The performance of the kernel based techniques depends on the selection of kernel parameters. That’s why; suitable parameter selection is an important problem for many kernel based techniques. This article presents a novel technique to learn the kernel parameters in kernel Fukunaga-Koontz Transform based (KFKT) classifier. The proposed approach determines the appropriate values of kernel parameters through optimizing an objective function constructed based on discrimination ability of KFKT. For this purpose we have utilized differential evolution algorithm (DEA). The new technique overcomes some disadvantages such as high time consumption existing in the traditional cross-validation method, and it can be utilized in any type of data. The experiments for target detection applications on the hyperspectral images verify the effectiveness of the proposed method.

Accelerated ray tracing algorithm under urban macro cell

Z.-Y. Liu, L.-X. Guo, X.-W. Guan

Show abstract

In this study, an ray tracing propagation prediction model, which is based on creating a virtual source tree, is used because of their high efficiency and reliable prediction accuracy. In addition, several acceleration techniques are also adopted to improve the efficiency of ray-tracing-based prediction over large areas. However, in the process of employing the ray tracing method for coverage zone prediction, runtime is linearly proportional to the total number of prediction points, leading to large and sometimes prohibitive computation time requirements under complex geographical urban macrocell environments. In order to overcome this bottleneck, the compute unified device architecture (CUDA), which provides fine-grained data parallelism and thread parallelism, is implemented to accelerate the calculation. Taking full advantage of tens of thousands of threads in CUDA program, the decomposition of the coverage prediction problem is firstly conducted by partitioning the image tree and the visible prediction points to different sources. Then, we make every thread calculate the electromagnetic field of one propagation path and then collect these results. Comparing this parallel algorithm with the traditional sequential algorithm, it can be found that computational efficiency has been improved.

The scale effects of anisotropic land surface reflectance: an analysis with Landsat and MODIS imagery

Lu Bai, Xun Huang, Zhensen Wu, et al.

Show abstract

Scale is a key issue in the spatial and environmental sciences. In this paper, we retrieved the sun-view geometries of each slope element of Badain Jaran Desert using ASTER GDEM data. Combined with the 30m resolution of Landsat ETM+ data, the characteristic of directional reflection of land surface reflectance was discussed. Then we derived a simplified physical BRDF model assuming that the main radiation source of land surface is direct solar radiation, and calculated the land surface reflectance of spatial resolutions from 30m to 500m and 1000m, which is compared with the results calculated by MODIS BRDF products in given sun-view geometries. Linear-regression analyses were performed to simulate and analyze scale characteristics. Moreover, some factors, such as solar zenith angle and azimuth angle, are taking into account for its fitting precision. The results show that our model is more precise in visible band than in nearinfrared and short-wave infrared bands, in which the sky diffuse radiation and adjacent terrain radiation cannot be ignored.

Semi-supervised classification tool for DubaiSat-2 multispectral imagery

Saeed Al-Mansoori

Show abstract

This paper addresses a semi-supervised classification tool based on a pixel-based approach of the multi-spectral satellite imagery. There are not many studies demonstrating such algorithm for the multispectral images, especially when the image consists of 4 bands (Red, Green, Blue and Near Infrared) as in DubaiSat-2 satellite images. The proposed approach utilizes both unsupervised and supervised classification schemes sequentially to identify four classes in the image, namely, water bodies, vegetation, land (developed and undeveloped areas) and paved areas (i.e. roads). The unsupervised classification concept is applied to identify two classes; water bodies and vegetation, based on a well-known index that uses the distinct wavelengths of visible and near-infrared sunlight that is absorbed and reflected by the plants to identify the classes; this index parameter is called "Normalized Difference Vegetation Index (NDVI)". Afterward, the supervised classification is performed by selecting training homogenous samples for roads and land areas. Here, a precise selection of training samples plays a vital role in the classification accuracy. Post classification is finally performed to enhance the classification accuracy, where the classified image is sieved, clumped and filtered before producing final output. Overall, the supervised classification approach produced higher accuracy than the unsupervised method. This paper shows some current preliminary research results which point out the effectiveness of the proposed technique in a virtual perspective.

Parallel algorithm of real-time infrared image restoration based on total variation theory

Ran Zhu, Miao Li, Yunli Long, et al.

Show abstract

Image restoration is a necessary preprocessing step for infrared remote sensing applications. Traditional methods allow us to remove the noise but penalize too much the gradients corresponding to edges. Image restoration techniques based on variational approaches can solve this over-smoothing problem for the merits of their well-defined mathematical modeling of the restore procedure. The total variation (TV) of infrared image is introduced as a L¹ regularization term added to the objective energy functional. It converts the restoration process to an optimization problem of functional involving a fidelity term to the image data plus a regularization term. Infrared image restoration technology with TV-L¹ model exploits the remote sensing data obtained sufficiently and preserves information at edges caused by clouds. Numerical implementation algorithm is presented in detail. Analysis indicates that the structure of this algorithm can be easily implemented in parallelization. Therefore a parallel implementation of the TV-L¹ filter based on multicore architecture with shared memory is proposed for infrared real-time remote sensing systems. Massive computation of image data is performed in parallel by cooperating threads running simultaneously on multiple cores. Several groups of synthetic infrared image data are used to validate the feasibility and effectiveness of the proposed parallel algorithm. Quantitative analysis of measuring the restored image quality compared to input image is presented. Experiment results show that the TV-L¹ filter can restore the varying background image reasonably, and that its performance can achieve the requirement of real-time image processing.

Fault tolerance of SVM algorithm for hyperspectral image

Yabo Cui, Zhengwu Yuan, Yuanfeng Wu, et al.

Show abstract

One of the most important tasks in analyzing hyperspectral image data is the classification process[1]. In general, in order to enhance the classification accuracy, a data preprocessing step is usually adopted to remove the noise in the data before classification. But for the time-sensitive applications, we hope that even the data contains noise the classifier can still appear to execute correctly from the user’s perspective, such as risk prevention and response. As the most popular classifier, Support Vector Machine (SVM) has been widely used for hyperspectral image classification and proved to be a very promising technique in supervised classification[2]. In this paper, two experiments are performed to demonstrate that for the hyperspectral data with noise, if the noise of the data is within a certain range, SVM algorithm is still able to execute correctly from the user’s perspective.

A lossless compression algorithm for aurora spectral data using online regression prediction

Wanqiu Kong, Jiaji Wu

Show abstract

Lossless compression algorithms are available for preservation of aurora images. Handling with aurora image compression, linear prediction based method has outstanding compression performance. However, this performance is limited by prediction order and time complexity of linear prediction is relatively high. This paper describes how to solve the conflict between high prediction order and low compression bit rate with an online linear regression with RLS (OLRRLS) algorithm. Experiment results show that OLR-RLS achieves average 7%~11.8% improvement in compression gain and 2.8x speed up in computation time over linear prediction.

High-Performance Computing in Remote Sensing V

Volume Details

Table of Contents

Table of Contents