Proceedings Volume 9139

Real-Time Image and Video Processing 2014

cover
Proceedings Volume 9139

Real-Time Image and Video Processing 2014

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 19 May 2014
Contents: 5 Sessions, 24 Papers, 0 Presentations
Conference: SPIE Photonics Europe 2014
Volume Number: 9139

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 9139
  • Real-Time Image and Video Processing I
  • Real-Time Image and Video Processing II
  • Real-Time Image and Video Processing III
  • Poster Session
Front Matter: Volume 9139
icon_mobile_dropdown
Front Matter: Volume 9139
This PDF file contains the front matter associated with SPIE Proceedings Volume 9139, including the Title Page, Copyright Information, Table of Contents, and the Conference Committee listing.
Real-Time Image and Video Processing I
icon_mobile_dropdown
Deriving video content type from HEVC bitstream semantics
As network service providers seek to improve customer satisfaction and retention levels, they are increasingly moving from traditional quality of service (QoS) driven delivery models to customer-centred quality of experience (QoE) delivery models. QoS models only consider metrics derived from the network however, QoE models also consider metrics derived from within the video sequence itself. Various spatial and temporal characteristics of a video sequence have been proposed, both individually and in combination, to derive methods of classifying video content either on a continuous scale or as a set of discrete classes. QoE models can be divided into three broad categories, full reference, reduced reference and no-reference models. Due to the need to have the original video available at the client for comparison, full reference metrics are of limited practical value in adaptive real-time video applications. Reduced reference metrics often require metadata to be transmitted with the bitstream, while no-reference metrics typically operate in the decompressed domain at the client side and require significant processing to extract spatial and temporal features. This paper proposes a heuristic, no-reference approach to video content classification which is specific to HEVC encoded bitstreams. The HEVC encoder already makes use of spatial characteristics to determine partitioning of coding units and temporal characteristics to determine the splitting of prediction units. We derive a function which approximates the spatio-temporal characteristics of the video sequence by using the weighted averages of the depth at which the coding unit quadtree is split and the prediction mode decision made by the encoder to estimate spatial and temporal characteristics respectively. Since the video content type of a sequence is determined by using high level information parsed from the video stream, spatio-temporal characteristics are identified without the need for full decoding and can be used in a timely manner to aid decision making in QoE oriented adaptive real time streaming.
Early video smoke detection system to improve fire protection in rolling stocks
Sergio Saponara, Luca Pilato, Luca Fanucci
This paper presents a video system, operating in the visible spectrum range, for early smoke detection in passenger trains. The main idea is integrating standard smoke sensors with the results of a smoke detection processing, which exploits video surveillance cameras already available on-board the train. To this aim a novel video processing flow is proposed exploiting temporal, spatial and chromatic characteristics of the reference scenario. The proposed algorithm has been successfully verified with several video set and its implementation complexity fully characterized.
New boundary effect free algorithm for fast and accurate image arbitrary scaling and rotation
A new fast DCT-based algorithm for accurate image arbitrary scaling and rotation is described. The algorithm is free from boundary effects characteristic for FFT-based algorithm and ensures perfect interpolation with no interpolation errors. The algorithm is compared with other available algorithms in terms of the interpolation accuracy, computational complexity and suitability for real time applications.
Comparative analysis of video processing and 3D rendering for cloud video games using different virtualization technologies
Adedayo Bada, Jose M. Alcaraz-Calero, Qi Wang, et al.
This paper describes a comprehensive empirical performance evaluation of 3D video processing employing the physical/virtual architecture implemented in a cloud environment. Different virtualization technologies, virtual video cards and various 3D benchmarks tools have been utilized in order to analyse the optimal performance in the context of 3D online gaming applications. This study highlights 3D video rendering performance under each type of hypervisors, and other factors including network I/O, disk I/O and memory usage. Comparisons of these factors under well-known virtual display technologies such as VNC, Spice and Virtual 3D adaptors reveal the strengths and weaknesses of the various hypervisors with respect to 3D video rendering and streaming.
Real-Time Image and Video Processing II
icon_mobile_dropdown
Refocusing from a plenoptic camera within seconds on a mobile phone
Óscar Gómez-Cárdenes, José G. Marichal-Hernández, Fernando L. Rosa, et al.
Refocusing a plenoptic image by digital means and after the exposure has been thoroughly studied in the last years, but few efforts have been made in the direction of real time implementation in a constrained environment such as that provided by current mobile phones and tablets. In this work we address the aforementioned challenge demonstrating that a complete focal stack, comprising 31 refocused planes from a (256ff16)2 plenoptic image, can be achieved within seconds by a current SoC mobile phone platform. The election of an appropriate algorithm is the key to success. In a previous work we developed an algorithm, the fast approximate 4D:3D discrete Radon transform, that performs this task with linear time complexity where others obtain quadratic or linearithmic time complexity. Moreover, that algorithm does not requires complex number transforms, trigonometric calculus nor even multiplications nor oat numbers. Our algorithm has been ported to a multi core ARM chip on an off-the-shelf tablet running Android. A careful implementation exploiting parallelism at several levels has been necessary. The final implementation takes advantage of multi-threading in native code and NEON SIMD instructions. As a result our current implementation completes the refocusing task within seconds for a 16 megapixels image, much faster than previous attempts running on powerful PC platforms or dedicated hardware. The times consumed by the different stages of the digital refocusing are given and the strategies to achieve this result are discussed. Time results are given for a variety of environments within Android ecosystem, from the weaker/cheaper SoCs to the top of the line for 2013.
A low-cost embedded platform for car's surrounding vision system
S. Saponara, G. Fontanelli, L. Fanucci, et al.
The design and the implementation of a flexible and low-cost embedded system for car’s surrounding vision is presented. The target of the proposed multi-camera vision system is to provide the driver a better view of the objects that surround the vehicle during maneuvering. Fish-eye lenses are used to achieve a larger field of view (FOV) but, on the other hand, introduce radial distortion of the images projected on the sensors. Using low-cost cameras there could be also some alignment issues. Since these complications are noticeable and dangerous, a real-time algorithm for their correction and for the merging of 4 cameras video showed in a single view is presented.
Feature selection gait-based gender classification under different circumstances
This paper proposes a gender classification based on human gait features and investigates the problem of two variations: clothing (wearing coats) and carrying bag condition as addition to the normal gait sequence. The feature vectors in the proposed system are constructed after applying wavelet transform. Three different sets of feature are proposed in this method. First, Spatio-temporal distance that is dealing with the distance of different parts of the human body (like feet, knees, hand, Human Height and shoulder) during one gait cycle. The second and third feature sets are constructed from approximation and non-approximation coefficient of human body respectively. To extract these two sets of feature we divided the human body into two parts, upper and lower body part, based on the golden ratio proportion. In this paper, we have adopted a statistical method for constructing the feature vector from the above sets. The dimension of the constructed feature vector is reduced based on the Fisher score as a feature selection method to optimize their discriminating significance. Finally k-Nearest Neighbor is applied as a classification method. Experimental results demonstrate that our approach is providing more realistic scenario and relatively better performance compared with the existing approaches.
Gait recognition based on Kinect sensor
This paper presents gait recognition based on human skeleton and trajectory of joint points captured by Microsoft Kinect sensor. In this paper Two sets of dynamic features are extracted during one gait cycle: the first is Horizontal Distance Features (HDF) that is based on the distances between (Ankles, knees, hands, shoulders), the second set is the Vertical Distance Features (VDF) that provide significant information of human gait extracted from the height to the ground of (hand, shoulder, and ankles) during one gait cycle. Extracting these two sets of feature are difficult and not accurate based on using traditional camera, therefore the Kinect sensor is used in this paper to determine the precise measurements. The two sets of feature are separately tested and then fused to create one feature vector. A database has been created in house to perform our experiments. This database consists of sixteen males and four females. For each individual, 10 videos have been recorded, each record includes in average two gait cycles. The Kinect sensor is used here to extract all the skeleton points, and these points are used to build up the feature vectors mentioned above. K-nearest neighbor is used as the classification method based on Cityblock distance function. Based on the experimental result the proposed method provides 56% as a recognition rate using HDF, while VDF provided 83.5% recognition accuracy. When fusing both of the HDF and VDF as one feature vector, the recognition rate increased to 92%, the experimental result shows that our method provides significant result compared to the existence methods.
Real-Time Image and Video Processing III
icon_mobile_dropdown
Comparison of two real-time hand gesture recognition systems involving stereo cameras, depth camera, and inertial sensor
This paper presents a comparison of two real-time hand gesture recognition systems. One system utilizes a binocular stereo camera set-up while the other system utilizes a combination of a depth camera and an inertial sensor. The latter system is a dual-modality system as it utilizes two different types of sensors. These systems have been previously developed in the Signal and Image Processing Laboratory at the University of Texas at Dallas and the details of the algorithms deployed in these systems are reported in previous papers. In this paper, a comparison is carried out between these two real-time systems in order to examine which system performs better for the same set of hand gestures under realistic conditions.
3D filtering technique in presence of additive noise in color videos implemented on DSP
Volodymyr I. Ponomaryov, Hector Montenegro-Monroy, Alfredo Palacios
A filtering method for color videos contaminated by additive noise is presented. The proposed framework employs three filtering stages: spatial similarity filtering, neighboring frame denoising, and spatial post-processing smoothing. The difference with other state-of- the-art filtering methods, is that this approach, based on fuzzy logic, analyses basic and related gradient values between neighboring pixels into a 7 fi 7 sliding window in the vicinity of a central pixel in each of the RGB channels. Following, the similarity measures between the analogous pixels in the color bands are taken into account during the denoising. Next, two neighboring video frames are analyzed together estimating local motions between the frames using block matching procedure. In the final stage, the edges and smoothed areas are processed differently in a current frame during the post-processing filtering. Numerous simulations results confirm that this 3D fuzzy filter perform better than other state-of-the- art methods, such as: 3D-LLMMSE, WMVCE, RFMDAF, FDARTF G, VBM3D and NLM, in terms of objective criteria (PSNR, MAE, NCD and SSIM) as well as subjective perception via human vision system in the different color videos. An efficiency analysis of the designed and other mentioned filters have been performed on the DSPs TMS320 DM642 and TMS320DM648 by Texas Instruments through MATLAB and Simulink module showing that the novel 3D fuzzy filter can be used in real-time processing applications.
Parallel multithread computing for spectroscopic analysis in optical coherence tomography
Spectroscopic Optical Coherence Tomography (SOCT) is an extension of Optical Coherence Tomography (OCT). It allows gathering spectroscopic information from individual scattering points inside the sample. It is based on time-frequency analysis of interferometric signals. Such analysis requires calculating hundreds of Fourier transforms while performing a single A-scan. Additionally, further processing of acquired spectroscopic information is needed. This significantly increases the time of required computations. During last years, application of graphical processing units (GPU’s) was proposed to reduce computation time in OCT by using parallel computing algorithms. GPU technology can be also used to speed-up signal processing in SOCT. However, parallel algorithms used in classical OCT need to be revised because of different character of analyzed data. The classical OCT requires processing of long, independent interferometric signals for obtaining subsequent A-scans. The difference with SOCT is that it requires processing of multiple, shorter signals, which differ only in a small part of samples. We have developed new algorithms for parallel signal processing for usage in SOCT, implemented with NVIDIA CUDA (Compute Unified Device Architecture). We present details of the algorithms and performance tests for analyzing data from in-house SD-OCT system. We also give a brief discussion about usefulness of developed algorithm. Presented algorithms might be useful for researchers working on OCT, as they allow to reduce computation time and are step toward real-time signal processing of SOCT data.
Simultaneous edge sensing compression and encryption for real-time video transmission
Video compression and encryption became an essential part in multimedia application and video conferencing in particular. Applying both techniques simultaneously is one of the challenges where the size and the quality are important. In this paper we are suggesting the use of wavelet transform in order to deal with the low frequency coefficients when undertaking the encryption on the wavelet high frequency coefficients while accomplishing the compression. Applying both methods simultaneously is not new. In this paper we are suggesting a way to improve the security level of the encryption with better computational performance in both encryption and compression. Both encryption and compression in this paper are based on edges extraction from the wavelet high frequency sub-bands. Although there are some research perform the edge detection on the spatial domain, but the number of edges produced based on wavelet can be dynamic which have an effect on the compression ratio dynamically. Moreover, this kind of edge detection in wavelet domain will add different level of selective encryption.
Poster Session
icon_mobile_dropdown
Automatic 2D to 3D conversion implemented for real-time applications
Volodymyr Ponomaryov, Eduardo Ramos-Diaz, Victor Gonzalez Huitron
Different hardware implementations of designed automatic 2D to 3D video color conversion employing 2D video sequence are presented. The analyzed framework includes together processing of neighboring frames using the following blocks: CIELa*b* color space conversion, wavelet transform, edge detection using HF wavelet sub-bands (HF, LH and HH), color segmentation via k-means on a*b* color plane, up-sampling, disparity map (DM) estimation, adaptive postfiltering, and finally, the anaglyph 3D scene generation. During edge detection, the Donoho threshold is computed, then each sub-band is binarized according to a threshold chosen and finally the thresholding image is formed. DM estimation is performed in the following matter: in left stereo image (or frame), a window with varying sizes is used according to the information obtained from binarized sub-band image, distinguishing different texture areas into LL sub-band image. The stereo matching is performed between two (left and right) LL sub-band images using processing with different window sizes. Upsampling procedure is employed in order to obtain the enhanced DM. Adaptive post-processing procedure is based on median filter and k-means segmentation in a*b* color plane. The SSIM and QBP criteria are applied in order to compare the performance of the proposed framework against other disparity map computation techniques. The designed technique has been implemented on DSP TMS320DM648, Matlab’s Simulink module over a PC with Windows 7 and using graphic card (NVIDIA Quadro K2000) demonstrating that the proposed approach can be applied in real-time processing mode.
Image resolution enhancement using edge extraction and sparse representation in wavelet domain for real-time application
Volodymyr I. Ponomaryov, Herminio Chavez-Roman, Victor Gonzalez-Huitron
The paper presents the design and hardware implementation of novel framework for image resolution enhancement employing the wavelet domain. The principal idea of resolution enhancement consists of using edge preservation procedure and mutual interpolation between the input low-resolution (LR) image and the HF sub-band images performed via the Discrete Wavelet Transform (DWT). The LR image is used in the sparse representation for the resolutionenhancement process, employing a 1-D interpolation in set of angle directions; following, the computations of the new samples are found, estimating the missing samples. Finally, pixels are performed via the Lanczos interpolation. To preserve more edge information additional edge extraction in HF sub-bands is performed in the DWT decomposition of input image. The differences between the LL sub-band image and LR input image is used to correct the HF component, generating a significantly sharper reconstructed image. All sub-band images are used to generate the new HR image applying the inverse DWT (IDWT). Additionally, the novel framework employs a denoising procedure by using the Non-Local Means for the input LR image. An efficiency analysis of the designed and other state-of-the-art filters have been performed on the DSP TMS320DM648 by Texas Instruments through MATLAB’s Simulink module and on the video card (NVIDIA®Quadro® K2000), showing that novel SR procedure can be used in real-time processing applications. Experimental results have confirmed that implemented framework outperforms existing SR algorithms in terms of objective criteria (PSNR, MAE and SSIM) as well as in subjective perception, justifying better image resolution.
Software architecture as a freedom for 3D content providers and users along with independency on purposes and used devices
Razia Sultana, Andreas Christ, Patrick Meyrueis
The improvements in the hardware and software of communication devices have allowed running Virtual Reality (VR) and Augmented Reality (AR) applications on those. Nowadays, it is possible to overlay synthetic information on real images, or even to play 3D on-line games on smart phones or some other mobile devices. Hence the use of 3D data for business and specially for education purposes is ubiquitous. Due to always available at hand and always ready to use properties of mobile phones, those are considered as most potential communication devices. The total numbers of mobile phone users are increasing all over the world every day and that makes mobile phones the most suitable device to reach a huge number of end clients either for education or for business purposes. There are different standards, protocols and specifications to establish the communication among different communication devices but there is no initiative taken so far to make it sure that the send data through this communication process will be understood and used by the destination device. Since all the devices are not able to deal with all kind of 3D data formats and it is also not realistic to have different version of the same data to make it compatible with the destination device, it is necessary to have a prevalent solution. The proposed architecture in this paper describes a device and purpose independent 3D data visibility any time anywhere to the right person in suitable format. There is no solution without limitation. The architecture is implemented in a prototype to make an experimental validation of the architecture which also shows the difference between theory and practice.
Comparative study of internet cloud and cloudlet over wireless mesh networks for real-time applications
Kashif A. Khan, Qi Wang, Chunbo Luo, et al.
Mobile cloud computing is receiving world-wide momentum for ubiquitous on-demand cloud services for mobile users provided by Amazon, Google etc. with low capital cost. However, Internet-centric clouds introduce wide area network (WAN) delays that are often intolerable for real-time applications such as video streaming. One promising approach to addressing this challenge is to deploy decentralized mini-cloud facility known as cloudlets to enable localized cloud services. When supported by local wireless connectivity, a wireless cloudlet is expected to offer low cost and high performance cloud services for the users. In this work, we implement a realistic framework that comprises both a popular Internet cloud (Amazon Cloud) and a real-world cloudlet (based on Ubuntu Enterprise Cloud (UEC)) for mobile cloud users in a wireless mesh network. We focus on real-time video streaming over the HTTP standard and implement a typical application. We further perform a comprehensive comparative analysis and empirical evaluation of the application’s performance when it is delivered over the Internet cloud and the cloudlet respectively. The study quantifies the influence of the two different cloud networking architectures on supporting real-time video streaming. We also enable movement of the users in the wireless mesh network and investigate the effect of user’s mobility on mobile cloud computing over the cloudlet and Amazon cloud respectively. Our experimental results demonstrate the advantages of the cloudlet paradigm over its Internet cloud counterpart in supporting the quality of service of real-time applications.
Empirical evaluation of H.265/HEVC-based dynamic adaptive video streaming over HTTP (HEVC-DASH)
Real-time HTTP streaming has gained global popularity for delivering video content over Internet. In particular, the recent MPEG-DASH (Dynamic Adaptive Streaming over HTTP) standard enables on-demand, live, and adaptive Internet streaming in response to network bandwidth fluctuations. Meanwhile, emerging is the new-generation video coding standard, H.265/HEVC (High Efficiency Video Coding) promises to reduce the bandwidth requirement by 50% at the same video quality when compared with the current H.264/AVC standard. However, little existing work has addressed the integration of the DASH and HEVC standards, let alone empirical performance evaluation of such systems. This paper presents an experimental HEVC-DASH system, which is a pull-based adaptive streaming solution that delivers HEVC-coded video content through conventional HTTP servers where the client switches to its desired quality, resolution or bitrate based on the available network bandwidth. Previous studies in DASH have focused on H.264/AVC, whereas we present an empirical evaluation of the HEVC-DASH system by implementing a real-world test bed, which consists of an Apache HTTP Server with GPAC, an MP4Client (GPAC) with open HEVC-based DASH client and a NETEM box in the middle emulating different network conditions. We investigate and analyze the performance of HEVC-DASH by exploring the impact of various network conditions such as packet loss, bandwidth and delay on video quality. Furthermore, we compare the Intra and Random Access profiles of HEVC coding with the Intra profile of H.264/AVC when the correspondingly encoded video is streamed with DASH. Finally, we explore the correlation among the quality metrics and network conditions, and empirically establish under which conditions the different codecs can provide satisfactory performance.
Network-aware scalable video monitoring system for emergency situations with operator-managed fidelity control
Tawfik Al Hadhrami, James M. Nightingale, Qi Wang, et al.
In emergency situations, the ability to remotely monitor unfolding events using high-quality video feeds will significantly improve the incident commander’s understanding of the situation and thereby aids effective decision making. This paper presents a novel, adaptive video monitoring system for emergency situations where the normal communications network infrastructure has been severely impaired or is no longer operational. The proposed scheme, operating over a rapidly deployable wireless mesh network, supports real-time video feeds between first responders, forward operating bases and primary command and control centers. Video feeds captured on portable devices carried by first responders and by static visual sensors are encoded in H.264/SVC, the scalable extension to H.264/AVC, allowing efficient, standard-based temporal, spatial, and quality scalability of the video. A three-tier video delivery system is proposed, which balances the need to avoid overuse of mesh nodes with the operational requirements of the emergency management team. In the first tier, the video feeds are delivered at a low spatial and temporal resolution employing only the base layer of the H.264/SVC video stream. Routing in this mode is designed to employ all nodes across the entire mesh network. In the second tier, whenever operational considerations require that commanders or operators focus on a particular video feed, a ‘fidelity control’ mechanism at the monitoring station sends control messages to the routing and scheduling agents in the mesh network, which increase the quality of the received picture using SNR scalability while conserving bandwidth by maintaining a low frame rate. In this mode, routing decisions are based on reliable packet delivery with the most reliable routes being used to deliver the base and lower enhancement layers; as fidelity is increased and more scalable layers are transmitted they will be assigned to routes in descending order of reliability. The third tier of video delivery transmits a high-quality video stream including all available scalable layers using the most reliable routes through the mesh network ensuring the highest possible video quality. The proposed scheme is implemented in a proven simulator, and the performance of the proposed system is numerically evaluated through extensive simulations. We further present an in-depth analysis of the proposed solutions and potential approaches towards supporting high-quality visual communications in such a demanding context.
Scene-based nonuniformity correction using multiframe registration and iteration method
Jianle Ren, Qian Chen, Weixian Qian, et al.
In this paper, an improved scene-based nonuniformity correction (NC) algorithm for infrared focal plane arrays (IRFPAs) using multiframe registration and iteration method is proposed. This method estimates the global translation and iterates between several adjacent frames. Then mean square error between any two properly registered images is minimized to obtain nonuniformity correction parameters. The detailed method includes three main steps: First, we assume that brightness along the motion trajectory is constant, and a linear detector response and model the nonuniformity of each detector with a gain and a bias. Second, several adjacent frames are used to compute relative motion of any two adjacent frames. Here we use the Fourier shift theorem, their relative translation can be obtained by calculating their normalized cross-power spectrum. We choose K adjacent frames, so the total number of iteration is K*(K-1)/2. Then the mean square error function is defined as the corresponding difference between the two adjacent corrected frames, and it is minimized making use of the least mean square algorithm. The use of correlation of adjacent frames sufficiently, together with iteration strategy between them, can get fast and reliable fixed-pattern noise reduction with low few ghosting artifacts. We define the algorithm and present a number of experimental results to demonstrate the efficacy of the proposed method in comparison to several previously published methods. The performance of the proposed method is thoroughly evaluated with clean infrared image sequences with synthetic nonuniformity and real infrared imagery.
Adaptive skin segmentation via feature-based face detection
Michael J. Taylor, Tim Morris
Variations in illumination can have significant effects on the apparent colour of skin, which can be damaging to the efficacy of any colour-based segmentation approach. We attempt to overcome this issue by presenting a new adaptive approach, capable of generating skin colour models at run-time. Our approach adopts a Viola-Jones feature-based face detector, in a moderate-recall, high-precision configuration, to sample faces within an image, with an emphasis on avoiding potentially detrimental false positives. From these samples, we extract a set of pixels that are likely to be from skin regions, filter them according to their relative luma values in an attempt to eliminate typical non-skin facial features (eyes, mouths, nostrils, etc.), and hence establish a set of pixels that we can be confident represent skin. Using this representative set, we train a unimodal Gaussian function to model the skin colour in the given image in the normalised rg colour space – a combination of modelling approach and colour space that benefits us in a number of ways. A generated function can subsequently be applied to every pixel in the given image, and, hence, the probability that any given pixel represents skin can be determined. Segmentation of the skin, therefore, can be as simple as applying a binary threshold to the calculated probabilities. In this paper, we touch upon a number of existing approaches, describe the methods behind our new system, present the results of its application to arbitrary images of people with detectable faces, which we have found to be extremely encouraging, and investigate its potential to be used as part of real-time systems.
The wide area retrievals of temperature in life space from multi-data set fusion
Heat wave is one of the phenomena stemmed from abnormal climate caused by climate change. This phenomenon which occurs strongly and frequently worldwide has been threatening the heath-vulnerable classes in the urban and suburb area. To reduce the damage from the heat wave, the current research attempts to perform data assimilation between highresolution images and ground observation data based on middle infra-red satellite imagery. We use an integrated approach involving compilation of both spatial and non-spatial data from government agencies and institutions, application of spatial and temporal analyses using remote sensing data. The near real-time temperature retrievals of selected areas are performed and analyzed using thermal data from COMS, Landsat, and in-situ data. And, the computational complexity and storage were discussed. Seven major land-use categories (Built-up, Road, Agriculture (green house, paddy fields, and dry fields), Field of construction work, Vegetation (forests), Wasteland and Water bodies) frequently are used in Korea. The four land-uses were selected as the most strongly areas affected by heat waves according to the survey of National Emergency Management Agency. In the future, we will estimate the precise wide area temperature of life space and promote the application of the heat/health watch/warning system.
Moving targets tracking on a mobile infrared platform and its real-time application on GPU
Chen Peng, Qian Chen, Wei-xian Qian, et al.
In the vision from a mobile infrared platform, the background scene keeps changing due to the moving camera, which results in more background clutters and noise. In this condition, it is difficult to track the target depending on brightness or gradient information. In this paper, we present a new tracking algorithm which segments the moving target according to the difference of velocity field between the target and background. In the proposed algorithm, the phase correlation method is introduced to transform two adjacent frames into one coordinate to remain a relatively static background. And then the Horn-Schunck optical flow is calculated in the tracking window to estimate the velocity field of the target. Finally we introduce the particle filter algorithm to estimate the location of the target, which has a more robust performance by optimizing the transition probability of particles with features of optical flow. Our algorithm can be organized in parallel processing mode enabled for GPU (graphics processing unit). By parallel computing, our algorithm is computationally efficient that it can work in real time. Experimental results show that the proposed algorithm can track the infrared target in real time on a moving platform.
An automatic lesion detection using dynamic image enhancement and constrained clustering
Jean M. Vianney Kinani, Alberto J. Rosales-Silva, Francisco J. Gallegos-Funes, et al.
In this work, we present a fast and robust method for lesions detection, primarily, a non-linear image enhancement is performed on T1 weighted magnetic resonance (MR) images in order to facilitate an effective segmentation that enables the lesion detection. First a dynamic system that performs the intensity transformation through the Modified sigmoid function contrast stretching is established, then, the enhanced image is used to classify different brain structures including the lesion using constrained fuzzy clustering, and finally, the lesion contour is outlined through the level set evolution. Through experiments, validation of the algorithm was carried out using both clinical and synthetic brain lesion datasets and an 84%–93% overlap performance of the proposed algorithm was obtained with an emphasis on robustness with respect to different lesion types.