Proceedings Volume 7871

Real-Time Image and Video Processing 2011

cover
Proceedings Volume 7871

Real-Time Image and Video Processing 2011

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 1 February 2011
Contents: 6 Sessions, 25 Papers, 0 Presentations
Conference: IS&T/SPIE Electronic Imaging 2011
Volume Number: 7871

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 7871
  • Real-Time Algorithms/Systems I
  • Real-Time Implementation/Hardware
  • Real-Time Video
  • Real-Time Algorithms/Systems II
  • Interactive Paper Session
Front Matter: Volume 7871
icon_mobile_dropdown
Front Matter: Volume 7871
This PDF file contains the front matter associated with SPIE Proceedings Volume 7871, including the Title Page, Copyright information, Table of Contents, and the Conference Committee listing.
Real-Time Algorithms/Systems I
icon_mobile_dropdown
Towards real-time image quality assessment
Bobby Geary, Christos Grecos
We introduce a real-time implementation and evaluation of a new fast accurate full reference based image quality metric. The popular general image quality metric known as the Structural Similarity Index Metric (SSIM) has been shown to be an effective, efficient and useful, finding many practical and theoretical applications. Recently the authors have proposed an enhanced version of the SSIM algorithm known as the Rotated Gaussian Discrimination Metric (RGDM). This approach uses a Gaussian-like discrimination function to evaluate local contrast and luminance. RGDM was inspired by an exploration of local statistical parameter variations in relation to variation of Mean Opinion Score (MOS) for a range of particular distortion types. In this paper we out-line the salient features of the derivation of RGDM and show how analyses of local statistics of distortion type necessitate variation in discrimination function width. Results on the LIVE image database show tight banding of RGDM metric value when plotted against mean opinion score indicating the usefulness of this metric. We then explore a number of strategies for algorithmic speed-up including the application of Integral Images for patch based computation optimisation, cost reduction for the evaluation of the discrimination function and general loop unrolling. We also employ fast Single Instruction Multiple Data (SIMD) intrinsics and explore data parallel decomposition on a multi-core Intel Processor.
2000 fps real-time target tracking vision system based on color histogram
Idaku Ishii, Tetsuro Tatebe, Qingyi Gu, et al.
In this study, we develop a high-speed color-histogram-based tracking system that can be applied to 512x511 pixel images at 2000~fps using the hardware implementation of an improved CamShift algorithm. In the improved algorithm, the size, position, and orientation of an object to be tracked can be extracted using only the hardware implementation of color conversion and the moment feature calculation of 16 binary images quantized by color bins on a high-speed vision platform. We demonstrate its effectiveness by presenting several target tracking results at 2000~fps when color-patterned objects move rapidly under complicated backgrounds.
Real-time iris tracking with a smart camera
This paper presents a real-time iris detection procedure for gray intensity images. Typical applications for iris detection utilize template and feature based methods. These methods are generally time and memory intensive and not applicable for all practical real-time embedded realizations. Here, we propose a method that utilizes a simple algorithm that is time-efficient with high detection and low error rates that is implemented in a smart camera. The system used for this research involves a National Instruments smart camera with LabVIEW Real-Time Module. First, the images are analyzed to determine the region of interest (face). The iris location is determined by applying a convolution-based algorithm on the edge image and then using the Hough Transform. The edge-based less complex and less computationally expensive algorithm results in an efficient analysis method. The extracted iris location information is stored in the camera's image buffer, and used to model one specific eye pattern. The location of the iris thus determined is used as a reference to reduce the search region for the iris in the subsequent images. The iris detection algorithm has been applied at different frame rates. The results demonstrate the speed of this algorithm allows the tracking of the iris when the eyes or the subject is moving in front of the camera at reasonable speeds and with limited occlusions.
Optimization of image processing algorithms on mobile platforms
Pramod Poudel, Mukul Shirvaikar
This work presents a technique to optimize popular image processing algorithms on mobile platforms such as cell phones, net-books and personal digital assistants (PDAs). The increasing demand for video applications like context-aware computing on mobile embedded systems requires the use of computationally intensive image processing algorithms. The system engineer has a mandate to optimize them so as to meet real-time deadlines. A methodology to take advantage of the asymmetric dual-core processor, which includes an ARM and a DSP core supported by shared memory, is presented with implementation details. The target platform chosen is the popular OMAP 3530 processor for embedded media systems. It has an asymmetric dual-core architecture with an ARM Cortex-A8 and a TMS320C64x Digital Signal Processor (DSP). The development platform was the BeagleBoard with 256 MB of NAND RAM and 256 MB SDRAM memory. The basic image correlation algorithm is chosen for benchmarking as it finds widespread application for various template matching tasks such as face-recognition. The basic algorithm prototypes conform to OpenCV, a popular computer vision library. OpenCV algorithms can be easily ported to the ARM core which runs a popular operating system such as Linux or Windows CE. However, the DSP is architecturally more efficient at handling DFT algorithms. The algorithms are tested on a variety of images and performance results are presented measuring the speedup obtained due to dual-core implementation. A major advantage of this approach is that it allows the ARM processor to perform important real-time tasks, while the DSP addresses performance-hungry algorithms.
Real-Time Implementation/Hardware
icon_mobile_dropdown
Scalable software architecture for on-line multi-camera video processing
Massimo Camplani, Luis Salgado
In this paper we present a scalable software architecture for on-line multi-camera video processing, that guarantees a good trade off between computational power, scalability and flexibility. The software system is modular and its main blocks are the Processing Units (PUs), and the Central Unit. The Central Unit works as a supervisor of the running PUs and each PU manages the acquisition phase and the processing phase. Furthermore, an approach to easily parallelize the desired processing application has been presented. In this paper, as case study, we apply the proposed software architecture to a multi-camera system in order to efficiently manage multiple 2D object detection modules in a real-time scenario. System performance has been evaluated under different load conditions such as number of cameras and image sizes. The results show that the software architecture scales well with the number of camera and can easily works with different image formats respecting the real time constraints. Moreover, the parallelization approach can be used in order to speed up the processing tasks with a low level of overhead.
Real-time implementation of logo detection on open source BeagleBoard
M. George, N. Kehtarnavaz, L. Estevez
This paper presents the real-time implementation of our previously developed logo detection and tracking algorithm on the open source BeagleBoard mobile platform. This platform has an OMAP processor that incorporates an ARM Cortex processor. The algorithm combines Scale Invariant Feature Transform (SIFT) with k-means clustering, online color calibration and moment invariants to robustly detect and track logos in video. Various optimization steps that are carried out to allow the real-time execution of the algorithm on BeagleBoard are discussed. The results obtained are compared to the PC real-time implementation results.
Low complexity orientation detection algorithm for real-time implementation
Vikram V. Appia, Rajesh Narasimha
In this paper we describe a low complexity image orientation detection algorithm which can be implemented in real-time on embedded devices such as low-cost digital cameras, mobile phone cameras and video surveillance cameras. Providing orientation information to tamper detection algorithm in surveillance cameras, color enhancement algorithm and various scene classifiers can help improve their performances. Various image orientation detection algorithms have been developed in the last few years for image management systems, as a post processing tool. But, these techniques use certain high-level features and object classification to detect the orientation, thus they are not suitable for implementation on a capturing device in real-time. Our algorithm uses low-level features such as texture, lines and source of illumination to detect orientation. We implemented the algorithm on a mobile phone camera device with a 180 MHz, ARM926 processor. The orientation detection takes 10 ms for each frame which makes it suitable to use in image capture as well as video mode. It can be used efficiently in parallel with the other processes in the imaging pipeline of the device. On hardware, the algorithm achieved an accuracy of 92% with a rejection rate of 4% and a false detection rate of 8% on outdoor images.
Real-time topological image smoothing on shared memory parallel machines
Smoothing filter is the method of choice for image preprocessing and pattern recognition. We present a new concurrent method for smoothing 2D object in binary case. Proposed method provides a parallel computation while preserving the topology by using homotopic transformations. We introduce an adapted parallelization strategy called split, distribute and merge (SDM) strategy which allows efficient parallelization of a large class of topological operators including, mainly, smoothing, skeletonization, and watershed algorithms. To achieve a good speedup, we cared about task scheduling. Distributed work during smoothing process is done by a variable number of threads. Tests on 2D binary image (512*512), using shared memory parallel machine (SMPM) with 8 CPU cores (2× Xeon E5405 running at frequency of 2 GHz), showed an enhancement of 5.2 thus a cadency of 32 images per second is achieved.
Multithreaded real-time 3D image processing software architecture and implementation
Vikas Ramachandra, Kalin Atanassov, Milivoje Aleksic, et al.
Recently, 3D displays and videos have generated a lot of interest in the consumer electronics industry. To make 3D capture and playback popular and practical, a user friendly playback interface is desirable. Towards this end, we built a real time software 3D video player. The 3D video player displays user captured 3D videos, provides for various 3D specific image processing functions and ensures a pleasant viewing experience. Moreover, the player enables user interactivity by providing digital zoom and pan functionalities. This real time 3D player was implemented on the GPU using CUDA and OpenGL. The player provides user interactive 3D video playback. Stereo images are first read by the player from a fast drive and rectified. Further processing of the images determines the optimal convergence point in the 3D scene to reduce eye strain. The rationale for this convergence point selection takes into account scene depth and display geometry. The first step in this processing chain is identifying keypoints by detecting vertical edges within the left image. Regions surrounding reliable keypoints are then located on the right image through the use of block matching. The difference in the positions between the corresponding regions in the left and right images are then used to calculate disparity. The extrema of the disparity histogram gives the scene disparity range. The left and right images are shifted based upon the calculated range, in order to place the desired region of the 3D scene at convergence. All the above computations are performed on one CPU thread which calls CUDA functions. Image upsampling and shifting is performed in response to user zoom and pan. The player also consists of a CPU display thread, which uses OpenGL rendering (quad buffers). This also gathers user input for digital zoom and pan and sends them to the processing thread.
Real-Time Video
icon_mobile_dropdown
Real-time video streaming using H.264 scalable video coding (SVC) in multihomed mobile networks: a testbed approach
Users of the next generation wireless paradigm known as multihomed mobile networks expect satisfactory quality of service (QoS) when accessing streamed multimedia content. The recent H.264 Scalable Video Coding (SVC) extension to the Advanced Video Coding standard (AVC), offers the facility to adapt real-time video streams in response to the dynamic conditions of multiple network paths encountered in multihomed wireless mobile networks. Nevertheless, preexisting streaming algorithms were mainly proposed for AVC delivery over multipath wired networks and were evaluated by software simulation. This paper introduces a practical, hardware-based testbed upon which we implement and evaluate real-time H.264 SVC streaming algorithms in a realistic multihomed wireless mobile networks environment. We propose an optimised streaming algorithm with multi-fold technical contributions. Firstly, we extended the AVC packet prioritisation schemes to reflect the three-dimensional granularity of SVC. Secondly, we designed a mechanism for evaluating the effects of different streamer 'read ahead window' sizes on real-time performance. Thirdly, we took account of the previously unconsidered path switching and mobile networks tunnelling overheads encountered in real-world deployments. Finally, we implemented a path condition monitoring and reporting scheme to facilitate the intelligent path switching. The proposed system has been experimentally shown to offer a significant improvement in PSNR of the received stream compared with representative existing algorithms.
A new bitstream structure for parallel CAVLC decoding
Yun Gu Lee, Kyung Hwan Cho
A context adaptive variable length coding (CAVLC) decoder do not know the exact start position of the k-th syntax element in a bitstream until it finishes parsing the (k-1)-th syntax element. It makes a parallel CAVLC decoding difficult. It significantly increases implementation cost to predict the exact start position of a syntax element prior to parsing its previous one. In this paper, we propose a new bitstream structure to concurrently access multiple syntax elements for parallel CAVLC decoding. The method divides a bit-stream into N kinds of segments whose size is M bits and puts syntax elements into the segments, based on a proposed rule. Then, a CAVLC decoder can simultaneously access N segments to read N syntax elements from a single bitstream and decode them in parallel. This technique increases the speed of CAVLC decoding by up to N times. Since the method just rearranges the generated bitstream, it does not affect coding efficiency. Experimental results show that the proposed algorithm significantly increases decoding speed.
3D video sequence reconstruction algorithms implemented on a DSP
A novel approach for 3D image and video reconstruction is proposed and implemented. This is based on the wavelet atomic functions (WAF) that have demonstrated better approximation properties in different processing problems in comparison with classical wavelets. Disparity maps using WAF are formed, and then they are employed in order to present 3D visualization using color anaglyphs. Additionally, the compression via Pth law is performed to improve the disparity map quality. Other approaches such as optical flow and stereo matching algorithm are also implemented as the comparative approaches. Numerous simulation results have justified the efficiency of the novel framework. The implementation of the proposed algorithm on the Texas Instruments DSP TMS320DM642 permits to demonstrate possible real time processing mode during 3D video reconstruction for images and video sequences.
Real-time patch sweeping for high-quality depth estimation in 3D video conferencing applications
Wolfgang Waizenegger, Ingo Feldmann, Oliver Schreer
In future 3D videoconferencing systems, depth estimation is required to support autostereoscopic displays and even more important, to provide eye contact. Real-time 3D video processing is currently possible, but within some limits. Since traditional CPU centred sub-pixel disparity estimation is computationally expensive, the depth resolution of fast stereo approaches is directly linked to pixel quantization and the selected stereo baseline. In this work we present a novel, highly parallelizable algorithm that is capable of dealing with arbitrary depth resolutions while avoiding texture interpolation related runtime penalties by application of GPU centred design. The cornerstone of our patch sweeping approach is the fusion of space sweeping and patch based 3D estimation techniques. Especially for narrow baseline multi-camera configurations, as commonly used for 3D videoconferencing systems (e.g. [1]), it preserves the strengths of both techniques and avoid their shortcomings at the same time. Moreover, we provide a sophisticated parameterization and quantization scheme that establishes a very good scalability of our algorithm in terms of computation time and depth estimation quality. Furthermore, we present an optimized CUDA implementation for a multi GPU setup in a cluster environment. For each GPU, it performs three pair wise high quality depth estimations for a trifocal narrow baseline camera configuration on a 256x256 image block within real-time.
Real-Time Algorithms/Systems II
icon_mobile_dropdown
Real-time scene change detection assisted with camera 3A: auto exposure, auto white balance, and auto focus
Liang Liang, Bob Hung, Ying Noyes, et al.
Many scene change detection techniques have been developed for scene cuts, fade in and fade out by analyzing video encoder input signals. For real time scene change detection, sensor input signals provide first-hand information which can be used for scene change detection. In this paper, by analyzing camcorder front end sensor input signals with our proposed algorithms based on camera 3A (auto exposure, auto white balance and auto focus), a novel scene change detection technique is described. Camera 3A based scene change detection algorithm can detect scene changes in a timely manner and therefore fits well for real time scene change detection applications. Experimental results show that this algorithm can detect scene changes with good accuracy. The proposed algorithm is computationally efficient and easy to implement.
Fast approximate 4D:3D discrete radon transform, from light field to focal stack with O(N4) sums
José G. Marichal-Hernández, Jonas P. Lüke, Fernando L. Rosa, et al.
In this work we develop a new algorithm, that extends the bidimensional Fast Digital Radon transform from Götz and Druckmüller (1996), to digitally simulate the refocusing of a 4D light field into a 3D volume of photographic planes, as previously done by Ren Ng et al. (2005), but with the minimum number of operations. This new algorithm does not require multiplications, just sums, and its computational complexity is O(N4) to achieve a volume consisting of 2N photographic planes focused at different depths, from a N4 plenoptic image. This reduced complexity allows for the acquisition and processing of a plenoptic sequence with the purpose of estimating 3D shape at video rate. Examples are given of implementations on GPU and CPU platforms. Finally, a modified version of the algorithm to deal with domains of sizes different than power of two, is proposed.
A cross-based filter for fast edge-preserving smoothing
Ke Zhang, Jiangbo Lu, Gauthier Lafruit, et al.
In this paper, we present a local adaptive filter for fast edge-preserving smoothing, a so-called cross-based filter. The filter is mainly built on upright crosses and captures the local image structures adaptively. The cross-based filter has some resemblance with the classic bilateral filter, when binarizing the support weight and imposing a spatial connectivity constraint. For edge-preserving smoothing, our cross-based filter is capable of reaching similar performance as bilateral filter, while being dozens of times faster. The proposed filter can be applied in near-constant time, using the integral images technique. In addition, the cross-based filter is highly parallel and suitable for parallel computing platforms, e.g. GPUs. The strength of the proposed filter is illustrated in several applications, i.e. denoising and image abstraction.
Human action recognition in a wide and complex environment
In this paper, a linear discriminant analysis (LDA) based classifier employed in a tree structure is presented to recognize the human actions in a wide and complex environment. In particular, the proposed classifier is based on a supervised learning process and achieves the required classification in a multi-step process. This multi-step process is performed simply by adopting a tree structured which is built during the training phase. Hence, there is no need of any priori information like in other classifiers such as the number of hidden neurons or hidden layers in a multilayer neural network based classifier or an exhaustive search as used in training algorithms for decision trees. A skeleton based strategy is adopted to extract the features from a given video sequence representing any human action. A Pan-Tilt-Zoom (PTZ) camera is used to monitor the wide and complex test environment. A background mosaic image is built offline and used to compute the background images in real time. A background subtraction strategy has been adopted for detecting the object in various frames and to extract their corresponding silhouette. A skeleton based process is used to extract attributes of a feature vector corresponding to a human action. Finally, the proposed framework is tested on various indoor and outdoor scenarios and encouraging results are achieved in terms of classification accuracy.
Interactive Paper Session
icon_mobile_dropdown
Efficient object tracking in WAAS data streams
Trevor R. H. Clarke, Roxanne Canosa
Wide area airborne surveillance (WAAS) systems are a new class of remote sensing imagers which have many military and civilian applications. These systems are characterized by long loiter times (extended imaging time over fixed target areas) and large footprint target areas. These characteristics complicate moving object detection and tracking due to the large image size and high number of moving objects. This research evaluates existing object detection and tracking algorithms withWAAS data and provides enhancements to the processing chain which decrease processing time and maintain or increase tracking accuracy. Decreases in processing time are needed to perform real-time or near real-time tracking either on the WAAS sensor platform or in ground station processing centers. Increased tracking accuracy benefits real-time users and forensic (off-line) users.
How fast can one numerically reconstruct digitally recorded holograms?
Results of comparative study of the computational complexity of different algorithms for numerical reconstruction of electronically recorded holograms are presented and discussed. The following algorithms were compared: different types of Fourier and convolutional algorithms and a new universal DCT-based algorithm, in terms of the number of operations. Based on the comparison results, the feasibility of real-time implementation of numerical reconstruction of holograms is evaluated.
Tracking flow of leukocytes in blood for drug analysis
Arslan Basharat, Wesley Turner, Gillian Stephens, et al.
Modern microscopy techniques allow imaging of circulating blood components under vascular flow conditions. The resulting video sequences provide unique insights into the behavior of blood cells within the vasculature and can be used as a method to monitor and quantitate the recruitment of inflammatory cells at sites of vascular injury/ inflammation and potentially serve as a pharmacodynamic biomarker, helping screen new therapies and individualize dose and combinations of drugs. However, manual analysis of these video sequences is intractable, requiring hours per 400 second video clip. In this paper, we present an automated technique to analyze the behavior and recruitment of human leukocytes in whole blood under physiological conditions of shear through a simple multi-channel fluorescence microscope in real-time. This technique detects and tracks the recruitment of leukocytes to a bioactive surface coated on a flow chamber. Rolling cells (cells which partially bind to the bioactive matrix) are detected counted, and have their velocity measured and graphed. The challenges here include: high cell density, appearance similarity, and low (1Hz) frame rate. Our approach performs frame differencing based motion segmentation, track initialization and online tracking of individual leukocytes.
Phase correlation based adaptive mode decision for the H.264/AVC
Abdelrahman Abdelazim, Stephen James Mein, Martin Roy Varley, et al.
The H.264 video coding standard achieves high performance compression and image quality at the expense of increased encoding complexity, due to the very refined Motion Estimation (ME) and mode decision processes. This paper focuses on decreasing the complexity of the mode selection process by effectively applying a novel fast mode decision algorithm. Firstly the phase correlation is analysed between a macroblock and its prediction obtained from the previously encoded adjacent block. Relationships are established between the correlation value and object size and also best fit motion vector. From this a novel fast mode decision and motion estimation technique has been developed utilising preprocessing frequency domain ME in order to accurately predict the best mode and the search range. We measure the correlation between a macroblock and the corresponding prediction. Based on the result we select the best mode, or limit the mode selection process to a subset of modes. Moreover the correlation result is also used to select an appropriate search range for the ME stage. Experimental results show that the proposed algorithm significantly reduces the motion estimation time whilst maintaining similar Rate Distortion performance, when compared to both the H.264/AVC Joint Model (JM) reference software and recently reported work.
Fast multilayered prediction algorithm for group of pictures in H.264/SVC
Abdelrahman Abdelazim, Stephen James Mein, Martin Roy Varley, et al.
The objective of scalable video coding is to enable the generation of a unique bitstream that can adapt to various bitrates, transmission channels and display capabilities. The scalability is categorised in terms of temporal, spatial, and quality. To improve encoding efficiency, the SVC scheme incorporates inter-layer prediction mechanisms which increases complexity of overall encoding. In this paper several conditional probabilities are established relating motion estimation characteristics and the mode distribution at different layers of the H.264/SVC. An evaluation of these probabilities is used to structure a low-complexity prediction algorithm for Group of Pictures (GOP) in H.264/SVC, reducing computational complexity whilst maintaining similar performance. When compared to the JSVM software, this algorithm achieves a significant reduction of encoding time, with a negligible average PSNR loss and bit-rate increase in temporal, spatial and SNR scalability. Experiments are conducted to provide a comparison between our method and a recently developed fast mode selection algorithm. These demonstrate our method achieves appreciable time savings for scalable spatial and scalable quality video coding, while maintaining similar PSNR and bit rate.
X-Eye: a novel wearable vision system
Yuan-Kai Wang, Ching-Tang Fan, Shao-Ang Chen, et al.
This paper proposes a smart portable device, named the X-Eye, which provides a gesture interface with a small size but a large display for the application of photo capture and management. The wearable vision system is implemented with embedded systems and can achieve real-time performance. The hardware of the system includes an asymmetric dualcore processer with an ARM core and a DSP core. The display device is a pico projector which has a small volume size but can project large screen size. A triple buffering mechanism is designed for efficient memory management. Software functions are partitioned and pipelined for effective execution in parallel. The gesture recognition is achieved first by a color classification which is based on the expectation-maximization algorithm and Gaussian mixture model (GMM). To improve the performance of the GMM, we devise a LUT (Look Up Table) technique. Fingertips are extracted and geometrical features of fingertip's shape are matched to recognize user's gesture commands finally. In order to verify the accuracy of the gesture recognition module, experiments are conducted in eight scenes with 400 test videos including the challenge of colorful background, low illumination, and flickering. The processing speed of the whole system including the gesture recognition is with the frame rate of 22.9FPS. Experimental results give 99% recognition rate. The experimental results demonstrate that this small-size large-screen wearable system has effective gesture interface with real-time performance.
Real-time vehicle matching for multi-camera tunnel surveillance
Vedran Jelača, Jorge Oswaldo Niño Castañeda, Andrés Frías-Velázquez, et al.
Tracking multiple vehicles with multiple cameras is a challenging problem of great importance in tunnel surveillance. One of the main challenges is accurate vehicle matching across the cameras with non-overlapping fields of view. Since systems dedicated to this task can contain hundreds of cameras which observe dozens of vehicles each, for a real-time performance computational efficiency is essential. In this paper, we propose a low complexity, yet highly accurate method for vehicle matching using vehicle signatures composed of Radon transform like projection profiles of the vehicle image. The proposed signatures can be calculated by a simple scan-line algorithm, by the camera software itself and transmitted to the central server or to the other cameras in a smart camera environment. The amount of data is drastically reduced compared to the whole image, which relaxes the data link capacity requirements. Experiments on real vehicle images, extracted from video sequences recorded in a tunnel by two distant security cameras, validate our approach.