Real-Time Image and Video Processing 2010

Front Matter: Volume 7724

Show abstract

This PDF file contains the front matter associated with SPIE Proceedings Volume 7724, including the Title Page, Copyright information, Table of Contents, and the Conference Committee listing.

Real-time robust estimation of vanishing points through nonlinear optimization

Marcos Nieto, Luis Salgado

Show abstract

Vanishing points are elements of great interest in the computer vision field, since they are the main source of information about the geometry of the scene and the projection process associated to the camera. They have been studied and applied during decades for plane rectification, 3D reconstruction, and mainly auto-calibration tasks. Nevertheless, the literature lacks accurate online solutions for multiple vanishing point estimation. Most strategies focalize on the accuracy, using highly computational demanding iterative procedures. We propose a novel strategy for multiple vanishing point estimation that finds a trade-off between accuracy and efficiency, being able to operate in real time for video sequences. This strategy takes advantage of the temporal coherence of the images of the sequences to reduce the computational load of the processing algorithms while keeping a high level of accuracy due to an optimization process. The key element of the approach is a robust scheme based on the MLESAC algorithm, which is used in a similar way to the EM algorithm. This approach ensures robust and accurate estimations, since we use the MLESAC in combination with a novel error function, based on the angular error between the vanishing point and the image features. To increase the speed of the MLESAC algorithm, the selection of the minimal sample sets is substituted by a random sampling step that takes into account temporal information to provide better initializations. Besides, for the sake of flexibility, the proposed error function has been designed to work using as image features indiscriminately gradient-pixels or line segments. Hence, we increase the range of applications in which our approach can be used, according to the type of information that is available. The results show a real-time system that delivers real-time accurate estimations of multiple vanishing points for online processing, tested in moving camera video sequences of structured scenarios, both indoors and outdoors, such as rooms, corridors, facades, roads, etc.

Super-resolution algorithms based on atomic wavelet functions in real-time processing of video sequences

Francisco Gomeztagle-Sepulveda, Victor Kravchenko, Volodymyr I. Ponomaryov

Show abstract

In the real world, there are a variety of applications of high resolution (HR) images in remote sensing, video frame freezing, medicine, robot artificial viewing, military information acquisition, etc. Because of the high cost and physical limitations of the acquisition hardware, the low-resolution (LR) images are used frequently. So, super-resolution (SR) restoration is an emerged solution permitting to form one or a set of HR images from a sequence of LR images. The proposed SR framework takes into account the spatial and spectral WT pixel information reconstructing different video and texture nature, presenting good performance in terms of objective (PSNR, MAE, NCD) criteria and visual subjective perception, employing the Wavelets based on atomic functions (WAF). Statistical simulations have demonstrated the effectiveness of the novel approach. The real time digital processing has been implemented on DSP of Texas Instruments TMS320DM642, demonstrating the effectiveness of the reconstruction of SR images in real time processing mode, and justifying this in the video sequences of different nature, pixel resolution and motion behavior.

Real-time multi-barcode reader for industrial applications

Iffat Zafar, Usman Zakir, Eran A. Edirisinghe

Show abstract

The advances in automated production processes have resulted in the need for detecting, reading and decoding 2D datamatrix barcodes at very high speeds. This requires the correct combination of high speed optical devices that are capable of capturing high quality images and computer vision algorithms that can read and decode the barcodes accurately. Such barcode readers should also be capable of resolving fundamental imaging challenges arising from blurred barcode edges, reflections from possible polyethylene wrapping, poor and/or non-uniform illumination, fluctuations of focus, rotation and scale changes. Addressing the above challenges in this paper we propose the design and implementation of a high speed multi-barcode reader and provide test results from an industrial trial. To authors knowledge such a comprehensive system has not been proposed and fully investigated in existing literature. To reduce the reflections on the images caused due to polyethylene wrapping used in typical packaging, polarising filters have been used. The images captured using the optical system above will still include imperfections and variations due to scale, rotation, illumination etc. We use a number of novel image enhancement algorithms optimised for use with 2D datamatrix barcodes for image de-blurring, contrast point and self-shadow removal using an affine transform based approach and non-uniform illumination correction. The enhanced images are subsequently used for barcode detection and recognition. We provide experimental results from a factory trial of using the multi-barcode reader and evaluate the performance of each optical unit and computer vision algorithm used. The results indicate an overall accuracy of 99.6 % in barcode recognition at typical speeds of industrial conveyor systems.

Fast algorithms for computing image local statistics in windows of arbitrary shape and weights

Leonid Bilevich, Leonid Yaroslavsky

Show abstract

Computing image local statistics is required in many image processing applications such as local adaptive image restoration, enhancement, segmentation, target location and tracking, to name a few. These computations must be carried out in sliding window of a certain shape and weights. Generally, it is a time consuming operation with per-pixel computational complexity of the order of the window size, which hampers real-time applications. For acceleration of computations, recursive computational algorithms are used. However, such algorithms are available only for windows of certain specific forms, such as rectangle and octagon, with uniform weights. We present a general framework of fast parallel and recursive computation of image local statistics in sliding window of almost arbitrary shape and weights with "per-pixel" computational complexity that is substantially of lower order than the window size. As an illustration of this framework, we describe methods for computing image local moments such as local mean and variance, image local histograms and local order statistics (in particular, minimum, maximum, median), image local ranks, image local DFT, DCT, DcST spectra in polygon-shaped windows as well as in windows with non-uniform weights, such as Sine lobe, Hann, Hamming and Blackman windows.

Real-time 3D light field transmission

Tibor Balogh, Péter Tamás Kovács

Show abstract

Although capturing and displaying stereo 3D content is now commonplace, information-rich light-field video content capture, transmission and display are much more challenging, resulting in at least one order of magnitude increase in complexity even in the simplest cases. We present an end-to-end system capable of capturing and real-time displaying of high-quality light-field video content on various HoloVizio light-field displays, providing very high 3D image quality and continuous motion parallax. The system is compact in terms of number of computers, and provides superior image quality, resolution and frame rate compared to other published systems. To generate light-field content, we have built a camera system with a large number of cameras and connected them to PC computers. The cameras were in an evenly spaced linear arrangement. The capture PC was directly connected through a single gigabit Ethernet connection to the demonstration 3D display, supported by a PC computation cluster. For the task of dense light field displaying massively parallel reordering and filtering of the original camera images is required. We were utilizing both CPU and GPU threads for this task. On the GPU we do the light-field conversion and reordering, filtering and the YUV-RGB conversion. We use OpenGL 3.0 shaders and 2D texture arrays to have easy access to individual camera images. A network-based synchronization scheme is used to present the final rendered images.

Programming Cell/BE and GPUs systems for real-time video encoding

Svetislav Momcilovic, Leonel Sousa

Show abstract

In this work scalable parallelization methods for computing in real-time the H.264/AVC on multi-cores platforms, such as the most recent Graphical Processing Units (GPUs) and Cell Broadband Engine (Cell/BE), are proposed. By applying the Amdahl's law, the most demanding parts of the video coder were identified and the Single Program Multiple Data and Single Instruction Multiple Data approaches are adopted for achieving real-time processing. In particular, video motion estimation and in-loop deblocking filtering were offloaded to be executed in parallel on either GPUs or Cell/BE Synergistic Processor Elements (SPEs). The limits and advantages of these two architectures when dealing with typical video coding problems, such as data dependencies and large input data are demonstrated. We propose techniques to minimize the impact of branch divergences and branch misprediction, data misalignment, conflicts and non-coalesced memory accesses. Moreover, data dependencies and memory size restrictions are taken into account in order to minimize synchronization and communication time overheads, and to achieve the optimal workload balance given the available multiple cores. Data reusing technique is extensively applied for reducing communication overhead, in order to achieve the maximum processing speedup. Experimental results show that real time H.264/AVC is achieved in both systems by computing 30 frames per second, with a resolution of 720×576 pixels, when full-pixel motion estimation is applied over 5 reference frames and 32×32 search area. When quarter-pixel motion estimation is adopted, real time video coding is obtained on GPU for larger search area and on Cell/BE for smaller search areas.

Two-dimensional systolic-array architecture for pixel-level vision tasks

Julien A. Vijverberg, Peter H. N. de With

Show abstract

This paper presents ongoing work on the design of a two-dimensional (2D) systolic array for image processing. This component is designed to operate on a multi-processor system-on-chip. In contrast with other 2D systolic-array architectures and many other hardware accelerators, we investigate the applicability of executing multiple tasks in a time-interleaved fashion on the Systolic Array (SA). This leads to a lower external memory bandwidth and better load balancing of the tasks on the different processing tiles. To enable the interleaving of tasks, we add a shadow-state register for fast task switching. To reduce the number of accesses to the external memory, we propose to share the communication assist between consecutive tasks. A preliminary, non-functional version of the SA has been synthesized for an XV4S25 FPGA device and yields a maximum clock frequency of 150 MHz requiring 1,447 slices and 5 memory blocks. Mapping tasks from video content-analysis applications from literature on the SA yields reductions in the execution time of 1-2 orders of magnitude compared to the software implementation. We conclude that the choice for an SA architecture is useful, but a scaled version of the SA featuring less logic with fewer processing and pipeline stages yielding a lower clock frequency, would be sufficient for a video analysis system-on-chip.

Near real-time endmember extraction from remotely sensed hyperspectral data using NVidia GPUs

Sergio Sánchez, Gabriel Martín, Abel Paz, et al.

Show abstract

One of the most important techniques for hyperspectral data exploitation is spectral unmixing, which aims at characterizing mixed pixels. When the spatial resolution of the sensor is not fine enough to separate different spectral constituents, these can jointly occupy a single pixel and the resulting spectral measurement will be a composite of the individual pure spectra. The N-FINDR algorithm is one of the most widely used and successfully applied methods for automatically determining endmembers (pure spectral signatures) in hyperspectral image data without using a priori information. The identification of such pure signatures is highly beneficial in order to 'unmix' the hyperspectral scene, i.e. to perform sub-pixel analysis by estimating the fractional abundance of endmembers in mixed pixels collected by a hyperspectral imaging spectrometer. The N-FINDR algorithm attempts to automatically find the simplex of maximum volume that can be inscribed within the hyperspectral data set. Due to the intrinsic complexity of remotely sensed scenes and their ever-increasing spatial and spectral resolution, the efficiency of the endmember searching process conducted by N-FINDR depends not only on the size and dimensionality of the scene, but also on its complexity (directly related with the number of endmembers). In this paper, we develop a new parallel version of N-FINDR which is shown to scale better as the dimensionality and complexity of the hyperspectral scene to be processed increases. The parallel algorithm has been implemented on two different parallel systems, in which two different types of commodity graphics processing units (GPUs) from NVidia™ are used to assist the CPU as co-processors. Commodity computing in GPUs is an exciting new development in remote sensing applications since these systems offer the possibility of (onboard) high performance computing at very low cost. Our experimental results, obtained in the framework of a mineral mapping application using hyperspectral data collected by the NASA Jet Propulsion Laboratory's Airborne Visible Infra-Red Imaging Spectrometer (AVIRIS), reveal that the proposed parallel implementation compares favorably with the original version of N-FINDR not only in terms of computation time, but also in terms of the the accuracy of the solutions that it provides. The real-time processing capabilities of our GPU-based N-FINDR algorithms and other GPU algorithms for endmember extraction are also discussed.

Real-time face and gesture analysis for human-robot interaction

Frank Wallhoff, Tobias Rehrl, Christoph Mayer, et al.

Show abstract

Human communication relies on a large number of different communication mechanisms like spoken language, facial expressions, or gestures. Facial expressions and gestures are one of the main nonverbal communication mechanisms and pass large amounts of information between human dialog partners. Therefore, to allow for intuitive human-machine interaction, a real-time capable processing and recognition of facial expressions, hand and head gestures are of great importance. We present a system that is tackling these challenges. The input features for the dynamic head gestures and facial expressions are obtained from a sophisticated three-dimensional model, which is fitted to the user in a real-time capable manner. Applying this model different kinds of information are extracted from the image data and afterwards handed over to a real-time capable data-transferring framework, the so-called Real-Time DataBase (RTDB). In addition to the head and facial-related features, also low-level image features regarding the human hand - optical flow, Hu-moments are stored into the RTDB for the evaluation process of hand gestures. In general, the input of a single camera is sufficient for the parallel evaluation of the different gestures and facial expressions. The real-time capable recognition of the dynamic hand and head gestures are performed via different Hidden Markov Models, which have proven to be a quick and real-time capable classification method. On the other hand, for the facial expressions classical decision trees or more sophisticated support vector machines are used for the classification process. These obtained results of the classification processes are again handed over to the RTDB, where other processes (like a Dialog Management Unit) can easily access them without any blocking effects. In addition, an adjustable amount of history can be stored by the RTDB buffer unit.

Real-time logo detection and tracking in video

M. George, N. Kehtarnavaz, M. Rahman, et al.

Show abstract

This paper presents a real-time implementation of a logo detection and tracking algorithm in video. The motivation of this work stems from applications on smart phones that require the detection of logos in real-time. For example, one application involves detecting company logos so that customers can easily get special offers in real-time. This algorithm uses a hybrid approach by initially running the Scale Invariant Feature Transform (SIFT) algorithm on the first frame in order to obtain the logo location and then by using an online calibration of color within the SIFT detected area in order to detect and track the logo in subsequent frames in a time efficient manner. The results obtained indicate that this hybrid approach allows robust logo detection and tracking to be achieved in real-time.

Laser based method for real-time three-dimensional monitoring of chest wall movement

Matija Jezeršek, Klemen Povšič, Eva Topole, et al.

Show abstract

Novel method for monitoring the entire three-dimensional shape of the chest wall in real time is presented. The system is based on the multiple-line laser triangulation principle. The laser projector generates a light pattern of 33 equally inclined light planes directed toward the measured surface. The camera records the illuminated surface from a different viewpoint, and consequently, the light pattern is distorted by the shape of the surface. The acquired images are transferred in the personal computer, where contour detection, three-dimensional surface reconstruction, shape analysis, and displaying are performed in real time. Surface displacements are calculated by subtraction of the current measured surface from the reference one. Differences are displayed with color palette, where the blue represent the inward (negative) and the red represent the outward (positive) movement. The accuracy of the calibrated apparatus is ±0.5 mm, which is calculated as a standard deviation between points of the measured and nominal reference surface. The measuring range is approximately 400×600×500 mm in width, height and depth. The intention of this study was to evaluate the system by means of its ability to distinguish between different breathing patterns and to verify the accuracy of measuring chest wall deformation volumes during breathing. The results demonstrate that the presented 3-d measuring system has a great potential as a diagnostic and training tool in case of monitoring the breathing pattern. We believe that exact graphical communication with the patient is much more simple and easy to understand than verbal and/or numerical.

Real-time speaker identification for video conferencing

S. Saravi, I. Zafar, E. A. Edirisinghe, et al.

Show abstract

Automatic speaker identification in a videoconferencing environment will allow conference attendees to focus their attention on the conference rather than having to be engaged manually in identifying which channel is active and who may be the speaker within that channel. In this work we present a real-time, audio-coupled video based approach to address this problem, but focus more on the video analysis side. The system is driven by the need for detecting a talking human via the use of computer vision algorithms. The initial stage consists of a face detector which is subsequently followed by a lip-localization algorithm that segments the lip region. A novel approach for lip movement detection based on image registration and using the Coherent Point Drift (CPD) algorithm is proposed. Coherent Point Drift (CPD) is a technique for rigid and non-rigid registration of point sets. We provide experimental results to analyse the performance of the algorithm when used in monitoring real life videoconferencing data.

Real-time structured light patterns coding with subperfect submaps

Xavier Maurice, Pierre Graebling, Christophe Doignon

Show abstract

Coded structured light is a technique that allows the 3-D reconstruction of poorly or non-textured scene areas. The codes, uniquely associated with visual primitives of the projected pattern, allow to solve the correspondence problem using local information only with robustness against pertubations like high curvatures, occlusions, out of field of view, out-of-focus. Real-time 3-D reconstruction is possible with pseudo-random arrays, where the encoding is done in a single pattern using spatial neighbourhood. Ensuring a higher Hamming distance between all the used codewords, will allow to correct more mislabeled primitives and thus ensure patterns globally more robust.1 Up to now, the proposed coding schemes ensured the Hamming distance between all the primitives of the pattern which was generated offline, beforehand, producing Perfect SubMaps (PSM). But knowing the epipolar geometry of the projector-camera system, one can ensure the Hamming distance only between primitives that will project along nearby epipolar lines, because these only can produce correspondence ambiguity during the decoding process. As for such a new coding scheme, the Hamming distance have to be checked only in subsets of the pattern primitives, the patterns are globally far less constrained and therefore can be generated at a video framerate. We call such a new pattern coding as SubPerfect SubMaps (SPSM).

Real-time preview for layered depth video in 3D-TV

A. Frick, B. Bartczak, R. Koch

Show abstract

In this paper we propose a hardware capture system for the generation of layered depth video content, and realtime feedback mechanismus required to ensure optimal data acquisition for 3DTV productions. The capture system consists of five color cameras and two time of flight cameras. The time of flight cameras allow direct depth measurements and thus help to overcome difficulties of classical stereo matching. However they suffer from low resolution and must be combined with stereo to achieve acceptable results in a post production process. Realtime previews are hence necessary, so that a dynamic adaption of the scene as well as the capture parameters becomes possible, and a good data quality can be assured.

Comparative analysis of local binocular and trinocular depth estimation approaches

Sergey Smirnov, Atanas P. Gotchev, Miska Hannuksela

Show abstract

In this paper, we present a comparative analysis of local trinocular and binocular depth estimation techniques. Local techniques are chosen because of their higher computational efficiency compared to global approaches. Our aim is to quantify the benefits of the third camera with respect to performance and computational burden. We have adopted the color-weighted local-window approach in stereo matching, where pixels within local spatial window around the pixel being processed are penalized by their colors in order to ensure better adaptivity to local structures. Thus, the window size becomes the main parameter which influences the quality and determines the execution time. Extensive experiments on large set of data have been carried out to test trinocular versus binocular setting in terms of quality of estimated depth and execution time. Both natural and artificial scenes have been tested. A set of quality measures has been used to support the comparisons. MPEG Depth Estimation Reference Software has been used as a reference benchmark as well. Results show that from some window size on, the trinocular setting outperforms the binocular in general: providing higher quality for less computational time. While comparisons were done for 'pure' depth estimation, we also run post-processing on depth estimates in order to analyze the potential of estimated depths to be further improved.

Two novel motion-based algorithms for surveillance video analysis on embedded platforms

Julien A. Vijverberg, Marijn J. H. Loomans, Cornelis J. Koeleman, et al.

Show abstract

This paper proposes two novel motion-vector based techniques for target detection and target tracking in surveillance videos. The algorithms are designed to operate on a resource-constrained device, such as a surveillance camera, and to reuse the motion vectors generated by the video encoder. The first novel algorithm for target detection uses motion vectors to construct a consistent motion mask, which is combined with a simple background segmentation technique to obtain a segmentation mask. The second proposed algorithm aims at multi-target tracking and uses motion vectors to assign blocks to targets employing five features. The weights of these features are adapted based on the interaction between targets. These algorithms are combined in one complete analysis application. The performance of this application for target detection has been evaluated for the i-LIDS sterile zone dataset and achieves an F1-score of 0.40-0.69. The performance of the analysis algorithm for multi-target tracking has been evaluated using the CAVIAR dataset and achieves an MOTP of around 9.7 and MOTA of 0.17-0.25. On a selection of targets in videos from other datasets, the achieved MOTP and MOTA are 8.8-10.5 and 0.32-0.49 respectively. The execution time on a PC-based platform is 36 ms. This includes the 20 ms for generating motion vectors, which are also required by the video encoder.

Dim point target detection against bright background

Yao Zhang, Qiheng Zhang, Zhiyong Xu, et al.

Show abstract

For target detection within a large-field cluttered background from a long distance, several difficulties, involving low contrast between target and background, little occupancy, illumination ununiformity caused by vignetting of lens, and system noise, make it a challenging problem. The existing approaches to dim target detection can be roughly divided into two categories: detection before tracking (DBT) and tracking before detection (TBD). The DBT-based scheme has been widely used in practical applications due to its simplicity, but it often requires working in the situation with a higher signal-to-noise ratio (SNR). In contrast, the TBD-based methods can provide impressive detection results even in the cases of very low SNR; unfortunately, the large memory requirement and high computational load prevents these methods from real-time tasks. In this paper, we propose a new method for dim target detection. We address this problem by combining the advantages of the DBT-based scheme in computational efficiency and of the TBD-based in detection capability. Our method first predicts the local background, and then employs the energy accumulation and median filter to remove background clutter. The dim target is finally located by double window filtering together with an improved high order correlation which speeds up the convergence. The proposed method is implemented on a hardware platform and performs suitably in outside experiments.

Real-time indoor positioning using range imaging sensors

Tobias K. Kohoutek, Rainer Mautz, Andreas Donaubauer

Show abstract

This paper considers a novel indoor positioning method that is currently under development at the ETH Zurich. The method relies on a digital spatio-semantic interior building model CityGML and a Range Imaging sensor. In contrast to common indoor positioning approaches, the procedure presented here does not require local physical reference infrastructure, such as WLAN hot spots or reference markers.

Object tracking using multiple camera video streams

Mehrube Mehrubeoglu, Diego Rojas, Lifford McLauchlan

Show abstract

Two synchronized cameras are utilized to obtain independent video streams to detect moving objects from two different viewing angles. The video frames are directly correlated in time. Moving objects in image frames from the two cameras are identified and tagged for tracking. One advantage of such a system involves overcoming effects of occlusions that could result in an object in partial or full view in one camera, when the same object is fully visible in another camera. Object registration is achieved by determining the location of common features in the moving object across simultaneous frames. Perspective differences are adjusted. Combining information from images from multiple cameras increases robustness of the tracking process. Motion tracking is achieved by determining anomalies caused by the objects' movement across frames in time in each and the combined video information. The path of each object is determined heuristically. Accuracy of detection is dependent on the speed of the object as well as variations in direction of motion. Fast cameras increase accuracy but limit the speed and complexity of the algorithm. Such an imaging system has applications in traffic analysis, surveillance and security, as well as object modeling from multi-view images. The system can easily be expanded by increasing the number of cameras such that there is an overlap between the scenes from at least two cameras in proximity. An object can then be tracked long distances or across multiple cameras continuously, applicable, for example, in wireless sensor networks for surveillance or navigation.

Time budget evaluation for image-based reconstruction of sewer shafts

Sandro Esquivel, Reinhard Koch, Heino Rehse

Show abstract

In this paper we propose a robust real-time image and sensor based approach for automatic 3d model acquisition of sewer shafts from survey videos captured by a downward-looking fisheye-lens camera while lowering it into the shaft. Our approach is based on Structure from Motion adjusted to the constrained motion and scene, and involves shape recognition techniques in order to obtain the geometry of the scene appropriately. We perform a time budget evaluation for the components of an existing off-line application based on previous work and design a real-time application which can be applied during on-site inspection. The methods of our approach are modified so that they can be executed on the GPU. Expensive bundle adjustment is avoided by applying a simple and fast geometric correction of the computed reconstruction which is capable of handling inaccuracies of the intrinsic camera calibration parameters.

Fast DCT-based image convolution algorithms and application to image resampling and hologram reconstruction

Leonid Bilevich, Leonid Yaroslavsky

Show abstract

Convolution and correlation are very basic image processing operations with numerous applications ranging from image restoration to target detection to image resampling and geometrical transformation. In real time applications, the crucial issue is the processing speed, which implies mandatory use of algorithms with the lowest possible computational complexity. Fast image convolution and correlation with large convolution kernels are traditionally carried out in the domain of Discrete Fourier Transform computed using Fast Fourier Transform algorithms. However standard DFT based convolution implements cyclic convolution rather than linear one and, because of this, suffers from heavy boundary effects. We introduce a fast DCT based convolution algorithm, which is virtually free of boundary effects of the cyclic convolution. We show that this algorithm have the same or even lower computational complexity as DFT-based algorithm and demonstrate its advantages in application examples of image arbitrary translation and scaling with perfect discrete sinc-interpolation and for image scaled reconstruction from holograms digitally recorded in near and far diffraction zones. In geometrical resampling the scaling by arbitrary factor is implemented using the DFT domain scaling algorithm and DCT-based convolution. In scaled hologram reconstruction in far diffraction zones the Fourier reconstruction method with simultaneous scaling is implemented using DCT-based convolution. In scaled hologram reconstruction in near diffraction zones the convolutional reconstruction algorithm is implemented by the DCT-based convolution.

Real-Time Image and Video Processing 2010

Volume Details

Table of Contents

Table of Contents