Proceedings Volume 8437

Real-Time Image and Video Processing 2012

cover
Proceedings Volume 8437

Real-Time Image and Video Processing 2012

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 23 May 2012
Contents: 5 Sessions, 29 Papers, 0 Presentations
Conference: SPIE Photonics Europe 2012
Volume Number: 8437

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 8437
  • Real-Time Algorithms
  • Real-Time Hardware
  • Real-Time Implementation
  • Poster Session
Front Matter: Volume 8437
icon_mobile_dropdown
Front Matter: Volume 8437
This PDF file contains the front matter associated with SPIE Proceedings Volume 8437, including the Title Page, Copyright information, Table of Contents, and the Conference Committee listing.
Real-Time Algorithms
icon_mobile_dropdown
GePaRDT: a framework for massively parallel processing of dataflow graphs
Alexander Schöch, Carlo Bach, Andreas Ettemeyer, et al.
The trend towards computers with multiple processing units keeps going with no end in sight. Modern consumer computers come with 2 - 6 processing units. Programming methods have been unable to keep up with this fast development. In this paper we present a framework that uses a dataflow model for parallel processing: the Generic Parallel Rapid Development Toolkit, GePaRDT. This intuitive programming model eases the concurrent usage of many processing units without specialized knowledge about parallel programming methods and it's pitfalls.
Image segmentation in wavelet transform space implemented on DSP
A novel approach in the segmentation for the images of different nature employing the feature extraction in WT space before the segmentation process is presented. The designed frameworks (W-FCM, W-CPSFCM and WK-Means) according to AUC analysis have demonstrated better performance novel frameworks against other algorithms existing in literature during numerous simulation experiments with synthetic and dermoscopic images. The novel W-CPSFCM algorithm estimates a number of clusters in automatic mode without the intervention of a specialist. The implementation of the proposed segmentation algorithms on the Texas Instruments DSP TMS320DM642 demonstrates possible real time processing mode for images of different nature.
A contourlet transform based algorithm for real-time video encoding
Stamos Katsigiannis, Georgios Papaioannou, Dimitris Maroulis
In recent years, real-time video communication over the internet has been widely utilized for applications like video conferencing. Streaming live video over heterogeneous IP networks, including wireless networks, requires video coding algorithms that can support various levels of quality in order to adapt to the network end-to-end bandwidth and transmitter/receiver resources. In this work, a scalable video coding and compression algorithm based on the Contourlet Transform is proposed. The algorithm allows for multiple levels of detail, without re-encoding the video frames, by just dropping the encoded information referring to higher resolution than needed. Compression is achieved by means of lossy and lossless methods, as well as variable bit rate encoding schemes. Furthermore, due to the transformation utilized, it does not suffer from blocking artifacts that occur with many widely adopted compression algorithms. Another highly advantageous characteristic of the algorithm is the suppression of noise induced by low-quality sensors usually encountered in web-cameras, due to the manipulation of the transform coefficients at the compression stage. The proposed algorithm is designed to introduce minimal coding delay, thus achieving real-time performance. Performance is enhanced by utilizing the vast computational capabilities of modern GPUs, providing satisfactory encoding and decoding times at relatively low cost. These characteristics make this method suitable for applications like video-conferencing that demand real-time performance, along with the highest visual quality possible for each user. Through the presented performance and quality evaluation of the algorithm, experimental results show that the proposed algorithm achieves better or comparable visual quality relative to other compression and encoding methods tested, while maintaining a satisfactory compression ratio. Especially at low bitrates, it provides more human-eye friendly images compared to algorithms utilizing block-based coding, like the MPEG family, as it introduces fuzziness and blurring instead of artificial block artifacts.
Capturing reading patterns through a real-time smart camera iris tracking system
Mehrube Mehrubeoglu, Evan Ortlieb, Lifford McLauchlan, et al.
A real-time iris detection and tracking algorithm has been implemented on a smart camera using LabVIEW graphical programming tools. The program detects the eye and finds the center of the iris, which is recorded and stored in Cartesian coordinates. In subsequent video frames, the location of the center of the iris corresponding to the previously detected eye is computed and recorded for a desired period of time, creating a list of coordinates representing the moving iris center location across image frames. We present an application for the developed smart camera iris tracking system that involves the assessment of reading patterns. The purpose of the study is to identify differences in reading patterns of readers at various levels to eventually determine successful reading strategies for improvement. The readers are positioned in front of a computer screen with a fixed camera directed at the reader's eyes. The readers are then asked to read preselected content on the computer screen, one comprising a traditional newspaper text and one a Web page. The iris path is captured and stored in real-time. The reading patterns are examined by analyzing the path of the iris movement. In this paper, the iris tracking system and algorithms, application of the system to real-time capture of reading patterns, and representation of 2D/3D iris track are presented with results and recommendations.
Video-based realtime IMU-camera calibration for robot navigation
Arne Petersen, Reinhard Koch
This paper introduces a new method for fast calibration of inertial measurement units (IMU) with cameras being rigidly coupled. That is, the relative rotation and translation between the IMU and the camera is estimated, allowing for the transfer of IMU data to the cameras coordinate frame. Moreover, the IMUs nuisance parameters (biases and scales) and the horizontal alignment of the initial camera frame are determined. Since an iterated Kalman Filter is used for estimation, information on the estimations precision is also available. Such calibrations are crucial for IMU-aided visual robot navigation, i.e. SLAM, since wrong calibrations cause biases and drifts in the estimated position and orientation. As the estimation is performed in realtime, the calibration can be done using a freehand movement and the estimated parameters can be validated just in time. This provides the opportunity of optimizing the used trajectory online, increasing the quality and minimizing the time effort for calibration. Except for a marker pattern, used for visual tracking, no additional hardware is required. As will be shown, the system is capable of estimating the calibration within a short period of time. Depending on the requested precision trajectories of 30 seconds to a few minutes are sufficient. This allows for calibrating the system at startup. By this, deviations in the calibration due to transport and storage can be compensated. The estimation quality and consistency are evaluated in dependency of the traveled trajectories and the amount of IMU-camera displacement and rotation misalignment. It is analyzed, how different types of visual markers, i.e. 2- and 3-dimensional patterns, effect the estimation. Moreover, the method is applied to mono and stereo vision systems, providing information on the applicability to robot systems. The algorithm is implemented using a modular software framework, such that it can be adopted to altered conditions easily.
Real-Time Hardware
icon_mobile_dropdown
GPU acceleration towards real-time image reconstruction in 3D tomographic diffractive microscopy
J. Bailleul, B. Simon, M. Debailleul, et al.
Phase microscopy techniques regained interest in allowing for the observation of unprepared specimens with excellent temporal resolution. Tomographic diffractive microscopy is an extension of holographic microscopy which permits 3D observations with a finer resolution than incoherent light microscopes. Specimens are imaged by a series of 2D holograms: their accumulation progressively fills the range of frequencies of the specimen in Fourier space. A 3D inverse FFT eventually provides a spatial image of the specimen. Consequently, acquisition then reconstruction are mandatory to produce an image that could prelude real-time control of the observed specimen. The MIPS Laboratory has built a tomographic diffractive microscope with an unsurpassed 130nm resolution but a low imaging speed - no less than one minute. Afterwards, a high-end PC reconstructs the 3D image in 20 seconds. We now expect an interactive system providing preview images during the acquisition for monitoring purposes. We first present a prototype implementing this solution on CPU: acquisition and reconstruction are tied in a producer-consumer scheme, sharing common data into CPU memory. Then we present a prototype dispatching some reconstruction tasks to GPU in order to take advantage of SIMDparallelization for FFT and higher bandwidth for filtering operations. The CPU scheme takes 6 seconds for a 3D image update while the GPU scheme can go down to 2 or > 1 seconds depending on the GPU class. This opens opportunities for 4D imaging of living organisms or crystallization processes. We also consider the relevance of GPU for 3D image interaction in our specific conditions.
A flexible software architecture for scalable real-time image and video processing applications
Rubén Usamentiaga, Julio Molleda, Daniel F. García, et al.
Real-time image and video processing applications require skilled architects, and recent trends in the hardware platform make the design and implementation of these applications increasingly complex. Many frameworks and libraries have been proposed or commercialized to simplify the design and tuning of real-time image processing applications. However, they tend to lack flexibility because they are normally oriented towards particular types of applications, or they impose specific data processing models such as the pipeline. Other issues include large memory footprints, difficulty for reuse and inefficient execution on multicore processors. This paper presents a novel software architecture for real-time image and video processing applications which addresses these issues. The architecture is divided into three layers: the platform abstraction layer, the messaging layer, and the application layer. The platform abstraction layer provides a high level application programming interface for the rest of the architecture. The messaging layer provides a message passing interface based on a dynamic publish/subscribe pattern. A topic-based filtering in which messages are published to topics is used to route the messages from the publishers to the subscribers interested in a particular type of messages. The application layer provides a repository for reusable application modules designed for real-time image and video processing applications. These modules, which include acquisition, visualization, communication, user interface and data processing modules, take advantage of the power of other well-known libraries such as OpenCV, Intel IPP, or CUDA. Finally, we present different prototypes and applications to show the possibilities of the proposed architecture.
Dense real-time stereo matching using memory efficient semi-global-matching variant based on FPGAs
Maximilian Buder
This paper presents a stereo image matching system that takes advantage of a global image matching method. The system is designed to provide depth information for mobile robotic applications. Typical tasks of the proposed system are to assist in obstacle avoidance, SLAM and path planning. Mobile robots pose strong requirements about size, energy consumption, reliability and output quality of the image matching subsystem. Current available systems either rely on active sensors or on local stereo image matching algorithms. The first are only suitable in controlled environments while the second suffer from low quality depth-maps. Top ranking quality results are only achieved by an iterative approach using global image matching and color segmentation techniques which are computationally demanding and therefore difficult to be executed in realtime. Attempts were made to still reach realtime performance with global methods by simplifying the routines. The depth maps are at the end almost comparable to local methods. An equally named semi-global algorithm was proposed earlier that shows both very good image matching results and relatively simple operations. A memory efficient variant of the Semi-Global-Matching algorithm is reviewed and adopted for an implementation based on reconfigurable hardware. The implementation is suitable for realtime execution in the field of robotics. It will be shown that the modified version of the efficient Semi-Global-Matching method is delivering equivalent result compared to the original algorithm based on the Middlebury dataset. The system has proven to be capable of processing VGA sized images with a disparity resolution of 64 pixel at 33 frames per second based on low cost to mid-range hardware. In case the focus is shifted to a higher image resolution, 1024×1024-sized stereo frames may be processed with the same hardware at 10 fps. The disparity resolution settings stay unchanged. A mobile system that covers preprocessing, matching and interfacing operations is also presented.
Real-Time Implementation
icon_mobile_dropdown
Real-time video breakup detection for multiple HD video streams on a single GPU
Jakub Rosner, Hannes Fassold, Martin Winter, et al.
An important task in film and video preservation is the quality assessment of the content to be archived or reused out of the archive. This task, if done manually, is a straining and time consuming process, so it is highly recommended to automate this process as far as possible. In this paper, we show how to port a previously proposed algorithm for detection of severe analog and digital video distortions (termed "video breakup"), efficiently to NVIDIA GPUs of the Fermi Architecture with CUDA. By parallizing of the algorithm massively in order to take usage of the hundreds of cores on a typical GPU and careful usage of GPU features like atomic functions, texture and shared memory, we achive a speedup of roughly 10-15 when comparing the GPU implementation with an highly optimized, multi-threaded CPU implementation. Thus our GPU algorithm is able to analyze nine Full HD (1920 × 1080) video streams or 40 standard definition (720 × 576) video streams in real-time on a single inexpensive Nvidia Geforce GTX 480 GPU. Additionally, we present the AV-Inspector application for video quality analysis where the video breakup algorithm has been integrated.
Complexity analysis of vision functions for implementation of wireless smart cameras using system taxonomy
Muhammad Imran, Khursheed Khursheed, Naeem Ahmad, et al.
There are a number of challenges caused by the large amount of data and limited resources when implementing vision systems on wireless smart cameras using embedded platforms. Generally, the common challenges include limited memory, processing capability, the power consumption in the case of battery operated systems, and bandwidth. It is usual for research in this field to focus on the development of a specific solution for a particular problem. In order to implement vision systems on an embedded platform, the designers must firstly investigate the resource requirements for a design and, indeed, failure to do this may result in additional design time and costs so as to meet the specifications. There is a requirement for a tool which has the ability to predict the resource requirements for the development and comparison of vision solutions in wireless smart cameras. To accelerate the development of such tool, we have used a system taxonomy, which shows that the majority of vision systems for wireless smart cameras are common and these focus on object detection, analysis and recognition. In this paper, we have investigated the arithmetic complexity and memory requirements of vision functions by using the system taxonomy and proposed an abstract complexity model. To demonstrate the use of this model, we have analysed a number of implemented systems with this model and showed that complexity model together with system taxonomy can be used for comparison and generalization of vision solutions. The study will assist researchers/designers to predict the resource requirements for different class of vision systems, implemented on wireless smart cameras, in a reduced time and which will involve little effort. This in turn will make the comparison and generalization of solutions simple for wireless smart cameras.
Benchmarking real-time HEVC streaming
Work towards the standardisation of High Efficiency Video Coding (HEVC), the next generation video coding scheme, is currently gaining pace. HEVC offers the prospect of a 50% improvement in compression over the current H.264 Advanced Video Coding standard (H.264/AVC). Thus far, work on HEVC has concentrated on improvements to the coding efficiency and has not yet addressed transmission in networks other than to mandate byte stream compliance with Annex B of H.264/AVC. For practical networked HEVC applications a number of essential building blocks have yet to be defined. In this work, we design and prototype a real-time HEVC streaming system and empirically evaluate its performance, in particular we consider the robustness of the current Test Model under Consideration (TMuC HM4.0) for HEVC to packet loss caused by a reduction in available bandwidth both in terms of decoder resilience and degradation in perceptual video quality. A NAL unit packetisation and streaming framework for HEVC encoded video streams is designed, implemented and empirically tested in a number of streaming environments including wired, wireless, single path and multiple path network scenarios. As a first step the HEVC decoder's error resilience is tested under a comprehensive set of packet loss conditions and a simple error concealment method for HEVC is implemented. Similarly to H.264 encoded streams, the size and distribution of NAL units within an HEVC stream and the nature of the NAL unit dependencies influences the packetisation and streaming strategies which may be employed for such streams. The relationships between HEVC encoding mode and the quality of the received video are shown under a wide range of bandwidth constraints. HEVC streaming is evaluated in both single and multipath network configuration scenarios. Through the use of extensive experimentation, we establish a comprehensive set of benchmarks for HEVC streaming in loss prone network environments. We show the visual quality reduction in terms of PSNR which results from a reduction in available bandwidth. To the best of our knowledge, this is the first time that such a fully functional streaming system for HEVC, together with the benchmark evaluation results, has been reported. This study will open up more timely research opportunities in this cutting edge area.
2000 fps multi-object tracking based on color histogram
Qingyi Gu, Takeshi Takaki, Idaku Ishii
In this study, we develop a real-time, color histogram-based tracking system for multiple color-patterned objects in a 512×512 image at 2000 fps. Our system can simultaneously extract the positions, areas, orientation angles, and color histograms of multiple objects in an image using the hardware implementation of a multi-object, color histogram extraction circuit module on a high-speed vision platform. It can both label multiple objects in an image consisting of connected components and calculate their moment features and 16-bin hue-based color histograms using cell-based labeling. We demonstrate the performance of our system by showing several experimental results: (1) tracking of multiple color-patterned objects on a plate rotating at 16 rps, and (2) tracking of human hand movement with two color-patterned drinking bottles.
Block matching noise reduction method for photographic images applied in Bayer RAW domain and optimized for real-time implementation
Image de-noising has been a well studied problem in the field of digital image processing. However there are a number of problems, preventing state-of-the-art algorithms finding their way to practical implementations. In our research we have solved these issues with an implementation of a practical de-noising algorithm. In order of importance: firstly we have designed a robust algorithm, tackling different kinds of nose in a very wide range of signal to noise ratios, secondly in our algorithm we tried to achieve natural looking processed images and to avoid unnatural looking artifacts, thirdly we have designed the algorithm to be suitable for implementation in commercial grade FPGA's capable of processing full HD (1920×1080) video data in real time (60 frame per second). The main challenge for the use of noise reduction algorithms in photo and video applications is the compromise between the efficiency of the algorithm (amount of PSNR improvement), loss of details, appearance of artifacts and the complexity of the algorithm (and consequentially the cost of integration). In photo and video applications it is very important that the residual noise and artifacts produced by the noise reduction algorithm should look natural and do not distract aesthetically. Our proposed algorithm does not produce artificially looking defects found in existing state-of-theart algorithms. In our research, we propose a robust and fast non-local de-noising algorithm. The algorithm is based on a Laplacian pyramid. The advantage of this approach is the ability to build noise reduction algorithms with a very large effective kernel. In our experiments effective kernel sizes as big as 127×127 pixels were used in some cases, which only required 4 scales. This size of a kernel was required to perform noise reduction for the images taken with a DSLR camera. Taking into account the achievable improvement in PSNR (on the level of the best known noise reduction techniques) and low algorithmic complexity, enabling its practical use in commercial photo, video applications, the results of our research can be very valuable.
Real-time lossy compression of hyperspectral images using iterative error analysis on graphics processing units
Sergio Sánchez, Antonio Plaza
Hyperspectral image compression is an important task in remotely sensed Earth Observation as the dimensionality of this kind of image data is ever increasing. This requires on-board compression in order to optimize the donwlink connection when sending the data to Earth. A successful algorithm to perform lossy compression of remotely sensed hyperspectral data is the iterative error analysis (IEA) algorithm, which applies an iterative process which allows controlling the amount of information loss and compression ratio depending on the number of iterations. This algorithm, which is based on spectral unmixing concepts, can be computationally expensive for hyperspectral images with high dimensionality. In this paper, we develop a new parallel implementation of the IEA algorithm for hyperspectral image compression on graphics processing units (GPUs). The proposed implementation is tested on several different GPUs from NVidia, and is shown to exhibit real-time performance in the analysis of an Airborne Visible Infra-Red Imaging Spectrometer (AVIRIS) data sets collected over different locations. The proposed algorithm and its parallel GPU implementation represent a significant advance towards real-time onboard (lossy) compression of hyperspectral data where the quality of the compression can be also adjusted in real-time.
Poster Session
icon_mobile_dropdown
MTF measurements on real time for performance analysis of electro-optical systems
Jose Augusto Stuchi, Elisa Signoreto Barbarini, Flavio Pascoal Vieira, et al.
The need of methods and tools that assist in determining the performance of optical systems is actually increasing. One of the most used methods to perform analysis of optical systems is to measure the Modulation Transfer Function (MTF). The MTF represents a direct and quantitative verification of the image quality. This paper presents the implementation of the software, in order to calculate the MTF of electro-optical systems. The software was used for calculating the MTF of Digital Fundus Camera, Thermal Imager and Ophthalmologic Surgery Microscope. The MTF information aids the analysis of alignment and measurement of optical quality, and also defines the limit resolution of optical systems. The results obtained with the Fundus Camera and Thermal Imager was compared with the theoretical values. For the Microscope, the results were compared with MTF measured of Microscope Zeiss model, which is the quality standard of ophthalmological microscope.
Real-time shrinkage studies in photopolymer films using holographic interferometry
Polymerisation induced shrinkage is one of the main reasons why photopolymer materials are not more widely used for holographic applications. The aim of this study is to evaluate the shrinkage in an acrylamide photopolymer layer during holographic recording using holographic interferometry. Shrinkage in photopolymer layers can be measured by real time capture of holographic interferograms during holographic recording. Interferograms were captured using a CMOS camera at regular intervals. The optical path length change and hence the shrinkage were determined from the captured fringe patterns. It was observed that the photopolymer layer shrinkage is in the order of 3.5%.
GPU-based real-time structured light 3D scanner at 500 fps
Hao Gao, Takeshi Takaki, Idaku Ishii
In this study, we develop a real-time, structured light 3D scanner that can output 3D video of 512×512 pixels at 500 fps using a GPU-based, high-speed vision system synchronized with a high-speed DLP projector. Our 3D scanner projects eight pairs of positive and negative image patterns with 8-bit gray code on the measurement objects at 1000 fps. Synchronized with the high-speed vision platform, these images are simultaneously captured at 1000 fps and processed in real time for 3D image generation at 500 fps by introducing parallel pixel processing on a NVIDIA Tesla 1060 GPU board. Several experiments are performed for high-speed 3D objects that undergo sudden 3D shape deformation.
Invariant methods for real-time object recognition and image understanding
In this paper we discuss certain recently developed invariant geometric techniques that can be used for fast object recognition or fast image understanding. The results make use of techniques from algebraic geometry that allow one to relate the geometric invariants of a feature set in 3D to similar invariants in 2D or 1D. The methods apply equally well to optical images or radar images. In addition to the "object/image" equations relating these invariants, we also discuss certain invariant metrics and show why they provide a more natural and robust test for matching object features to image features. Additional aspects of the work as it applies to shape reconstruction and shape statistics will also be explored.
Selection of bi-level image compression method for reduction of communication energy in wireless visual sensor networks
Khursheed Khursheed, Muhammad Imran, Naeem Ahmad, et al.
Wireless Visual Sensor Network (WVSN) is an emerging field which combines image sensor, on board computation unit, communication component and energy source. Compared to the traditional wireless sensor network, which operates on one dimensional data, such as temperature, pressure values etc., WVSN operates on two dimensional data (images) which requires higher processing power and communication bandwidth. Normally, WVSNs are deployed in areas where installation of wired solutions is not feasible. The energy budget in these networks is limited to the batteries, because of the wireless nature of the application. Due to the limited availability of energy, the processing at Visual Sensor Nodes (VSN) and communication from VSN to server should consume as low energy as possible. Transmission of raw images wirelessly consumes a lot of energy and requires higher communication bandwidth. Data compression methods reduce data efficiently and hence will be effective in reducing communication cost in WVSN. In this paper, we have compared the compression efficiency and complexity of six well known bi-level image compression methods. The focus is to determine the compression algorithms which can efficiently compress bi-level images and their computational complexity is suitable for computational platform used in WVSNs. These results can be used as a road map for selection of compression methods for different sets of constraints in WVSN.
Movement detection using an order statistics algorithm
José Portillo-Portillo, Francisco J. Gallegos-Funes, Alberto J. Rosales-Silva, et al.
In this paper, we present a novel algorithm to motion detection in video sequences. The proposed algorithm is based in the use of the median of the absolute deviations from the median (MAD) as a measure of statistical dispersion of pixels in a video sequence to provide the robustness needed to detect motion in a frame of video sequence. By using the MAD, the proposed algorithm is able to detect small or big objects, the size of the detected objects depend of the size of kernel used in the analysis of the video sequence. Experimental results in the human motion detection are presented showing that the proposed algorithm can be used in security applications.
Real-time FPGA implementation of recursive wavelet packet transform
Vanishree Gopalakrishna, Nasser Kehtarnavaz, Chandrasekhar Patlolla, et al.
To address the computational complexity of the wavelet packet transform of a moving window with a large amount of overlap between consecutive windows, the recursive computation approach was introduced previously1. In this work, this approach is extended to 2D or images. In addition, the FPGA implementation of the recursive approach for updating wavelet coefficients is performed by using the LabVIEW FPGA module. This programming approach is graphical and requires no knowledge of relatively involved hardware description languages. A number of optimization steps including both filter and wavelet stage pipelining are taken in order to achieve a real-time throughput. It is shown that the recursive approach reduces the computational complexity significantly as compared to the non-recursive or the classical computation of wavelet packet transform. For example, the number of multiplications is reduced by a factor of 3 for a 3-stage 1D transform of moving windows containing 128 samples and by a factor of 12 for a 3-stage 2D transform of moving window blocks of size 16×16 with 50% overlap.
Real-time visual communication to aid disaster recovery in a multi-segment hybrid wireless networking system
Tawfik Al Hadhrami, Qi Wang, Christos Grecos
When natural disasters or other large-scale incidents occur, obtaining accurate and timely information on the developing situation is vital to effective disaster recovery operations. High-quality video streams and high-resolution images, if available in real time, would provide an invaluable source of current situation reports to the incident management team. Meanwhile, a disaster often causes significant damage to the communications infrastructure. Therefore, another essential requirement for disaster management is the ability to rapidly deploy a flexible incident area communication network. Such a network would facilitate the transmission of real-time video streams and still images from the disrupted area to remote command and control locations. In this paper, a comprehensive end-to-end video/image transmission system between an incident area and a remote control centre is proposed and implemented, and its performance is experimentally investigated. In this study a hybrid multi-segment communication network is designed that seamlessly integrates terrestrial wireless mesh networks (WMNs), distributed wireless visual sensor networks, an airborne platform with video camera balloons, and a Digital Video Broadcasting- Satellite (DVB-S) system. By carefully integrating all of these rapidly deployable, interworking and collaborative networking technologies, we can fully exploit the joint benefits provided by WMNs, WSNs, balloon camera networks and DVB-S for real-time video streaming and image delivery in emergency situations among the disaster hit area, the remote control centre and the rescue teams in the field. The whole proposed system is implemented in a proven simulator. Through extensive simulations, the real-time visual communication performance of this integrated system has been numerically evaluated, towards a more in-depth understanding in supporting high-quality visual communications in such a demanding context.
Real-time video streaming in mobile cloud over heterogeneous wireless networks
Recently, the concept of Mobile Cloud Computing (MCC) has been proposed to offload the resource requirements in computational capabilities, storage and security from mobile devices into the cloud. Internet video applications such as real-time streaming are expected to be ubiquitously deployed and supported over the cloud for mobile users, who typically encounter a range of wireless networks of diverse radio access technologies during their roaming. However, real-time video streaming for mobile cloud users across heterogeneous wireless networks presents multiple challenges. The network-layer quality of service (QoS) provision to support high-quality mobile video delivery in this demanding scenario remains an open research question, and this in turn affects the application-level visual quality and impedes mobile users' perceived quality of experience (QoE). In this paper, we devise a framework to support real-time video streaming in this new mobile video networking paradigm and evaluate the performance of the proposed framework empirically through a lab-based yet realistic testing platform. One particular issue we focus on is the effect of users' mobility on the QoS of video streaming over the cloud. We design and implement a hybrid platform comprising of a test-bed and an emulator, on which our concept of mobile cloud computing, video streaming and heterogeneous wireless networks are implemented and integrated to allow the testing of our framework. As representative heterogeneous wireless networks, the popular WLAN (Wi-Fi) and MAN (WiMAX) networks are incorporated in order to evaluate effects of handovers between these different radio access technologies. The H.264/AVC (Advanced Video Coding) standard is employed for real-time video streaming from a server to mobile users (client nodes) in the networks. Mobility support is introduced to enable continuous streaming experience for a mobile user across the heterogeneous wireless network. Real-time video stream packets are captured for analytical purposes on the mobile user node. Experimental results are obtained and analysed. Future work is identified towards further improvement of the current design and implementation. With this new mobile video networking concept and paradigm implemented and evaluated, results and observations obtained from this study would form the basis of a more in-depth, comprehensive understanding of various challenges and opportunities in supporting high-quality real-time video streaming in mobile cloud over heterogeneous wireless networks.
Cost optimization of a sky surveillance visual sensor network
Naeem Ahmad, Khursheed Khursheed, Muhammad Imran, et al.
A Visual Sensor Network (VSN) is a network of spatially distributed cameras. The primary difference between VSN and other type of sensor networks is the nature and volume of information. A VSN generally consists of cameras, communication, storage and central computer, where image data from multiple cameras is processed and fused. In this paper, we use optimization techniques to reduce the cost as derived by a model of a VSN to track large birds, such as Golden Eagle, in the sky. The core idea is to divide a given monitoring range of altitudes into a number of sub-ranges of altitudes. The sub-ranges of altitudes are monitored by individual VSNs, VSN1 monitors lower range, VSN2 monitors next higher and so on, such that a minimum cost is used to monitor a given area. The VSNs may use similar or different types of cameras but different optical components, thus, forming a heterogeneous network. We have calculated the cost required to cover a given area by considering an altitudes range as single element and also by dividing it into sub-ranges. To cover a given area with given altitudes range, with a single VSN requires 694 camera nodes in comparison to dividing this range into sub-ranges of altitudes, which requires only 88 nodes, which is 87% reduction in the cost.
Fast repurposing of high-resolution stereo video content for mobile use
Ali Karaoglu, Bong Ho Lee, Atanas Boev, et al.
3D video content is captured and created mainly in high resolution targeting big cinema or home TV screens. For 3D mobile devices, equipped with small-size auto-stereoscopic displays, such content has to be properly repurposed, preferably in real-time. The repurposing requires not only spatial resizing but also properly maintaining the output stereo disparity, as it should deliver realistic, pleasant and harmless 3D perception. In this paper, we propose an approach to adapt the disparity range of the source video to the comfort disparity zone of the target display. To achieve this, we adapt the scale and the aspect ratio of the source video. We aim at maximizing the disparity range of the retargeted content within the comfort zone, and minimizing the letterboxing of the cropped content. The proposed algorithm consists of five stages. First, we analyse the display profile, which characterises what 3D content can be comfortably observed in the target display. Then, we perform fast disparity analysis of the input stereoscopic content. Instead of returning the dense disparity map, it returns an estimate of the disparity statistics (min, max, meanand variance) per frame. Additionally, we detect scene cuts, where sharp transitions in disparities occur. Based on the estimated input, and desired output disparity ranges, we derive the optimal cropping parameters and scale of the cropping window, which would yield the targeted disparity range and minimize the area of cropped and letterboxed content. Once the rescaling and cropping parameters are known, we perform resampling procedure using spline-based and perceptually optimized resampling (anti-aliasing) kernels, which have also a very efficient computational structure. Perceptual optimization is achieved through adjusting the cut-off frequency of the anti-aliasing filter with the throughput of the target display.
Multi-resolution model-based traffic sign detection and tracking
Javier Marinas, Luis Salgado, Massimo Camplani
In this paper we propose an innovative approach to tackle the problem of traffic sign detection using a computer vision algorithm and taking into account real-time operation constraints, trying to establish intelligent strategies to simplify as much as possible the algorithm complexity and to speed up the process. Firstly, a set of candidates is generated according to a color segmentation stage, followed by a region analysis strategy, where spatial characteristic of previously detected objects are taken into account. Finally, temporal coherence is introduced by means of a tracking scheme, performed using a Kalman filter for each potential candidate. Taking into consideration time constraints, efficiency is achieved two-fold: on the one side, a multi-resolution strategy is adopted for segmentation, where global operation will be applied only to low-resolution images, increasing the resolution to the maximum only when a potential road sign is being tracked. On the other side, we take advantage of the expected spacing between traffic signs. Namely, the tracking of objects of interest allows to generate inhibition areas, which are those ones where no new traffic signs are expected to appear due to the existence of a TS in the neighborhood. The proposed solution has been tested with real sequences in both urban areas and highways, and proved to achieve higher computational efficiency, especially as a result of the multi-resolution approach.
Adaptive optics combined with computer post-processing for horizontal turbulent imaging
We describe an approach that oers an almost real time image enhancement through turbulent and wavy me- dia. The approach consists in a combination of optimization-based adaptive optics with digital multi-frame post-processing. Applications in astronomical and terrestrial imaging { where the image features are initially unresolved due to loss of contrast, blur, vibrations and image wander { have been illustrated by experimental results. A new software from Flexible Optical BV is presented
Real-time machine vision system using FPGA and soft-core processor
Abdul Waheed Malik, Benny Thörnberg, Xiaozhou Meng, et al.
This paper presents a machine vision system for real-time computation of distance and angle of a camera from reference points in the environment. Image pre-processing, component labeling and feature extraction modules were modeled at Register Transfer (RT) level and synthesized for implementation on field programmable gate arrays (FPGA). The extracted image component features were sent from the hardware modules to a soft-core processor, MicroBlaze, for computation of distance and angle. A CMOS imaging sensor operating at a clock frequency of 27MHz was used in our experiments to produce a video stream at the rate of 75 frames per second. Image component labeling and feature extraction modules were running in parallel having a total latency of 13ms. The MicroBlaze was interfaced with the component labeling and feature extraction modules through Fast Simplex Link (FSL). The latency for computing distance and angle of camera from the reference points was measured to be 2ms on the MicroBlaze, running at 100 MHz clock frequency. In this paper, we present the performance analysis, device utilization and power consumption for the designed system. The FPGA based machine vision system that we propose has high frame speed, low latency and a power consumption that is much lower compared to commercially available smart camera solutions.