Proceedings Volume 8301

Intelligent Robots and Computer Vision XXIX: Algorithms and Techniques

Juha Röning, David P. Casasent
cover
Proceedings Volume 8301

Intelligent Robots and Computer Vision XXIX: Algorithms and Techniques

Juha Röning, David P. Casasent
View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 28 December 2011
Contents: 10 Sessions, 40 Papers, 0 Presentations
Conference: IS&T/SPIE Electronic Imaging 2012
Volume Number: 8301

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 8301
  • Invited Papers on Intelligent Robotics
  • Stereo Vision and Applications
  • Novel People and Vehicle Tracking Approaches
  • UAVs and Aerial Applications
  • Robot Manipulation and Application
  • Vision Navigation and Activity Recognition
  • Visual Algorithms
  • Intelligent Ground Vehicle Competition
  • Interactive Paper Session
Front Matter: Volume 8301
icon_mobile_dropdown
Front Matter: Volume 8301
This PDF file contains the front matter associated with SPIE Proceedings Volume 8301, including the Title Page, Copyright information, Table of Contents, and the Conference Committee listing.
Invited Papers on Intelligent Robotics
icon_mobile_dropdown
Software-based neural network assisted movement compensation for nanoresolution piezo actuators
Micro- and nanoresolution applications are important part of functional material research, where imaging and observation of material interaction may go down to the molecular or even atomic level. Much of the nanometer range movement of scanning and manipulation instruments is made possible by usage of piezoelectric actuation systems. This paper presents a software based controller implementation utilizing neural networks for high precision positioning of a piezoelectric actuator. The controller developed can be used for controlling nanopositioning piezo actuators when sufficiently accurate feedback information is available. Piezo actuators exhibit complex hysteresis dynamics that need to be taken into account when designing an accurate control system. For inverse modelling purposes of the hysteresis related phenomena, a static hysteresis operator and a new developed dynamic creep operator are presented to be used in conjunction with a feed-forward type neural network. The controller utilizing the neural network inverse hybrid model is implemented as a software component for the existing Scalable Modular Control framework (SMC). Using the SMC framework and off-the-shelf components, a measurement and control system for the nanopositioning actuator is constructed and tested using two different capacitive sensors operating on y- and z-axes of the actuator. Using the developed controller, piezo actuator related hysteresis phenomena were successfully reduced making the nanometer range positioning of the actuator axes possible. Also, the effect of using a lower accuracy position sensor with more noise to control accuracy is briefly discussed.
Traffic monitoring with distributed smart cameras
Oliver Sidla, Marcin Rosner, Michael Ulm, et al.
The observation and monitoring of traffic with smart visions systems for the purpose of improving traffic safety has a big potential. Today the automated analysis of traffic situations is still in its infancy--the patterns of vehicle motion and pedestrian flow in an urban environment are too complex to be fully captured and interpreted by a vision system. 3In this work we present steps towards a visual monitoring system which is designed to detect potentially dangerous traffic situations around a pedestrian crossing at a street intersection. The camera system is specifically designed to detect incidents in which the interaction of pedestrians and vehicles might develop into safety critical encounters. The proposed system has been field-tested at a real pedestrian crossing in the City of Vienna for the duration of one year. It consists of a cluster of 3 smart cameras, each of which is built from a very compact PC hardware system in a weatherproof housing. Two cameras run vehicle detection and tracking software, one camera runs a pedestrian detection and tracking module based on the HOG dectection principle. All 3 cameras use sparse optical flow computation in a low-resolution video stream in order to estimate the motion path and speed of objects. Geometric calibration of the cameras allows us to estimate the real-world co-ordinates of detected objects and to link the cameras together into one common reference system. This work describes the foundation for all the different object detection modalities (pedestrians, vehicles), and explains the system setup, tis design, and evaluation results which we have achieved so far.
The 19th Annual Intelligent Ground Vehicle Competition: student built autonomous ground vehicles
The Intelligent Ground Vehicle Competition (IGVC) is one of four, unmanned systems, student competitions that were founded by the Association for Unmanned Vehicle Systems International (AUVSI). The IGVC is a multidisciplinary exercise in product realization that challenges college engineering student teams to integrate advanced control theory, machine vision, vehicular electronics and mobile platform fundamentals to design and build an unmanned system. Teams from around the world focus on developing a suite of dual-use technologies to equip ground vehicles of the future with intelligent driving capabilities. Over the past 19 years, the competition has challenged undergraduate, graduate and Ph.D. students with real world applications in intelligent transportation systems, the military and manufacturing automation. To date, teams from almost 80 universities and colleges have participated. This paper describes some of the applications of the technologies required by this competition and discusses the educational benefits. The primary goal of the IGVC is to advance engineering education in intelligent vehicles and related technologies. The employment and professional networking opportunities created for students and industrial sponsors through a series of technical events over the four-day competition are highlighted. Finally, an assessment of the competition based on participation is presented.
Stereo Vision and Applications
icon_mobile_dropdown
Accurate dense 3D reconstruction of moving and still objects from dynamic stereo sequences based on temporal modified-RANSAC and feature-cut
Naotomo Tatematsu, Jun Ohya
This paper improves the authors' conventional method for reconstructing the 3D structure of moving and still objects that are tracked in the video and/or depth image sequences acquired by moving cameras and/or range finder. The authors proposed a Temporal Modified-RANSAC based method [1] that (1) can discriminate each moving object from the still background in color image and depth image sequences acquired by moving stereo cameras or moving range finder, (2) can compute the stereo cameras' egomotion, (3) can compute the motion of each moving object, and (4) can reconstruct the 3D structure of each moving object and the background. However, the TMR based method has the following two problems concerning the 3D reconstruction: lack of accuracy of segmenting into each object's region and sparse 3D reconstructed points in each object's region. To solve these problems of our conventional method, this paper proposes a new 3D segmentation method that utilizes Graph-cut, which is frequently used for segmentation tasks. First, the proposed method tracks feature points in the color and depth image sequences so that 3D optical flows of the feature points in every N frames are obtained. Then, TMR classifies all the obtained 3D optical flows into regions (3D flow set) for the background and each moving object; simultaneously, the rotation matrix and the translation vector for each 3D flow set are computed. Next, Graph-Cut using the energy function that consists of color probability, structure probability and a-priori probability is performed so that pixels in each frame are segmented into object regions and the background region. Finally, 3D point clouds are obtained from the segmentation result image and depth image, and then the point clouds are merged using the rotation and translation from the N-th frame prior to the current frame so that 3D models for the background and each moving object are constructed with dense 3D point data.
Efficient hybrid monocular-stereo approach to on-board video-based traffic sign detection and tracking
Javier Marinas, Luis Salgado, Jon Arróspide, et al.
In this paper we propose an innovative method for the automatic detection and tracking of road traffic signs using an onboard stereo camera. It involves a combination of monocular and stereo analysis strategies to increase the reliability of the detections such that it can boost the performance of any traffic sign recognition scheme. Firstly, an adaptive color and appearance based detection is applied at single camera level to generate a set of traffic sign hypotheses. In turn, stereo information allows for sparse 3D reconstruction of potential traffic signs through a SURF-based matching strategy. Namely, the plane that best fits the cloud of 3D points traced back from feature matches is estimated using a RANSAC based approach to improve robustness to outliers. Temporal consistency of the 3D information is ensured through a Kalman-based tracking stage. This also allows for the generation of a predicted 3D traffic sign model, which is in turn used to enhance the previously mentioned color-based detector through a feedback loop, thus improving detection accuracy. The proposed solution has been tested with real sequences under several illumination conditions and in both urban areas and highways, achieving very high detection rates in challenging environments, including rapid motion and significant perspective distortion.
A general model and calibration method for spherical stereoscopic vision
Weijia Feng, Juha Röning, Juho Kannala, et al.
In geometric calibration of stereoscopic cameras the object is to determine a set of parameters which describe the mapping from 3D reference coordinates to 2D image coordinates, and indicate the geometric relationships between the cameras. While various methods for stereo cameras with ordinary lenses can be found from the literature, stereoscopic vision with extremely wide angle lenses has been much less discussed. Spherical stereoscopic vision is more and more convenient in computer vision applications. However, its use for 3D measurement purposes is limited by the lack of an accurate, general, and easy-to-use calibration procedure. Hence, we present a geometric model for spherical stereoscopic vision equipped by extremely wide angle lenses. Then, a corresponding generic mathematical model is built. Method for calibration the parameters of the mathematical model is proposed. This paper shows practical results from the calibration of two high quality panomorph lenses mounted on cameras with 2048x1536 resolutions. Here, the stereoscopic vision system is flexible, the position and orientation of the cameras can be adjusted randomly. The calibration results include interior orientation, exterior orientation and the geometric relationships between the two cameras. The achieved level of calibration accuracy is very satisfying.
An approach to stereo-point cloud registration using image homographies
Stephen D. Fox, Damian M. Lyons
A mobile robot equipped with a stereo camera can measure both the video image of a scene and the visual disparity in the scene. The disparity image can be used to generate a collection of points, each representing the location of a surface in the visual scene as a 3D point with respect to the location of the stereo camera: a point cloud. If the stereo camera is moving, e.g., mounted on a moving robot, aligning these scans becomes a difficult, and computationally expensive problem. Many finely tuned versions of the iterative closest point algorithm (ICP) have been used throughout robotics for registration of these sets of scans. However, ICP relies on theoretical convergence to the nearest local minimum of the dynamical system: there is no guarantee that ICP will accurately align the scans. In order to address two problems with ICP, convergence time and accuracy of convergence, we have developed an improvement by using salient keypoints from successive video images to calculate an affine transformation estimate of the camera location. This transformation, when applied to the target point cloud, provides ICP an initial guess to reduce the computational time required for point cloud registration and improve the quality of registration. We report ICP convergence times with and without image information for a set of stereo data point clouds to demonstrate the effectiveness of the approach.
Hazardous sign detection for safety applications in traffic monitoring
Wanda Benesova, Michal Kottman, Oliver Sidla
The transportation of hazardous goods in public streets systems can pose severe safety threats in case of accidents. One of the solutions for these problems is an automatic detection and registration of vehicles which are marked with dangerous goods signs. We present a prototype system which can detect a trained set of signs in high resolution images under real-world conditions. This paper compares two different methods for the detection: bag of visual words (BoW) procedure and our approach presented as pairs of visual words with Hough voting. The results of an extended series of experiments are provided in this paper. The experiments show that the size of visual vocabulary is crucial and can significantly affect the recognition success rate. Different code-book sizes have been evaluated for this detection task. The best result of the first method BoW was 67% successfully recognized hazardous signs, whereas the second method proposed in this paper - pairs of visual words and Hough voting - reached 94% of correctly detected signs. The experiments are designed to verify the usability of the two proposed approaches in a real-world scenario.
PRoViScout: a planetary scouting rover demonstrator
Gerhard Paar, Mark Woods, Christiane Gimkiewicz, et al.
Mobile systems exploring Planetary surfaces in future will require more autonomy than today. The EU FP7-SPACE Project ProViScout (2010-2012) establishes the building blocks of such autonomous exploration systems in terms of robotics vision by a decision-based combination of navigation and scientific target selection, and integrates them into a framework ready for and exposed to field demonstration. The PRoViScout on-board system consists of mission management components such as an Executive, a Mars Mission On-Board Planner and Scheduler, a Science Assessment Module, and Navigation & Vision Processing modules. The platform hardware consists of the rover with the sensors and pointing devices. We report on the major building blocks and their functions & interfaces, emphasizing on the computer vision parts such as image acquisition (using a novel zoomed 3D-Time-of-Flight & RGB camera), mapping from 3D-TOF data, panoramic image & stereo reconstruction, hazard and slope maps, visual odometry and the recognition of potential scientifically interesting targets.
Novel People and Vehicle Tracking Approaches
icon_mobile_dropdown
Red-light traffic enforcement at railway crossings
The observation and monitoring of traffic witih smart vision systems for the purpose of improving traffic safety has a big potential. Embedded loop sensors can detect and count passing vehicles, radar can measure speed and presence of vehicles, and embedded vision systems or stationary camera systems can count vehicles and estimate the state of traffic along the road. This work presents a vision system which is targeted at detecting and reporting incidents at unsecured railways crossings. These crossings, even when guarded by automated barriers, pose a threat to drivers day and night. Our system is designed to detect and record vehicles which pass over the railway crossing by means of real-time motion analysis after the red light has been activated. We implement sparse optical flow in conjunction with motion clustering in order to detect critical events. We describe some modifications of the original Lucas Kanade optical flow method which makes our implementation faster and more robust compared to the original concept. In addition, the results of our optical flow method are compared with a HOG based vehicle detector which has been implemented and tested as an alternative methodology. The embedded system which is used for detection consists of a smart camera which observes one street lane as* well as the red light at the crossing. The camera is triggered by an electrical signal from the railway as soon ss a vehicle moves over th this line, image sequences are recorded and stored onboard the device.
Image projection clues for improved real-time vehicle tracking in tunnels
Vedran Jelaca, Jorge Oswaldo Niño Castaneda, Aleksandra Pizurica, et al.
Vehicle tracking is of great importance for tunnel safety. To detect incidents or disturbances in traffic flow it is necessary to reliably track vehicles in real-time. The tracking is a challenging task due to poor lighting conditions in tunnels and frequent light reflections from tunnel walls, the road and the vehicles themselves. In this paper we propose a multi-clue tracking approach combining foreground blobs, optical flow of Shi-Tomasi features and image projection profiles in a Kalman filter with a constant velocity model. The main novelty of our approach lies in using vertical and horizontal image projection profiles (so-called vehicle signatures) as additional measurements to overcome the problems of inconsistent foreground and optical flow clues in cases of severe lighting changes. These signatures consist of Radon-transform like projections along each image column and row. We compare the signatures from two successive video frames to align them and to correct the predicted vehicle position and size. We tested our approach on a real tunnel video sequence. The results show an improvement in the accuracy of the tracker and less target losses when image projection clues are used. Furthermore, calculation and comparison of image projections is computationally efficient so the tracker keeps real-time performance (25 fps, on a single 1.86 GHz processor).
Decentralized tracking of humans using a camera network
Sebastian Gruenwedel, Vedran Jelaca, Jorge Oswaldo Niño-Castañeda, et al.
Real-time tracking of people has many applications in computer vision and typically requires multiple cameras; for instance for surveillance, domotics, elderly-care and video conferencing. However, this problem is very challenging because of the need to deal with frequent occlusions and environmental changes. Another challenge is to develop solutions which scale well with the size of the camera network. Such solutions need to carefully restrict overall communication in the network and often involve distributed processing. In this paper we present a distributed person tracker, addressing the aforementioned issues. Real-time processing is achieved by distributing tasks between the cameras and a fusion node. The latter fuses only high level data based on low-bandwidth input streams from the cameras. This is achieved by performing tracking first on the image plane of each camera followed by sending only metadata to a local fusion node. We designed the proposed system with respect to a low communication load and towards robustness of the system. We evaluate the performance of the tracker in meeting scenarios where persons are often occluded by other persons and/or furniture. We present experimental results which show that our tracking approach is accurate even in cases of severe occlusions in some of the views.
Real-time detection of traffic events using smart cameras
M. Macesic, V. Jelaca, J. O. Niño-Castaneda, et al.
With rapid increase of number of vehicles on roads it is necessary to maintain close monitoring of traffic. For this purpose many surveillance cameras are placed along roads and on crossroads, creating a huge communication load between the cameras and the monitoring center. Therefore, the data needs to be processed on site and transferred to the monitoring centers in form of metadata or as a set of selected images. For this purpose it is necessary to detect events of interest already on the camera side, which implies using smart cameras as visual sensors. In this paper we propose a method for tracking of vehicles and analysis of vehicle trajectories to detect different traffic events. Kalman filtering is used for tracking, combining foreground and optical flow measurements. Obtained vehicle trajectories are used to detect different traffic events. Every new trajectory is compared with collection of normal routes and clustered accordingly. If the observed trajectory differs from all normal routes more than a predefined threshold, it is marked as abnormal and the alarm is raised. The system was developed and tested on Texas Instruments OMAP platform. Testing was done on four different locations, two locations in the city and two locations on the open road.
Vehicle tracking data for calibrating microscopic traffic simulation models
R. Schönauer, Y. Lipetski, H. Schrom-Feiertag
This paper applies object detection in a microscopic traffic model calibration process and analyses the outcome. To cover a large and versatile amount of real world data for calibration and validation processes this paper proposes semiautomated data acquisition by video analysis. This work concentrates mainly on the aspects of a automatic annotation tool applied to create trajectories of traffic participants over space and time. The acquired data is analyzed with a view towards calibrating vehicle models, which navigate through a road's surface and interact with the environment. The applied vehicle tracking algorithms for automated data extraction provide many trajectories not applicable for model calibration. Therefore, we applied an additional automated processing step to filter out faulty trajectories. With this process chain, the trajectory data can be extracted from videos automatically in a quality sufficient for the model calibration of speeds, the lateral positioning and vehicle interactions in a mixed traffic environment.
UAVs and Aerial Applications
icon_mobile_dropdown
AR.Drone: security threat analysis and exemplary attack to track persons
Fred Samland, Jana Fruth, Mario Hildebrandt, et al.
In this article we illustrate an approach of a security threat analysis of the quadrocopter AR.Drone, a toy for augmented reality (AR) games. The technical properties of the drone can be misused for attacks, which may relate security and/or privacy aspects. Our aim is to sensitize for the possibility of misuses and the motivation for an implementation of improved security mechanisms of the quadrocopter. We focus primarily on obvious security vulnerabilities (e.g. communication over unencrypted WLAN, usage of UDP, live video streaming via unencrypted WLAN to the control device) of this quadrocopter. We could practically verify in three exemplary scenarios that this can be misused by unauthorized persons for several attacks: high-jacking of the drone, eavesdropping of the AR.Drones unprotected video streams, and the tracking of persons. Amongst other aspects, our current research focuses on the realization of the attack of tracking persons and objects with the drone. Besides the realization of attacks, we want to evaluate the potential of this particular drone for a "safe-landing" function, as well as potential security enhancements. Additionally, in future we plan to investigate an automatic tracking of persons or objects without the need of human interactions.
Detection of unknown targets from aerial camera and extraction of simple object fingerprints for the purpose of target reacquisition
An aerial multiple camera tracking paradigm needs to not only spot unknown targets and track them, but also needs to know how to handle target reacquisition as well as target handoff to other cameras in the operating theater. Here we discuss such a system which is designed to spot unknown targets, track them, segment the useful features and then create a signature fingerprint for the object so that it can be reacquired or handed off to another camera. The tracking system spots unknown objects by subtracting background motion from observed motion allowing it to find targets in motion, even if the camera platform itself is moving. The area of motion is then matched to segmented regions returned by the EDISON mean shift segmentation tool. Whole segments which have common motion and which are contiguous to each other are grouped into a master object. Once master objects are formed, we have a tight bound on which to extract features for the purpose of forming a fingerprint. This is done using color and simple entropy features. These can be placed into a myriad of different fingerprints. To keep data transmission and storage size low for camera handoff of targets, we try several different simple techniques. These include Histogram, Spatiogram and Single Gaussian Model. These are tested by simulating a very large number of target losses in six videos over an interval of 1000 frames each from the DARPA VIVID video set. Since the fingerprints are very simple, they are not expected to be valid for long periods of time. As such, we test the shelf life of fingerprints. This is how long a fingerprint is good for when stored away between target appearances. Shelf life gives us a second metric of goodness and tells us if a fingerprint method has better accuracy over longer periods. In videos which contain multiple vehicle occlusions and vehicles of highly similar appearance we obtain a reacquisition rate for automobiles of over 80% using the simple single Gaussian model compared with the null hypothesis of <20%. Additionally, the performance for fingerprints stays well above the null hypothesis for as much as 800 frames. Thus, a simple and highly compact single Gaussian model is useful for target reacquisition. Since the model is agnostic to view point and object size, it is expected to perform as well on a test of target handoff. Since some of the performance degradation is due to problems with the initial target acquisition and tracking, the simple Gaussian model may perform even better with an improved initial acquisition technique. Also, since the model makes no assumption about the object to be tracked, it should be possible to use it to fingerprint a multitude of objects, not just cars. Further accuracy may be obtained by creating manifolds of objects from multiple samples.
Super-resolution terrain map enhancement for navigation based on satellite imagery
Super resolution techniques are commonly used to enhance images and video. The techniques have previously been applied to the enhancement of map data via enhancing aerial imagery. This paper proposes the use of super resolution techniques for enhancing topographic data directly. Specifically, a database-driven super resolution algorithm that is trained with domain-specific patterns is used to enhance topographic digital elevation model (DEM) data from NASA/NGIA SRTM. This enhancement process is evaluated using a elevation-difference evaluation technique where downscaled and enhanced DEM data is compared to the origional higher-resolution data. It is also evaluated with a threshold-based elevation difference metric and visually. The benefits that it might have for flight path planning for a UAV application are discussed. The challenges of using super resolution style techniques on non-visual data are also reviewed.
Robot Manipulation and Application
icon_mobile_dropdown
3D positional control of magnetic levitation system using adaptive control: improvement of positioning control in horizontal plane
Toshimasa Nishino, Yasuhiro Fujitani, Norihiko Kato, et al.
The objective of this paper is to establish a technique that levitates and conveys a hand, a kind of micro-robot, by applying magnetic forces: the hand is assumed to have a function of holding and detaching the objects. The equipment to be used in our experiments consists of four pole-pieces of electromagnets, and is expected to work as a 4DOF drive unit within some restricted range of 3D space: the three DOF are corresponding to 3D positional control and the remaining one DOF, rotational oscillation damping control. Having used the same equipment, Khamesee et al. had manipulated the impressed voltages on the four electric magnetics by a PID controller by the use of the feedback signal of the hand's 3D position, the controlled variable. However, in this system, there were some problems remaining: in the horizontal direction, when translating the hand out of restricted region, positional control performance was suddenly degraded. The authors propose a method to apply an adaptive control to the horizontal directional control. It is expected that the technique to be presented in this paper contributes not only to the improvement of the response characteristic but also to widening the applicable range in the horizontal directional control.
The magic glove: a gesture-based remote controller for intelligent mobile robots
Chaomin Luo, Yue Chen, Mohan Krishnan, et al.
This paper describes the design of a gesture-based Human Robot Interface (HRI) for an autonomous mobile robot entered in the 2010 Intelligent Ground Vehicle Competition (IGVC). While the robot is meant to operate autonomously in the various Challenges of the competition, an HRI is useful in moving the robot to the starting position and after run termination. In this paper, a user-friendly gesture-based embedded system called the Magic Glove is developed for remote control of a robot. The system consists of a microcontroller and sensors that is worn by the operator as a glove and is capable of recognizing hand signals. These are then transmitted through wireless communication to the robot. The design of the Magic Glove included contributions on two fronts: hardware configuration and algorithm development. A triple axis accelerometer used to detect hand orientation passes the information to a microcontroller, which interprets the corresponding vehicle control command. A Bluetooth device interfaced to the microcontroller then transmits the information to the vehicle, which acts accordingly. The user-friendly Magic Glove was successfully demonstrated first in a Player/Stage simulation environment. The gesture-based functionality was then also successfully verified on an actual robot and demonstrated to judges at the 2010 IGVC.
Way-point navigation for a skid-steer vehicle in unknown environments
Peiyi Chen, Arun Das, Prasenjit Mukherjee, et al.
Unmanned ground vehicles (UGVs) allow people to remotely access and perform tasks in dangerous or inconvenient locations more effectively. They have been successfully used for practical applications such as mine detection, sample retrieval, and exploration and mapping. One of the fundamental requirements for the autonomous operation of any vehicle is the capability to traverse its environment safely. To accomplish this, UGVs rely on the data from their on-board sensors to solve the problems of localization, mapping, path planning, and controls. This paper proposes a combined mapping, path planning, and controls solution that will allow a skidsteer UGV to navigate safely through unknown environments and reach a goal location. The mapping algorithm generates 2D maps of the traversable environment, the path planner uses these maps to find kinodynamically feasible paths to the goal, and the tracking controller ensures that the vehicle stays on the generated path during traversal. All of the algorithms are computationally efficient enough to run onboard the robot in real-time, and the proposed solution has been experimentally verified on a custom built skid-steer vehicle allowing it to navigate to desired GPS waypoints through a variety of unknown environments.
Vision Navigation and Activity Recognition
icon_mobile_dropdown
Integrated field testing of planetary robotics vision processing: the PRoVisG campaign in Tenerife 2011
G. Paar, L. Waugh, D. P. Barnes, et al.
In order to maximize the use of a robotic probe during its limited lifetime, scientists immediately have to be provided the best achievable visual quality of 3D data products. The EU FP7-SPACE Project PRoVisG (2008-2012) develops technology for the rapid processing and effective representation of visual data by improving ground processing facilities. In September 2011 PRoVisG held a Field Trials campaign in the Caldera of Tenerife to verify the implemented 3D Vision processing mechanisms and to collect various sets of reference data in representative environment. The campaign was strongly supported by the Astrium UK Rover Bridget as a representative platform which allows simultaneous onboard mounting and powering of various vision sensors such as the Aberystwyth ExoMars PanCam Emulator (AUPE). The paper covers the preparation work for such a campaign and highlights the experiments that include standard operations- and science- related components but also data capture to verify specific processing functions. We give an overview of the captured data and the compiled and envisaged processing results, as well as a summary of the test sites, logistics and test assets utilized during the campaign.
Hierarchical loop detection for mobile outdoor robots
Dagmar Lang, Christian Winkens, Marcel Häselich, et al.
Loop closing is a fundamental part of 3D simultaneous localization and mapping (SLAM) that can greatly enhance the quality of long-term mapping. It is essential for the creation of globally consistent maps. Conceptually, loop closing is divided into detection and optimization. Recent approaches depend on a single sensor to recognize previously visited places in the loop detection stage. In this study, we combine data of multiple sensors such as GPS, vision, and laser range data to enhance detection results in repetitively changing environments that are not sufficiently explained by a single sensor. We present a fast and robust hierarchical loop detection algorithm for outdoor robots to achieve a reliable environment representation even if one or more sensors fail.
A novel margin-based linear embedding technique for visual object recognition
F. Dornaika, A. Assoum
Linear Dimensionality Reduction (LDR) techniques have been increasingly important in computer vision and pattern recognition since they permit a relatively simple mapping of data onto a lower dimensional subspace, leading to simple and computationally efficient classification strategies. Recently, many linear discriminant methods have been developed in order to reduce the dimensionality of visual data and to enhance the discrimination between different groups or classes. Although many linear discriminant analysis methods have been proposed in the literature, they suffer from at least one of the following shortcomings: i) they require the setting of many parameters (e.g., the neighborhood sizes for homogeneous and heterogeneous samples), ii) they suffer from the Small Sample Size problem that often occurs when dealing with visual data sets for which the number of samples is less than the dimension of the sample, and iii) most of the traditional subspace learning methods have to determine the dimension of the projected space by either cross-validation or exhaustive search. In this paper, we propose a novel margin-based linear embedding method that exploits the nearest hit and the nearest miss samples only. Our proposed method tackles all the above shortcomings. It finds the projection directions such that the sum of local margins is maximized. Our proposed approach has been applied to the problem of appearancebased face recognition. Experimental results performed on four public face databases show that the proposed approach can give better generalization performance than the competing methods. These competing methods used for performance comparison were: Principal Component Analysis (PCA), Locality Preserving Projections (LPP), Average Neighborhood Margin Maximization (ANMM), and Maximally Collapsing Metric Learning algorithm (MCML). The proposed approach could also be applied to other category of objects characterized by large variations in their appearance.
Real-time two-level foreground detection and person-silhouette extraction enhanced by body-parts tracking
Rada Deeb, Elodie Desserée, Saida Bouakaz
In this paper we discuss foreground detection and human body silhouette extraction and tracking in monocular video systems designed for human motion analysis applications. Vision algorithms face many challenges when it comes to analyze human activities in non-controlled environments. For instance, issues like illumination changes, shadows, camouflage and occlusions make the detection and the tracking of a moving person a hard task to accomplish. Hence, advanced solutions are required to analyze the content of video sequences. We propose a real-time, two-level foreground detection, enhanced by body parts tracking, designed to efficiently extract person silhouette and body parts for monocular video-based human motion analysis systems. We aim to find solutions for different non-controlled environment challenges, which make the detection and the tracking of a moving person a hard task to accomplish. On the first level, we propose an enhanced Mixture of Gaussians, built on both chrominanceluminance and chrominance-only spaces, which handles global illumination changes. On the second level, we improve segmentation results, in interesting areas, by using statistical foreground models updated by a high-level tracking of body parts. Each body part is represented with a set of template characterized by a feature vector built in an initialization phase. Then, high level tracking is done by finding blob-template correspondences via distance minimization in feature space. Correspondences are then used to update foreground models, and a graph cut algorithm, which minimizes a Markov random field energy function containing these models, is used to refine segmentation. We were able to extract a refined silhouette in the presence of light changes, noise and camouflage. Moreover, the tracking approach allowed us to infer information about the presence and the location of body parts even in the case of partial occlusion.
Activity recognition from video using layered approach
Charles A. McPherson, John M. Irvine, Mon Young, et al.
The adversary in current threat situations can no longer be identified by what they are, but by what they are doing. This has lead to a large increase in the use of video surveillance systems for security and defense applications. With the quantity of video surveillance at the disposal of organizations responsible for protecting military and civilian lives comes issues regarding the storage and screening the data for events and activities of interest. Activity recognition from video for such applications seeks to develop automated screening of video based upon the recognition of activities of interest rather than merely the presence of specific persons or vehicle classes developed for the Cold War problem of "Find the T72 Tank". This paper explores numerous approaches to activity recognition, all of which examine heuristic, semantic, and syntactic methods based upon tokens derived from the video. The proposed architecture discussed herein uses a multi-level approach that divides the problem into three or more tiers of recognition, each employing different techniques according to their appropriateness to strengths at each tier using heuristics, syntactic recognition, and HMM's of token strings to form higher level interpretations.
Visual Algorithms
icon_mobile_dropdown
Method for fast detecting the intersection of a plane and a cube in an octree structure to find point sets within a convex region
K, Fujimoto, N. Kimura, T. Moriya
Performing efficient view frustum culling is a fundamental problem in computer graphics. In general, an octree is used for view frustum culling. The culling checks the intersection of each octree node (cube) against the planes of the view frustum. However, this involves many calculations. We propose a method for fast detecting the intersection of a plane and a cube in an octree structure. When we check which child of the octree node intersects a plane, we compare the coordinates of the corner of the node and the plane. Using an octree, we calculate the vertices of the child node by using the vertices of the parent node. To find points within a convex region, a visibility test is performed by AND operation with the result of three or more planes. In experiments, we tested the problem of searching for the visible point with a camera. The method was two times faster than the conventional method, which detects a visible octree node by using the inner product of the plane and each corner of the node.
Lucas-Kanade image registration using camera parameters
Sunghyun Cho, Hojin Cho, Yu-Wing Tai, et al.
The Lucas-Kanade algorithm and its variants have been successfully used for numerous works in computer vision, which include image registration as a component in the process. In this paper, we propose a Lucas-Kanade based image registration method using camera parameters. We decompose a homography into camera intrinsic and extrinsic parameters, and assume that the intrinsic parameters are given, e.g., from the EXIF information of a photograph. We then estimate only the extrinsic parameters for image registration, considering two types of camera motions, 3D rotations and full 3D motions with translations and rotations. As the known information about the camera is fully utilized, the proposed method can perform image registration more reliably. In addition, as the number of extrinsic parameters is smaller than the number of homography elements, our method runs faster than the Lucas-Kanade based registration method that estimates a homography itself.
Object tracking with adaptive HOG detector and adaptive Rao-Blackwellised particle filter
Stefano Rosa, Marco Paleari, Paolo Ariano, et al.
Scenarios for a manned mission to the Moon or Mars call for astronaut teams to be accompanied by semiautonomous robots. A prerequisite for human-robot interaction is the capability of successfully tracking humans and objects in the environment. In this paper we present a system for real-time visual object tracking in 2D images for mobile robotic systems. The proposed algorithm is able to specialize to individual objects and to adapt to substantial changes in illumination and object appearance during tracking. The algorithm is composed by two main blocks: a detector based on Histogram of Oriented Gradient (HOG) descriptors and linear Support Vector Machines (SVM), and a tracker which is implemented by an adaptive Rao-Blackwellised particle filter (RBPF). The SVM is re-trained online on new samples taken from previous predicted positions. We use the effective sample size to decide when the classifier needs to be re-trained. Position hypotheses for the tracked object are the result of a clustering procedure applied on the set of particles. The algorithm has been tested on challenging video sequences presenting strong changes in object appearance, illumination, and occlusion. Experimental tests show that the presented method is able to achieve near real-time performances with a precision of about 7 pixels on standard video sequences of dimensions 320 × 240.
A modular real-time vision system for humanoid robots
Alina L. Trifan, António J. R. Neves, Nuno Lau, et al.
Robotic vision is nowadays one of the most challenging branches of robotics. In the case of a humanoid robot, a robust vision system has to provide an accurate representation of the surrounding world and to cope with all the constraints imposed by the hardware architecture and the locomotion of the robot. Usually humanoid robots have low computational capabilities that limit the complexity of the developed algorithms. Moreover, their vision system should perform in real time, therefore a compromise between complexity and processing times has to be found. This paper presents a reliable implementation of a modular vision system for a humanoid robot to be used in color-coded environments. From image acquisition, to camera calibration and object detection, the system that we propose integrates all the functionalities needed for a humanoid robot to accurately perform given tasks in color-coded environments. The main contributions of this paper are the implementation details that allow the use of the vision system in real-time, even with low processing capabilities, the innovative self-calibration algorithm for the most important parameters of the camera and its modularity that allows its use with different robotic platforms. Experimental results have been obtained with a NAO robot produced by Aldebaran, which is currently the robotic platform used in the RoboCup Standard Platform League, as well as with a humanoid build using the Bioloid Expert Kit from Robotis. As practical examples, our vision system can be efficiently used in real time for the detection of the objects of interest for a soccer playing robot (ball, field lines and goals) as well as for navigating through a maze with the help of color-coded clues. In the worst case scenario, all the objects of interest in a soccer game, using a NAO robot, with a single core 500Mhz processor, are detected in less than 30ms. Our vision system also includes an algorithm for self-calibration of the camera parameters as well as two support applications that can run on an external computer for color calibration and debugging purposes. These applications are built based on a typical client-server model, in which the main vision pipe runs as a server, allowing clients to connect and distantly monitor its performance, without interfering with its efficiency. The experimental results that we acquire prove the efficiency of our approach both in terms of accuracy and processing time. Despite having been developed for the NAO robot, the modular design of the proposed vision system allows it to be easily integrated into other humanoid robots with a minimum number of changes, mostly in the acquisition module.
Intelligent Ground Vehicle Competition
icon_mobile_dropdown
Radial polar histogram: obstacle avoidance and path planning for robotic cognition and motion control
Po-Jen Wang, Nicholas R. Keyawa, Craig Euler
In order to achieve highly accurate motion control and path planning for a mobile robot, an obstacle avoidance algorithm that provided a desired instantaneous turning radius and velocity was generated. This type of obstacle avoidance algorithm, which has been implemented in California State University Northridge's Intelligent Ground Vehicle (IGV), is known as Radial Polar Histogram (RPH). The RPH algorithm utilizes raw data in the form of a polar histogram that is read from a Laser Range Finder (LRF) and a camera. A desired open block is determined from the raw data utilizing a navigational heading and an elliptical approximation. The left and right most radii are determined from the calculated edges of the open block and provide the range of possible radial paths the IGV can travel through. In addition, the calculated obstacle edge positions allow the IGV to recognize complex obstacle arrangements and to slow down accordingly. A radial path optimization function calculates the best radial path between the left and right most radii and is sent to motion control for speed determination. Overall, the RPH algorithm allows the IGV to autonomously travel at average speeds of 3mph while avoiding all obstacles, with a processing time of approximately 10ms.
Optimizing a mobile robot control system using GPU acceleration
Nat Tuck, Michael McGuinness, Fred Martin
This paper describes our attempt to optimize a robot control program for the Intelligent Ground Vehicle Competition (IGVC) by running computationally intensive portions of the system on a commodity graphics processing unit (GPU). The IGVC Autonomous Challenge requires a control program that performs a number of different computationally intensive tasks ranging from computer vision to path planning. For the 2011 competition our Robot Operating System (ROS) based control system would not run comfortably on the multicore CPU on our custom robot platform. The process of profiling the ROS control program and selecting appropriate modules for porting to run on a GPU is described. A GPU-targeting compiler, Bacon, is used to speed up development and help optimize the ported modules. The impact of the ported modules on overall performance is discussed. We conclude that GPU optimization can free a significant amount of CPU resources with minimal effort for expensive user-written code, but that replacing heavily-optimized library functions is more difficult, and a much less efficient use of time.
Design and realization of an intelligent ground vehicle with modular payloads
Mehmet A. Akmanalp, Ryan M. Doherty, Jeffrey Gorges, et al.
In June 2011, Worcester Polytechnic Institute's (WPI) unmanned ground vehicle Prometheus participated in the 8th Annual Robotic Lawnmower and 19th Annual Intelligent Ground Vehicle Competitions back-to-back. This paper details the two-year design and development cycle for WPI's intelligent ground vehicle, Prometheus. The on-board intelligence algorithms include lane detection, obstacle avoidance, path planning, world representation and waypoint navigation. The authors present experimental results and discuss practical implementations of the intelligence algorithms used on the robot.
Navigating a path delineated by colored flags: an approach for a 2011 IGVC requirement
Alex Szmatula, Matt Parrish, Mohan Krishnan, et al.
In this work, we address a situation presented as a new requirement for the Autonomous Challenge portion of the 2011 Intelligent Ground Vehicle Competition (IGVC). This new requirement is to navigate between red and green colored flags placed within the normal white painted lane lines. The regular vision algorithms had to be enhanced to reliably identify and localize the colored flags, while the Navigation algorithms had to be modified to satisfy the constraints placed on the robot while transiting through the flag region. The challenge in finding a solution was the size of the flags, the possibility of loosing them against the background, as well as their movement in the wind. The attendant possibility of false positives and negatives also needed to be addressed to increase reliability of detection. Preliminary tests on the robot have produced positive results.
Navigating with VFH: a strategy to avoid traps
Chaomin Luo, Mohan Krishnan, Mark Paulik, et al.
The IGVC Navigation Challenge course configuration has evolved in complexity to a point where use of a simple reactive local navigation algorithm presents problems in course completion. A commonly used local navigation algorithm, the Vector Field Histogram (VFH), is relatively fast and thus suitable when computational capabilities on a robot are limited. One of the attendant disadvantages of this algorithm is that a robot can get trapped when attempting to get past a concave obstacle structure. The Navigation Challenge course now has several such structures, including some that partially surround waypoints. Elaborate heuristics are needed to make VFH viable in such a situation and their tuning is arduous. An alternate approach that avoids the use of heuristics is to combine a dynamic path planning algorithm with VFH. In this work, the D*Lite path planning algorithm is used to provide VFH with intermediate goals, which the latter then uses as stepping stones to its final destination. Results from simulation studies as well as field deployment are used to illustrate the benefits of using the local navigator in conjunction with a path planner.
Interactive Paper Session
icon_mobile_dropdown
Measurement of noises and modulation transfer function of cameras used in optical-digital correlators
Nikolay N. Evtikhiev, Sergey N. Starikov, Pavel A. Cheryomkhin, et al.
Hybrid optical-digital systems based on diffractive correlator are being actively developed. To correctly estimate application capabilities of cameras of different types in optical-digital correlation systems knowledge of modulation transfer function (MTF) and light depended temporal and spatial noises is required. The method for measurement of 2D MTF is presented. The method based on random target method but instead of a random target the specially created target with flat power spectrum is used. It allows to measure MTF without averaging 1D Fourier spectra over rows or columns as is in the random target method and to achieve all values of 2D MTF instead of just two orthogonal cross-sections. The simple method for measuring the dependence of camera temporal noise on light signal value by shooting a single scene is described. Measurements results of light and dark spatial and temporal noises of cameras are presented. Procedure for obtaining camera's light spatial noise portrait (array of PRNU values for all photo sensor pixels) is presented. Results on measurements of MTF and temporal and spatial noises for consumer photo camera, machine vision camera and videosurveillance camera are presented.
A phase space approach for detection and removal of rain in video
Varun Santhaseelan, Vijayan K. Asari
Nowadays, the widespread use of computer vision algorithms in surveillance systems and autonomous robots has increased the demand for video enhancement algorithms. In this paper, we propose an algorithm based on phase congruency features to detect and remove rain and thus improve the quality of video. We make use of the following characteristics of rain streaks in video in order to detect them: (1) rain streaks do not occlude the scene at all instances, (2) all the rain streaks in a frame are oriented in a single direction, and (3) presence of rain streak at a particular pixel causes a positive change in intensity. Combining all these properties we are able to detect rain streaks in a particular frame using phase congruency features. The pixels in a frame which are identified as rain streaks are then replaced using the pixel information of its spatial and temporal neighbors which are not affected by rain. When this method is used in conjunction with phase correlation, we are able to remove rain of medium density from videos even when complex camera movement is involved.
Intelligence algorithms for autonomous navigation in a ground vehicle
Steve Petkovsek, Rahul Shakya, Young Ho Shin, et al.
This paper will discuss the approach to autonomous navigation used by "Q," an unmanned ground vehicle designed by the Trinity College Robot Study Team to participate in the Intelligent Ground Vehicle Competition (IGVC). For the 2011 competition, Q's intelligence was upgraded in several different areas, resulting in a more robust decision-making process and a more reliable system. In 2010-2011, the software of Q was modified to operate in a modular parallel manner, with all subtasks (including motor control, data acquisition from sensors, image processing, and intelligence) running simultaneously in separate software processes using the National Instruments (NI) LabVIEW programming language. This eliminated processor bottlenecks and increased flexibility in the software architecture. Though overall throughput was increased, the long runtime of the image processing process (150 ms) reduced the precision of Q's realtime decisions. Q had slow reaction times to obstacles detected only by its cameras, such as white lines, and was limited to slow speeds on the course. To address this issue, the image processing software was simplified and also pipelined to increase the image processing throughput and minimize the robot's reaction times. The vision software was also modified to detect differences in the texture of the ground, so that specific surfaces (such as ramps and sand pits) could be identified. While previous iterations of Q failed to detect white lines that were not on a grassy surface, this new software allowed Q to dynamically alter its image processing state so that appropriate thresholds could be applied to detect white lines in changing conditions. In order to maintain an acceptable target heading, a path history algorithm was used to deal with local obstacle fields and GPS waypoints were added to provide a global target heading. These modifications resulted in Q placing 5th in the autonomous challenge and 4th in the navigation challenge at IGVC.
Hierarchical multi-level image mosaicing for autonomous navigation of UAV
Sangho Park, Debabrata Ghosh, Naima Kaabouch, et al.
A novel algorithm for hierarchical multi-level image mosaicing for autonomous navigation of UAV is proposed. The main contribution of the proposed system is the blocking of the error accumulation propagated along the frames, by incrementally building a long-duration mosaic on the fly which is hierarchically composed of short-duration mosaics. The proposed algorithm fulfills the real-time processing requirements in autonomous navigation as follows. 1) Causality: the current output of the mosaicing system depends only on the current and/or previous input frames, contrary to existing offline mosaic algorithms that depend on future input frames as well. 2) Learnability: the algorithm autonomously analyzes/learns the scene characteristics. 3) Adaptability: the system automatically adapts itself to the scene change and chooses the proper methods for feature selection (i.e., the fast but unreliable LKT vs. the slow but robust SIFT). The evaluation of our algorithm with the extensive field test data involving several thousand airborne images shows the significant improvement in processing time, robustness and accuracy of the proposed algorithm.
A fluidic lens with reduced optical aberration
Jei-Yin Yiu, Robert Batchko, Sam Robinson, et al.
We present an electrically-actuated adaptive fluidic lens having a 10-mm aperture, 4-diopter range and center-thickness less than 1 mm. The lens employs dual deflectable glass membranes encasing an optical fluid. A piezoelectric ring bender actuator draws less than 1 mW and is built into the 25-mm-diameter lens housing. The adaptive lens demonstrates resolution comparable to commercial precision glass singlet lenses of similar format over a wide range of field angles and focal powers. Focal power vs. voltage, resolution, modulation transfer function (MTF), life testing and dynamic response are examined and show that the lens is suitable for numerous adaptive lens applications demanding high optical quality.