Proceedings Volume 6006

Intelligent Robots and Computer Vision XXIII: Algorithms, Techniques, and Active Vision

David P. Casasent, Ernest L. Hall, Juha Röning
cover
Proceedings Volume 6006

Intelligent Robots and Computer Vision XXIII: Algorithms, Techniques, and Active Vision

David P. Casasent, Ernest L. Hall, Juha Röning
View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 23 October 2005
Contents: 9 Sessions, 40 Papers, 0 Presentations
Conference: Optics East 2005 2005
Volume Number: 6006

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Intelligent Robots and Computer Vision I: Invited Session
  • Intelligent Robots and Computer Vision II: Invited Session
  • Novel Applications and Technologies for Advanced Computer Vision
  • Tracking and Multi-View Image Processing for Robots
  • Mobile Robot Competitions and Autonomous Navigation
  • Autonomous Navigation, Obstacle Avoidance, and Path Planning
  • Color Processing, Calibration, and Imaging
  • Image Processing and Applications for Robots and Computer Vision
  • Applications of Autonomous Robots
Intelligent Robots and Computer Vision I: Invited Session
icon_mobile_dropdown
Face recognition with illumination and pose variations using MINACE filters
David Casasent, Rohit Patnaik
This paper presents the status of our present CMU face recognition work. We first present a face recognition system that functions in the presence of illumination variations. We then present initial results when pose variations are also considered. A separate minimum noise and correlation energy (MINACE) filter is synthesized for each person. Our concern is face identification and impostor (non-database face) rejection. Most prior face identification did not address impostor rejection. We also present results for face verification with impostor rejection. The MINACE parameter c trades-off distortion-tolerance (recognition) versus discrimination (impostor rejection) performance. We use an automated filter-synthesis algorithm to select c and to synthesize the MINACE filter for each person using a training set of images of that person and a validation set of a few faces of other persons; this synthesis ensures both good recognition and impostor rejection performance. No impostor data is present in the training or validation sets. The peak-tocorrelation energy ratio (PCE) metric is used as the match-score in both the filter-synthesis and test stages and we show that it is better than use of the correlation peak value. We use circular correlations in filter synthesis and in tests, since such filters require one-fourth the storage space and similarly fewer on-line correlation calculations compared to the use of linear correlation filters. All training set images are registered (aligned) using the coordinates of several facial landmarks to remove scale variations and tilt bias. We also discuss the proper handling of pose variations by either pose estimation or by transforming the test input to all reference poses. Our face recognition system is evaluated using images from the CMU Pose, Illumination, and Expression (PIE) database. The same set of MINACE filters and impostor faces are used to evaluate the performance of the face identification and verification systems.
Intelligent robots that adapt, learn, and predict
The purpose of this paper is to describe the concept and architecture for an intelligent robot system that can adapt, learn and predict the future. This evolutionary approach to the design of intelligent robots is the result of several years of study on the design of intelligent machines that could adapt using computer vision or other sensory inputs, learn using artificial neural networks or genetic algorithms, exhibit semiotic closure with a creative controller and perceive present situations by interpretation of visual and voice commands. This information processing would then permit the robot to predict the future and plan its actions accordingly. In this paper we show that the capability to adapt, and learn naturally leads to the ability to predict the future state of the environment which is just another form of semiotic closure. That is, predicting a future state without knowledge of the future is similar to making a present action without knowledge of the present state. The theory will be illustrated by considering the situation of guiding a mobile robot through an unstructured environment for a rescue operation. The significance of this work is in providing a greater understanding of the applications of learning to mobile robots.
The evolution of space robotics: new directions in human-robotic exploration
Through the Vision for Space Exploration (hence Vision) recently announced by the US President, NASA has received the charter and challenge of progressively staged human-robotic (H/R) exploration of the Solar System. This exploration encompasses the autonomous robotic precursors that will pave the way for safe and durable H/R presence at the Moon, Mars and beyond; it also entails the development of an orbital space infrastructure to support incremental, cost-effective deployment of both human and robotic assets deep into space. We overview these anticipated mission operations and functions in the context of future robotic and telerobotic capabilities needed to support them.
Intelligent Robots and Computer Vision II: Invited Session
icon_mobile_dropdown
Retinex software or diffractive-optical correlator hardware: the basis of human color vision?
Edwin Land--based on photometric data--tried to explain through a retinex (retina + cortex) model calculating scaled integrated reflectances, how human color vision determines the perceived hues of colored Mondrian patches by relating illuminants and 'energies at the eye' and including calculation over the whole image. An alternative--purely optical--model, the diffractive-optical correlator hardware in aperture and image space of the human eye, relating 'local' data onto 'global' data in color vision, becomes illustrated. Based on Edwin Land's experimental data it is shown how the perceived hues result from diffractive-optical transformations and cross-correlations between object space and reciprocal space (RGB space), from matrix multiplications and divisions in vector space. Optical pre-processing causes that our eyes do not see what (physically) is real, but what they optically have calculated. This same diffractive-optical mechanism has also lead to an explanation of the phenomenon of paradoxically colored shadows, shortly re-presented in the introduction.
Embedded object concept with a telepresence robot system
This paper presents the Embedded Object Concept (EOC) and a telepresence robot system which is a test case for the EOC. The EOC utilizes common object-oriented methods used in software by applying them to combined Lego-like software-hardware entities. These entities represent objects in object-oriented design methods, and they are the building blocks of embedded systems. The goal of the EOC is to make the designing of embedded systems faster and easier. This concept enables people without comprehensive knowledge in electronics design to create new embedded systems, and for experts it shortens the design time of new embedded systems. We present the current status of the EOC, including two generations of embedded objects named Atomi objects. The first generation of the Atomi objects has been tested with different applications, and found to be functional, but not optimal. The second generation aims to correct the issues found with the first generation, and it is being tested in a relatively complex test case. The test case is a telepresence robot consisting of a two wheeled human height robot and its computer counter part. The robot has been constructed using incremental device development, which is made possible by the architecture of the EOC. The robot contains video and audio exchange capability, and a controlling and balancing system for driving with two wheels. The robot is built in two versions, the first consisting of a PDA device and Atomi objects, and the second consisting of only Atomi objects. The robot is currently incomplete, but for the most part it has been successfully tested.
People detection in crowded scenes using active contour models and texture analysis
Oliver Sidla, Jean P. Andreu, Lucas Paletta, et al.
A vision system designed to detect people in complex backgrounds is presented. The purpose of the proposed algorithms is to allow the identification and tracking of single persons under difficult conditions - in crowded places, under partial occlusion and in low resolution images. In order to detect people reliably, we combine different information channels from video streams. Most emphasis for the initialization of trajectories and the subsequent pedestrian recognition is placed on the detection of the head-shoulder contour. In the first step a simple and fast shape model selects promising candidates, then a local active shape model is matched against the gradients found in the image with the help of a cost function. Texture analysis in the form of co-occurrence features ensures that shape candidates form coherent trajectories over time. In order to reduce the amount of false positives and to become more robust, a pattern analysis step based on Eigenimage analysis is presented. The cues which form the basis of pedestrian detection are integrated into a tracking algorithm which uses the shape information for initial pedestrian detection and verification, propagates positions into new frames using local motion and matches pedestrians with the help of texture information.
Novel Applications and Technologies for Advanced Computer Vision
icon_mobile_dropdown
Image retrieval based on local grey-level invariants
Eva Bordeaux, Neelima Shrikhande
During past decades, the enormous growth of image archives has significantly increased the demand for research efforts aimed at efficiently finding specific images within large databases. This paper investigates matching of images of buildings, architectural designs, blueprints and sketches. Their geometrical constrains lead to the proposed approach: the use of local grey-level invariants based on internal contours of the object. The problem involves three key phases: object recognition in image data, matching two images and searching the database of images. The emphasis of this paper is on object recognition based on internal contours of image data. In her master's thesis, M.M. Kulkarni described a technique for image retrieval by contour analysis implemented on external contours of an object in an image data. This is used to define the category of a building (tower, dome, flat, etc). Integration of these results with local grey-level invariant analysis creates a more robust image retrieval system. Thus, the best match result is the intersection of the results of contour analysis and grey-level invariants analysis. Experiments conducted for the database of architectural buildings have shown robustness w.r.t. to image rotation, translation, small view-point variations, partial visibility and extraneous features. The recognition rate is above 99% for a variety of tested images taken under different conditions.
Visual surveillance coverage: strategies and metrics
Jacob Everist, T. Nathan Mundhenk, Chris Landauer, et al.
Many sensor systems such as security cameras and satellite photography are faced with the problem of where they should point their sensors at any given time. With directional control of the sensor, the amount of space available to cover far exceeds the field-of-view of the sensor. Given a task domain and a set of constraints, we seek coverage strategies that achieve effective area coverage of the environment. We develop metrics that measure the quality of the strategies and give a basis for comparison. In addition, we explore what it means for an area to be "covered" and how that is affected by the domain, the sensor constraints, and the algorithms. We built a testbed in which we implement and run various sensor coverage strategies and take measurements on their performance. We modeled the domain of a camera mounted on pan and tilt servos with appropriate constraints and time delays on movement. Next, we built several coverage strategies for selecting where the camera should look at any given time based on concepts such as force-mass systems, scripted movements, and the time since an area was last viewed. Finally, we describe several metrics with which we can compare the effectiveness of different coverage strategies. These metrics are based on such things as how well the whole space is covered, how relevant the covered areas are to the domain, how much time is spent acquiring data, how much time is wasted while moving the servos, and how well the strategies detect new objects moving through space.
Task-selection and task-merging for directional and vision-based sensors
Jacob Everist, T. Nathan Mundhenk, Chris Landauer, et al.
Many vision research projects involve a sensor or camera doing one thing and doing it well. Fewer research projects have been done involving a sensor trying to satisfy simultaneous and conflicting tasks. Satisfying a task involves pointing the sensor in the direction demanded by the task. We seek ways to mitigate and select between competing tasks and also, if possible, merge the tasks together to be simultaneously achieved by the sensor. This would make a simple pan-tilt camera a very powerful instrument. These two approaches are task-selection and task-merging respectively. We built a simple testbed to implement our task-selection and task-merging schemes. We use a digital camera as our sensor attached to pan and tilt servos capable of pointing the sensor in different directions. We use three different types of tasks for our research: target tracking, surveillance coverage, and initiative. Target tracking is the task of following a target with a known set of features. Surveillance coverage is the task of ensuring that all areas of the space are routinely scanned by the sensor. Initiative is the task of focusing on new things of potential interest should they appear in the course of other activities. Given these heterogeneous task descriptions, we achieve task-selection by assigning priority functions to each task and letting the camera select among the tasks to service. To achieve task-merging, we introduce a concept called "task maps" that represent the regions of space the tasks wish to attend with the sensor. We then merge the task maps and select a region to attend that will satisfy multiple tasks at the same time if possible.
Terrestrial experiments for the motion estimation of a large space debris object using image data
An algorithm is developed for estimating the motion (relative attitude and relative position) of large pieces of space debris, such as failed satellites. This algorithm is designed to be used by a debris removal system which would perform various operations on space debris such as observation, investigation, capture, repair, refuel and de-orbit. The algorithm uses a combination of stereo vision and 3D model matching, applying the ICP (Iterative Closest Point) algorithm, and uses time series of images to increase the reliability of relative attitude and position estimates. To evaluate the algorithm, a visual simulator is prepared to simulate the on-orbit optical environment in terrestrial experiments, and the motion of a miniature satellite model is estimated using images obtained from this simulator.
Tracking and Multi-View Image Processing for Robots
icon_mobile_dropdown
Two and three view geometry based on noisy data: an experimental evaluation
It is well known that, based on known multi view geometry, and given a single point in one image, its corresponding point in a second image can be determined up to a one dimensional ambiguity; and that, given a pair of corresponding points in two images, their corresponding point in the third image can be uniquely determined. These relationships have been widely used in computer vision community for the applications such as correspondences, stereo, motion analysis, etc. However, in the real world, images are noisy. How to apply accurate mathematical relationships of multi view geometry to noisy data and the various numerical algorithms available for doing so stably and accurately is an active topic of research. In this paper, some major methods currently available for the computation of two and three view geometries for both calibrated and un-calibrated cameras are analysed, a novel method of calculating the trifocal tensor for the calibrated camera is deduced, and a quantitative evaluation of the influences of the noise at different levels, corresponding to different methods of computing two and three view geometries, is performed through the experiments on synthetic data. Based on the experiment results, several novel algorithms are introduced which improve the performance of searching for correspondences in real images across two or three views.
Foreground object segmentation from binocular stereo video
Kevin Law, Stan Sclaroff
Moving cameras are needed for a wide range of applications in robotics, vehicle systems, surveillance, etc. However, many foreground object segmentation methods reported in the literature are unsuitable for such settings; these methods assume that the camera is fixed and the background changes slowly, and are inadequate for segmenting objects in video if there is significant motion of the camera or background. To address this shortcoming, a new method for segmenting foreground objects is proposed that utilizes binocular video. The method is demonstrated in the application of tracking and segmenting people in video who are approximately facing the binocular camera rig. Given a stereo image pair, the system first tries to find faces. Starting at each face, the region containing the person is grown by merging regions from an over-segmented color image. The disparity map is used to guide this merging process. The system has been implemented on a consumer-grade PC, and tested on video sequences of people indoors obtained from a moving camera rig. As can be expected, the proposed method works well in situations where other foreground-background segmentation methods typically fail. We believe that this superior performance is partly due to the use of object detection to guide region merging in disparity/color foreground segmentation, and partly due to the use of disparity information available with a binocular rig, in contrast with most previous methods that assumed monocular sequences.
Distributed biologically-based real-time tracking in the absence of prior target information
T. Nathan Mundhenk, Jacob Everist, Chris Landauer, et al.
We are developing a distributed system for the tracking of people and objects in complex scenes and environments using biologically based algorithms. An important component of such a system is its ability to track targets from multiple cameras at multiple viewpoints. As such, our system must be able to extract and analyze the features of targets in a manner that is sufficiently invariant of viewpoints, so that they can share information about targets, for purposes such as tracking. Since biological organisms are able to describe targets to one another from very different visual perspectives, by discovering the mechanisms by which they understand objects, it is hoped such abilities can be imparted on a system of distributed agents with many camera viewpoints. Our current methodology draws from work on saliency and center surround competition among visual components that allows for real time location of targets without the need for prior information about the targets visual features. For instance, gestalt principles of color opponencies, continuity and motion form a basis to locate targets in a logical manner. From this, targets can be located and tracked relatively reliably for short periods. Features can then be extracted from salient targets allowing for a signature to be stored which describes the basic visual features of a target. This signature can then be used to share target information with other cameras, at other viewpoints, or may be used to create the prior information needed for other types of trackers. Here we discuss such a system, which, without the need for prior target feature information, extracts salient features from a scene, binds them and uses the bound features as a set for understanding trackable objects.
Intersection-point determination in Virtual Keeper using machine vision
Janne Koljonen, Olli Kanniainen, Jarmo T. Alander
Virtual Keeper is a goalkeeper simulator. It estimates the trajectory of a ball thrown by a player using machine vision and animates a goalkeeper with a video projector. The first version of Virtual Keeper used only one camera. In this paper, a new version that uses two gray-scale cameras for trajectory estimation is proposed. In addition, a color camera and a microphone are used to determine the intersection-point of the ball trajectory and the goal line, in order to enable feedback and online calibration of the machine vision with neural networks, which in turn allows varying external parameters of the cameras and the video projector. The color camera takes images of the goal and determines the positions of the goalposts with pattern matching. After the gray-scale cameras have observed the ball and estimated its trajectory, the sound processing block is triggered. When the ball hits the screen, the noise pattern is recognized with a neural network, whose input consists of temporal and spectral features. The sound processing block in turn triggers the color camera image processing block. The color of the ball differs from the colors of the background and goalkeeper to make the segmentation problem easier. The ball is recognized with fuzzy color based segmentation and fuzzy pattern matching.
Active shape model-based real-time tracking of deformable objects
Sangjin Kim, Daehee Kim, Jeongho Shin, et al.
Tracking non-rigid objects such as people in video sequences is a daunting task due to computational complexity and unpredictable environment. The analysis and interpretation of video sequence containing moving, deformable objects have been an active research areas including video tracking, computer vision, and pattern recognition. In this paper we propose a robust, model-based, real-time system to cope with background clutter and occlusion. The proposed algorithm consists of following four steps: (i) localization of an object-of-interest by analyzing four directional motions, (ii) region tracker for tracking moving region detected by the motion detector, (iii) update of training sets using the Smart Snake Algorithm (SSA) without preprocessing, (iv) active shape model-based tracking in region information. The major contribution this work lies in the integration for a completed system, which covers from image processing to tracking algorithms. The approach of combining multiple algorithms succeeds in overcoming fundamental limitations of tracking and at the same time realizes real time implementation. Experimental results show that the proposed algorithm can track people under various environment in real-time. The proposed system has potential uses in the area of surveillance, sape analysis, and model-based coding, to name of few.
Person identification for mobile robot using audio-visual modality
Young-Ouk Kim, Sehoon Chin, Jihoon Lee, et al.
Recently, we experienced significant advancement in intelligent service robots. The remarkable features of an intelligent robot include tracking and identification of person using biometric features. The human-robot interaction is very important because it is one of the final goals of an intelligent service robot. Many researches are concentrating in two fields. One is self navigation of a mobile robot and the other is human-robot interaction in natural environment. In this paper we will present an effective person identification method for HRI (Human Robot Interaction) using two different types of expert systems. However, most of mobile robots run under uncontrolled and complicated environment. It means that face and speech information can't be guaranteed under varying conditions, such as lighting, noisy sound, orientation of a robot. According to a value of illumination and signal to noise ratio around mobile a robot, our proposed fuzzy rule make a reasonable person identification result. Two embedded HMM (Hidden Marhov Model) are used for each visual and audio modality to identify person. The performance of our proposed system and experimental results are compared with single modality identification and simply mixed method of two modality.
Mobile Robot Competitions and Autonomous Navigation
icon_mobile_dropdown
The 13th Annual Intelligent Ground Vehicle Competition: intelligent ground vehicles created by intelligent teams
The Intelligent Ground Vehicle Competition (IGVC) is one of three, unmanned systems, student competitions that were founded by the Association for Unmanned Vehicle Systems International (AUVSI) in the 1990s. The IGVC is a multidisciplinary exercise in product realization that challenges college engineering student teams to integrate advanced control theory, machine vision, vehicular electronics, and mobile platform fundamentals to design and build an unmanned system. Teams from around the world focus on developing a suite of dual-use technologies to equip ground vehicles of the future with intelligent driving capabilities. Over the past 13 years, the competition has challenged undergraduate, graduate and Ph.D. students with real world applications in intelligent transportation systems, the military and manufacturing automation. To date, teams from over 50 universities and colleges have participated. This paper describes some of the applications of the technologies required by this competition and discusses the educational benefits. The primary goal of the IGVC is to advance engineering education in intelligent vehicles and related technologies. The employment and professional networking opportunities created for students and industrial sponsors through a series of technical events over the three-day competition are highlighted. Finally, an assessment of the competition based on participant feedback is presented.
Software design for an autonomous ground vehicle for the 13th Annual Intelligent Ground Vehicle Competition
Tim Roberts, Daniel Barber, Brian C. Becker, et al.
This paper presents the vision and path planning software design of a totally autonomous vehicle built to compete in the 13th Intelligent Ground Vehicle Competition, IGVC. The vehicle, Calculon, is based on a powered wheelchair and uses a variety of sensors for its navigation and obstacle avoidance including a 3CCD Sony color camera, an outdoor laser range finder, a DGPS, 2 wheel encoders and a solid state compass. The modules forming the core vision system include: filters, color classifier, image segments, and line finder. Our color classifier is based on a modified implementation of an adaptive Gaussian color model similar to those used in some skin detection algorithms. A combination of various image enhancing filters and the color classifier allow for the isolation of possible obstacles within the image. After filtering the image for areas of high brightness and contrast, the line finder performs a Hough Transform to find lines in the image. Our path planning is accomplished using a variety of known and custom algorithms in combination including a modified road map method, a Rapidly Exploring Trees method and a Gaussian Potential Field's method. This paper will present the software design and methods of our autonomous vehicle focusing mainly on the 2 most difficult components, the vision and path planning.
Autonomous Navigation, Obstacle Avoidance, and Path Planning
icon_mobile_dropdown
A simple approach to a vision-guided unmanned vehicle
Christopher Archibald, Evan Millar, Jon D. Anderson, et al.
This paper describes the design and implementation of a vision-guided autonomous vehicle that represented BYU in the 2005 Intelligent Ground Vehicle Competition (IGVC), in which autonomous vehicles navigate a course marked with white lines while avoiding obstacles consisting of orange construction barrels, white buckets and potholes. Our project began in the context of a senior capstone course in which multi-disciplinary teams of five students were responsible for the design, construction, and programming of their own robots. Each team received a computer motherboard, a camera, and a small budget for the purchase of additional hardware, including a chassis and motors. The resource constraints resulted in a simple vision-based design that processes the sequence of images from the single camera to determine motor controls. Color segmentation separates white and orange from each image, and then the segmented image is examined using a 10x10 grid system, effectively creating a low resolution picture for each of the two colors. Depending on its position, each filled grid square influences the selection of an appropriate turn magnitude. Motor commands determined from the white and orange images are then combined to yield the final motion command for video frame. We describe the complete algorithm and the robot hardware and we present results that show the overall effectiveness of our control approach.
The XH-map algorithm: A method to process stereo video to produce a real-time obstacle map
This paper presents a novel, simple and fast algorithm to produce a "floor plan" obstacle map in real time using video. The XH-map algorithm is a transformation of stereo vision data in disparity map space into a two dimensional obstacle map space using a method that can be likened to a histogram reduction of image information. The classic floor-ground background noise problem is addressed with a simple one-time semi-automatic calibration method incorporated into the algorithm. This implementation of this algorithm utilizes the Intel Performance Primitives library and OpenCV libraries for extremely fast and efficient execution, creating a scaled obstacle map from a 480x640x256 stereo pair in 1.4 milliseconds. This algorithm has many applications in robotics and computer vision including enabling an "Intelligent Robot" robot to "see" for path planning and obstacle avoidance.
Intelligent mobile robot system for path planning using stereo camera-based geometry information
Jung-Hwan Ko, Dong-Choon Hwang, Yong-Woo Jung, et al.
In this paper, a new real-time and intelligent mobile robot system for path planning and navigation using stereo camera embedded on the pan/tilt system is proposed. In the proposed system, face area of a moving person is detected from a sequence of the stereo image pairs by using the YCbCr color model and using the disparity map obtained from the left and right images captured by the pan/tilt-controlled stereo camera system and depth information can be detected. And then, the distance between the mobile robot system and the face of the moving person can be calculated from the detected depth information. Accordingly, based-on the analysis of these data, three-dimensional objects can be detected. Finally, by using these detected data, 2-D spatial map for a visually guided robot that can plan paths, navigate surrounding objects and explore an indoor environment is constructed. From some experiments on target tracking with 480 frames of the sequential stereo images, it is analyzed that error ratio between the calculated and measured values of the relative position is found to be very low value of 1.4 % on average. Also, the proposed target tracking system has achieved a high speed of 0.04 sec/frame for target detection and 0.06 sec/frame for target tracking.
Design and simulation of a motion controller for a wheeled mobile-robot autonomous navigation
This paper describes the development of PD, PID Computed-Torque (CT), and a PD digital motion controller for the autonomous navigation of a Wheeled Mobile Robot (WMR) in outdoor environments. The controllers select the suitable control torques, so that the WMR follows the desired path produced from a navigation algorithm described in a previous paper. PD CT, PID CT, and PD digital controllers were developed using a linear system design procedure to select the feedback control signal that stabilizes the tracking error equation. The torques needed for the motors were computed by using the inverse of the dynamic equation for the WMR. Simulation software was developed to simulate the performance and efficiency of the controllers. Simulation results verified the effectiveness of the controllers under different motion trajectories, comparing the performance of the three controllers shows that the PD digital controller was the best where the tracking error did not exceed .05 using 20 msec sample period. The significance of this work lies in the development of CT and digital controllers for WMR navigation, instead of robot manipulators. These CT controllers will facilitate the use of WMRs in many applications including defense, industrial, personal, and medical robots.
Stereo vision-based autonomous mobile robot
In this paper, we propose a method to give more autonomy to a mobile robot by providing vision sensors. The proposed autonomous mobile robot consists of vision, decision, and moving systems. The vision system is based on the stereo technology, which needs correspondence between a set of identical points in the left and the right images. Although mean square difference (MSD) is generally used for the measure of correspondence, it is prone to various types of error caused by: shades, color change, and repetitive texture, to name a few. To correct this error, the fourdirection method, which incorporates surrounding information into correspondence measuring, can be used to improve the accuracy of correspondence. Edge of object is first extracted from the Laplacian of Gaussian (LoG) filtered image and post-treatment is performed to eliminate remaining high-frequency noise. During the process to minimize the change of edge, an adaptive threshold value is applied. The extracted edge image is then segmented based on histogram, and it precisely scans candidate blocks for accurate extraction of object. Even if the mobile robot is guaranteed to move autonomously, it has to sublate meaningless movement. To this end, the target is set up for the robot to be able to move toward the designated target and the robot is made to perceive the target by using structural information. The decision system utilizes three-dimensional (3D) distance information extracted from stereo vision and enables dynamic movement to look for a target. Experimental results show average error of 1.25% in the distance estimation, 97% recognition rate of target objects, and 2.3% collision rate with obstacles.
Trapezium model-based crop field structure recognition for guidance system of off-road vehicle
The trapezium models are designed for matching with the intensity outlines to locate the crop rows. Tow kinds of model were designed, single trapezium and double trapezium model. The former was applied to single grass row, while the later was applied to double maize rows. The intensity outlines were extracted by summing the intensities in each column. To locate the crop row quickly, a fast position algorithm was designed that a predigested trapezium model was constructed first according to the distribution of gray level, and then detail model located the row position accurately. The location of maximum correlation coefficients between the model and real intensity data were thought as the position of crop row. The mean correlation coefficient of single trapezium model at the location of row is 0.91, and that of double model is 0.7. This approach has been experimented on field of ZJU in real time and it is proved work robust.
Color Processing, Calibration, and Imaging
icon_mobile_dropdown
4D-RGB diffractive-optical correlator in the human eye: the hardware for a hierarchy of spectral space-time transformations in color vision
3D space and time in optics and in human vision are linked together in spectral diffractive-optical transformations of the visible world. A 4D-RGB correlator hardware - integrated in an optical imaging system like the human eye - processes a hierarchy of relativistic equilibrium states and a sequence of double-cone transformations. The full chain of light-like events ends in von Laue interference maxima in reciprocal space, where 4D-RGB signals are miniaturized down to the level of individual photoreceptors. The diffractive-optical correlator relates local information to global data in the visual field and illustrates the potential of future development of cameras towards more intelligent 4D optical sensors.
Edge and corner detection in grayscale and color images using first Fourier basis
M. C. Prakash, V. Jagannadan, R. Raghunath Sharma, et al.
In this paper we present an algorithm for edge and corner detection in the greyscale and colour images using Fourier Amplitude Measure. We use the magnitude of the first Fourier basis to detect changes in the image intensity profile. We show the mathematical equivalence of the first Fourier co-efficient to that of the first order partial derivatives. Thus we interpret these coefficients as an estimate of the derivative measure and call them as Fourier Derivative Estimate. We then modify the first order derivative based method used for edge and corner detection using the Fourier Derivative Estimates (FDE) to produce robust edges and corner points. We compare the repeatability rate between FDE method and first order derivative methods in grayscale images. We then show examples where in, the generalization of this method to color images is found to be stable under illumination, blur, color contrast and affine variations, suggesting the utility of this method for image registration and mosaicing purposes.
Image Processing and Applications for Robots and Computer Vision
icon_mobile_dropdown
The spectrum enhancement algorithm for feature extraction and pattern recognition
The "enhanced spectrum" of an image g[.] is a function h[.] of wavenumber u obtained as follows. A reflection operation Q[.] is applied to g[.]; the power spectral density I G[u]2 of Q[g[.]] is converted to the Log scale and averaged over a suitable arc; the function s[.] of u alone is thus obtained, from which a known function, the "model" m[u], is subtracted: this yields h[u]. Models m(p)[.] used herewith have a roll-off like -1OLog10[uP]. As a consequence spectrum enhancement is a non-linear image filter which is shown to include partial spatial differentiation of Q[g[.]] of suitable order. The function h[.] emphasizes deviations of s[.] from the prescribed behaviour m(p)[.]. The enhanced spectrum is used herewith as the morphological descriptor of the image after polynomial interpolation. Multivariate statistical analysis of enhanced spectra by means of principal components analysis is applied with the objective of maximizing discrimination between classes of images. Recent applications to materials science, cell biology and environmental monitoring are reviewed.
Scene understanding and intelligent tactical behavior of mobile robots with perception system based on network-symbolic models
Intelligent tactical behaviors of robots and UGVs cannot be achieved without a perception system that is similar to human vision. The traditional linear bottom-up "segmentation-grouping-learning-recognition" approach to image processing and analysis cannot provide a reliable separation of an object from its background or clutter, while human vision unambiguously solves this problem. The nature of informational processes in the visual system does not allow separating from the informational processes in the top-level knowledge system. An Image/Video Analysis that is based on Network-Symbolic approach is a combination of recursive hierarchical bottom-up and top-down processes. Instead of precise computations of 3-dimensional models a Network-Symbolic system converts image information into an "understandable" Network-Symbolic format that is similar to the relational knowledge models. The logic of visual scenes can be captured in the Network-Symbolic models and used for the reliable disambiguation of visual information, including object detection and identification. View-based object recognition is a hard problem for traditional algorithms that directly match a primary view of an object to a model. In Network-Symbolic Models, the derived structure and not the primary view is a subject for recognition. Such recognition is not affected by local changes and appearances of the object from a set of similar views, and a robot can better interpret images and video for intelligent tactical behavior.
Toward accurate and automatic morphing
Image morphing has proved to be a powerful tool for generating compelling and pleasing visual effects and has been widely used in entertainment industry. However, traditional image morphing methods suffer from a number of drawbacks: feature specification between images is tedious and the reliance on 2D information ignores the possible advantages to be gained from 3D knowledge. In this paper, we utilize recent advantages of computer vision technologies to diminish these drawbacks. By analyzing multi view geometry theories, we propose a processing pipeline based on three reference images. We first seek a few seed correspondences using robust methods and then recover multi view geometries using the seeds, through bundle adjustment. Guided by the recovered two and three view geometries, a novel line matching algorithm across three views is then deduced, through edge growth, line fitting and two and three view geometry constraints. Corresponding lines on a novel image is then obtained by an image transfer method and finally matched lines are fed into the traditional morphing methods and novel images are generated. Novel images generated by this pipeline have advantages over traditional morphing methods: they have an inherent 3D foundation and are therefore physically close to real scenes; not only images located between the baseline connecting two reference image centers, but also extrapolated images away from the baseline are possible; and the whole processing can be either wholly automatic, or at least the tedious task of feature specification in traditional morphing methods can be greatly relieved.
Detection and localization of unexploded ordnance using small unmanned helicopters
This paper presents the development and testing of a sensor payload for use on unmanned helicopters. This payload is designed to be used to detect and geo-position unexploded ordnance termed UXOs. This technology will be beneficial to explosive ordnance disposal personnel in their test range clearance operations. This payload is capable of gathering image, attitude, and position information during flight, and a suite of software programs was developed capable of modeling, classifying, and geo-positioning UXOs.
Development of a written music-recognition system using Java and open source technologies
Gernot Loibner, Andreas Schwarzl, Matthias Kovač, et al.
We report on the development of a software system to recognize and interpret printed music. The overall goal is to scan printed music sheets, analyze and recognize the notes, timing, and written text, and derive the all necessary information to use the computers MIDI sound system to play the music. This function is primarily useful for musicians who want to digitize printed music for editing purposes. There exist a number of commercial systems that offer such a functionality. However, on testing these systems, we were astonished on how weak they behave in their pattern recognition parts. Although we submitted very clear and rather flawless scanning input, none of these systems was able to e.g. recognize all notes, staff lines, and systems. They all require a high degree of interaction, post-processing, and editing to get a decent digital version of the hard copy material. In this paper we focus on the pattern recognition area. In a first approach we tested more or less standard methods of adaptive thresholding, blob detection, line detection, and corner detection to find the notes, staff lines, and candidate objects subject to OCR. Many of the objects on this type of material can be learned in a training phase. None of the commercial systems we saw offers the option to train special characters or unusual signatures. A second goal in this project is to use a modern software engineering platform. We were interested in how well Java and open source technologies are suitable for pattern recognition and machine vision. The scanning of music served as a case-study.
High performance file compression algorithm for video-on-demand e-learning system
Yoshihiko Nomura, Ryutaro Matsuda, Ryota Sakamoto, et al.
Information processing and communication technology are progressing quickly, and are prevailing throughout various technological fields. Therefore, the development of such technology should respond to the needs for improvement of quality in the e-learning education system. The authors propose a new video-image compression processing system that ingeniously employs the features of the lecturing scene: recognizing the a lecturer and a lecture stick by pattern recognition techniques, the video-image compression processing system deletes the figure of a lecturer of low importance and displays only the end point of a lecture stick. It enables us to create the highly compressed lecture video files, which are suitable for the Internet distribution. We compare this technique with the other simple methods such as the lower frame-rate video files, and the ordinary MPEG files. The experimental result shows that the proposed compression processing system is much more effective than the others.
New hierarchical SVM classifier for multi-class target recognition
Yu-Chiang Wang, David Casasent
We propose a binary hierarchical classifier to solve the multi-class classification problem with aspect variations in objects and with rejection of non-object false targets. The hierarchical architecture design is automated using our new k-means SVRM (support vector representation machine) clustering algorithm. At each node in the hierarchy, we use a new SVRDM (support vector representation and discrimination machine) classifier, which has good generalization and offers good rejection ability. We also provide a theoretical basis for our choice of kernel function (K), and our method of parameter selection (for σ and p). Using this hierarchical SVRDM classifier with magnitude Fourier transform features, experimental results on both simulated and real infra-red (IR) databases are excellent.
Applications of Autonomous Robots
icon_mobile_dropdown
Perception system with scene understanding capabilities upon network-symbolic models for intelligent tactical behavior of mobile robots in real-world environments
Tactical behavior of UGVs, which is needed for successful autonomous off-road driving, can be in many cases achieved by covering most possible driving situations with a set of rules and switching into a "drive-me-away" semi-autonomous mode when no such rule exists. However, the unpredictable and rapidly changing nature of combat situations requires more intelligent tactical behavior that must be based on predictive situation awareness with ongoing scene understanding and fast autonomous decision making. The implementation of image understanding and active vision is possible in the form of biologically inspired Network-Symbolic models, which combine the power of Computational Intelligence with graph and diagrammatic representation of knowledge. A Network-Symbolic system converts image information into an "understandable" Network-Symbolic format, which is similar to relational knowledge models. The traditional linear bottom-up "segmentation-grouping-learning-recognition" approach cannot provide a reliable separation of an object from its background/clutter, while human vision unambiguously solves this problem. An Image/Video Analysis that is based on Network-Symbolic approach is a combination of recursive hierarchical bottom-up and top-down processes. Logic of visual scenes can be captured in the Network-Symbolic models and used for the reliable disambiguation of visual information, including object detection and identification. Such a system can better interpret images/video for situation awareness, target recognition, navigation and actions and seamlessly integrates into 4D/RCS architecture.
Analysis of gaits for a rotating tripedal robot
Damian M. Lyons, Kiran Pamnany
A goal of robotics has been to develop mechanisms that have the efficiency and speed of wheeled robots with the terrain flexibility of legged robots. In previous work, we have proposed a unique three-legged mechanism, the rotopod, designed to integrate these two useful approaches to locomotion. In this paper, we present an analysis of the variety of gaits that can be exhibited by the rotopod. We present examples of many of these gaits and discuss their potential use. A trajectory generation algorithm is presented that can be used to generate point to point trajectories using one of three different styles of gait.
Energy-driven design choices for skid-steering robotic vehicles
Aakash Sinha, Jyoti Vashishtha
In the area research on unmanned robotic vehicles, there is a need for systematic analysis of locomotion and energy dynamics, which would enable an efficient design of the vehicle. This research builds upon the earlier research by the authors and develops techniques to derive efficient design parameters for skid steering vehicle in order to achieve optimal performance by minimizing the energy losses/consumption. Two major constituent components of energy losses/consumption of the vehicle are - losses in skid steer turning, and losses in rolling. Our focus is on skid steering, we present a detailed analysis of skid steering for different turning modes; elastic mode steering, half-slip steering, skid turns, low radius turns, and zero radius turns. Each of the energy loss components is modeled from physics in terms of the design variables. The effect of design variables on the total energy losses/consumption is then studied using simulated data for different types of surfaces i.e. hard surfaces and muddy surfaces. Finally, we make suggestions about efficient vehicle design choices in terms of the design variables.
FPGA implementation of vision algorithms for small autonomous robots
J. D. Anderson, D. J. Lee, J. K. Archibald
The use of on-board vision with small autonomous robots has been made possible by the advances in the field of Field Programmable Gate Array (FPGA) technology. By connecting a CMOS camera to an FPGA board, on-board vision has been used to reduce the computation time inherent in vision algorithms. The FPGA board allows the user to create custom hardware in a faster, safer, and more easily verifiable manner that decreases the computation time and allows the vision to be done in real-time. Real-time vision tasks for small autonomous robots include object tracking, obstacle detection and avoidance, and path planning. Competitions were created to demonstrate that our algorithms work with our small autonomous vehicles in dealing with these problems. These competitions include Mouse-Trapped-in-a-Box, where the robot has to detect the edges of a box that it is trapped in and move towards them without touching them; Obstacle Avoidance, where an obstacle is placed at any arbitrary point in front of the robot and the robot has to navigate itself around the obstacle; Canyon Following, where the robot has to move to the center of a canyon and follow the canyon walls trying to stay in the center; the Grand Challenge, where the robot had to navigate a hallway and return to its original position in a given amount of time; and Stereo Vision, where a separate robot had to catch tennis balls launched from an air powered cannon. Teams competed on each of these competitions that were designed for a graduate-level robotic vision class, and each team had to develop their own algorithm and hardware components. This paper discusses one team's approach to each of these problems.
World map based on RFID tags for indoor mobile robots
A new navigation method is described for an indoor mobile robot. The robot system is composed of a Radio Frequency Identification (RFID) tag sensor and a commercial three-wheel mobile platform with ultrasonic rangefinders. The RFID tags are used as landmarks for navigation and the topological relation map which shows the connection of scattered tags through the environment is used as course instructions to a goal. The robot automatically follows paths using the ultrasonic rangefinders until a tag is found and then refers the next movement to the topological map for a decision. Our proposed technique would be useful for real-world robotic applications such as intelligent navigation for motorized wheelchairs.
Positioning approach of a spherical rolling robot
Liangqing Wang, Hanxu Sun, Qingxuan Jia, et al.
In comparison to other mobile robots, spherical rolling robots offer greater mobility, stability, and scope for operation in hazardous environments. Spherical rolling robots have been attracting much attention in not only mechanical but also control literature recently, due to both their relevance to practical applications, and to the difficulties in the analysis and control of these robots. The positioning of a spherical rolling robot at an arbitrary pose and at any time is one of the fundamental and difficult problems in the research of spherical rolling robots. Because spherical rolling robot touches the floor at a point, the positioning is difficult, especially when it moves at a high speed. Up to now, this problem has not been solved perfectly. In this paper, we present an efficient positioning approach for a spherical rolling robot. Based on the approach, a moving spherical rolling robot can be positioned at an arbitrary pose and at any time.
Kinematics analysis on a spherical robot
Spherical robot, rolling by altering its barycenter with the inside actuating device, has a spherical or spheroid housing, the motivity of which is supplied by the friction force between the housing and the ground while it rolling. Particular attention is paid to the research of spherical robot in recent years. This paper presents a new omnidirectional bi-driver spherical robot droved by two motors that directly drive the balancer to rotate about two orthogonal axes. The spherical robot is a nonholonomic system with 3 DOF while it rolls on the ground, so the spherical presented in this paper is a nonholonomic under-actuated system, featuring omnidirectional movement, simple configuration, and so on.