Proceedings Volume 7252

Intelligent Robots and Computer Vision XXVI: Algorithms and Techniques

David P. Casasent, Ernest L. Hall, Juha Röning
cover
Proceedings Volume 7252

Intelligent Robots and Computer Vision XXVI: Algorithms and Techniques

David P. Casasent, Ernest L. Hall, Juha Röning
View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 19 January 2009
Contents: 10 Sessions, 29 Papers, 0 Presentations
Conference: IS&T/SPIE Electronic Imaging 2009
Volume Number: 7252

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 7252
  • Computer Vision and Robotics Invited Session
  • Face Detection and Tracking
  • New Techniques for 3D and Shape Information
  • Pedestrian Detection and Tracking
  • Intelligent Ground Vehicle Competition
  • Mobile and Cognitive Robotics
  • Scene Analysis, Localization, and Computer Vision I
  • Scene Analysis, Localization, and Computer Vision II
  • Interactive Paper Session
Front Matter: Volume 7252
icon_mobile_dropdown
Front Matter: Volume 7252
This PDF file contains the front matter associated with SPIE-IS&T Proceedings Volume 7252, including the Title Page, Copyright information, Table of Contents, and Conference Committee listing.
Computer Vision and Robotics Invited Session
icon_mobile_dropdown
Fast FFT-based distortion-invariant kernel filters for general object recognition
Rohit Patnaik, David Casasent
General object recognition involves recognizing an object in a scene in the presence of several distortions and when its location is not known. Since the location of the test object in the scene is unknown, a classifier needs to be applied for different locations of the object over the test input. In this scenario, distortion-invariant filters (DIFs) are attractive, since they can be applied (efficiently and fast) for different shifts using the fast Fourier transform (FFT). A single DIF handles different object distortions (e.g. all aspect views and some range of scale and depression angle). In this paper, we show a new approach that combines DIFs and the kernel technique (to form "kernel DIFs"), addresses the need for fast on-line filter shifts, and improves performance. We consider polynomial and Gaussian kernels (polynomial results are emphasized here). We consider kernel versions of the synthetic discriminant function (SDF) filter and DIFs that minimize an energy function such as the minimum average correlation energy (MACE) filter. We provide insight into and compare several different formulations of kernel DIFs. We emphasize proper formulations of kernel DIFs and provide data in many cases to show that they perform better. We recall that kernel SDF filters are the most computationally efficient ones and thus emphasize them. We use the performance of the minimum noise and correlation energy (MINACE) filter as the baseline to which we compare kernel SDF filter results. We consider the classification of two true-class objects and the rejection of unseen clutter and unseen confuser-class objects with full 360° aspect view distortions and with a range of scale distortions present (shifts of all test images are addressed for the first time, for kernel DIFs); we use CAD (computer-aided design) infrared (IR) data to synthesize objects with the necessary distortions and we use only problematic (blob) real IR clutter data.
Architectures for intelligent robots in the age of exploitation
History shows that problems that cause human confusion often lead to inventions to solve the problems, which then leads to exploitation of the invention, creating a confusion-invention-exploitation cycle. Robotics, which started as a new type of universal machine implemented with a computer controlled mechanism in the 1960's, has progressed from an Age of Over-expectation, a Time of Nightmare, an Age of Realism, and is now entering the Age of Exploitation. The purpose of this paper is to propose architecture for the modern intelligent robot in which sensors permit adaptation to changes in the environment are combined with a "creative controller" that permits adaptive critic, neural network learning, and a dynamic database that permits task selection and criteria adjustment. This ideal model may be compared to various controllers that have been implemented using Ethernet, CAN Bus and JAUS architectures and to modern, embedded, mobile computing architectures. Several prototypes and simulations are considered in view of peta-computing. The significance of this comparison is that it provides some insights that may be useful in designing future robots for various manufacturing, medical, and defense applications.
Micromanipulation platform for micro- and nanoscale applications
Risto Sipola, Tero Vallius, Marko Pudas, et al.
This paper presents a micromanipulation platform for micro- and nanoscale applications. The micromanipulation platform is a device platform that can be used for different applications that require actuation and sensing at nanometer resolution. Presently, nanoactuation devices on the market are very expensive, and often limited in applications. Our approach is to make the platform with off-the-shelf components and thus enable reasonable cost of the instrument. In this paper we present a generalized modular architecture for both the device hardware and the control software on a PC. The modular architecture enables swift changing of actuators, sensors and tools with minimal effort, thus being an ideal frame for various applications. As a test case we present an adhesion measurement by pushing a small particle on a coated surface and show how the architecture is used in this context. The test case shows several problems that occur in nanoscale devices and how the device platform overcomes these problems. The results of the test case are analyzed and the results are presented. The test case shows that the architecture is suitable for its purpose.
The 16th Annual Intelligent Ground Vehicle Competition: intelligent students creating intelligent vehicles
The Intelligent Ground Vehicle Competition (IGVC) is one of three, unmanned systems, student competitions that were founded by the Association for Unmanned Vehicle Systems International (AUVSI) in the 1990s. The IGVC is a multidisciplinary exercise in product realization that challenges college engineering student teams to integrate advanced control theory, machine vision, vehicular electronics and mobile platform fundamentals to design and build an unmanned system. Teams from around the world focus on developing a suite of dual-use technologies to equip ground vehicles of the future with intelligent driving capabilities. Over the past 16 years, the competition has challenged undergraduate, graduate and Ph.D. students with real world applications in intelligent transportation systems, the military and manufacturing automation. To date, teams from nearly 70 universities and colleges have participated. This paper describes some of the applications of the technologies required by this competition and discusses the educational benefits. The primary goal of the IGVC is to advance engineering education in intelligent vehicles and related technologies. The employment and professional networking opportunities created for students and industrial sponsors through a series of technical events over the four-day competition are highlighted. Finally, an assessment of the competition based on participation is presented.
Face Detection and Tracking
icon_mobile_dropdown
Multiview face detection using multilayer chained structure
Jung-Bae Kim, Haibing Ren, SeongDeok Lee
To capture human pictures with good quality, auto focus, exposure and white-balance on human face areas is very important. This paper presents a novel method to detect multi-view faces fast and accurately. It combines accurate 3- level all-chain structure algorithm and fast skin color algorithm. The 3-level all-chain structure algorithm has 3 levels and all the levels are linked from the top to the bottom. The level 1 rejects the non-face samples for all the views with improved real-boosting method. The level 2 proposes a specially designed cascade structure with 2 sub levels to estimate and verify the view class of face sample from coarse to fine. The level 3 is independent view verifier for each view. Between neighboring levels(or sub levels), the sample classification confidence of previous level would be passed to next level. Inner each level(or sub levels), the classification confidence of previous stage would be the first weak classifier of next stage. It is because the previous classification result contains very useful information for current situation. The fast skin color algorithm could remove the non-skin area with little computation, which makes the system work much faster. The experimental result shows that this method is very efficient and it could correctly detect the multiview human faces in real-time. It can also estimate the face view class at the same time.
Detecting low-resolution faces in video
Neil Robertson, Nils Janssen
This paper presents a method for the detection of faces (via skin regions) in images where faces may be low-resolution and no assumptions are made about fine facial features being visible. This type of data is challenging because changes in appearance of skin regions occur due to changes in both lighting and resolution. We present a non-parametric classification scheme based on a histogram similarity measure. By comparing performance of commonly-used colour-spaces we find that the YIQ colour space with 16 histogram bins (in both 1 and 2 dimensions) gives the most accurate performance over a wide range of imaging conditions for non-parametric skin classification. We demonstrate better performance of the non-parametric approach vs. colour thresholding and a Gaussian classifier. Face detection is subsequently achieved via a simple aspect-ratio and we show results from indoor and outdoor scenes.
New Techniques for 3D and Shape Information
icon_mobile_dropdown
Synthetic aperture method for GPR considering depth-changed shape of reflected waveform
Hirofumi Kotaki, Yoshihiko Nomura, Hirokazu Fujii, et al.
By using synthetic aperture methods for Ground Penetrating Radar (GPR), subsurface structural images are reconstructed from spatial and temporal two-dimensional images that are known as B-Scope images. The spatial and temporal coordinates in B-Scope images correspond to the horizontal position on the surface and the propagation time of the reflected waveforms from the buried object. The synthetic aperture methods visualize buried objects by deconvolving the B-Scope image with the transfer function of the reflected waveforms. Based on the characteristic that the transfer function continuously changes with depth, the authors proposed an algorithm for suppressing the ill effect of the change of the transfer function to enhance the reconstructed images. When applying the deconvolution of the B-Scope images and the transfer function, the B-Scope images are divided into several sectors in the depth direction based on the amount of the change of the transfer function, which is defined for respective sectors. Experimental results demonstrated the effectiveness of the proposed algorithm.
Depth-from-trajectories for uncalibrated multiview video
Paul A. Ardis, Amit Singhal, Christopher M. Brown
We propose a method for efficiently determining qualitative depth maps for multiple monoscopic videos of the same scene without explicitly solving for stereo or calibrating any of the cameras involved. By tracking a small number of feature points and determining trajectory correspondence, it is possible to determine correct temporal alignment as well as establish a similarity metric for fundamental matrices relating each trajectory. Modeling of matrix relations with a weighted digraph and performing Markov clustering results in a determination of emergent depth layers for feature points. Finally, pixels are segmented into depth layers based upon motion similarity to feature point trajectories. Initial experimental results are demonstrated on stereo benchmark and consumer data.
Geometric alignment for large point cloud pairs using sparse overlap areas
Keisuke Fujimoto, Nobutaka Kimura, Fumiko Beniyama, et al.
We present a novel approach for geometric alignment of 3D sensor data. The Iterative Closest Point (ICP) algorithm is widely used for geometric alignment of 3D models as a point-to-point matching method when an initial estimate of the relative pose is known. However, the accuracy of the correspondence between point and point is difficult when the points are sparsely distributed. In addition, the searching cost is high because the ICP algorithm requires a search of the nearest-neighbor points at every minimization. In this paper, we describe a plane-to-plane registration method. We define the distance between two planes and estimate the translation parameter by minimizing the distance between the planes. The plane-to-plane method is able to register the set of scatter points which are sparsely distributed and the density is low with low cost. We tested this method with the large scatter points of a manufacturing plant and show the effectiveness of our proposed method.
A probabilistic approach for the reconstruction of polyhedral objects using shape from shading technique
Manoj Kumar, R. Balasubramanian, Rama Bhargava, et al.
There are many objects in the real world, especially, man made objects often having a polyhedral shape. Shape from shading (SFS) is a well known and the most robust technique of Computer vision. SFS is a first order nonlinear, ill-posed problem. The main idea for solving ill-posed problems is to restrict the class of admissible solution by introducing suitable a priori knowledge. To overcome the ill-posedness in SFS techniques, Bayesian estimation of geometrical constraints are used. The Lambertian reflectance model is used in this method due to its wide applicability in SFS techniques. The priori or the constraints are represented in the form of probability distribution function, so that the Bayesian approach can be applied. The Monte Carlo method is applied for generating the sample fields from the distribution so that the model can represent our priori knowledge and constraints. The optimal estimators are also computed by using Monte Carlo method. The geometric constraints for lines and planes are used in probabilistic manner to eliminate the rank deficiency to get the unique solution. In case of incorrect line drawings, it is not always possible to reconstruct the object shape uniquely. To deal with this problem, we have processed each planar face separately. Hence, the proposed method is applicable in case of slight error in computation of vertex positions in the images of polyhedral objects. The proposed method is used on various synthetic and real images and satisfactory results are obtained.
Pedestrian Detection and Tracking
icon_mobile_dropdown
People detection in crowded scenes using active contour models
The detection of pedestrians in real-world scenes is a daunting task, especially in crowded situations. Our experience over the last years has shown that active shape models (ASM) can contribute significantly to a robust pedestrian detection system. The paper starts with an overview of shape model approaches, it then explains our approach which builds on top of Eigenshape models which have been trained using real-world data. These models are placed over candidate regions and matched to image gradients using a scoring function which integrates i) point distribution, ii) local gradient orientations iii) local image gradient strengths. A matching and shape model update process is iteratively applied in order to fit the flexible models to the local image content. The weights of the scoring function have a significant impact on the ASM performance. We analyze different settings of scoring weights for gradient magnitude, relative orientation differences, distance between model and gradient in an experiment which uses real-world data. Although for only one pedestrian model in an image computation time is low, the number of necessary processing cycles which is needed to track many people in crowded scenes can become the bottleneck in a real-time application. We describe the measures which have been taken in order to improve the speed of the ASM implementation and make it real-time capable.
Homography-based multiple-camera person-tracking
Multiple video cameras are cheaply installed overlooking an area of interest. While computerized single-camera tracking is well-developed, multiple-camera tracking is a relatively new problem. The main multi-camera problem is to give the same tracking label to all projections of a real-world target. This is called the consistent labelling problem. Khan and Shah (2003) introduced a method to use field of view lines to perform multiple-camera tracking. The method creates inter-camera meta-target associations when objects enter at the scene edges. They also said that a plane-induced homography could be used for tracking, but this method was not well described. Their homography-based system would not work if targets use only one side of a camera to enter the scene. This paper overcomes this limitation and fully describes a practical homography-based tracker. A new method to find the feet feature is introduced. The method works especially well if the camera is tilted, when using the bottom centre of the target's bounding-box would produce inaccurate results. The new method is more accurate than the bounding-box method even when the camera is not tilted. Next, a method is presented that uses a series of corresponding point pairs "dropped" by oblivious, live human targets to find a plane-induced homography. The point pairs are created by tracking the feet locations of moving targets that were associated using the field of view line method. Finally, a homography-based multiple-camera tracking algorithm is introduced. Rules governing when to create the homography are specified. The algorithm ensures that homography-based tracking only starts after a non-degenerate homography is found. The method works when not all four field of view lines are discoverable; only one line needs to be found to use the algorithm. To initialize the system, the operator must specify pairs of overlapping cameras. Aside from that, the algorithm is fully automatic and uses the natural movement of live targets for training. No calibration is required. Testing shows that the algorithm performs very well in real-world sequences. The consistent labelling problem is solved, even for targets that appear via in-scene entrances. Full occlusions are handled. Although implemented in Matlab, the multiple-camera tracking system runs at eight frames per second. A faster implementation would be suitable for real-world use at typical video frame rates.
Hybrid real-time tracking of non-rigid objects under occlusions
A mean shift algorithm has gained special attention in recent years due to its simplicity to enable real-time tracking. However, the traditional mean shift tracking algorithm can fail to track target under occlusions. In this paper we propose a novel technique which alleviates the limitation of mean shift tracking. Our algorithm employs the Kalman filter to estimate the target dynamics information. Moreover, the proposed algorithm performs the background check process to calculate the similarity which expresses how similar to target the background is. We then find the exact target position combining the motion estimation by Kalman filter and the color based estimation by the mean shift algorithm based on the similarity value. Therefore, the proposed algorithm can robustly track targets under several types of occlusion, while the mean shift and mean shift-Kalman filter algorithms fail.
Intelligent Ground Vehicle Competition
icon_mobile_dropdown
Combining a modified vector field histogram algorithm and real-time image processing for unknown environment navigation
Kumud Nepal, Adam Fine, Nabil Imam, et al.
Q is an unmanned ground vehicle designed to compete in the Autonomous and Navigation Challenges of the AUVSI Intelligent Ground Vehicle Competition (IGVC). Built on a base platform of a modified PerMobil Trax off-road wheel chair frame, and running off a Dell Inspiron D820 laptop with an Intel t7400 Core 2 Duo Processor, Q gathers information from a SICK laser range finder (LRF), video cameras, differential GPS, and digital compass to localize its behavior and map out its navigational path. This behavior is handled by intelligent closed loop speed control and robust sensor data processing algorithms. In the Autonomous challenge, data taken from two IEEE 1394 cameras and the LRF are integrated and plotted on a custom-defined occupancy grid and converted into a histogram which is analyzed for openings between obstacles. The image processing algorithm consists of a series of steps involving plane extraction, normalizing of the image histogram for an effective dynamic thresholding, texture and morphological analysis and particle filtering to allow optimum operation at varying ambient conditions. In the Navigation Challenge, a modified Vector Field Histogram (VFH) algorithm is combined with an auto-regressive path planning model for obstacle avoidance and better localization. Also, Q features the Joint Architecture for Unmanned Systems (JAUS) Level 3 compliance. All algorithms are developed and implemented using National Instruments (NI) hardware and LabVIEW software. The paper will focus on explaining the various algorithms that make up Q's intelligence and the different ways and modes of their implementation.
Development of a vision system for an intelligent ground vehicle
Robert L. Nagel, Kenneth Perry, Robert B. Stone, et al.
The development of a vision system for an autonomous ground vehicle designed and constructed for the Intelligent Ground Vehicle Competition (IGVC) is discussed. The requirements for the vision system of the autonomous vehicle are explored via functional analysis considering the flows (materials, energies and signals) into the vehicle and the changes required of each flow within the vehicle system. Functional analysis leads to a vision system based on a laser range finder (LIDAR) and a camera. Input from the vision system is processed via a ray-casting algorithm whereby the camera data and the LIDAR data are analyzed as a single array of points representing obstacle locations, which for the IGVC, consist of white lines on the horizontal plane and construction markers on the vertical plane. Functional analysis also leads to a multithreaded application where the ray-casting algorithm is a single thread of the vehicle's software, which consists of multiple threads controlling motion, providing feedback, and processing the data from the camera and LIDAR. LIDAR data is collected as distances and angles from the front of the vehicle to obstacles. Camera data is processed using an adaptive threshold algorithm to identify color changes within the collected image; the image is also corrected for camera angle distortion, adjusted to the global coordinate system, and processed using least-squares method to identify white boundary lines. Our IGVC robot, MAX, is utilized as the continuous example for all methods discussed in the paper. All testing and results provided are based on our IGVC robot, MAX, as well.
Kratos: Princeton University's entry in the 2008 Intelligent Ground Vehicle Competition
Christopher A. Baldassano, Gordon H. Franken, Jonathan R. Mayer, et al.
In this paper we present Kratos, an autonomous ground robot capable of static obstacle field navigation and lane following. A sole color stereo camera provides all environmental data. We detect obstacles by generating a 3D point cloud and then searching for nearby points of differing heights, and represent the results as a cost map of the environment. For lane detection we merge the output of a custom set of filters and iterate the RANSAC algorithm to fit parabolas to lane markings. Kratos' state estimation is built on a square root central difference Kalman filter, incorporating input from wheel odometry, a digital compass, and a GPS receiver. A 2D A* search plans the straightest optimal path between Kratos' position and a target waypoint, taking vehicle geometry into account. A novel C++ wrapper for Carnegie Mellon's IPC framework provides flexible communication between all services. Testing showed that obstacle detection and path planning were highly effective at generating safe paths through complicated obstacle fields, but that Kratos tended to brush obstacles due to the proportional law control algorithm cutting turns. In addition, the lane detection algorithm made significant errors when only a short stretch of a lane line was visible or when lighting conditions changed. Kratos ultimately earned first place in the Design category of the Intelligent Ground Vehicle Competition, and third place overall.
Mobile and Cognitive Robotics
icon_mobile_dropdown
Construction engineering robot kit: warfighter experiment
Bernard L. Theisen, Paul Richardson
The Construction Equipment Robot Kit (CERK) Warfighter Experiment was conducted March 17-28, 2008 at Fort Leonard Wood, Missouri to assess if robotic construction equipment systems are an operationally effective and suitable means to support hasty route clearance and route remediation operations. This paper will present the findings of the Soldier testing of two different common-off-the-self (COTS) robotic kits installed on two different pieces of field construction equipment through a variety of operational vignettes. These vignettes were preformed during both day and night operations, with and without all weather gear on by Soldiers that are experienced operators of construction equipment. This paper will go into detailed analysis of the communications systems used between the robots and their operator control units (OCU) and the need to improve communications of both robotic kits. The results will show that these operations are possible to accomplish while removing the Soldier from harms way.
Convoy active safety technologies war fighter experiment II
The operational ability to project and sustain forces in distant, anti-access and area denial environments poses new challenges for combatant commanders. One of the new challenges is the ability to conduct sustainment operations at operationally feasible times and places on the battlefield. Combatant commanders require a sustainment system that is agile, versatile, and survivable throughout the range of military operations and across the spectrum of conflict. A key component of conducting responsive, operationally feasible sustainment operations is the ability to conduct sustainment convoys. Sustainment convoys are critical to providing combatant commanders the right support, at the right time and place, and in the right quantities, across the full range of military operations. The ability to conduct sustainment convoys in a variety of hostile environments require force protection measures that address the enemy threat and protect the Soldier. One cost effective, technically feasible method of increasing the force protection for sustainment convoys is the use of robotic follower technology and autonomous navigation. The Convoy Active Safety Technologies (CAST) system is a driver assist, convoy autopilot technology aimed to address these issues. The CAST Warfigher Experiment II, being held at The Nevada Automotive Test Center in the fall of 2008, will continue analysis of the utility of this vehicle following technology not only in measures of system integrity and performance vs. manual driving, but also the physiological effects on the operators themselves. This paper will detail this experiment's methodology and analysis. Results will be presented at the SPIE Electronic Imaging 2009 symposium.
Locating and tracking objects by efficient comparison of real and predicted synthetic video imagery
A mobile robot moving in an environment in which there are other moving objects and active agents, some of which may represent threats and some of which may represent collaborators, needs to be able to reason about the potential future behaviors of those objects and agents. In this paper we present an approach to tracking targets with complex behavior, leveraging a 3D simulation engine to generate predicted imagery and comparing that against real imagery. We introduce an approach to compare real and simulated imagery and present results using this approach to locate and track objects with complex behaviors. In this approach, the salient points in real and imaged images are identified and an affine image transformation that maps the real scene to the synthetic scene is generated. An image difference operation is developed that ensures that the matched points in both images produce a zero difference. In this way, synchronization differences are reduced and content differences enhanced. A number of image pairs are processed and presented to illustrate the approach.
Scene Analysis, Localization, and Computer Vision I
icon_mobile_dropdown
Scene categorization with multiscale category specific visual words
Jianzhao Qin, Nelson H. C. Yung
In this paper, we propose a scene categorization method based on multi-scale category-specific visual words. The proposed method quantizes visual words in a multi-scale manner which combines the global-feature-based and local-feature- based scene categorization approaches into a uniform framework. Unlike traditional visual word creation methods which quantize visual words from the whole training images without considering their categories, we form visual words from the training images grouped in different categories then collate the visual words from different categories to form the final codebook. This category-specific strategy provides us with more discriminative visual words for scene categorization. Based on the codebook, we compile a feature vector that encodes the presence of different visual words to represent a given image. A SVM classifier with linear kernel is then employed to select the features and classify the images. The proposed method is evaluated over two scene classification datasets of 6,447 images altogether using 10-fold cross-validation. The results show that the classification accuracy has been improved significantly comparing with the methods using the traditional visual words. And the proposed method is comparable to the best results published in the previous literatures in terms of classification accuracy rate and has the advantage in terms of simplicity.
Photographic expert-like capturing by analyzing scenes with representative image set
D. Chung, S. Kim, J. Bae, et al.
In this work we propose a method to build digital still cameras that can take pictures of a given scene with the knowledge of photographic experts, professional photographers. Photographic expert' knowledge means photographic experts' camera controls, i.e. shutter speed, aperture size, and ISO value for taking pictures of a given scene. For the implementation of photographic experts' knowledge we redefine the Scene Mode of currently commercially available digital cameras. For example instead of a single Night Scene Mode in conventional digital cameras, we break it into 76 scene modes with the Night Scene Representative Image Set. The idea of the night scene representative image set is the image set which can cover all the cases of night scene with respect to camera controls. Meanwhile to appropriate picture taking of all the complex night scene cases, each one of the scene representative image set comes along with corresponding photographic experts' camera controls such as shutter speed, aperture size, and ISO value. Initially our work pairs off a given scene with one of our redefined scene modes automatically, which is the realization of photographic experts' knowledge. With the scene representative set we use likelihood analysis for the given scene to detect whether it is within the boundary of the representative set or not. If the given scene is classified within the representative set it is proceeded to calculate the similarities with comparing the correlation coefficient between the given scene and each of the representative images. Finally the camera controls for the most similar one of the representative image set is used for taking picture of the given scene, with finer tuning with respect to the degree of the similarities.
Learning the fusion of multiple video analysis detectors
X. Desurmont, F. Lavigne, J. Meessen, et al.
This paper presents a new fusion scheme for enhancing the result quality based on the combination of multiple different detectors. We present a study showing the fusion of multiple video analysis detectors like "detecting unattended luggage" in video sequences. One of the problems is the time jitter between different detectors, i.e. typically one system can trigger an event several seconds before another one. Another issue is the computation of the adequate fusion of realigned events. We propose a fusion system that overcomes these problems by being able (i) In the learning stage to match off-line the ground truth events with the result of the detectors events using a dynamic programming scheme (ii) To learn the relation between ground truth and result (iii) To fusion in real-time the events from different detectors thanks to the learning stage in order to maximize the global quality of result. We show promising results by combining outputs of different video analysis detector technologies.
Object localization using adaptive feature selection
'Fast and robust' are the most beautiful keywords in computer vision. Unfortunately they are in trade-off relationship. We present a method to have one's cake and eat it using adaptive feature selections. Our chief insight is that it compares reference patterns to query patterns, so that it selects smartly more important and useful features to find target. The probabilities of pixels in the query to belong to the target are calculated from importancy of features. Our framework has three distinct advantages: 1 - It saves computational cost dramatically to the conventional approach. This framework makes it possible to find location of an object in real-time. 2 - It can smartly select robust features of a reference pattern as adapting to a query pattern. 3- It has high flexibility on any feature. It doesn't matter which feature you may use. Lots of color space, texture, motion features and other features can fit perfectly only if the features meet histogram criteria.
A study on SSD calculation between input image and subpixel-translated template images and its applications to a subpixel image matching problem
Pattern matching between input and template images, which is carried out using Sum of Squared Differences (SSD), a similarity value, has been widely used in various computer vision applications such as stereo measurements and superresolution image syntheses. The crucial process in the pattern matching problem is estimating the translation of the input image to match both images; a technique exists for improving the accuracy of the translation estimation at the subpixel level. In addition, subpixel estimation accuracy is improved by synthetic template images that are assumed to represent subpixel translated images using linear interpolation. However, calculation cost increases because the technique necessitates additional SSD calculations for the synthetic template images. To eliminate the need for additional SSD calculations, we found that we can obtain additional SSD values for the synthetic subpixel translated images by calculating just the SSD values for the original template images: we never need additional SSD calculations. Moreover, based on this knowledge, we proposed a novel algorithm for speeding up the estimation error cancellation (EEC) method that was developed for estimating subpixel displacements in pattern matching. Experimental results demonstrated the advantages of the proposed algorithm.
Scene Analysis, Localization, and Computer Vision II
icon_mobile_dropdown
Hand-gesture extraction and recognition from the video sequence acquired by a dynamic camera using condensation algorithm
Dan Luo, Jun Ohya
To achieve environments in which humans and mobile robots co-exist, technologies for recognizing hand gestures from the video sequence acquired by a dynamic camera could be useful for human-to-robot interface systems. Most of conventional hand gesture technologies deal with only still camera images. This paper proposes a very simple and stable method for extracting hand motion trajectories based on the Human-Following Local Coordinate System (HFLC System), which is obtained from the located human face and both hands. Then, we apply Condensation Algorithm to the extracted hand trajectories so that the hand motion is recognized. We demonstrate the effectiveness of the proposed method by conducting experiments on 35 kinds of sign language based hand gestures.
N-dimension geometry used in the design of a dynamic neural-network pattern-recognition system
If the main features, or the skeleton (e.g., the corner points and the boundary lines,) of a 3D moving object can be represented by an ND analog vector, then the whole history of movement (rotation, translation, deformation, etc.) of this object can be described by an ND curve in the ND state space. Each point on the curve corresponds to a snap-shot of the 3D object at a certain time during the course of movement. We can approximate this ND curve by an ND broken line just like the linearization of a 2D curve by a 2D broken line. But the linearization of a branch of an ND curve is just to apply the ND convex operation to the two end points of this branch. Therefore remembering all the end points (or all the extreme points) in the ND curve will allow us to approximately reconstruct the ND curve, or the whole 3D object's moving history, by means of the simple mathematical operation, the ND convex operation. Based on this ND geometry principle, a very simple, yet very robust, and very accurate dynamic neural network system (a computer graphic program) is proposed for recognizing any moving object not only by its static images, but also by the special way this object moves.
Interactive Paper Session
icon_mobile_dropdown
Using spatially varying pixels exposure technique for increasing accuracy of the optical-digital pattern recognition correlator
The registration of correlation signals with high dynamic range leads to increase of recognition's accuracy and robustness. Digital photo sensors with common Bayer colour filter array can be used for this purpose. In case of quasimonochromatic illumination used in optical-digital correlator, it is possible to register correlation signals with high dynamic range. For signal's registration it can be used not only colour channel, which corresponded to the wavelength of illumination, but other colour channels too. In this work the application of the spatially varying pixels exposures technique for obtaining linear high dynamic range images of correlation signals from digital photo sensors with Bayer mosaic is presented. Bayer colour filters array is considered as an array of attenuating filters in a quasimonochromatic light. Images are reconstructed using information from all colour channels and correction coeficients that obtained at the preliminary calibration step. The registered image of the correlation signal is mapped to the linear high dynamic range image using a simple and eficient algorithm. Calibration procedure for correction coeficients obtaining is described. Quantitative estimation of optical-digital correlator's accuracy is provided. Experimental results on obtaining images of correlation signals with linear high dynamic range are presented.
Visual tracking method based on target feature dynamic extracting
Jie Su, Guisheng Yin, Yahui Liu
The main problem of now visual tracking algorithm is that the algorithm is lack of robustness, precision and speed. This paper gives a visual tracking method based on dynamic object features extracting. First extract object features according to the value of current frame image and build feature base. Then evaluate the recognition ability of every feature in feature base using fisher criteria and select high-recognition features to generate object feature set. Dynamic adjust the feature vectors of feature set according to the changes of environment object lie in. finally process visual tracking adopting particle filter method using feature vectors of feature set. Experiments have proved that this method can improve the tracking speed while assure tracking accuracy when lighting environment that moving objects lie in changes.