Proceedings Volume 8661

Image Processing: Machine Vision Applications VI

cover
Proceedings Volume 8661

Image Processing: Machine Vision Applications VI

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 12 March 2013
Contents: 9 Sessions, 36 Papers, 0 Presentations
Conference: IS&T/SPIE Electronic Imaging 2013
Volume Number: 8661

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 8661
  • Systems I
  • Applications I
  • Algorithms I
  • Systems II
  • Algorithms II
  • Applications II
  • Algorithms III: Pattern Recognition
  • Interactive Paper Session
Front Matter: Volume 8661
icon_mobile_dropdown
Front Matter: Volume 8661
This PDF file contains the front matter associated with SPIE Proceedings Volume 8661, including the Title Page, Copyright Information, Table of Contents, and the Conference Committee listing.
Systems I
icon_mobile_dropdown
A polynomial phase-shift algorithm for high precision three-dimensional profilometry
Fuqin Deng, Chang Liu, Wuifung Sze, et al.
The perspective effect is common in real optical systems using projected patterns for machine vision applications. In the past, the frequencies of these sinusoidal patterns are assumed to be uniform at different heights when reconstructing moving objects. Therefore, the error caused by a perspective projection system becomes pronounced in phase-measuring profilometry, especially for some high precision metrology applications such as measuring the surfaces of the semiconductor components at micrometer level. In this work, we investigate the perspective effect on phase-measuring profilometry when reconstructing the surfaces of moving objects. Using a polynomial to approximate the phase distribution under a perspective projection system, which we call a polynomial phase-measuring profilometry (P-PMP) model, we are able to generalize the phase-measuring profilometry model discussed in our previous work and solve the phase reconstruction problem effectively. Furthermore, we can characterize how the frequency of the projected pattern changes according to the height variations and how the phase of the projected pattern distributes in the measuring space. We also propose a polynomial phase-shift algorithm (P-PSA) to correct the phase-shift error due to perspective effect during phase reconstruction. Simulation experiments show that the proposed method can improve the reconstruction quality both visually and numerically.
High-temperature dual-band thermal imaging by means of high-speed CMOS camera system
When measuring rapid temperature change as well as measuring high temperatures (<2000 K) commercial pyrometers reach the limits of their performance very quickly. Thus a novel type of high temperature measurement system using a high-speed camera as a two-color pyrometer is introduced. In addition to the high temporal resolution, ranging between 10 μs – 100 μs, the presented system also allows the determination of the radiation temperature distribution at a very high spatial resolution. The principle of operation including various image processing algorithms and filters is explained by means of a concrete example, where the surface temperature decay of a carbon electrode heated by an electric arc is measured. The measurement results yield a temperature of a hot spot on the contact surface of 3100 K which declines to approx. 1800 K within 105 ms. The spatial distribution of surface temperatures reveal local temperature variations on the contact. These variations might result from surface irregularities, such as protrusions or micro-peaks, due to inhomogeneous evaporation. An error analysis is given, for evaluating the potential accuracy inherent in practical temperature measurements.
A state observer for using a slow camera as a sensor for fast control applications
Reinhard Gahleitner, Martin Schagerl
This contribution concerns about a problem that often arises in vision based control, when a camera is used as a sensor for fast control applications, or more precisely, when the sample rate of the control loop is higher than the frame rate of the camera. In control applications for mechanical axes, e.g. in robotics or automated production, a camera and some image processing can be used as a sensor to detect positions or angles. The sample time in these applications is typically in the range of a few milliseconds or less and this demands the use of a camera with a high frame rate up to 1000 fps. The presented solution is a special state observer that can work with a slower and therefore cheaper camera to estimate the state variables at the higher sample rate of the control loop. To simplify the image processing for the determination of positions or angles and make it more robust, some LED markers are applied to the plant. Simulation and experimental results show that the concept can be used even if the plant is unstable like the inverted pendulum.
Applications I
icon_mobile_dropdown
Multiple-level patch-based object tracking using MLBP-based integral histogram
Jirui Yuan, Karen Egiazarian
This paper presents a novel multiple-level patch-based approach for object tracking using Modified Local Binary Pattern (MLBP) histograms. The initial template is divided into overlapping rectangular patches, and each of these patches is tracked independently by finding the most similar match within a search region. Every patch votes on the possible locations of the object in the current frame, by comparing its MLBP histogram with the correspondence in the target frame. To reduce the individual tracking error of a given patch due to partial occlusions, the idea of multiple-level patch partitioning is further developed. And the similarity between template and target object is compared patch-by-patch, level-by-level. The comparison starts from the highest level and progressively feeds to the lowest level through a median operation. The proposed algorithm provides additional robustness and effectiveness in several ways. First, the spatial relationship among patches is improved by this overlapping partitioning manner. Second, by introducing MLBP operator, the tracking accuracy is significantly improved. Third, the median operation utilized in the multiple-level vote-combining process provides additional robustness with respect to outliers resulting from occluded patches and pose changes. The proposed method is evaluated using both face and pedestrian sequences, and comparison is made w.r.t. several state-of-the-art tracking algorithms. Experimental results show that the proposed method significantly outperforms in case of occlusions and pose changes. Besides, the tracking in case of scale changes additionally proves the effectiveness and efficiency of the proposed method.
Periodicity estimation of nearly regular textures based on discrepancy norm
Gernot Stübl, Peter Haslinger, Volkmar Wieser, et al.
This paper proposes a novel approach to determine the texture periodicity, the texture element size and further characteristics like the area of the basin of attraction in the case of computing the similarity of a test image patch with a reference. The presented method utilizes the properties of a novel metric, the so-called discrepancy norm. Due to the Lipschitz and the monotonicity property the discrepancy norm distinguishes itself from other metrics by well-formed and stable convergence regions. Both the periodicity and the convergence regions are closely related and have an immediate impact on the performance of a subsequent template matching and evaluation step. The general form of the proposed approach relies on the generation of discrepancy norm induced similarity maps at random positions in the image. By applying standard image processing operations like watershed and blob analysis on the similarity maps a robust estimation of the characteristic periodicity can be computed. From the general approach a tailored version for orthogonal aligned textures is derived which shows robustness to noise disturbed images and is suitable for estimation on near regular textures. In an experimental set-up the estimation performance is tested on samples of standardized image databases and is compared with state-of-theart methods. Results show that the proposed method is applicable to a wide range of nearly regular textures and estimation results keeps up with current methods. When adding a hypothesis generation/selection mechanism it even outperforms the current state-or-the-art.
Gradient feature matching for in-plane rotation invariant face sketch recognition
Ann Theja Alex, Vijayan K. Asari, Alex Mathew
Automatic recognition of face sketches is a challenging and interesting problem. An artist drawn sketch is compared against a mugshot database to identify criminals. It is a very cumbersome task to manually compare images. This necessitates a pattern recognition system to perform the comparisons. Existing methods fall into two main categories - those that allow recognition across modalities and methods that require a sketch/photo symthesis step and then copare in some modality. The methods that require synthesis require a lot of computing power since it involves high time and space complexity. Our method allows recognition across modalities. It uses the edge feature of a face sketch and face photo image to create a feature string called 'edge-string' which is a polar coordinate representation of the edge image. To generate a polar coordinate representation, we need the reference point and reference line. Using the center point of the edge image as the reference point and using a horizontal line as the reference line is the simplest solution. But, it cannot handle in-plane rotations. For this reason, we propose an approach for finding the reference line and the centroid point. The edge-strings of the face photo and face sketch are then compared using the Smith-Waterman algorithm for local string alignments. The face photo that gave the highest similarity score is the photo that matches the test face sketch input. The results on CUHK (Chinese University of Hong Kong) student dataset show the effectiveness of the proposed approach in face sketch recognition.
An iris segmentation algorithm based on edge orientation for off-angle iris recognition
Mahmut Karakaya, Del Barstow, Hector Santos-Villalobos, et al.
Iris recognition is known as one of the most accurate and reliable biometrics. However, the accuracy of iris recognition systems depends on the quality of data capture and is negatively affected by several factors such as angle, occlusion, and dilation. In this paper, we present a segmentation algorithm for off-angle iris images that uses edge detection, edge elimination, edge classification, and ellipse fitting techniques. In our approach, we first detect all candidate edges in the iris image by using the canny edge detector; this collection contains edges from the iris and pupil boundaries as well as eyelash, eyelids, iris texture etc. Edge orientation is used to eliminate the edges that cannot be part of the iris or pupil. Then, we classify the remaining edge points into two sets as pupil edges and iris edges. Finally, we randomly generate subsets of iris and pupil edge points, fit ellipses for each subset, select ellipses with similar parameters, and average to form the resultant ellipses. Based on the results from real experiments, the proposed method shows effectiveness in segmentation for off-angle iris images.
Algorithms I
icon_mobile_dropdown
Dense sampling of shape interiors for improved representation
Matching shapes accurately is an important requirement in various applications; the most notable of which is object recognition. Precisely matching shapes is a difficult task and is an active area of research in the computer vision community. Most shape matching techniques rely on the contour of the object to provide the object's shape properties. However, we show that using the contour alone cannot help in matching all kinds of shapes. Many objects are recognised because of their overall visual similarity, rather than just their contour properties. In this paper, we assert that modelling the interior properties of the shape can help in extracting this overall visual similarity. We propose a simple way to extract the shape's interior properties. This is done by densely sampling points from within the shape and using it to describe the shape's features. We show that using such an approach provides an effective way to perform matching of shapes that are visually similar to each other, but have vastly different contour properties.
Efficient defect detection with sign information of Walsh Hadamard transform
Qiang Zhang, Peter van Beek, Chang Yuan, et al.
We propose a method for defect detection based on taking the sign information of Walsh Hadamard Transform (WHT) coefficients. The core of the proposed algorithm involves only three steps that can all be implemented very efficiently: applying the forward WHT, taking the sign of the transform coefficients, and taking an inverse WHT using only the sign information. Our implementation takes only 7 milliseconds for a 512 × 512 image on a PC platform. As a result, the proposed method is more efficient than the PHase Only Transform (PHOT) method and other methods in literature. In addition, the proposed approach is capable of detecting defects of varying shapes, by combining the 2-dimensional WHT and 1-dimensional WHT; and can detect defects in images with strong object boundaries by utilizing a reference image. The proposed algorithm is robust over different background image patterns and varying illumination conditions. We evaluated the proposed method both visually and quantitatively and obtained good results on images from various defect detection applications.
Improving the performance of interest point detectors with contrast stretching functions
The initial stage of many computer vision algorithms such as object recognition and tracking is to detect interest points on an image. Some of the existing interest point detection algorithms are robust to illumination variations to a certain extent. We have recently proposed the contrast stretching technique to improve the repeatability rate of the Harris corner detector under large illumination changes5. In this paper the contrast stretching technique has been incorporated into two scale invariant interest point detectors, specifically multi-scale Harris and multi-scale Hessian detectors. We show that, with the adoption of contrast stretching technique, the performances of these detectors improve not only under illumination variations but also under variations of viewpoint, scale, blur, and compression. In addition, we discuss GPU implementation of the proposed technique.
Object detection using feature-based template matching
Pattern matching, also known as template matching, is a computationally intensive problem aimed at localizing the instances of a given template within a query image. In this work we present a fast technique for template matching, able to use histogram-based similarity measures on complex descriptors. In particular we will focus on Color Histograms (CH), Histograms of Oriented Gradients (HOG), and Bag of visual Words histograms (BOW). The image is compared with the template via histogram-matching exploiting integral histograms. In order to introduce spatial information, template and candidates are divided into sub-regions, and multiple descriptor sizes are computed. The proposed solution is compared with the Full-Search-equivalent Incremental Dissimilarity Approximations, a state of the art approach, in terms of both accuracy and execution time on different standard datasets.
Systems II
icon_mobile_dropdown
Touch sensing analysis using multi-modal acquisition system
Jeffrey S. King, Dragan Pikula, Zachi Baharav
Touch sensing is ubiquitous in many consumer electronic products. Users are expecting to be able to touch with their finger the surface of a display and interact with it. Yet, the actual mechanics and physics of the touch process are little known, as these are dependent on many independent variables. Ranging from the physics of the fingertip structure, composed of ridges, valleys, and pores, and beyond a few layers of skin and flesh the bone itself. Moreover, sweat glands and wetting are critical as well as we will see. As for the mechanics, the pressure at which one touches the screen, and the manner by which the surfaces responds to this pressure, have major impact on the touch sensing. In addition, different touch sensing methods, like capacitive or optical, will have different dependencies. For example, the color of the finger might impact the latter, whereas the former is insensitive to it. In this paper we describe a system that captures multiple modalities of the touch event, and by post-processing synchronizing all these. This enables us to look for correlation between various effects, and uncover their influence on the performance of the touch sensing algorithms. Moreover, investigating these relations allows us to improve various sensing algorithms, as well as find areas where they complement each other. We conclude by pointing to possible future extensions and applications of this system.
Structural deformation measurement via efficient tensor polynomial calibrated electro-active glass targets
This paper describes the physical setup and mathematical modelling of a device for the measurement of structural deformations over large scales, e.g., a mining shaft. Image processing techniques are used to determine the deformation by measuring the position of a target relative to a reference laser beam. A particular novelty is the incorporation of electro-active glass; the polymer dispersion liquid crystal shutters enable the simultaneous calibration of any number of consecutive measurement units without manual intervention, i.e., the process is fully automatic. It is necessary to compensate for optical distortion if high accuracy is to be achieved in a compact hardware design where lenses with short focal lengths are used. Wide-angle lenses exhibit significant distortion, which are typically characterized using Zernike polynomials. Radial distortion models assume that the lens is rotationally symmetric; such models are insufficient in the application at hand. This paper presents a new coordinate mapping procedure based on a tensor product of discrete orthogonal polynomials. Both lens distortion and the projection are compensated by a single linear transformation. Once calibrated, to acquire the measurement data, it is necessary to localize a single laser spot in the image. For this purpose, complete interpolation and rectification of the image is not required; hence, we have developed a new hierarchical approach based on a quad-tree subdivision. Cross-validation tests verify the validity, demonstrating that the proposed method accurately models both the optical distortion as well as the projection. The achievable accuracy is e ≤ ±0.01 [mm] in a field of view of 150 [mm] x 150 [mm] at a distance of the laser source of 120 [m]. Finally, a Kolmogorov Smirnov test shows that the error distribution in localizing a laser spot is Gaussian. Consequently, due to the linearity of the proposed method, this also applies for the algorithm's output. Therefore, first-order covariance propagation provides an accurate estimate of the measurement uncertainty, which is essential for any measurement device.
Machine vision system for the control of tunnel boring machines
This paper presents a machine vision system for the control of dual-shield Tunnel Boring Machines. The system consists of a camera with ultra bright LED illumination and a target system consisting of multiple retro-reflectors. The camera mounted on the gripper shield measures the relative position and orientation of the target which is mounted on the cutting shield. In this manner the position of the cutting shield relative to the gripper shield is determined. Morphological operators are used to detect the retro-reflectors in the image and a covariance optimized circle fit is used to determine the center point of each reflector. A graph matching algorithm is used to ensure a robust matching of the constellation of the observed target with the ideal target geometry.
Algorithms II
icon_mobile_dropdown
Eliminating illumination effects by discrete cosine transform (DCT) coefficients' attenuation and accentuation
Shan Du, Mohamed Shehata, Wael Badawy, et al.
In this paper, we proposed a discrete cosine transform (DCT)-based attnuation and accentuation method to remove lighting effects on face images for faciliating face recognition task under varying lighting conditions. In the proposed method, logorithm transform is first used to convert a face image into logarithm domain. Then discrete cosine transform is applied to obtain DCT coefficients. The low-frequency DCT coefficients are attenuated since illumination variations mainly concentrate on the low-frequency band. The high-frequency coefficients are accentuated since when under poor illuminations, the high-frequency features become more important in recognition. The reconstructed log image by inverse DCT of the modified coefficients is used for the final recognition. Experiments are conducted on the Yale B database, the combination of Yale B and Extended Yale B databases and the CMU-PIE database. The proposed method does not require modeling and model fitting steps. It can be directly applied to single face image, without any prior information of 3D shape or light sources.
Non-rigid ultrasound image registration using generalized relaxation labeling process
Jong-Ha Lee, Yeong Kyeong Seong, MoonHo Park, et al.
This research proposes a novel non-rigid registration method for ultrasound images. The most predominant anatomical features in medical images are tissue boundaries, which appear as edges. In ultrasound images, however, other features can be identified as well due to the specular reflections that appear as bright lines superimposed on the ideal edge location. In this work, an image’s local phase information (via the frequency domain) is used to find the ideal edge location. The generalized relaxation labeling process is then formulated to align the feature points extracted from the ideal edge location. In this work, the original relaxation labeling method was generalized by taking n compatibility coefficient values to improve non-rigid registration performance. This contextual information combined with a relaxation labeling process is used to search for a correspondence. Then the transformation is calculated by the thin plate spline (TPS) model. These two processes are iterated until the optimal correspondence and transformation are found. We have tested our proposed method and the state-of-the-art algorithms with synthetic data and bladder ultrasound images of in vivo human subjects. Experiments show that the proposed method improves registration performance significantly, as compared to other state-of-the-art non-rigid registration algorithms.
Mammogram CAD, hybrid registration and iconic analysis
A. Boucher, F. Cloppet, N. Vincent
This paper aims to develop a computer aided diagnosis (CAD) based on a two-step methodology to register and analyze pairs of temporal mammograms. The concept of "medical file", including all the previous medical information on a patient, enables joint analysis of different acquisitions taken at different times, and the detection of significant modifications. The developed registration method aims to superimpose at best the different anatomical structures of the breast. The registration is designed in order to get rid of deformation undergone by the acquisition process while preserving those due to breast changes indicative of malignancy. In order to reach this goal, a referent image is computed from control points based on anatomical features that are extracted automatically. Then the second image of the couple is realigned on the referent image, using a coarse-to-fine approach according to expert knowledge that allows both rigid and non-rigid transforms. The joint analysis detects the evolution between two images representing the same scene. In order to achieve this, it is important to know the registration error limits in order to adapt the observation scale. The approach used in this paper is based on an image sparse representation. Decomposed in regular patterns, the images are analyzed under a new angle. The evolution detection problem has many practical applications, especially in medical images. The CAD is evaluated using recall and precision of differences in mammograms.
Applications II
icon_mobile_dropdown
Neutron imaging for geothermal energy systems
Philip Bingham, Yarom Polsky, Lawrence Anovitz
Geothermal systems extract heat energy from the interior of the earth using a working fluid, typically water. Three components are required for a commercially viable geothermal system: heat, fluid, and permeability. Current commercial electricity production using geothermal energy occurs where the three main components exist naturally. These are called hydrothermal systems. In the US, there is an estimated 30 GW of base load electrical power potential for hydrothermal sites. Next generation geothermal systems, named Enhanced Geothermal Systems (EGS), have an estimated potential of 4500 GW. EGSs lack in-situ fluid, permeability or both. As such, the heat exchange system must be developed or “engineered” within the rock. The envisioned method for producing permeability in the EGS reservoir is hydraulic fracturing, which is rarely practiced in the geothermal industry, and not well understood for the rocks typically present in geothermal reservoirs. High costs associated with trial and error learning in the field have led to an effort to characterize fluid flow and fracturing mechanisms in the laboratory to better understand how to design and manage EGS reservoirs. Neutron radiography has been investigated for potential use in this characterization. An environmental chamber has been developed that is suitable for reproduction of EGS pressures and temperatures and has been tested for both flow and precipitations studies with success for air/liquid interface imaging and 3D reconstruction of precipitation within the core.
Wave front distortion based fluid flow imaging
In this paper, a transparent flow surface reconstruction based on wave front distortion is investigated. A camera lens is used to focus the image formed by the micro-lens array to the camera imaging plane. The irradiance of the captured image is transformed to frequency spectrum and then the x and y spatial components are separated. A rigid spatial translation followed by low pass filtering yields a single frequency component of the image intensity. Index of refraction is estimated from the inverse Fourier transform of the spatial frequency spectrum of the irradiance. The proposed method is evaluated with synthetic data of a randomly generated index of refraction value and used to visualize a fuel injection volumetric data.
Autonomous ship classification using synthetic and real color images
Deniz Kumlu, B. Keith Jenkins
This work classifies color images of ships attained using cameras mounted on ships and in harbors. Our data-sets contain 9 different types of ship with 18 different perspectives for our training set, development set and testing set. The training data-set contains modeled synthetic images; development and testing data-sets contain real images. The database of real images was gathered from the internet, and 3D models for synthetic images were imported from Google 3D Warehouse. A key goal in this work is to use synthetic images to increase overall classification accuracy. We present a novel approach for autonomous segmentation and feature extraction for this problem. Support vector machine is used for multi-class classification. This work reports three experimental results for multi-class ship classification problem. First experiment trains on a synthetic image data-set and tests on a real image data-set, and obtained accuracy is 87.8%. Second experiment trains on a real image data-set and tests on a separate real image data-set, and obtained accuracy is 87.8%. Last experiment trains on real + synthetic image data-sets (combined data-set) and tests on a separate real image data-set, and obtained accuracy is 93.3%.
Fast and flexible 3D object recognition solutions for machine vision applications
Ira Effenberger, Jens Kühnle, Alexander Verl
In automation and handling engineering, supplying work pieces between different stages along the production process chain is of special interest. Often the parts are stored unordered in bins or lattice boxes and hence have to be separated and ordered for feeding purposes. An alternative to complex and spacious mechanical systems such as bowl feeders or conveyor belts, which are typically adapted to the parts’ geometry, is using a robot to grip the work pieces out of a bin or from a belt. Such applications are in need of reliable and precise computer-aided object detection and localization systems. For a restricted range of parts, there exists a variety of 2D image processing algorithms that solve the recognition problem. However, these methods are often not well suited for the localization of randomly stored parts. In this paper we present a fast and flexible 3D object recognizer that localizes objects by identifying primitive features within the objects. Since technical work pieces typically consist to a substantial degree of geometric primitives such as planes, cylinders and cones, such features usually carry enough information in order to determine the position of the entire object. Our algorithms use 3D best-fitting combined with an intelligent data pre-processing step. The capability and performance of this approach is shown by applying the algorithms to real data sets of different industrial test parts in a prototypical bin picking demonstration system.
Algorithms III: Pattern Recognition
icon_mobile_dropdown
Low complexity smile detection technique for mobile devices
Valeria Tomaselli, Mirko Guarnera, Claudio Domenico Marchisio, et al.
In this paper, we propose a low complexity smile detection technique, able to detect smiles in a variety of light conditions, face positions, image resolutions. The proposed approach firstly detects the faces in the image, then applies almost cost-free mouth detection, extracts features from this region and finally classifies between smiling and nonsmiling stages. In this paper different feature extraction methods and classification techniques are analyzed from both the performance and computational complexity standpoints. The best compromise between performances and complexity is represented by a combined approach which exploits both a shape feature and a texture feature and uses the Mahalanobis distance based classifier. This solution achieves good performances with very low complexity, being suitable for an implementation on mobile devices.
Density-induced oversampling for highly imbalanced datasets
Daniel Fecker, Volker Märgner, Tim Fingscheidt
The problem of highly imbalanced datasets with only sparse data of the minority class in the context of two class classification is investigated. A novel synthetic data oversampling technique is proposed which utilizes estimations of the probability density distribution in the feature space. First, a Gaussian mixture model (GMM) from the data of the well-sampled majority class is generated and with its help a new GMM is approximated by Bayesian adaptation using the sparse minority class data. Random synthetic data is generated from the adapted GMM and an additional assignment rule assigns this data to either the minority class or else discards it. The obtained synthetic data is employed in combination with the available original data to train a support vector machine classifier. The examined application in this paper is optical on-line process monitoring of laser brazing with only rare sporadic occurring defects. Experiments with different amounts of minority class data samples and comparisons to other methods show that this approach performs very well for highly imbalanced datasets.
Coherent image layout using an adaptive visual vocabulary
Scott E. Dillard, Michael J. Henry, Shawn Bohn, et al.
When querying a huge image database containing millions of images, the result of the query may still contain many thousands of images that need to be presented to the user. We consider the problem of arranging such a large set of images into a visually coherent layout, one that places similar images next to each other. Image similarity is determined using a bag-of-features model, and the layout is constructed from a hierarchical clustering of the image set by mapping an in-order traversal of the hierarchy tree into a space-filling curve. This layout method provides strong locality guarantees so we are able to quantitatively evaluate performance using standard image retrieval benchmarks. Performance of the bag-of-features method is best when the vocabulary is learned on the image set being clustered. Because learning a large, discriminative vocabulary is a computationally demanding task, we present a novel method for efficiently adapting a generic visual vocabulary to a particular dataset. We evaluate our clustering and vocabulary adaptation methods on a variety of image datasets and show that adapting a generic vocabulary to a particular set of images improves performance on both hierarchical clustering and image retrieval tasks.
Shape recognition for capacitive touch display
In this paper we present a technique to classify five common classes of shapes acquired with a capacitive touch display: finger, ear, cheek, hand hold, half ear-half cheek. The need of algorithms able to discriminate among the aforementioned shapes comes from the growing diffusion of touch screen based consumer devices (e.g. smartphones, tablet, etc.). In this context, detection and the recognition of fingers are fundamental tasks in many touch based user applications (e.g., mobile games). Shape recognition algorithms are also extremely useful to identify accidental touches in order to avoid involuntary activation of the device functionalities (e.g., accidental calls). Our solution makes use of simple descriptors designed to capture discriminative information of the considered classes of shapes. The recognition is performed through a decision tree based approach whose parameters are learned on a set of labeled samples. Experimental results demonstrate that the proposed solution achieves good recognition accuracy.
Interactive Paper Session
icon_mobile_dropdown
An elliptic phase-shift algorithm for high speed three-dimensional profilometry
Fuqin Deng, Zhao Li, Jia Chen, et al.
A high throughput is often required in many machine vision systems, especially on the assembly line in the semiconductor industry. To develop a non-contact three-dimensional dense surface reconstruction system for real-time surface inspection and metrology applications, in this work, we project sinusoidal patterns onto the inspected objects and propose a high speed phase-shift algorithm. First, we use an illumination-reflectivity-focus (IRF) model to investigate the factors in image formation for phase-measuring profilometry. Second, by visualizing and analyzing the characteristic intensity locus projected onto the intensity space, we build a two-dimensional phase map to store the phase information for each point in the intensity space. Third, we develop an efficient elliptic phase-shift algorithm (E-PSA) for high speed surface profilometry. In this method, instead of calculating the time-consuming inverse trigonometric function, we only need to normalize the measured image intensities and then index the built two-dimensional phase map during real-time phase reconstruction. Finally, experimental results show that it is about two times faster than conventional phase-shift algorithm.
An incompressible fluid flow model with mutual information for MR image registration
Leo Tsai, Herng-Hua Chang
Image registration is one of the fundamental and essential tasks within image processing. It is a process of determining the correspondence between structures in two images, which are called the template image and the reference image, respectively. The challenge of registration is to find an optimal geometric transformation between corresponding image data. This paper develops a new MR image registration algorithm that uses a closed incompressible viscous fluid model associated with mutual information. In our approach, we treat the image pixels as the fluid elements of a viscous fluid flow governed by the nonlinear Navier-Stokes partial differential equation (PDE). We replace the pressure term with the body force mainly used to guide the transformation with a weighting coefficient, which is expressed by the mutual information between the template and reference images. To solve this modified Navier-Stokes PDE, we adopted the fast numerical techniques proposed by Seibold1. The registration process of updating the body force, the velocity and deformation fields is repeated until the mutual information weight reaches a prescribed threshold. We applied our approach to the BrainWeb and real MR images. As consistent with the theory of the proposed fluid model, we found that our method accurately transformed the template images into the reference images based on the intensity flow. Experimental results indicate that our method is of potential in a wide variety of medical image registration applications.
Improved skin detection method by iteratively eliminating pseudo-skin colors through combined skin filter
Oh-Yeol Kwon, Kyung-Ah Kim, Sung-Il Chien
Skin color detection methods often include wrongly pseudo-skin colors which are similar to the skin color and they cannot detect highlight and dark pixels which belong to the original skin color inside or around the detected skin regions. This paper proposes an improved iterative skin detection method for eliminating the pseudo-skin color through Wang’s [2] and Cheddad’s [4] skin filer characteristics. The highlight and dark pixels are detected by adjusting the saturation level adaptively.
A modified hierarchical graph cut based video segmentation approach for high frame rate video
Xuezhang Hu, Sumit Chakravarty, Qi She, et al.
Video object segmentation entails selecting and extracting objects of interest from a video sequence. Video Segmentation of Objects (VSO) is a critical task which has many applications, such as video edit, video decomposition and object recognition. The core of VSO system consists of two major problems of computer vision, namely object segmentation and object tracking. These two difficulties need to be solved in tandem in an efficient manner to handle variations in shape deformation, appearance alteration and background clutter. Along with segmentation efficiency computational expense is also a critical parameter for algorithm development. Most existing methods utilize advanced tracking algorithms such as mean shift and particle filter, applied together with object segmentation schemes like Level sets or graph methods. As video is a spatiotemporal data, it gives an extensive opportunity to focus on the regions of high spatiotemporal variation. We propose a new algorithm to concentrate on the high variations of the video data and use modified hierarchical processing to capture the spatiotemporal variation. The novelty of the research presented here is to utilize a fast object tracking algorithm conjoined with graph cut based segmentation in a hierarchical framework. This involves modifying both the object tracking algorithm and the graph cut segmentation algorithm to work in an optimized method in a local spatial region while also ensuring all relevant motion has been accounted for. Using an initial estimate of object and a hierarchical pyramid framework the proposed algorithm tracks and segments the object of interest in subsequent frames. Due to the modified hierarchal framework we can perform local processing of the video thereby enabling the proposed algorithm to target specific regions of the video where high spatiotemporal variations occur. Experiments performed with high frame rate video data shows the viability of the proposed approach.
Power and execution performance tradeoffs of GPGPU computing: a case study employing stereo matching
Sarala Arunagiri, Jaime Jaloma, Ricardo Portillo, et al.
GPGPUs and Multicore processors have become commonplace with their wide usage in traditional high performance computing systems as well as mobile computing devices. A significant speedup can be achieved for a variety of general-purpose applications by using these technologies. Unfortunately, this speedup is often accompanied by high power and/or energy consumption. As a result, energy conservation is increasingly becoming a major concern in designing these computing devices. For large-scale systems such as massive data centers, the cost and environmental impact of powering and cooling computer systems is the main driver for energy-efficiency. On the other hand, for the mobile computing sector, energy conservation is driven by the need to extend battery life and power capping is mandated by the restrictive power budget of mobile platforms such as Unmanned Aerial Vehicles (UAV). Our focus is to understand the power performance tradeoffs in executing Army applications on portable or tactical computing platforms. For a GPGPU computing platform, this study investigates how host processors (CPUs) with different Thermal Design Power (TDP) might affect the execution time and the power consumption of an Army-relevant stereo-matching code accelerated by a GPGPU. For image pairs with size approximately one Megapixel we observed a decrease in execution time of nearly 50% and a decrease in average power by 5% when executed on a low TDP Intel Xeon processor host. The decrease in energy consumption was over 50%. For a larger image pair, although there was no substantial decrease in execution time, there was a decrease in power and energy consumption of approximately 6%. Although we cannot make general conclusions based on a case study, it points to the possibility that for some tactical-HPC GPGPU-accelerated applications, a host processor with a lower TDP might provide better system performance in terms of power consumption while not degrading the execution-time performance.
An efficient algorithm for food quality control based on multispectral signatures
Multispectral imaging has motivated new applications related to quality monitoring for industrial applications due to its capability of analysis based on spectral signatures. In practice, however, a multispectral system used for such purposes is limited because of the large amount of data to be analyzed, being necessary to develop fast methods for the unsupervised classification task. This manuscript introduces a fast and efficient algorithm that is used in combination with a multispectral system for the unsupervised classification of food based on quality. In particular, given two types of fruits previously characterized, we first register a multispectral image from them and perform a dimensionality reduction by taking into account the most representative spectral bands that involve their reflection spectra. From the reduced set, the min-W and max-M lattice associative memories are computed and a subset of their columns are used as centroids of specific clusters. Then, the Euclidean distance computed between each centroid and all spectral vectors in the image allows to subdivide the image in clusters. The achieved results state that the technique is fast, reliable, and non-invasive for food classification.
Bottle inspector based on machine vision
A machine vision system for fault detection in PET bottles is presented. The bottle inspector is divided in three modules for image acquisition of bottle finish, bottle wall and bottle bottom. The captured images are corrected by adaptive gamma correction. An algorithm based in the frequency filtering of n images for defect detection of bottle wall and bottle finish is proposed. We obtain a correct rate classification of 85.5 % in bottle finish, 80.64 % in bottle wall and 95.0 % in bottle bottom.
Defect inspection technology for a gloss-coated surface using patterned illumination
Tsuyoshi Nagato, Takashi Fuse, Tetsuo Koezuka
In this paper, we discuss the development of an inspection system for a gloss-coated surface using patterned illumination. The convex defect on a gloss-coated surface is caused by top-coating paint on a primary coating with minute particles such as dust remaining. Since the convex defect is transparent, it is difficult to observe it in conventional illumination. Thus, we developed an optical system with patterned illumination and an inspection system using imaging technology with a phase-shifting method given the behavior of specular reflection on a gloss surface. The inspected surface is illuminated with the patterned illumination by shifting the phase of a stripe pattern, and a camera takes multiple images of the specular reflection. By calculating the amplitude of the luminance modulation according to a phase-shifting method, the amplitude image can be obtained from the multiple images. The amplitude image means the distribution of the reflectance. The scratch and dirt as well as small convex defects on a gloss surface can be observed in the amplitude image. This inspection system can make an image of the shape and specular reflectance on a gloss surface and allows inspection of gloss coating, which was difficult in the conventional method.
A semi-automatic annotation tool for cooking video
In order to create a cooking assistant application to guide the users in the preparation of the dishes relevant to their profile diets and food preferences, it is necessary to accurately annotate the video recipes, identifying and tracking the foods of the cook. These videos present particular annotation challenges such as frequent occlusions, food appearance changes, etc. Manually annotate the videos is a time-consuming, tedious and error-prone task. Fully automatic tools that integrate computer vision algorithms to extract and identify the elements of interest are not error free, and false positive and false negative detections need to be corrected in a post-processing stage. We present an interactive, semi-automatic tool for the annotation of cooking videos that integrates computer vision techniques under the supervision of the user. The annotation accuracy is increased with respect to completely automatic tools and the human effort is reduced with respect to completely manual ones. The performance and usability of the proposed tool are evaluated on the basis of the time and effort required to annotate the same video sequences.
Intensity and color descriptors for texture classification
In this paper we present a descriptor for texture classification based on the histogram of a local measure of the color contrast. The descriptor has been concatenated to several other color and intensity texture descriptors in the state of the art and has been experimented on three datasets. Results show, in nearly every case, a performance improvement with respect to results achieved by baseline methods thus demonstrating the effectiveness of the proposed texture features. The descriptor has also demonstrated to be robust with respect to global changes in lighting conditions.