Image Processing: Machine Vision Applications VIII | (2015) | Publications

Volume Details

Date Published: 3 April 2015

Contents: 8 Sessions, 33 Papers, 0 Presentations

Conference: SPIE/IS&T Electronic Imaging 2015

Volume Number: 9405

All links to SPIE Proceedings will open in the SPIE Digital Library.

Show all abstracts

View Session

Front Matter: Volume 9405
Detection, Identification, and Monitoring I
Imaging and Machine Vision Algorithms
Algorithms and Techniques
Inspection and Metrology
Detection, Identification, and Monitoring II
Imaging Applications
Interactive Paper Session

Front Matter: Volume 9405

Show abstract

This PDF file contains the front matter associated with SPIE Proceedings Volume 9405 including the Title Page, Copyright information, Table of Contents, Introduction, and Conference Committee listing.

Detection, Identification, and Monitoring I

Multiple object detection in hyperspectral imagery using spectral fringe-adjusted joint transform correlator

Paheding Sidike, Vijayan K. Asari, Mohammad S. Alam

Show abstract

Hyperspectral imaging (HSI) sensors provide plenty of spectral information to uniquely identify materials by their reflectance spectra, and this information has been effectively used for object detection and identification applications. Joint transform correlation (JTC) based object detection techniques in HSI have been proposed in the literatures, such as spectral fringe-adjusted joint transform correlation (SFJTC) and with its several improvements. However, to our knowledge, the SFJTC based techniques were designed to detect only similar patterns in hyperspectral data cube and not for dissimilar patterns. Thus, in this paper, a new deterministic object detection approach using SFJTC is proposed to perform multiple dissimilar target detection in hyperspectral imagery. In this technique, input spectral signatures from a given hyperspectral image data cube are correlated with the multiple reference signatures using the classassociative technique. To achieve better correlation output, the concept of SFJTC and the modified Fourier-plane image subtraction technique are incorporated in the multiple target detection processes. The output of this technique provides sharp and high correlation peaks for a match and negligible or no correlation peaks for a mismatch. Test results using real-life hyperspectral data cube show that the proposed algorithm can successfully detect multiple dissimilar patterns with high discrimination.

Dynamic hierarchical algorithm for accelerated microfossil identification

Cindy M. Wong, Dileepan Joseph

Show abstract

Marine microfossils provide a useful record of the Earth's resources and prehistory via biostratigraphy. To study Hydrocarbon reservoirs and prehistoric climate, geoscientists visually identify the species of microfossils found in core samples. Because microfossil identification is labour intensive, automation has been investigated since the 1980s. With the initial rule-based systems, users still had to examine each specimen under a microscope. While artificial neural network systems showed more promise for reducing expert labour, they also did not displace manual identification for a variety of reasons, which we aim to overcome. In our human-based computation approach, the most difficult step, namely taxon identification is outsourced via a frontend website to human volunteers. A backend algorithm, called dynamic hierarchical identification, uses unsupervised, supervised, and dynamic learning to accelerate microfossil identification. Unsupervised learning clusters specimens so that volunteers need not identify every specimen during supervised learning. Dynamic learning means interim computation outputs prioritize subsequent human inputs. Using a dataset of microfossils identified by an expert, we evaluated correct and incorrect genus and species rates versus simulated time, where each specimen identification defines a moment. The proposed algorithm accelerated microfossil identification effectively, especially compared to benchmark results obtained using a k-nearest neighbour method.

Monitoring Arctic landscape variation by pole and kite mounted cameras

Ruşen Öktem, Baptiste Dafflon, John E. Peterson, et al.

Show abstract

Optic surveillance is an important part of monitoring environmental changes in various ecological settings. Although remote sensing provides extensive data, its resolution is yet not sufficient for scientific research focusing on small spatial scale landscape variations. We are interested in exploiting high resolution image data to observe and investigate the landscape variations at a small spatial scale arctic corridor in Barrow, AK, as part of the DOE Next-Generation Ecosystem Experiments (NGEE-Arctic). A 35 m transect is continuously imaged by two separate pole mounted consumer grade stationary cameras, one capturing in NIR and the other capturing in visible range, starting from June to August in 2014. Surface and subsurface features along this 35 m transect are also sampled by electrical resistivity tomography (ERT), temperature loggers and water content reflectometers. We track the behavioral change along this transect by collecting samples from the pole images and look for a relation between the image features and electrical conductivity. Results show that the correlation coefficient between inferred vegetation indices and soil electrical resistivity (closely related to water content) increased during the growing season, reaching a correlation of 0.89 at the peak of the vegetation. To extrapolate such results to a larger scale, we use a high resolution RGB map of a 500x40 m corridor at this site, which is occasionally obtained using a low-altitude kite mounted consumer grade (RGB) camera. We introduce a segmentation algorithm that operates on the mosaic generated from the kite images to classify the landscape features of the corridor.

Hyperspectral imaging using a color camera and its application for pathogen detection

Seung-Chul Yoon, Tae-Sung Shin, Gerald W. Heitschmidt, et al.

Show abstract

This paper reports the results of a feasibility study for the development of a hyperspectral image recovery (reconstruction) technique using a RGB color camera and regression analysis in order to detect and classify colonies of foodborne pathogens. The target bacterial pathogens were the six representative non-O157 Shiga-toxin producing Escherichia coli (STEC) serogroups (O26, O45, O103, O111, O121, and O145) grown in Petri dishes of Rainbow agar. The purpose of the feasibility study was to evaluate whether a DSLR camera (Nikon D700) could be used to predict hyperspectral images in the wavelength range from 400 to 1,000 nm and even to predict the types of pathogens using a hyperspectral STEC classification algorithm that was previously developed. Unlike many other studies using color charts with known and noise-free spectra for training reconstruction models, this work used hyperspectral and color images, separately measured by a hyperspectral imaging spectrometer and the DSLR color camera. The color images were calibrated (i.e. normalized) to relative reflectance, subsampled and spatially registered to match with counterpart pixels in hyperspectral images that were also calibrated to relative reflectance. Polynomial multivariate least-squares regression (PMLR) was previously developed with simulated color images. In this study, partial least squares regression (PLSR) was also evaluated as a spectral recovery technique to minimize multicollinearity and overfitting. The two spectral recovery models (PMLR and PLSR) and their parameters were evaluated by cross-validation. The QR decomposition was used to find a numerically more stable solution of the regression equation. The preliminary results showed that PLSR was more effective especially with higher order polynomial regressions than PMLR. The best classification accuracy measured with an independent test set was about 90%. The results suggest the potential of cost-effective color imaging using hyperspectral image classification algorithms for rapidly differentiating pathogens in agar plates.

Imaging and Machine Vision Algorithms

Fast face recognition by using an inverted index

Christian Herrmann, Jürgen Beyerer

Show abstract

This contribution addresses the task of searching for faces in large video datasets. Despite vast progress in the field, face recognition remains a challenge for uncontrolled large scale applications like searching for persons in surveillance footage or internet videos. While current productive systems focus on the best shot approach, where only one representative frame from a given face track is selected, thus sacrificing recognition performance, systems achieving state-of-the-art recognition performance, like the recently published DeepFace, ignore recognition speed, which makes them impractical for large scale applications. We suggest a set of measures to address the problem. First, considering the feature location allows collecting the extracted features in according sets. Secondly, the inverted index approach, which became popular in the area of image retrieval, is applied to these feature sets. A face track is thus described by a set of local indexed visual words which enables a fast search. This way, all information from a face track is collected which allows better recognition performance than best shot approaches and the inverted index permits constantly high recognition speeds. Evaluation on a dataset of several thousand videos shows the validity of the proposed approach.

Advanced colour processing for mobile devices

Eugen Gillich, Helene Dörksen, Volker Lohweg

Show abstract

Mobile devices such as smartphones are going to play an important role in professionally image processing tasks. However, mobile systems were not designed for such applications, especially in terms of image processing requirements like stability and robustness. One major drawback is the automatic white balance, which comes with the devices. It is necessary for many applications, but of no use when applied to shiny surfaces. Such an issue appears when image acquisition takes place in differently coloured illuminations caused by different environments. This results in inhomogeneous appearances of the same subject. In our paper we show a new approach for handling the complex task of generating a low-noise and sharp image without spatial filtering. Our method is based on the fact that we analyze the spectral and saturation distribution of the channels. Furthermore, the RGB space is transformed into a more convenient space, a particular HSI space. We generate the greyscale image by a control procedure that takes into account the colour channels. This leads in an adaptive colour mixing model with reduced noise. The results of the optimized images are used to show how, e. g., image classification benefits from our colour adaptation approach.

Image calibration and registration in cone-beam computed tomogram for measuring the accuracy of computer-aided implant surgery

Walter Y. H. Lam, Henry Y. T. Ngan, Peter Y. P. Wat, et al.

Show abstract

Medical radiography is the use of radiation to “see through” a human body without breaching its integrity (surface). With computed tomography (CT)/cone beam computed tomography (CBCT), three-dimensional (3D) imaging can be produced. These imagings not only facilitate disease diagnosis but also enable computer-aided surgical planning/navigation. In dentistry, the common method for transfer of the virtual surgical planning to the patient (reality) is the use of surgical stent either with a preloaded planning (static) like a channel or a real time surgical navigation (dynamic) after registration with fiducial markers (RF). This paper describes using the corner of a cube as a radiopaque fiducial marker on an acrylic (plastic) stent, this RF allows robust calibration and registration of Cartesian (x, y, z)- coordinates for linking up the patient (reality) and the imaging (virtuality) and hence the surgical planning can be transferred in either static or dynamic way. The accuracy of computer-aided implant surgery was measured with reference to coordinates. In our preliminary model surgery, a dental implant was planned virtually and placed with preloaded surgical guide. The deviation of the placed implant apex from the planning was x=+0.56mm [more right], y=- 0.05mm [deeper], z=-0.26mm [more lingual]) which was within clinically 2mm safety range. For comparison with the virtual planning, the physically placed implant was CT/CBCT scanned and errors may be introduced. The difference of the actual implant apex to the virtual apex was x=0.00mm, y=+0.21mm [shallower], z=-1.35mm [more lingual] and this should be brought in mind when interpret the results.

Algorithms and Techniques

Depth-map and albedo estimation with superior information-theoretic performance

Adam P. Harrison, Dileepan Joseph

Show abstract

Lambertian photometric stereo (PS) is a seminal computer vision method. However, using depth maps in the image formation model, instead of surface normals as in PS, reduces model parameters by a third, making it preferred from an information-theoretic perspective. The Akaike information criterion (AIC) quantifies this trade-off between goodness of fit and overfitting. Obtaining superior AIC values requires an effective maximum likelihood (ML) depth-map & albedo estimation method. Recently, the authors published an ML estimation method that uses a two-step approach based on PS. While effective, approximations of noise distributions and decoupling of depth-map & albedo estimation have limited its accuracy. Overcoming these limitations, this paper presents an ML method operating directly on images. The previous two-step ML method provides a robust initial solution, which kick starts a new nonlinear estimation process. An innovative formulation of the estimation task, including a separable nonlinear least-squares approach, reduces the computational burden of the optimization process. Experiments demonstrate visual improvements under noisy conditions by avoiding overfitting. As well, a comprehensive analysis shows that refined depth maps & albedos produce superior AIC metrics and enjoy better predictive accuracy than with literature methods. The results indicate that the new method is a promising means for depth-map & albedo estimation with superior information-theoretic performance.

Shot boundary detection and label propagation for spatio-temporal video segmentation

Sankaranaryanan Piramanayagam, Eli Saber, Nathan D. Cahill, et al.

Show abstract

This paper proposes a two stage algorithm for streaming video segmentation. In the first stage, shot boundaries are detected within a window of frames by comparing dissimilarity between 2-D segmentations of each frame. In the second stage, the 2-D segments are propagated across the window of frames in both spatial and temporal direction. The window is moved across the video to find all shot transitions and obtain spatio-temporal segments simultaneously. As opposed to techniques that operate on entire video, the proposed approach consumes significantly less memory and enables segmentation of lengthy videos. We tested our segmentation based shot detection method on the TRECVID 2007 video dataset and compared it with block-based technique. Cut detection results on the TRECVID 2007 dataset indicate that our algorithm has comparable results to the best of the block-based methods. The streaming video segmentation routine also achieves promising results on a challenging video segmentation benchmark database.

Inspection and Metrology

An edge-from-focus approach to 3D inspection and metrology

Fuqin Deng, Jia Chen, Jianyang Liu, et al.

Show abstract

We propose an edge-based depth-from-focus technique for high-precision non-contact industrial inspection and metrology applications. In our system, an objective lens with a large numerical aperture is chosen to resolve the edge details of the measured object. By motorizing this imaging system, we capture the high-resolution edges within every narrow depth of field. We can therefore extend the measured range and keep a high resolution at the same time. Yet, on the surfaces with a large depth variation, a significant amount of data around each measured point are out of focus within the captured images. Then, it is difficult to extract the valuable information from these out-of-focus data due to the depth-variant blur. Moreover, these data impede the extraction of continuous contours for the measurement objects in high-level machine vision applications. The proposed approach however makes use of the out-of-focus data to synthesize a depth-invariant smoothed image, and then robustly locates the positions of high contrast edges based on non-maximum suppression and hysteresis thresholding. Furthermore, by focus analysis of both the in-focus and the out-of-focus data, we reconstruct the high-precision 3D edges for metrology applications.

Improved metrology of implant lines on static images of textured silicon wafers using line integral method

Kuldeep Shah, Eli Saber, Kevin Verrier

Show abstract

In solar wafer manufacturing processes, the measurement of implant mask wearing over time is important to maintain the quality of wafers and the overall yield. Mask wearing can be estimated by measuring the width of lines implanted by it on the substrate. Previous methods, which propose image analysis methods to detect and measure these lines, have been shown to perform well on polished wafers. Although it is easier to capture images of textured wafers, the contrast between the foreground and background is extremely low. In this paper, an improved technique to detect and measure implant line widths on textured solar wafers is proposed. As a pre-processing step, a fast non-local means method is used to denoise the image due to the presence of repeated patterns of textured lines in the image. Following image enhancement, the previously proposed line integral method is used to extract the position of each line in the image. Full- Width One-Third maximum approximation is then used to estimate the line widths in pixel units. The conversion of these widths into real-world metric units is done using a photogrammetric approach involving the Sampling Distance. The proposed technique is evaluated using real images of textured wafers and compared with the state-of-the-art using identical synthetic images, to which varying amounts of noise was added. Precision, recall and F-measure values are calculated to benchmark the proposed technique. The proposed method is found to be more robust to noise, with critical SNR value reduced by 10dB in comparison to the existing method.

Multispectral imaging: an application to density measurement of photographic paper in the manufacturing process control

Raju Shrestha, Jon Yngve Hardeberg

Show abstract

In this paper, we present an industrial application of multispectral imaging, for density measurement of colorants in photographic paper. We designed and developed a 9-band LED illumination based multispectral imaging system specifically for this application in collaboration with FUJIFILM Manufacturing Europe B.V., Tilburg, Netherlands. Unlike a densitometer, which is a spot density measurement device, the proposed system enables fast density measurement in a large area of a photo paper. Densities of the four colorants (CMYK) at every surface point in an image are calculated from the spectral reflectance image. Fast density measurements facilitate automatic monitoring of density changes (which is proportional to thickness changes), which helps control the manufacturing process for quality and consistent output. Experimental results confirm the effectiveness of the proposed system.

Self-calibration of monocular vision system based on planar points

Yu Zhao, Lichao Xu

Show abstract

This paper proposes a method of self-calibration of monocular vision system which is based on planar points. Using the method proposed in this paper we can get the initial value of the three-dimensional (3D) coordinates of the feature points in the scene easily, although there is a nonzero factor between the initial value and the real value of the 3D coordinates of the feature points. From different viewpoints, we can shoot different pictures, and calculate the initial external parameters of these pictures. Finally, through the overall optimization, we can get all the parameters including the internal parameters, the distortion parameters, the external parameters of each picture and the 3D coordinates of the feature points. According to the experimental results, in about 100mm×200mm field of view, the mean error and the variance of 3D coordinates of the feature points is less than 10μm.

Detection, Identification, and Monitoring II

A comparative study of outlier detection for large-scale traffic data by one-class SVM and kernel density estimation

Henry Y. T. Ngan, Nelson H. C. Yung, Anthony G. O. Yeh

Show abstract

This paper aims at presenting a comparative study of outlier detection (OD) for large-scale traffic data. The traffic data nowadays are massive in scale and collected in every second throughout any modern city. In this research, the traffic flow dynamic is collected from one of the busiest 4-armed junction in Hong Kong in a 31-day sampling period (with 764,027 vehicles in total). The traffic flow dynamic is expressed in a high dimension spatial-temporal (ST) signal format (i.e. 80 cycles) which has a high degree of similarities among the same signal and across different signals in one direction. A total of 19 traffic directions are identified in this junction and lots of ST signals are collected in the 31-day period (i.e. 874 signals). In order to reduce its dimension, the ST signals are firstly undergone a principal component analysis (PCA) to represent as (x,y)-coordinates. Then, these PCA (x,y)-coordinates are assumed to be conformed as Gaussian distributed. With this assumption, the data points are further to be evaluated by (a) a correlation study with three variant coefficients, (b) one-class support vector machine (SVM) and (c) kernel density estimation (KDE). The correlation study could not give any explicit OD result while the one-class SVM and KDE provide average 59.61% and 95.20% DSRs, respectively.

Image-based dynamic deformation monitoring of civil engineering structures from long ranges

Matthias Ehrhart, Werner Lienhart

Show abstract

In this paper, we report on the vibration and displacement monitoring of civil engineering structures using a state of the art image assisted total station (IATS) and passive target markings. By utilizing the telescope camera of the total station, it is possible to capture video streams in real time with 10fps and an angular resolution of approximately 2″/px. Due to the high angular resolution resulting from the 30x optical magnification of the telescope, large distances to the object to be monitored are possible. The laser distance measurement unit integrated in the total station allows to precisely set the camera’s focus position and to relate the angular quantities gained from image processing to units of length. To accurately measure the vibrations and displacements of civil engineering structures, we use circular target markings rigidly attached to the object. The computation of the targets’ centers is performed by a least squares adjustment of an ellipse according to the Gauß-Helmert model from which the parameters of the ellipse and their standard deviations are derived. In laboratory experiments, we show that movements can be detected with an accuracy of better than 0.2mm for single frames and distances up to 30m. For static applications, where many video frames can be averaged, accuracies of better than 0.05mm are possible. In a field test on a life-size footbridge, we compare the vibrations measured by the IATS to reference values derived from accelerometer measurements.

Building and road detection from large aerial imagery

Shunta Saito, Yoshimitsu Aoki

Show abstract

Building and road detection from aerial imagery has many applications in a wide range of areas including urban design, real-estate management, and disaster relief. The extracting buildings and roads from aerial imagery has been performed by human experts manually, so that it has been very costly and time-consuming process. Our goal is to develop a system for automatically detecting buildings and roads directly from aerial imagery. Many attempts at automatic aerial imagery interpretation have been proposed in remote sensing literature, but much of early works use local features to classify each pixel or segment to an object label, so that these kind of approach needs some prior knowledge on object appearance or class-conditional distribution of pixel values. Furthermore, some works also need a segmentation step as pre-processing. Therefore, we use Convolutional Neural Networks(CNN) to learn mapping from raw pixel values in aerial imagery to three object labels (buildings, roads, and others), in other words, we generate three-channel maps from raw aerial imagery input. We take a patch-based semantic segmentation approach, so we firstly divide large aerial imagery into small patches and then train the CNN with those patches and corresponding three-channel map patches. Finally, we evaluate our system on a large-scale road and building detection datasets that is publicly available.

Segmentation and learning in the quantitative analysis of microscopy images

Christy Ruggiero, Amy Ross, Reid Porter

Show abstract

In material science and bio-medical domains the quantity and quality of microscopy images is rapidly increasing and there is a great need to automatically detect, delineate and quantify particles, grains, cells, neurons and other functional "objects" within these images. These are challenging problems for image processing because of the variability in object appearance that inevitably arises in real world image acquisition and analysis. One of the most promising (and practical) ways to address these challenges is interactive image segmentation. These algorithms are designed to incorporate input from a human operator to tailor the segmentation method to the image at hand. Interactive image segmentation is now a key tool in a wide range of applications in microscopy and elsewhere. Historically, interactive image segmentation algorithms have tailored segmentation on an image-by-image basis, and information derived from operator input is not transferred between images. But recently there has been increasing interest to use machine learning in segmentation to provide interactive tools that accumulate and learn from the operator input over longer periods of time. These new learning algorithms reduce the need for operator input over time, and can potentially provide a more dynamic balance between customization and automation for different applications. This paper reviews the state of the art in this area, provides a unified view of these algorithms, and compares the segmentation performance of various design choices.

Imaging Applications

Camera-based forecasting of insolation for solar systems

Daniel Manger, Frank Pagel

Show abstract

With the transition towards renewable energies, electricity suppliers are faced with huge challenges. Especially the increasing integration of solar power systems into the grid gets more and more complicated because of their dynamic feed-in capacity. To assist the stabilization of the grid, the feed-in capacity of a solar power system within the next hours, minutes and even seconds should be known in advance. In this work, we present a consumer camera-based system for forecasting the feed-in capacity of a solar system for a horizon of 10 seconds. A camera is targeted at the sky and clouds are segmented, detected and tracked. A quantitative prediction of the insolation is performed based on the tracked clouds. Image data as well as truth data for the feed-in capacity was synchronously collected at one Hz using a small solar panel, a resistor and a measuring device. Preliminary results demonstrate both the applicability and the limits of the proposed system.

3D barcodes: theoretical aspects and practical implementation

David Gladstein, Ramakrishna Kakarala, Zachi Baharav

Show abstract

This paper introduces the concept of three dimensional (3D) barcodes. A 3D barcode is composed of an array of 3D cells, called modules, and each can be either filled or empty, corresponding to two possible values of a bit. These barcodes have great theoretical promise thanks to their very large information capacity, which grows as the cube of the linear size of the barcode, and in addition are becoming practically manufacturable thanks to the ubiquitous use of 3D printers. In order to make these 3D barcodes practical for consumers, it is important to keep the decoding simple using commonly available means like smartphones. We therefore limit ourselves to decoding mechanisms based only on three projections of the barcode, which imply specific constraints on the barcode itself. The three projections produce the marginal sums of the 3D cube, which are the counts of filled-in modules along each Cartesian axis. In this paper we present some of the theoretical aspects of the 2D and 3D cases, and describe the resulting complexity of the 3D case. We then describe a method to reduce these complexities into a practical application. The method features an asymmetric coding scheme, where the decoder is much simpler than the encoder. We close by demonstrating 3D barcodes we created and their usability.

Still-to-video face recognition in unconstrained environments

Haoyu Wang, Changsong Liu, Xiaoqing Ding

Show abstract

Face images from video sequences captured in unconstrained environments usually contain several kinds of variations, e.g. pose, facial expression, illumination, image resolution and occlusion. Motion blur and compression artifacts also deteriorate recognition performance. Besides, in various practical systems such as law enforcement, video surveillance and e-passport identification, only a single still image per person is enrolled as the gallery set. Many existing methods may fail to work due to variations in face appearances and the limit of available gallery samples. In this paper, we propose a novel approach for still-to-video face recognition in unconstrained environments. By assuming that faces from still images and video frames share the same identity space, a regularized least squares regression method is utilized to tackle the multi-modality problem. Regularization terms based on heuristic assumptions are enrolled to avoid overfitting. In order to deal with the single image per person problem, we exploit face variations learned from training sets to synthesize virtual samples for gallery samples. We adopt a learning algorithm combining both affine/convex hull-based approach and regularizations to match image sets. Experimental results on a real-world dataset consisting of unconstrained video sequences demonstrate that our method outperforms the state-of-the-art methods impressively.

Realistic texture extraction for 3D face models robust to self-occlusion

Chengchao Qu, Eduardo Monari, Tobias Schuchert, et al.

Show abstract

In the context of face modeling, probably the most well-known approach to represent 3D faces is the 3D Morphable Model (3DMM). When 3DMM is fitted to a 2D image, the shape as well as the texture and illumination parameters are simultaneously estimated. However, if real facial texture is needed, texture extraction from the 2D image is necessary. This paper addresses the possible problems in texture extraction of a single image caused by self-occlusion. Unlike common approaches that leverage the symmetric property of the face by mirroring the visible facial part, which is sensitive to inhomogeneous illumination, this work first generates a virtual texture map for the skin area iteratively by averaging the color of neighbored vertices. Although this step creates unrealistic, overly smoothed texture, illumination stays constant between the real and virtual texture. In the second pass, the mirrored texture is gradually blended with the real or generated texture according to the visibility. This scheme ensures a gentle handling of illumination and yet yields realistic texture. Because the blending area only relates to non-informative area, main facial features still have unique appearance in different face halves. Evaluation results reveal realistic rendering in novel poses robust to challenging illumination conditions and small registration errors.

Interactive Paper Session

Context-based handover of persons in crowd and riot scenarios

Jürgen Metzler

Show abstract

In order to control riots in crowds, it is helpful to get ringleaders under control and pull them out of the crowd if one has become an offender. A great support to achieve these tasks is the capability of observing the crowd and ringleaders automatically by using cameras. It also allows a better conservation of evidence in riot control. A ringleader who has become an offender should be tracked across and recognized by several cameras, regardless of whether overlapping camera’s fields of view exist or not. We propose a context-based approach for handover of persons between different camera fields of view. This approach can be applied for overlapping as well as for non-overlapping fields of view, so that a fast and accurate identification of individual persons in camera networks is feasible. Within the scope of this paper, the approach is applied to a handover of persons between single images without having any temporal information. It is particularly developed for semiautomatic video editing and a handover of persons between cameras in order to improve conservation of evidence. The approach has been developed on a dataset collected during a Crowd and Riot Control (CRC) training of the German armed forces. It consists of three different levels of escalation. First, the crowd started with a peaceful demonstration. Later, there were violent protests, and third, the riot escalated and offenders bumped into the chain of guards. One result of the work is a reliable context-based method for person re-identification between single images of different camera fields of view in crowd and riot scenarios. Furthermore, a qualitative assessment shows that the use of contextual information can support this task additionally. It can decrease the needed time for handover and the number of confusions which supports the conservation of evidence in crowd and riot scenarios.

3D motion artifact compenstation in CT image with depth camera

Youngjun Ko, Jongduk Baek, Hyunjung Shim

Show abstract

Computed tomography (CT) is a medical imaging technology that projects computer-processed X-rays to acquire tomographic images or the slices of specific organ of body. A motion artifact caused by patient motion is a common problem in CT system and may introduce undesirable artifacts in CT images. This paper analyzes the critical problems in motion artifacts and proposes a new CT system for motion artifact compensation. We employ depth cameras to capture the patient motion and account it for the CT image reconstruction. In this way, we achieve the significant improvement in motion artifact compensation, which is not possible by previous techniques.

Human action classification using procrustes shape theory

Wanhyun Cho, Sangkyoon Kim, Soonyoung Park, et al.

Show abstract

In this paper, we propose new method that can classify a human action using Procrustes shape theory. First, we extract a pre-shape configuration vector of landmarks from each frame of an image sequence representing an arbitrary human action, and then we have derived the Procrustes fit vector for pre-shape configuration vector. Second, we extract a set of pre-shape vectors from tanning sample stored at database, and we compute a Procrustes mean shape vector for these preshape vectors. Third, we extract a sequence of the pre-shape vectors from input video, and we project this sequence of pre-shape vectors on the tangent space with respect to the pole taking as a sequence of mean shape vectors corresponding with a target video. And we calculate the Procrustes distance between two sequences of the projection pre-shape vectors on the tangent space and the mean shape vectors. Finally, we classify the input video into the human action class with minimum Procrustes distance. We assess a performance of the proposed method using one public dataset, namely Weizmann human action dataset. Experimental results reveal that the proposed method performs very good on this dataset.

Understanding video transmission decisions in cloud based computer vision services

Nijad Anabtawi, Rony M. Ferzli

Show abstract

This paper presents a study about the effect of the quality of the input video source on the computer vision system robustness and how to make use of the findings to create a framework generating a set of recommendation or rules for researchers and developers in the field to use. The study is of high importance especially for cloud based computer vision platforms where the transmission of raw uncompressed video is not possible, as such it is desired to have a sweet spot where the usage of bandwidth is at optimal level while maintaining high recognition rate. Experimental results showed that creating such rules is possible and beneficial to integrate in an end to end cloud based computer vision service.

An auto focus framework for computer vision systems

Nijad Anabtawi, Rony M. Ferzli

Show abstract

Capturing a clean video from a source camera is crucial for accurate results of a computer vision system. In particular, blurry images can considerably affect the detection, tracking and pattern matching algorithms. This paper presents a framework to apply quality control by monitoring captured video with the ability to detect whether the camera is out of focus or not, thus identifying blurry defective images and providing a feedback channel to the camera to adjust the focal length. The framework relies on the use of a no reference objective quality metric for the loopback channel to adjust the camera focus. The experimental results show how the framework enables reduction of unnecessary computations and thus enabling a more power efficient cameras.

Innovative hyperspectral imaging (HSI) based techniques applied to end-of-life concrete drill core characterization for optimal dismantling and materials recovery

Giuseppe Bonifazi, Nicoletta Picone, Silvia Serranti

Show abstract

The reduction of EOL concrete disposal in landfills, together with a lower exploitation of primary raw materials, generates a strong interest to develop, set-up and apply innovative technologies to maximize Construction and Demolition Waste (C&DW) conversion into useful secondary raw materials. Such a goal can be reached starting from a punctual in-situ efficient characterization of the objects to dismantle in order to develop demolition actions aimed to set up innovative mechanical-physical processes to recover the different materials and products to recycle. In this paper an innovative recycling-oriented characterization strategy based on HyperSpectral Imaging (HSI) is described in order to identify aggregates and mortar in drill core samples from end-of-life concrete. To reach this goal, concrete drill cores from a demolition site were systematically investigated by HSI in the short wave infrared field (1000-2500 nm). Results obtained by the adoption of the HSI approach showed as this technology can be successfully applied to analyze quality and characteristics of C&DW before dismantling and as final product to reutilise after demolition-milling-classification actions. The proposed technique and the related recognition logics, through the spectral signature detection of finite physical domains (i.e. concrete slice and/or particle) of different nature and composition, allows; i) to develop characterization procedures able to quantitatively assess end-of-life concrete compositional/textural characteristics and ii) to set up innovative sorting strategies to qualify the different materials constituting drill core samples.

Localizing people in crosswalks with a moving handheld camera: proof of concept

Marc Lalonde, Claude Chapdelaine, Samuel Foucher

Show abstract

Although people or object tracking in uncontrolled environments has been acknowledged in the literature, the accurate localization of a subject with respect to a reference ground plane remains a major issue. This study describes an early prototype for the tracking and localization of pedestrians with a handheld camera. One application envisioned here is to analyze the trajectories of blind people going across long crosswalks when following different audio signals as a guide. This kind of study is generally conducted manually with an observer following a subject and logging his/her current position at regular time intervals with respect to a white grid painted on the ground. This study aims at automating the manual logging activity: with a marker attached to the subject’s foot, a video of the crossing is recorded by a person following the subject, and a semi-automatic tool analyzes the video and estimates the trajectory of the marker with respect to the painted markings. Challenges include robustness to variations to lighting conditions (shadows, etc.), occlusions, and changes in camera viewpoint. Results are promising when compared to GNSS measurements.

Fused methods for visual saliency estimation

Amanda S. Danko, Siwei Lyu

Show abstract

In this work, we present a new model of visual saliency by combing results from existing methods, improving upon their performance and accuracy. By fusing pre-attentive and context-aware methods, we highlight the abilities of state-of-the-art models while compensating for their deficiencies. We put this theory to the test in a series of experiments, comparatively evaluating the visual saliency maps and employing them for content-based image retrieval and thumbnail generation. We find that on average our model yields definitive improvements upon recall and f-measure metrics with comparable precisions. In addition, we find that all image searches using our fused method return more correct images and additionally rank them higher than the searches using the original methods alone.

Classification of hyperspectral images based on conditional random fields

Yang Hu, Eli Saber, Sildomar Monteiro, et al.

Show abstract

A significant increase in the availability of high resolution hyperspectral images has led to the need for developing pertinent techniques in image analysis, such as classification. Hyperspectral images that are correlated spatially and spectrally provide ample information across the bands to benefit this purpose. Conditional Random Fields (CRFs) are discriminative models that carry several advantages over conventional techniques: no requirement of the independence assumption for observations, flexibility in defining local and pairwise potentials, and an independence between the modules of feature selection and parameter leaning. In this paper we present a framework for classifying remotely sensed imagery based on CRFs. We apply a Support Vector Machine (SVM) classifier to raw remotely sensed imagery data in order to generate more meaningful feature potentials to the CRFs model. This approach produces promising results when tested with publicly available AVIRIS Indian Pine imagery.

Pro and con of using Gencam based standard interfaces (GEV, U3V, CXP and CLHS) in a camera or image processing design
Werner Feith

Show abstract

When design image processing applications in hard and software today, as cameras, FPGA embedded processing or PC applications it is a big task select the right interfaces in function and bandwidth for the complex system. This paper shall present existing specifications which are well established in the market and can help in building the system.

Geological applications of machine learning on hyperspectral remote sensing data
C. H. Tse, Yi-liang Li, Edmund Y. Lam

Show abstract

The CRISM imaging spectrometer orbiting Mars has been producing a vast amount of data in the visible to infrared wavelengths in the form of hyperspectral data cubes. These data, compared with those obtained from previous remote sensing techniques, yield an unprecedented level of detailed spectral resolution in additional to an ever increasing level of spatial information. A major challenge brought about by the data is the burden of processing and interpreting these datasets and extract the relevant information from it. This research aims at approaching the challenge by exploring machine learning methods especially unsupervised learning to achieve cluster density estimation and classification, and ultimately devising an efficient means leading to identification of minerals. A set of software tools have been constructed by Python to access and experiment with CRISM hyperspectral cubes selected from two specific Mars locations. A machine learning pipeline is proposed and unsupervised learning methods were implemented onto pre-processed datasets. The resulting data clusters are compared with the published ASTER spectral library and browse data products from the Planetary Data System (PDS). The result demonstrated that this approach is capable of processing the huge amount of hyperspectral data and potentially providing guidance to scientists for more detailed studies.