Image Processing: Algorithms and Systems XIII | (2015) | Publications

Volume Details

Date Published: 7 April 2015

Contents: 8 Sessions, 41 Papers, 0 Presentations

Conference: SPIE/IS&T Electronic Imaging 2015

Volume Number: 9399

All links to SPIE Proceedings will open in the SPIE Digital Library.

Show all abstracts

View Session

Front Matter: Volume 9399
Pattern Classification and Recognition
Image Analysis and Filtering
Special Session: Panorama: Ultra Wide Context and Content Aware Imaging I
Special Session: Panorama: Ultra Wide Context and Content Aware Imaging II
Transform-Domain Image Processing
Multi-Dimensional and Multi-Modal Image Processing
Interactive Paper Session

Front Matter: Volume 9399

Show abstract

This PDF file contains the front matter associated with SPIE Proceedings Volume 9399, including the Title Page, Copyright information, Table of Contents, Introduction (if any), and Conference Committee listing.

Pattern Classification and Recognition

Links between binary classification and the assignment problem in ordered hypothesis machines

Reid Porter, Beate G. Zimmer

Show abstract

Ordered Hypothesis Machines (OHM) are large margin classifiers that belong to the class of Generalized Stack Filters which were originally developed for non-linear signal processing. In previous work we showed how OHM classifiers are equivalent to a variation of Nearest Neighbor classifiers, with the advantage that training involves minimizing a loss function which includes a regularization parameter that controls class complexity. In this paper we report a new connection between OHM training and the Linear Assignment problem, a combinatorial optimization problem that can be solved efficiently with (amongst others) the Hungarian algorithm. Specifically, for balanced classes, and particular choices of parameters, OHM training is the dual of the Assignment problem. The duality sheds new light on the OHM training problem, opens the door to new training methods and suggests several new directions for research.

Optimized curve design for image analysis using localized geodesic distance transformations

Billy Braithwaite, Harri Niska, Irene Pöllänen, et al.

Show abstract

We consider geodesic distance transformations for digital images. Given a M × N digital image, a distance image is produced by evaluating local pixel distances. Distance Transformation on Curved Space (DTOCS) evaluates shortest geodesics of a given pixel neighborhood by evaluating the height displacements between pixels. In this paper, we propose an optimization framework for geodesic distance transformations in a pattern recognition scheme, yielding more accurate machine learning based image analysis, exemplifying initial experiments using complex breast cancer images. Furthermore, we will outline future research work, which will complete the research work done for this paper.

Adaptive graph construction for Isomap manifold learning

Loc Tran, Zezhong Zheng, Guoqing Zhou, et al.

Show abstract

Isomap is a classical manifold learning approach that preserves geodesic distance of nonlinear data sets. One of the main drawbacks of this method is that it is susceptible to leaking, where a shortcut appears between normally separated portions of a manifold. We propose an adaptive graph construction approach that is based upon the sparsity property of the ℓ₁ norm. The ℓ₁ enhanced graph construction method replaces k-nearest neighbors in the classical approach. The proposed algorithm is first tested on the data sets from the UCI data base repository which showed that the proposed approach performs better than the classical approach. Next, the proposed approach is applied to two image data sets and achieved improved performances over standard Isomap.

Real-time affine invariant gesture recognition for LED smart lighting control

Xu Chen, Miao Liao, Xiao-Fan Feng

Show abstract

Gesture recognition has attracted extensive research interest in the field of human computer interaction. Realtime affine invariant gesture recognition is an important and challenging problem. This paper presents a robust affine view invariant gesture recognition system for realtime LED smart light control. As far as we know, this is the first time that gesture recognition has been applied for control LED smart light in realtime. Employing skin detection, hand blobs captured from a top view camera are first localized and aligned. Subsequently, SVM classifiers trained on HOG features and robust shape features are then utilized for gesture recognition. By accurately recognizing two types of gestures (“gesture 8" and a “5 finger gesture"), a user is enabled to toggle lighting on/off efficiently and control light intensity on a continuous scale. In each case, gesture recognition is rotation- and translation-invariant. Extensive evaluations in an office setting demonstrate the effectiveness and robustness of the proposed gesture recognition algorithm.

Steganography in clustered-dot halftones using orientation modulation and modification of direct binary search

Yung-Yao Chen, Sheng-Yi Hong, Kai-Wen Chen

Show abstract

This paper proposes a novel message-embedded halftoning scheme that is based on orientation modulation (OM) encoding. To achieve high image quality, we employ a human visual system (HVS)-based error metric between the continuous-tone image and a data-embedded halftone, and integrate a modified direct binary search (DBS) framework into the proposed message-embedded halftoning method. The modified DBS framework ensures that the resulting data-embedded halftones have optimal image quality from the viewpoint of the HVS.

Image Analysis and Filtering

Machine learning for adaptive bilateral filtering

Iuri Frosio, Karen Egiazarian, Kari Pulli

Show abstract

We describe a supervised learning procedure for estimating the relation between a set of local image features and the local optimal parameters of an adaptive bilateral filter. A set of two entropy-based features is used to represent the properties of the image at a local scale. Experimental results show that our entropy-based adaptive bilateral filter outperforms other extensions of the bilateral filter where parameter tuning is based on empirical rules. Beyond bilateral filter, our learning procedure represents a general framework that can be used to develop a wide class of adaptive filters.

Real-time 3D adaptive filtering for portable imaging systems

Olivier Bockenbach, Murtaza Ali, Ian Wainwright, et al.

Show abstract

Portable imaging devices have proven valuable for emergency medical services both in the field and hospital environments and are becoming more prevalent in clinical settings where the use of larger imaging machines is impractical. 3D adaptive filtering is one of the most advanced techniques aimed at noise reduction and feature enhancement, but is computationally very demanding and hence often not able to run with sufficient performance on a portable platform. In recent years, advanced multicore DSPs have been introduced that attain high processing performance while maintaining low levels of power dissipation. These processors enable the implementation of complex algorithms like 3D adaptive filtering, improving the image quality of portable medical imaging devices. In this study, the performance of a 3D adaptive filtering algorithm on a digital signal processor (DSP) is investigated. The performance is assessed by filtering a volume of size 512x256x128 voxels sampled at a pace of 10 MVoxels/sec.

Joint demosaicking and integer-ratio downsampling algorithm for color filter array image

Sangyoon Lee, Moon Gi Kang

Show abstract

This paper presents a joint demosacking and integer-ratio downsampling algorithm for color filter array (CFA) images. Color demosaicking is a necessary part of image signal processing to obtain full color image for digital image recording system using single sensor. Also, such as mobile devices, the obtained image from sensor has to be downsampled to be display because the resolution of display is smaller than that of image. The conventional method is “Demosaicking first and downsampling later”. However, this procedure requires a significant hardware resources and computational cost. In this paper, we proposed a method in which demosaicking and downsampling are working simultaneously. We analyze the Bayer CFA image in frequency domain, and then joint demosaicking and downsampling with integer-ratio scheme based on signal decomposition of luma and chrominance components. Experimental results show that the proposed method produces the high quality performance with much lower com putational cost and less hardware resources.

Intermediate color interpolation for color filter array containing the white channel

Jonghyun Kim, Sang Wook Park, Moon Gi Kang

Show abstract

Recently, a color filter array sensor with the white channel has been developed. This color filter array differs from the Bayer CFA, which is composed of red, green and blue channels. Since the white channel shows high sensitivity through broad spectral bands and a high light sensitivity, it presents many advantages. However, various color interpolation method for the Bayer CFA cannot be utilized for CFA pattern that contains the white channel directly. In this paper, a method for generating a quincuncial pattern is proposed for the CFA pattern. By generating an intermediate quincuncial pattern, various color interpolation algorithms can be applied to it. Experimental results are shown in comparison with the conventional method in terms of PSNR measurements.

Special Session: Panorama: Ultra Wide Context and Content Aware Imaging I

The future of consumer cameras

Sebastiano Battiato, Marco Moltisanti

Show abstract

In the last two decades multimedia, and in particular imaging devices (camcorders, tablets, mobile phones, etc.) have been dramatically diffused. Moreover the increasing of their computational performances, combined with an higher storage capability, allows them to process large amount of data. In this paper an overview of the current trends of consumer cameras market and technology will be given, providing also some details about the recent past (from Digital Still Camera up today) and forthcoming key issues.

Image quality based x-ray dose control in cardiac imaging

Andrew G. Davies, Stephen M. Kengyelics, Amber J. Gislason-Lee

Show abstract

An automated closed-loop dose control system balances the radiation dose delivered to patients and the quality of images produced in cardiac x-ray imaging systems. Using computer simulations, this study compared two designs of automatic x-ray dose control in terms of the radiation dose and quality of images produced. The first design, commonly in x-ray systems today, maintained a constant dose rate at the image receptor. The second design maintained a constant image quality in the output images. A computer model represented patients as a polymethylmetacrylate phantom (which has similar x-ray attenuation to soft tissue), containing a detail representative of an artery filled with contrast medium. The model predicted the entrance surface dose to the phantom and contrast to noise ratio of the detail as an index of image quality. Results showed that for the constant dose control system, phantom dose increased substantially with phantom size (x5 increase between 20 cm and 30 cm thick phantom), yet the image quality decreased by 43% for the same thicknesses. For the constant quality control, phantom dose increased at a greater rate with phantom thickness (>x10 increase between 20 cm and 30 cm phantom). Image quality based dose control could tailor the x-ray output to just achieve the quality required, which would reduce dose to patients where the current dose control produces images of too high quality. However, maintaining higher levels of image quality for large patients would result in a significant dose increase over current practice.

Selecting stimuli parameters for video quality assessment studies based on perceptual similarity distances

Asli Kumcu, Ljiljana Platiša, Heng Chen, et al.

Show abstract

This work presents a methodology to optimize the selection of multiple parameter levels of an image acquisition, degradation, or post-processing process applied to stimuli intended to be used in a subjective image or video quality assessment (QA) study. It is known that processing parameters (e.g. compression bit-rate) or technical quality measures (e.g. peak signal-to-noise ratio, PSNR) are often non-linearly related to human quality judgment, and the model of either relationship may not be known in advance. Using these approaches to select parameter levels may lead to an inaccurate estimate of the relationship between the parameter and subjective quality judgments – the system’s quality model. To overcome this, we propose a method for modeling the relationship between parameter levels and perceived quality distances using a paired comparison parameter selection procedure in which subjects judge the perceived similarity in quality. Our goal is to enable the selection of evenly sampled parameter levels within the considered quality range for use in a subjective QA study. This approach is tested on two applications: (1) selection of compression levels for laparoscopic surgery video QA study, and (2) selection of dose levels for an interventional X-ray QA study. Subjective scores, obtained from the follow-up single stimulus QA experiments conducted with expert subjects who evaluated the selected bit-rates and dose levels, were roughly equidistant in the perceptual quality space - as intended. These results suggest that a similarity judgment task can help select parameter values corresponding to desired subjective quality levels.

Special Session: Panorama: Ultra Wide Context and Content Aware Imaging II

On detailed 3D reconstruction of large indoor environments

Egor Bondarev

Show abstract

In this paper we present techniques for highly detailed 3D reconstruction of extra large indoor environments. We discuss the benefits and drawbacks of low-range, far-range and hybrid sensing and reconstruction approaches. The proposed techniques for low-range and hybrid reconstruction, enabling the reconstruction density of 125 points/cm³ on large 100.000 m³ models, are presented in detail. The techniques tackle the core challenges for the above requirements, such as a multi-modal data fusion (fusion of a LIDAR data with a Kinect data), accurate sensor pose estimation, high-density scanning and depth data noise filtering. Other important aspects for extra large 3D indoor reconstruction are the point cloud decimation and real-time rendering. In this paper, we present a method for planar-based point cloud decimation, allowing for reduction of a point cloud size by 80-95%. Besides this, we introduce a method for online rendering of extra large point clouds enabling real-time visualization of huge cloud spaces in conventional web browsers.

Person re-identification by pose priors

Slawomir Bak, Filipe Martins , Francois Bremond

Show abstract

The person re-identification problem is a well known retrieval task that requires finding a person of interest in a network of cameras. In a real-world scenario, state of the art algorithms are likely to fail due to serious perspective and pose changes as well as variations in lighting conditions across the camera network. The most effective approaches try to cope with all these changes by applying metric learning tools to find a transfer function between a camera pair. Unfortunately, this transfer function is usually dependent on the camera pair and requires labeled training data for each camera. This might be unattainable in a large camera network. In this paper, instead of learning the transfer function that addresses all appearance changes, we propose to learn a generic metric pool that only focuses on pose changes. This pool consists of metrics, each one learned to match a specific pair of poses. Automatically estimated poses determine the proper metric, thus improving matching. We show that metrics learned using a single camera improve the matching across the whole camera network, providing a scalable solution. We validated our approach on a publicly available dataset demonstrating increase in the re-identification performance.

Fast planar segmentation of depth images

Hani Javan Hemmat, Arash Pourtaherian, Egor Bondarev, et al.

Show abstract

One of the major challenges for applications dealing with the 3D concept is the real-time execution of the algorithms. Besides this, for the indoor environments, perceiving the geometry of surrounding structures plays a prominent role in terms of application performance. Since indoor structures mainly consist of planar surfaces, fast and accurate detection of such features has a crucial impact on quality and functionality of the 3D applications, e.g. decreasing model size (decimation), enhancing localization, mapping, and semantic reconstruction. The available planar-segmentation algorithms are mostly developed using surface normals and/or curvatures. Therefore, they are computationally expensive and challenging for real-time performance. In this paper, we introduce a fast planar-segmentation method for depth images avoiding surface normal calculations. Firstly, the proposed method searches for 3D edges in a depth image and finds the lines between identified edges. Secondly, it merges all the points on each pair of intersecting lines into a plane. Finally, various enhancements (e.g. filtering) are applied to improve the segmentation quality. The proposed algorithm is capable of handling VGA-resolution depth images at a 6 FPS frame-rate with a single-thread implementation. Furthermore, due to the multi-threaded design of the algorithm, we achieve a factor of 10 speedup by deploying a GPU implementation.

Machine vision image quality measurement in cardiac x-ray imaging

Stephen M. Kengyelics, Amber Gislason-Lee, Claire Keeble, et al.

Show abstract

The purpose of this work is to report on a machine vision approach for the automated measurement of x-ray image contrast of coronary arteries filled with iodine contrast media during interventional cardiac procedures. A machine vision algorithm was developed that creates a binary mask of the principal vessels of the coronary artery tree by thresholding a standard deviation map of the direction image of the cardiac scene derived using a Frangi filter. Using the mask, average contrast is calculated by fitting a Gaussian model to the greyscale profile orthogonal to the vessel centre line at a number of points along the vessel. The algorithm was applied to sections of single image frames from 30 left and 30 right coronary artery image sequences from different patients. Manual measurements of average contrast were also performed on the same images. A Bland-Altman analysis indicates good agreement between the two methods with 95% confidence intervals -0.046 to +0.048 with a mean bias of 0.001. The machine vision algorithm has the potential of providing real-time context sensitive information so that radiographic imaging control parameters could be adjusted on the basis of clinically relevant image content.

Multiview image sequence enhancement

Ljubomir Jovanov, Hiêp Luong, Tijana Ružic, et al.

Show abstract

Realistic visualization is crucial for more intuitive representation of complex data, medical imaging, simulation and entertainment systems. Multiview autostereoscopic displays are great step towards achieving complete immersive user experience. However, providing high quality content for this type of displays is still a great challenge. Due to the different characteristics/settings of the cameras in the multivew setup and varying photometric characteristics of the objects in the scene, the same object may have different appearance in the sequences acquired by the different cameras. Images representing views recorded using different cameras in practice have different local noise, color and sharpness characteristics. View synthesis algorithms introduce artefacts due to errors in disparity estimation/bad occlusion handling or due to erroneous warping function estimation. If the input multivew images are not of sufficient quality and have mismatching color and sharpness characteristics, these artifacts may become even more disturbing. The main goal of our method is to simultaneously perform multiview image sequence denoising, color correction and the improvement of sharpness in slightly blurred regions. Results show that the proposed method significantly reduces the amount of the artefacts in multiview video sequences resulting in a better visual experience.

How much image noise can be added in cardiac x-ray imaging without loss in perceived image quality?

Amber J. Gislason-Lee, Asli Kumcu, Stephen M. Kengyelics, et al.

Show abstract

Dynamic X-ray imaging systems are used for interventional cardiac procedures to treat coronary heart disease. X-ray settings are controlled automatically by specially-designed X-ray dose control mechanisms whose role is to ensure an adequate level of image quality is maintained with an acceptable radiation dose to the patient. Current commonplace dose control designs quantify image quality by performing a simple technical measurement directly from the image. However, the utility of cardiac X-ray images is in their interpretation by a cardiologist during an interventional procedure, rather than in a technical measurement. With the long term goal of devising a clinically-relevant image quality metric for an intelligent dose control system, we aim to investigate the relationship of image noise with clinical professionals’ perception of dynamic image sequences.

Computer-generated noise was added, in incremental amounts, to angiograms of five different patients selected to represent the range of adult cardiac patient sizes. A two alternative forced choice staircase experiment was used to determine the amount of noise which can be added to a patient image sequences without changing image quality as perceived by clinical professionals. Twenty-five viewing sessions (five for each patient) were completed by thirteen observers. Results demonstrated scope to increase the noise of cardiac X-ray images by up to 21% ± 8% before it is noticeable by clinical professionals. This indicates a potential for 21% radiation dose reduction since X-ray image noise and radiation dose are directly related; this would be beneficial to both patients and personnel.

Transform-Domain Image Processing

Metamerism in the context of aperture sampling reconstruction

Alfredo Restrepo, Julian D. Garzon

Show abstract

The traditional mathematical context for the modelling of metamerism is linear algebra, in particular linear transformations of finite-dimension vectors spaces; metamerism is thus traditionally considered in the natural (electromagnetic wavelength or electromagnetic frequency) domain. Aperture sampling, and sampling theory in general consider both the natural domain of signals as well their Fourier frequency domain. If (visible spectrum) signals are considered as nearly Fourier-frequency bandlimited as well as nautural-domain limited, it turns out that the discrete approach of a finite-dimension vector space of signals, commonly used in the study of metamerism is justified. Three aperture samples are not enough to recover the signal in most cases. Five aperture samples may be enough to recover a cone spectral function.

Tensor representation of color images and fast 2D quaternion discrete Fourier transform

Artyom M. Grigoryan, Sos S. Agaian

Show abstract

In this paper, a general, efficient, split algorithm to compute the two-dimensional quaternion discrete Fourier transform (2-D QDFT), by using the special partitioning in the frequency domain, is introduced. The partition determines an effective transformation, or color image representation in the form of 1-D quaternion signals which allow for splitting the N × M-point 2-D QDFT into a set of 1-D QDFTs. Comparative estimates revealing the efficiency of the proposed algorithms with respect to the known ones are given. In particular, a proposed method of calculating the 2^r× 2^r -point 2-D QDFT uses 18N² less multiplications than the well-known column-row method and method of calculation based on the symplectic decomposition. The proposed algorithm is simple to apply and design, which makes it very practical in color image processing in the frequency domain.

Algorithms of the q2^r× q2^r-point 2D discrete Fourier transform

Artyom M. Grigoryan, Sos S. Agaian

Show abstract

In this paper, the concept of partitions revealing the two-dimensional discrete Fourier transform (2-D DFT) of order q2^r × q2^r, where r > 1 and q is a positive odd number, is described. Two methods of calculation of the 2-D DFT are analyzed. The q2^r × q2^r-point 2-D DFT can be calculated by the traditional column-row method with 2(q2r) 1-D DFTs, and we propose the fast algorithm which splits each 1-D DFT by the short transforms by means of the fast paired transforms. Another effective algorithm of calculation of the q2^r × q2^r-point 2-D DFT is based on the tensor or paired representations of the image when the image is represented as a set of 1-D signals which define the 2-D transform in the different subsets of frequency-points and they all together cover the complete set of frequencies. In this case, the splittings of the q2^r × q2^r-point 2-D DFT are performed by the 2-D discrete tensor or paired transforms, respectively, which lead to the calculation with a minimum number of 1-D DFTs. Examples of the transforms and computational complexity of the proposed algorithms are given.

A method for predicting DCT-based denoising efficiency for grayscale images corrupted by AWGN and additive spatially correlated noise

Aleksey S. Rubel, Vladimir V. Lukin, Karen O. Egiazarian

Show abstract

Results of denoising based on discrete cosine transform for a wide class of images corrupted by additive noise are obtained. Three types of noise are analyzed: additive white Gaussian noise and additive spatially correlated Gaussian noise with middle and high correlation levels. TID2013 image database and some additional images are taken as test images. Conventional DCT filter and BM3D are used as denoising techniques. Denoising efficiency is described by PSNR and PSNR-HVS-M metrics. Within hard-thresholding denoising mechanism, DCT-spectrum coefficient statistics are used to characterize images and, subsequently, denoising efficiency for them. Results of denoising efficiency are fitted for such statistics and efficient approximations are obtained. It is shown that the obtained approximations provide high accuracy of prediction of denoising efficiency.

Multi-Dimensional and Multi-Modal Image Processing

Cost volume refinement filter for post filtering of visual corresponding

Shu Fujita, Takuya Matsuo, Norishige Fukushima, et al.

Show abstract

In this paper, we propose a generalized framework of cost volume refinement filtering for visual corresponding problems. When we estimate a visual correspondence map, e.g., depth map, optical flow, segmentation and so on, the estimated map often contains a number of noises and blurs. One of the solutions for this problem is post filtering. Edge-preserving filtering, such as joint bilateral filtering, can remove the noises, but it causes blurs on object boundaries at the same time. As an approach to remove noises without blurring, there is cost volume refinement filtering (CVRF) that is an effective solution for the refinement of such labeling of correspondence problems. There are some papers that propose several methods categorized into CVRF for various applications. These methods use various reconstructing metrics functions, which are L1 norm, L2 norm or exponential function, and various edge-preserving filters, which are joint bilateral filtering, guided image filtering and so on. In this paper, we generalize these factors and add range-spacial domain resizing factor for CVRF. Experimental results show that our generalized formulation outperform the conventional approaches, and also show what the format of CVRF is appropriate for various applications of stereo matching and optical flow estimation.

Depth remapping using seam carving for depth image based rendering

Ikuko Tsubaki, Kenichi Iwauchi

Show abstract

Depth remapping is a technique to control depth range of stereo images. Conventional remapping which uses a transform function in the whole image has a stable characteristic, however it sometimes reduces the 3D appearance too much. To cope with this problem, a depth remapping method which preserves the details of depth structure is proposed. We apply seam carving, which is an effective technique for image retargeting, to depth remapping. An extended depth map is defined as a space-depth volume, and a seam surface which is a 2D monotonic and connected manifold is introduced. The depth range is reduced by removing depth values on the seam surface from the space-depth volume. Finally a stereo image pair is synthesized from the corrected depth map and an input color image by depth image based rendering.

Depth map occlusion filling and scene reconstruction using modified exemplar-based inpainting

V. V. Voronin, V. I. Marchuk, A. V. Fisunov, et al.

Show abstract

RGB-D sensors are relatively inexpensive and are commercially available off-the-shelf. However, owing to their low complexity, there are several artifacts that one encounters in the depth map like holes, mis-alignment between the depth and color image and lack of sharp object boundaries in the depth map. Depth map generated by Kinect cameras also contain a significant amount of missing pixels and strong noise, limiting their usability in many computer vision applications. In this paper, we present an efficient hole filling and damaged region restoration method that improves the quality of the depth maps obtained with the Microsoft Kinect device. The proposed approach is based on a modified exemplar-based inpainting and LPA-ICI filtering by exploiting the correlation between color and depth values in local image neighborhoods. As a result, edges of the objects are sharpened and aligned with the objects in the color image. Several examples considered in this paper show the effectiveness of the proposed approach for large holes removal as well as recovery of small regions on several test images of depth maps. We perform a comparative study and show that statistically, the proposed algorithm delivers superior quality results compared to existing algorithms.

Real-time depth image-based rendering with layered dis-occlusion compensation and aliasing-free composition

Sergey Smirnov, Atanas Gotchev

Show abstract

Depth Image-based Rendering (DIBR) is a popular view synthesis technique which utilizes the RGB+D image format, also referred to as view-plus-depth scene representation. Classical DIBR is prone to dis-occlusion artefacts, caused by the lack of information in areas behind foreground objects, which appear visible in the synthesized images. A number of recently proposed compensation techniques have addressed the problem of hole filling. However, their computational complexity does not allow for real-time view synthesis and may require additional user input. In this work, we propose a hole-compensation technique, which works fully automatically and in a perceptually-correct manner. The proposed technique applies a two-layer model of the given RGB+D imagery, which is specifically tailored for rendering with free viewpoint selection. The main two components of the proposed technique are an adaptive layering of depth into relative 'foreground' and 'background' layers to be rendered separately and an additional blending filtering aimed at creating a blending function for aliasing cancellation during the process of view composition. The proposed real-time implementation turns ordinary view-plus-depth images to true 3D scene representations, which allow visualization in the y-around manner.

Interactive Paper Session

No-reference visual quality assessment for image inpainting

V. V. Voronin, Vladimir A. Frantc, V. I. Marchuk, et al.

Show abstract

Inpainting has received a lot of attention in recent years and quality assessment is an important task to evaluate different image reconstruction approaches. In many cases inpainting methods introduce a blur in sharp transitions in image and image contours in the recovery of large areas with missing pixels and often fail to recover curvy boundary edges. Quantitative metrics of inpainting results currently do not exist and researchers use human comparisons to evaluate their methodologies and techniques. Most objective quality assessment methods rely on a reference image, which is often not available in inpainting applications. Usually researchers use subjective quality assessment by human observers. It is difficult and time consuming procedure. This paper focuses on a machine learning approach for no-reference visual quality assessment for image inpainting based on the human visual property. Our method is based on observation that Local Binary Patterns well describe local structural information of the image. We use a support vector regression learned on assessed by human images to predict perceived quality of inpainted images. We demonstrate how our predicted quality value correlates with qualitative opinion in a human observer study. Results are shown on a human-scored dataset for different inpainting methods.

Pentachromatic colour spaces

Alfredo Restrepo

Show abstract

We generalise previous results to dimension 5. We exploit the geometric properties of the 5-hypercube [0, 1]⁵ in order to give a mathematical model for colour vision in the case of 5 photoreceptor types and for the corresponding additive colour combination with five primary lights. Five photoreceptors or five types of camera pixel filters with responses normalised to the interval [0, 1] give rise to a 5 dimensional hypercube [0, 1]⁵ of combined responses (colours). As previously done, for the trichromatic and tetrachromatic cases, we identify an equatorial PL 3- sphere in the PL 4-sphere boundary ∂[0, 1]⁵ of the hypercube. This equatorial sphere is the set of hues of the chromatic colour points in the hypercube. The remaining attributes of luminance and chromatic saturation are given by the midrange and range of the colour coordinates. From the 5-cube we go to a polytopal hexcone type space, to a double-cone type space and to a round Runge space.

A comparative study of two prediction models for brain tumor progression

Deqi Zhou, Loc Tran, Jihong Wang, et al.

Show abstract

MR diffusion tensor imaging (DTI) technique together with traditional T1 or T2 weighted MRI scans supplies rich information sources for brain cancer diagnoses. These images form large-scale, high-dimensional data sets. Due to the fact that significant correlations exist among these images, we assume low-dimensional geometry data structures (manifolds) are embedded in the high-dimensional space. Those manifolds might be hidden from radiologists because it is challenging for human experts to interpret high-dimensional data. Identification of the manifold is a critical step for successfully analyzing multimodal MR images.

We have developed various manifold learning algorithms (Tran et al. 2011; Tran et al. 2013) for medical image analysis. This paper presents a comparative study of an incremental manifold learning scheme (Tran. et al. 2013) versus the deep learning model (Hinton et al. 2006) in the application of brain tumor progression prediction. The incremental manifold learning is a variant of manifold learning algorithm to handle large-scale datasets in which a representative subset of original data is sampled first to construct a manifold skeleton and remaining data points are then inserted into the skeleton by following their local geometry. The incremental manifold learning algorithm aims at mitigating the computational burden associated with traditional manifold learning methods for large-scale datasets. Deep learning is a recently developed multilayer perceptron model that has achieved start-of-the-art performances in many applications. A recent technique named "Dropout" can further boost the deep model by preventing weight coadaptation to avoid over-fitting (Hinton et al. 2012).

We applied the two models on multiple MRI scans from four brain tumor patients to predict tumor progression and compared the performances of the two models in terms of average prediction accuracy, sensitivity, specificity and precision. The quantitative performance metrics were calculated as average over the four patients. Experimental results show that both the manifold learning and deep neural network models produced better results compared to using raw data and principle component analysis (PCA), and the deep learning model is a better method than manifold learning on this data set. The averaged sensitivity and specificity by deep learning are comparable with these by the manifold learning approach while its precision is considerably higher. This means that the predicted abnormal points by deep learning are more likely to correspond to the actual progression region.

Enhancement of galaxy images for improved classification

John Jenkinson, Artyom M. Grigoryan, Sos S. Agaian

Show abstract

In this paper, the classification accuracy of galaxy images is demonstrated to be improved by enhancing the galaxy images. Galaxy images often contain faint regions that are of similar intensity to stars and the image background, resulting in data loss during background subtraction and galaxy segmentation. Enhancement darkens these faint regions, enabling them to be distinguished from other objects in the image and the image background, relative to their original intensities. The heap transform is employed for the purpose of enhancement. Segmentation then produces a galaxy image which closely resembles the structure of the original galaxy image, and one that is suitable for further processing and classification. 6 Morphological feature descriptors are applied to the segmented images after a preprocessing stage and used to extract the galaxy image structure for use in training the classifier. The support vector machine learning algorithm performs training and validation of the original and enhanced data, and a comparison between the classification accuracy of each data set is included. Principal component analysis is used to compress the data sets for the purpose of classification visualization and a comparison between the reduced and original feature spaces. Future directions for this research include galaxy image enhancement by various methods, and classification performed with the use of a sparse dictionary. Both future directions are introduced.

Face retrieval in video sequences using Web images database

M. Leo, F. Battisti, M. Carli, et al.

Show abstract

Face processing techniques for automatic recognition in broadcast video attract the research interest because of its value in applications, such as video indexing, retrieval, and summarization. In multimedia press review, the automatic annotation of broadcasting news programs is a challenging task because people can appear with large appearance variations such as hair styles, illumination conditions and poses that make the comparison between similar faces more difficult. In this paper a technique for automatic face identification in TV broadcasting programs based on a gallery of faces downloaded from Web is proposed. The approach is based on a joint use of Scale Invariant Feature Transform descriptor and Eigenfaces-based algorithms and it has been tested on video sequences using a database of images acquired starting from a web search. Experimental results show that the joint use of these two approaches improves the recognition rate in case of use Standard Definition (SD) and High Definition (HD) standards.

Development and validation of an improved smartphone heart rate acquisition system

G. Karapetyan, R. Barseghyan, H. Sarukhanyan, et al.

Show abstract

In this paper we propose a robust system for touchless heart rate (HR) acquisition on mobile devices. The application allows monitor heart rate signal during usual mobile device usage such as video watching, games playing, article reading etc. The system is based on algorithm of acquiring heart rate via recording of skin color variations with built-in cameras of mobile device. The signal is acquired from different ROIs of human face, which make it more clear and the amplification of the signal improve the robustness in low lightening conditions. The effectiveness and robustness of the developed system has been demonstrated under different distances from camera source and illumination conditions. The experiments have been done with different mobile devices HRs were collected from 10 subjects, ages 22 to 65, by using the 3 devices. Moreover, we compared the developed method with Food and Drug Administration (FDA) approved sensors and related commercial applications of remote heart rate measurements on mobile devices.

New 2D discrete Fourier transforms in image processing

Artyom M. Grigoryan, Sos S. Agaian

Show abstract

In this paper, the concept of the two-dimensional discrete Fourier transformation (2-D DFT) is defined in the general case, when the form of relation between the spatial-points (x, y) and frequency-points (ω₁, ω₂) is defined in the exponential kernel of the transformation by a nonlinear form L(x, y; ω₁, ω₂). The traditional concept of the 2-D DFT uses the Diaphanous form xω₁ +yω₂ and this 2-D DFT is the particular case of the Fourier transform described by the form L(x, y; ω₁, ω₂). Properties of the general 2-D discrete Fourier transform are described and examples are given. The special case of the N × N-point 2-D Fourier transforms, when N = 2^r, r > 1, is analyzed and effective representation of these transforms is proposed. The proposed concept of nonlinear forms can be also applied for other transformations such as Hartley, Hadamard, and cosine transformations.

Printed Arabic optical character segmentation

Khader Mohammad, Muna Ayyesh, Aziz Qaroush, et al.

Show abstract

A considerable progress in recognition techniques for many non-Arabic characters has been achieved. In contrary, few efforts have been put on the research of Arabic characters. In any Optical Character Recognition (OCR) system the segmentation step is usually the essential stage in which an extensive portion of processing is devoted and a considerable share of recognition errors is attributed. In this research, a novel segmentation approach for machine Arabic printed text with diacritics is proposed. The proposed method reduces computation, errors, gives a clear description for the sub-word and has advantages over using the skeleton approach in which the data and information of the character can be lost. Both of initial evaluation and testing of the proposed method have been developed using MATLAB and shows 98.7% promising results.

Super Resolution Algorithm for CCTVs

Seiichi Gohshi

Show abstract

Recently, security cameras and CCTV systems have become an important part of our daily lives. The rising demand for such systems has created business opportunities in this field, especially in big cities. Analogue CCTV systems are being replaced by digital systems, and HDTV CCTV has become quite common. HDTV CCTV can achieve images with high contrast and decent quality if they are clicked in daylight. However, the quality of an image clicked at night does not always have sufficient contrast and resolution because of poor lighting conditions. CCTV systems depend on infrared light at night to compensate for insufficient lighting conditions, thereby producing monochrome images and videos. However, these images and videos do not have high contrast and are blurred. We propose a nonlinear signal processing technique that significantly improves visual and image qualities (contrast and resolution) of low-contrast infrared images. The proposed method enables the use of infrared cameras for various purposes such as night shot and poor lighting environments under poor lighting conditions.

Intended motion estimation using fuzzy Kalman filtering for UAV image stabilization with large drifting

Tiantian Xin, Hongying Zhao, Sijie Liu, et al.

Show abstract

Videos from a small Unmanned Aerial Vehicle (UAV) are always unstable because of the wobble of the vehicle and the impact of surroundings, especially when the motion has a large drifting. Electronic image stabilization aims at removing the unwanted wobble and obtaining the stable video. Then estimation of intended motion, which represents the tendency of global motion, becomes the key to image stabilization. It is usually impossible for general methods of intended motion estimation to obtain stable intended motion remaining as much information of video images and getting a path as much close to the real flying path at the same time. This paper proposed a fuzzy Kalman filtering method to estimate the intended motion to solve these problems. Comparing with traditional methods, the fuzzy Kalman filtering method can achieve better effect to estimate the intended motion.

A perceptual quality metric for high-definition stereoscopic 3D video

F. Battisti, M. Carli, Alessio Stramacci, et al.

Show abstract

The use of 3D video is growing in several fields such as entertainment, military simulations, medical applications. However, the process of recording, transmitting, and processing 3D video is prone to errors thus producing artifacts that may affect the perceived quality. Nowadays a challenging task is the definition of a new metric able to predict the perceived quality with low computational complexity in order to be used in real-time applications. The research in this field is very active due to the complexity of the analysis of the influence of stereoscopic cues. In this paper we present a novel stereoscopic metric based on the combination of relevant features able to predict the subjective quality rating in a more accurate way.

Content-aware video quality assessment: predicting human perception of quality using peak signal to noise ratio and spatial/temporal activity

B. Ortiz-Jaramillo, J. Niño-Castañeda, Ljiljana Platiša, et al.

Show abstract

Since the end-user of video-based systems is often a human observer, prediction of human perception of quality (HPoQ) is an important task for increasing the user satisfaction. Despite the large variety of objective video quality measures, one problem is the lack of generalizability. This is mainly due to the strong dependency between HPoQ and video content. Although this problem is well-known, few existing methods directly account for the influence of video content on HPoQ.

This paper propose a new method to predict HPoQ by using simple distortion measures and introducing video content features in their computation. Our methodology is based on analyzing the level of spatio-temporal activity and combining HPoQ content related parameters with simple distortion measures. Our results show that even very simple distortion measures such as PSNR and simple spatio-temporal activity measures lead to good results. Results over four different public video quality databases show that the proposed methodology, while faster and simpler, is competitive with current state-of-the-art methods, i.e., correlations between objective and subjective assessment higher than 80% and it is only two times slower than PSNR.

Multi-volume mapping and tracking for real-time RGB-D sensing

Lingni Ma, Egor Bondarev, Peter H. N. de With

Show abstract

In recent years, many research has been devoted to real-time dense mapping and tracking techniques due to the availability of low-cost RGB-D cameras. In this paper, we present a novel multi-volume mapping and tracking algorithm to generate photo-realistic mapping while maintaining accurate and robust camera tracking. The algorithm deploys one small volume of high voxel resolution to obtain detailed maps of near-field objects, while utilizes another big volume of low voxel resolution to increase robustness of tracking by including far-field scenes. The experimental results show that our multivolume processing scheme achieves an objective quality gain of 2 dB in PSNR and 0.2 in SSIM. Our approach is capable of real-time sensing with approximately 30 fps and can be implemented on a modern GPU.

Preserving natural scene lighting by strobe-lit video

Olli Suominen, Atanas Gotchev

Show abstract

Capturing images in low light intensity, and preserving ambient light in such conditions pose significant problems in terms of achievable image quality. Either the sensitivity of the sensor must be increased, filling the resulting image with noise, or the scene must be lit with artificial light, destroying the aesthetic quality of the image. While the issue has been previously tackled for still imagery using cross-bilateral filtering, the same problem exists in capturing video. We propose a method of illuminating the scene with a strobe light synchronized to every other frame captured by the camera, and merging the information from consecutive frames alternating between high gain and high intensity lighting. The motion between the frames is compensated using motion estimation based on block matching between strobe-illuminated frames. The uniform lighting conditions between every other frame make it possible to utilize conventional motion estimation methods, circumventing the image registration challenges faced in fusing flash/non-flash pairs from non-stationary images. The results of the proposed method are shown to closely resemble those computed using the same filter based on reference images captured at perfect camera alignment. The method can be applied starting from a simple set of three frames to video streams of arbitrary lengths with the only requirements being sufficiently accurate syncing between the imaging device and the lighting unit, and the capability to switch states (sensor gain high/low, illumination on/off) fast enough.