Proceedings Volume 8666

Visual Information Processing and Communication IV

cover
Proceedings Volume 8666

Visual Information Processing and Communication IV

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 13 March 2013
Contents: 7 Sessions, 19 Papers, 0 Presentations
Conference: IS&T/SPIE Electronic Imaging 2013
Volume Number: 8666

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Session 1
  • Session 2
  • Session 3
  • Session 4
  • Session 5
  • Interactive Paper Session
  • Front Matter: volume 8666
Session 1
icon_mobile_dropdown
Fairness issues in resource allocation schemes for wireless visual sensor networks
Katerina Pandremmenou, Lisimachos P. Kondi, Konstantinos E. Parsopoulos
This work addresses the problem of fairness and efficiency evaluation of various resource allocation schemes for wireless visual sensor networks (VSNs). These schemes are used to optimally allocate the source coding rates, channel coding rates, and power levels among the nodes of a wireless direct sequence code division multiple access (DS–CDMA) VSN. All of the considered schemes optimize a function of the video qualities of the nodes. However, there is no single scheme that maximizes the video quality of each node simultaneously. In fact, all presented schemes are able to provide a Pareto–optimal solution, meaning that there is no other solution that is simultaneously preferred by all nodes. Thus, it is not clear which scheme results in the best resource allocation for the whole network. To handle the resulting tradeoffs, in this study we examine four metrics that investigate fairness and efficiency under different perspectives. Specifically, we apply a metric that considers both fairness and performance issues, and another metric that measures the “equality” of a resource allocation (equal utilities for the nodes). The third metric computes the total system utility, while the last metric computes the total power consumption of the nodes. Ideally, a desirable scheme would achieve high total utility while being equally fair to all nodes and requiring low amounts of power.
Discussion on information theoretic and simulation analysis of linear shift-invariant edge detection operators
Generally, the designs of digital image processing algorithms and image gathering devices remain separate. However, experiments show that the image gathering process profoundly impacts the performance of digital image processing and the quality of the resulting images. We proposed an end-to-end information theory based system to assess linear shift-invariant edge detection algorithms, where the different parts, such as scene, image gathering, and processing, are assessed in an integrated manner using Shannon’s information theory. We evaluated the performance of the different algorithms as a function of the characteristics of the scene and the parameters, such as sampling, additive noise etc., that define the image gathering system. The edge detection algorithm is regarded as having high performance only if the information rate from the scene to the edge image approaches its maximum possible. This goal can be achieved only by jointly optimizing all processes. To validate our information theoretical conclusions, a series of experiments simulated the whole image acquisition process are conducted. After comparison and discussion between theoretic analysis and simulation analysis, we can draw a conclusion that the proposed information-theoretic assessment provides a new tool which allows us to compare different linear shift-invariant edge detectors in a common environment.
Session 2
icon_mobile_dropdown
Inter-layer motion field mapping for the scalable extension of HEVC
Xiaoyu Xiu, Yan Ye, Yong He, et al.
The next generation video coding standard, High Efficiency Video Coding (HEVC), is under development by the Joint Collaborative Team on Video Coding (JCT-VC) of the ITU-T VCEG and the ISO/IEC MPEG. As the first version of single-layer HEVC standard comes close to completion, there is a great interest to extend the standard with scalable capabilities. In this paper, an inter-layer Motion Field Mapping (MFM) algorithm is proposed for the scalable extension of HEVC to generate the motion field of inter-layer reference pictures, such that the correlation between the motion vectors (MVs) of base-layer and enhancement-layer can be exploited. Moreover, as the proposed method does not change any block-level operation, the existing single-layer encoder and decoder logic of HEVC can be directly applied without modification of motion vector prediction for the enhancement-layer. The experimental results show the effectiveness of the proposed MFM method in improving the performance of enhancement-layer motion prediction in scalable HEVC.
An HEVC extension for spatial and quality scalable video coding
Tobias Hinz, Philipp Helle, Haricharan Lakshman, et al.
This paper describes an extension of the upcoming High Efficiency Video Coding (HEVC) standard for supporting spatial and quality scalable video coding. Besides scalable coding tools known from scalable profiles of prior video coding standards such as H.262/MPEG-2 Video and H.264/MPEG-4 AVC, the proposed scalable HEVC extension includes new coding tools that further improve the coding efficiency of the enhancement layer. In particular, new coding modes by which base and enhancement layer signals are combined for forming an improved enhancement layer prediction signal have been added. All scalable coding tools have been integrated in a way that the low-level syntax and decoding process of HEVC remain unchanged to a large extent. Simulation results for typical application scenarios demonstrate the effectiveness of the proposed design. For spatial and quality scalable coding with two layers, bit-rate savings of about 20-30% have been measured relative to simulcasting the layers, which corresponds to a bit-rate overhead of about 5-15% relative to single-layer coding of the enhancement layer.
Towards a next generation open-source video codec
Jim Bankoski, Ronald S. Bultje, Adrian Grange, et al.
Google has recently been developing a next generation opensource video codec called VP9, as part of the experimental branch of the libvpx repository included in the WebM project (http://www.webmproject.org/). Starting from the VP8 video codec released by Google in 2010 as the baseline, a number of enhancements and new tools have been added to improve the coding efficiency. This paper provides a technical overview of the current status of this project along with comparisons and other stateoftheart video codecs H. 264/AVC and HEVC. The new tools that have been added so far include: larger prediction block sizes up to 64x64, various forms of compound INTER prediction, more modes for INTRA prediction, ⅛pel motion vectors and 8tap switchable subpel interpolation filters, improved motion reference generation and motion vector coding, improved entropy coding and framelevel entropy adaptation for various symbols, improved loop filtering, incorporation of Asymmetric Discrete Sine Transforms and larger 16x16 and 32x32 DCTs, frame level segmentation to group similar areas together, etc. Other tools and various bitstream features are being actively worked on as well. The VP9 bitstream is expected to be finalized by earlyto mid2013. Results show VP9 to be quite competitive in performance with mainstream stateoftheart codecs.
Session 3
icon_mobile_dropdown
Scalable extensions of HEVC for next generation services
Kiran Misra, Andrew Segall, Jie Zhao, et al.
The high efficiency video coding (HEVC) standard being developed by ITU-T VCEG and ISO/IEC MPEG achieves a compression goal of reducing the bitrate by half for the same visual quality when compared with earlier video compression standards such as H.264/AVC. It achieves this goal with the use of several new tools such as quad-tree based partitioning of data, larger block sizes, improved intra prediction, the use of sophisticated prediction of motion information, inclusion of an in-loop sample adaptive offset process etc. This paper describes an approach where the HEVC framework is extended to achieve spatial scalability using a multi-loop approach. The enhancement layer inter-predictive coding efficiency is improved by including within the decoded picture buffer multiple up-sampled versions of the decoded base layer picture. This approach has the advantage of achieving significant coding gains with a simple extension of the base layer tools such as inter-prediction, motion information signaling etc. Coding efficiency of the enhancement layer is further improved using adaptive loop filter and internal bit-depth increment. The performance of the proposed scalable video coding approach is compared to simulcast transmission of video data using high efficiency model version 6.1 (HM-6.1). The bitrate savings are measured using Bjontegaard Delta (BD) rate for a spatial scalability factor of 2 and 1.5 respectively when compared with simulcast anchors. It is observed that the proposed approach provides an average luma BD rate gains of 33.7% and 50.5% respectively.
An improved hypothetical reference decoder for HEVC
Sachin Deshpande, Miska M. Hannuksela, Kimihiko Kazui, et al.
Hypothetical Reference Decoder is a hypothetical decoder model that specifies constraints on the variability of conforming network abstraction layer unit streams or conforming byte streams that an encoding process may produce. High Efficiency Video Coding (HEVC) builds upon and improves the design of the generalized hypothetical reference decoder of H.264/ AVC. This paper describes some of the main improvements of hypothetical reference decoder of HEVC.
On lossless coding for HEVC
Wen Gao, Minqiang Jiang, Haoping Yu
In this paper, we first review the lossless coding mode in the version 1 of the HEVC standard that has recently finalized. We then provide a performance comparison between the lossless coding mode in the HEVC and MPEG-AVC/H.264 standards and show that the HEVC lossless coding has limited coding efficiency. To improve the performance of the lossless coding mode, several new coding tools that were contributed to JCT-VC but not adopted in version 1 of HEVC standard are introduced. In particular, we discuss sample based intra prediction and coding of residual coefficients in more detail. At the end, we briefly address a new class of coding tools, i.e., a dictionary-based coder, that is efficient in encoding screen content including graphics and text.
Edge adaptive intra field de-interlacing of video images
Vladimir Lachine, Gregory Smith, Louie Lee
Expanding image by an arbitrary scale factor and thereby creating an enlarged image is a crucial image processing operation. De-interlacing is an example of such operation where a video field is enlarged in vertical direction with 1 to 2 scale factor. The most advanced de-interlacing algorithms use a few consequent input fields to generate one output frame. In order to save hardware resources in video processors, missing lines in each field may be generated without reference to the other fields. Line doubling, known as “bobbing”, is the simplest intra field de-interlacing method. However, it may generate visual artifacts. For example, interpolation of an inserted line from a few neighboring lines by vertical filter may produce such visual artifacts as “jaggies.” In this work we present edge adaptive image up-scaling and/or enhancement algorithm, which can produce “jaggies” free video output frames. As a first step, an edge and its parameters in each interpolated pixel are detected from gradient squared tensor based on local signal variances. Then, according to the edge parameters including orientation, anisotropy and variance strength, the algorithm determines footprint and frequency response of two-dimensional interpolation filter for the output pixel. Filter’s coefficients are defined by edge parameters, so that quality of the output frame is controlled by local content. The proposed method may be used for image enlargement or enhancement (for example, anti-aliasing without resampling). It has been hardware implemented in video display processor for intra field de-interlacing of video images.
On the efficiency of image completion methods for intra prediction in video coding with large block structures
Dimitar Doshkov, Oscar Jottrand, Thomas Wiegand, et al.
Intra prediction is a fundamental tool in video coding with hybrid block-based architecture. Recent investigations have shown that one of the most beneficial elements for a higher compression performance in high-resolution videos is the incorporation of larger block structures. Thus in this work, we investigate the performance of novel intra prediction modes based on different image completion techniques in a new video coding scheme with large block structures. Image completion methods exploit the fact that high frequency image regions yield high coding costs when using classical H.264/AVC prediction modes. This problem is tackled by investigating the incorporation of several intra predictors using the concept of Laplace partial differential equation (PDE), Least Square (LS) based linear prediction and the Auto Regressive model. A major aspect of this article is the evaluation of the coding performance in a qualitative (i.e. coding efficiency) manner. Experimental results show significant improvements in compression (up to 7.41 %) by integrating the LS-based linear intra prediction.
Session 4
icon_mobile_dropdown
Depth-layer-based multiview image synthesis and coding for interactive z- and x-dimension view switching
Yu Mao, Gene Cheung, Yusheng Ji
In an interactive multiview image navigation system, a user requests switches to adjacent views as he observes the static 3D scene from different viewpoints. In response, the server transmits encoded data to enable client- side decoding and rendering of the requested viewpoint images. It is clear that there exists correlation between consecutive requested viewpoint images that can be exploited to lower transmission rate. In previous works, this is done using a pixel-based synthesis and coding approach for view-switch along the x-dimension (horizontal camera motion): given texture and depth maps of the previous view, texture pixels are individually shifted horizontally to the newly requested view, each according to its disparity value, via depth-image-based rendering (DIBR). Unknown pixels in the disoccluded region in the new view (pixels not visible in the previous view) are either inpainted, or intra-coded and transmitted by server for reconstruction at decoder. In this paper, to enable efficient view-switch along the z-dimension (camera motion into / out of the scene), we propose an alternative layer-based synthesis and coding approach. Specifically, we first divide each multiview image into depth layers, where adjacent pixels with similar depth values are grouped to the same layer. During a view-switch into the scene, spatial region in a layer is enlarged via super-resolution, where the scale factor is determined by the distance between the layer and the camera. On the other hand, during a view-switch out of the scene, spatial region in a layer is shrunk via low-pass filtering and down-sampling. Due to high quality reconstruction of depth layers in the new view via rescaling, coding and transmission of a depth layer in the new view by server is necessary only in the rare case when the quality of layer-based reconstruction is poor, saving transmission rate. Experiments show that our layer-based approach can reduce bit-rate by up to 35% compared to previous pixel-based approach.
Wyner-Ziv coding of depth maps exploiting color motion information
Matteo Salmistraro, Marco Zamarin, Søren Forchhammer
Distributed Video Coding of multi-view data and depth maps is an interesting and challenging research field, whose interest is growing thanks to the recent advances in depth estimation and the development of affordable devices able to acquire depth information. In applications like video surveillance and object tracking, the availability of depth data can be beneficial and allow for more accurate processing. In these scenarios, the encoding complexity is typically limited and therefore distributed coding approaches are desirable. In this paper a novel algorithm for distributed compression of depth maps exploiting corresponding color information is proposed. Due to the high correlation of the motion in color and corresponding depth videos, motion information from the decoded color signal can effectively be exploited to generate accurate side information for the depth signal, allowing for higher rate-distortion performance without increasing the delay at the decoder side. The proposed scheme has been evaluated against state-of-the-art distributed video coding techniques applied on depth data. Experimental results show that the proposed algorithm can provide PSNR improvement between 2.18 dB and 3.40 dB on depth data compared to the reference DISCOVER decoder, for GOP 2 and QCIF resolution.
Multimodal image registration by iteratively searching keypoint correspondences
This paper proposes a multimodal image registration algorithm through searching the best matched keypoints by employing the global information. Keypoints are detected from images from both the reference and test images. For each test keypoint, a certain number of reference keypoints are chosen as mapping candidates. A triplet of keypoint mappings determine an affine transformation, and then it is evaluated with the similarity metric between the reference image and the transformed test image by the determined transformation. An iterative process is conducted on triplets of keypoint mappings, and for every test keypoint updates and stores its best matched reference keypoint. The similarity metric is defined to be the number of overlapped edge pixels over entire images, allowing for global information being incorporated in evaluating triplets of mappings. Experimental results show that the proposed algorithm can provide more accurate registration than existing methods on EO-IR images.
A spatially varying PSF model for Seidel aberrations and defocus
Contrary to common assumptions in the literature, the blur kernel corresponding to lens-effect blur has been demonstrated to be spatially-varying across the image plane. Existing models for the corresponding point spread function (PSF) are either parameterized and spatially-invariant, or spatially-varying but ad-hoc and discretely-defined. In this paper, we develop and present a novel, spatially-varying, parameterized PSF model that accounts for Seidel aberrations and defocus in an imaging system. We also demonstrate that the parameters of this model can easily be determined from a set of discretely-defined PSF observations, and that the model accurately describes the spatial variation of the PSF from a test camera.
Session 5
icon_mobile_dropdown
Cubic-panorama image dataset analysis for storage and transmission
Saeed Salehi, Eric Dubois
In this paper we address the problem of disparity estimation required for free navigation in acquired cubicpanorama image datasets. A client server based scheme is assumed and a remote user is assumed to seek information at each navigation step. The initial compression of such image datasets for storage as well as the transmission of the required data is addressed in this work. Regarding the compression of such data for storage, a fast method that uses properties of the epipolar geometry together with the cubic format of panoramas is used to estimate disparity vectors efficiently. Assuming the use of B pictures, the concept of forward and backward prediction is addressed. Regarding the transmission stage, a new disparity vector transcoding-like scheme is introduced and a frame conversion scenario is addressed. Details on how to pick the best vector among candidate disparity vectors is explained. In all the above mentioned cases, results are compared both visually through error images as well as using the objective measure of Peak Signal to Noise Ratio (PSNR) versus time.
Efficient streaming of stereoscopic depth-based 3D videos
Dogancan Temel, Mohammed Aabed, Mashhour Solh, et al.
In this paper, we propose a method to extract depth from motion, texture and intensity. We first analyze the depth map to extract a set of depth cues. Then, based on these depth cues, we process the colored reference video, using texture, motion, luminance and chrominance content, to extract the depth map. The processing of each channel in the YCRCB-color space is conducted separately. We tested this approach on different video sequences with different monocular properties. The results of our simulations show that the extracted depth maps generate a 3D video with quality close to the video rendered using the ground truth depth map. We report objective results using 3VQM and subjective analysis via comparison of rendered images. Furthermore, we analyze the savings in bitrate as a consequence of eliminating the need for two video codecs, one for the reference color video and one for the depth map. In this case, only the depth cues are sent as a side information to the color video.
Interactive Paper Session
icon_mobile_dropdown
Block-layer bit allocation for quality constrained video encoding based on constant perceptual quality
Chao Wang, Xuanqin Mou, Wei Hong, et al.
In lossy image/video encoding, there is a compromise between the number of bits (rate) and the extent of distortion. Bits need to be properly allocated to different sources, such as frames and macro blocks (MBs). Since the human eyes are more sensitive to the difference than the absolute value of signals, the MINMAX criterion suggests to minimizing the maximum distortion of the sources to limit quality fluctuation. There are many works aimed to such constant quality encoding, however, almost all of them focus on the frame layer bit allocation, and use PSNR as the quality index. We suggest that the bit allocation for MBs should also be constrained in the constant quality, and furthermore, perceptual quality indices should be used instead of PSNR. Based on this idea, we propose a multi-pass block-layer bit allocation scheme for quality constrained encoding. The experimental results show that the proposed method can achieve much better encoding performance. Keywords: Bit allocation, block-layer, perceptual quality, constant quality, quality constrained
A bilateral hole filling algorithm for time-of-flight depth camera
In this paper, we present a simple but e ective hole lling algorithm for depth images acquired by time-of- ight (ToF) cameras. The proposed algorithm recovers a hole region of a depth image by taking into account contour pixels surrounding the hole region. In particular, eight contour pixels are selected and then grouped into four pairs according to the four representative directions, i.e. horizontal, vertical, and two diagonal directions. The four pairs of contour pixels are then combined via a bilateral filtering framework in which the filter coefficients are obtained by considering the photometric distance between the two depth pixels in each pair and the geometric distance between the hole pixel and the contour pixels. The experimental results demonstrate that the proposed algorithm e ectively recovers depth edges disconnected by the hole region.
Front Matter: volume 8666
icon_mobile_dropdown
Front Matter: Volume 8666
This PDF file contains the front matter associated with SPIE Proceedings Volume 8666, including the Title Page, Copyright Information, Table of Contents, and the Conference Committee listing.