Proceedings Volume 7543

Visual Information Processing and Communication

cover
Proceedings Volume 7543

Visual Information Processing and Communication

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 17 January 2010
Contents: 12 Sessions, 35 Papers, 0 Presentations
Conference: IS&T/SPIE Electronic Imaging 2010
Volume Number: 7543

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 7543
  • Image and Video Coding I
  • GPU-based Processing
  • Multiview Imaging and 3D
  • Distributed Coding
  • Image and Video Coding II
  • H.264/AVC Video Coding I
  • Computer Vision and Tracking
  • Keynote Session II
  • H.264/AVC Video Coding II
  • Image and Video Processing
  • Interactive Paper Session
Front Matter: Volume 7543
icon_mobile_dropdown
Front Matter: Volume 7543
This PDF file contains the front matter associated with SPIE Proceedings Volume 7543, including the Title Page, Copyright information, Table of Contents, and the Conference Committee listing.
Image and Video Coding I
icon_mobile_dropdown
Improved video coding efficiency exploiting tree-based pixelwise coding dependencies
Giuseppe Valenzise, Antonio Ortega
In a conventional hybrid video coding scheme, the choice of encoding parameters (motion vectors, quantization parameters, etc.) is carried out by optimizing frame by frame the output distortion for a given rate budget. While it is well known that motion estimation naturally induces a chain of dependencies among pixels, this is usually not explicitly exploited in the coding process in order to improve overall coding efficiency. Specifically, when considering a group of pictures with an IPPP... structure, each pixel of the first frame can be thought of as the root of a tree whose children are the pixels of the subsequent frames predicted by it. In this work, we demonstrate the advantages of such a representation by showing that, in some situations, the best motion vector is not the one that minimizes the energy of the prediction residual, but the one that produces a better tree structure, e.g., one that can be globally more favorable from a rate-distortion perspective. In this new structure, pixel with a larger descendance are allocated extra rate to produce higher quality predictors. As a proof of concept, we verify this assertion by assigning the quantization parameter in a video sequence in such a way that pixels with a larger number of descendants are coded with a higher quality. In this way we are able to improve RD performance by nearly 1 dB. Our preliminary results suggest that a deeper understanding of the temporal dependencies can potentially lead to substantial gains in coding performance.
Anisotropic multiscale sparse learned bases for image compression
Angélique Drémeau, Cédric Herzet, Christine Guillemot, et al.
This paper proposes a new compression algorithm based on multi-scale learned bases. We first explain the construction of a set of image bases using a bintree segmentation and the optimization procedure used to select the image basis from this set. We then present the sparse orthonormal transforms introduced by Sezer et al.1 and propose some extensions tending to improve the convergence of the learning algorithm on the one hand and to adapt the transforms to the coding scheme used on the other hand. Comparisons in terms of rate-distortion performance are finally made with the current compression standards JPEG and JPEG2000.
Variable block size transforms with higher order kernels for ultra-high definition video coding
Bumshik Lee, Sangsoo Ahn, Munchurl Kim, et al.
In this paper, 16-order and 32-order integer transform kernels are designed for the HD video coding in H.264|MPEG-4 AVC and the performance analyses for large transforms are presented. An adaptive block size transform coding scheme is also proposed based on the proposed transform kernels. Thus, additional 16-order (16 × 16, 16 × 8 and 8×16) and 32-order (32×32, 32×16 and 16×32) transforms are performed in addition to 8×8 and 4×4 transforms which are exploited in the Fidelity Range Extension of H.264|MPEG-4 AVC. The experimental results show that the variable block size transforms with the proposed higher order transform kernels yields 14.96% of bit saving in maximum for HD video sequences.
GPU-based Processing
icon_mobile_dropdown
Beyond pixels: applying the GPU to accelerate computer vision
As a massively parallel processor, the GPU is well-suited for performing 'per-pixel' operations in image processing and computer vision. New developments in hardware, software, and algorithm mappings now allow entire vision algorithms to be performed exclusively on GPU. In this paper we present the GPU mapping of a natural image feature processing pipeline used in an image stitching application. We examine how to utilize hardware features on the GPU for efficient processing to demonstrate how GPU programming now goes beyond per-pixel mappings, and is providing speedups in image feature processing, and matching.
A CUDA implementation of thumbnail-assisted decoder motion search for error concealment
We describe the real-time CUDA implementation of an error concealment algorithm for high definition video at 720p. The concealment method is based on decoder motion search on high resolution frame, using a thumbnail as a guide, and is therefore comparable in complexity as encoder motion search. We discuss the different requirements for decoder motion search compared to encoder search, and present a fast motion search algorithm suitable for parallel implementation in GPU. The design of the real-time CUDA implementation and its performance analysis are also presented.
GPU-aided motion adaptive video deinterlacing
Xiaolin Wu, Jie Cao
In most applications, video deinterlacing has to be performed in real time. Numerous algorithms have been developed to strike a good balance between throughput and quality. The motion adaptive deinterlacing algorithm switches between two modes: direct merging of two fields in areas of no motion, or intrafield adaptive interpolation when motions are detected. In this paper, we propose a fast GPU-aided implementation of a motion adaptive deinterlacing algorithm using NVIDIA CUDA (Compute Unified Device Architecture) technology. We discuss the techniques of adapting the computations in motion detection and adaptive directional interpolation to the GPU architecture for maximum video throughput possible. The objective is to fully utilize the processing power of GPU without compromising the visual quality of the deinterlaced video. Experimental results are reported and discussed to demonstrate the performance of the proposed GPU-aided motion adaptive video deinterlacer in both speed and visual quality.
GPU implementation of JPEG XR
Ming-Chao Che, Jie Liang
JPEG XR (formerly Microsoft Windows Media Photo and HD Photo) is the latest image coding standard. By integrating various advanced technologies such as integer hierarchical lapped transform, context adaptive Huffman coding, and high dynamic range coding, it achieves competitive performance to JPEG-2000, but with lower computational complexity and memory requirement. In this paper, the GPU implementation of the JPEG XR codec using NVIDIA CUDA (Compute Unified Device Architecture) technology is investigated. Design considerations to speed up the algorithm are discussed, by taking full advantage of the properties of the CUDA framework and JPEG XR. Experimental results are presented to demonstrate the performance of the GPU implementation.
Multiview Imaging and 3D
icon_mobile_dropdown
Geometry-based block partitioning for efficient intra prediction in depth video coding
Min-Koo Kang, Jaejoon Lee, Jin Young Lee, et al.
In 3D video system including depth information, once a depth video is coded by the state-of-the-art video compression tools such as H.264/AVC, depth errors around the boundaries of objects can be intensified, and these can significantly affect the quality of rendered virtual view later. Despite this drawback of depth video coding, its compression is essential because of the enormous amount of input data in 3D video system. In this paper, we propose a line-based partitioned intra prediction method which exploits geometric redundancy of depth video for an efficient compression without significant errors around boundaries. The proposed algorithm can efficiently divide the current coded block into two partitioned regions, and the algorithm independently predicts each region with previously coded neighbor pixel information. Finally, the generated prediction mode adaptively alternates the conventional DC intra prediction mode. To evaluate the intra prediction performances, we have implemented the proposed method into H.264/AVC intra prediction scheme. Experimental results have demonstrated that our proposed method provides higher coding performance. The coding performance for depth video compression itself was up to 3.71% bit-saving or 0.309dB on maximum peak signalto- noise ratio (PSNR) gain among proper depth sequences which contain line-like boundaries.
Depth map coding with distortion estimation of rendered view
Woo-Shik Kim, Antonio Ortega, PoLin Lai, et al.
New data formats that include both video and the corresponding depth maps, such as multiview plus depth (MVD), enable new video applications in which intermediate video views (virtual views) can be generated using the transmitted/stored video views (reference views) and the corresponding depth maps as inputs. We propose a depth map coding method based on a new distortion measurement by deriving relationships between distortions in coded depth map and rendered view. In our experiments we use a codec based on H.264/AVC tools, where the rate-distortion (RD) optimization for depth encoding makes use of the new distortion metric. Our experimental results show the efficiency of the proposed method, with coding gains of up to 1.6 dB in interpolated frame quality as compared to encoding the depth maps using the same coding tools but applying RD optimization based on conventional distortion metrics.
Multiple description coding of 3D dynamic meshes based on temporal subsampling
M. Oguz Bici, Gozde Bozdagi Akar
In this paper, we propose a Multiple Description Coding (MDC) method for reliable transmission of compressed time consistent 3D dynamic meshes. It trades off reconstruction quality for error resilience to provide the best expected reconstruction of 3D mesh sequence at the decoder side. The method is based on partitioning the mesh frames into two sets by temporal subsampling and encoding each set independently by a 3D dynamic mesh coder. The encoded independent bitstreams or so-called descriptions are transmitted independently. The 3D dynamic mesh coder is based on predictive coding with spatial and temporal layered decomposition. In addition, the proposed method allows for different redundancy allocations by including a number of encoded spatial layers of the frames in the other set. The algorithm is evaluated with redundancy-rate-distortion curves and it is shown that, when one of the descriptions is lost, acceptable quality can be achieved with around 50% redundancy.
Distributed Coding
icon_mobile_dropdown
Compression efficiency analysis of Wyner-Ziv video coding with motion compensated side information interpolation
João Ascenso, Catarina Brites, Fernando Pereira
The Wyner-Ziv video coding (WZVC) rate distortion performance is highly dependent on the quality of the side information, an estimation of the original frame, created at the decoder. This paper, characterizes the WZVC efficiency when motion compensated frame interpolation (MCFI) techniques are used to generate the side information, a difficult problem in WZVC especially because the decoder only has available some reference decoded frames. The proposed WZVC compression efficiency rate model relates the power spectral of the estimation error to the accuracy of the MCFI motion field. Then, some interesting conclusions may be derived related to the impact of the motion field smoothness and the correlation to the true motion trajectories on the compression performance.
Image and Video Coding II
icon_mobile_dropdown
A second-order-residual (SOR) coding approach to high-bit-rate video compression
A novel video compression scheme that exploits the idea of second-order-residual (SOR) coding is proposed for high-bit-rate video applications in this work. We first study the limitation of today's high performance video coding standard, H.264/AVC, and show that it is not effective in the coding of small image features and variations for high-bit-rate video contents. For low to medium quality video streams, these small image features can be removed by the quantization process. However, when the quantization stepsize becomes small in high-bit-rate video, their existence degrades the rate-distortion coding performance significantly. To address this problem, we propose a coding scheme that decomposes the residual signals into two layers: the first-order-residual (FOR) and the second-order-residual (SOR). The FOR contains low frequency residuals while the SOR contains the high frequency residuals. We adopt the H.264/AVC for the FOR coding and propose two schemes, called SOR-freq and SOR-bp, for the SOR coding. It is shown by experimental results that the proposed FOR/SOR scheme outperforms H.264/AVC by a significant margin (with about 20% bit rate saving) in high-bit-rate video.
H.264/AVC Video Coding I
icon_mobile_dropdown
Phase refinement for image prediction based on sparse representation
Aurélie Martin, Jean-Jacques Fuchs, Christine Guillemot, et al.
In this work, we propose the use of sparse signal representation techniques to solve the problem of closed-loop spatial image prediction. The reconstruction of signal in the block to predict is based on basis functions selected with the Matching Pursuit (MP)i terative algorithm, to best match a causal neighborhood. We evaluate this new method in terms of PSNR and bitrate in a H.264 / AVC encoder. Experimental results indicate an improvement of rate-distortion performance. In this paper, we also present results concerning the use of phase correlation to improve the reconstruction trough shifted-basis functions.
Prediction matching for video coding
Yunfei Zheng, Peng Yin, Òscar Divorra Escoda, et al.
Modern video coding schemes such as H.264/AVC employ multi-hypothesis motion compensation for an improved coding efficiency. However, an additional cost has to be paid for the improved prediction performance in these schemes. Based on the observed high correlation among the multiple hypothesis in H.264/AVC, in this paper, we propose a new method (Prediction Matching) to jointly combine explicit and implicit prediction approaches. The first motion hypothesis on a predicted block is explicitly coded, while the eventual additional hypotheses are implicitly derived at the decoder based on the first one and the available data from previously decoded frames. Thus, the overhead to indicate motion information is reduced, while prediction accuracy may be better with respect to fully implicit multi-hypothesis prediction. Proof-of-concept simulation results show that up to 7.06% bitrate saving with respect to state-of-the-art H.264/AVC can be achieved using our Prediction Matching.
Computer Vision and Tracking
icon_mobile_dropdown
Automatic pose initialization of swimmers in videos
Christian X. Ries, Rainer Lienhart
We propose an approach to the task of automatic pose initialization of swimmers in videos. Thus, our goal is to detect a swimmer inside a target video and assign an estimated position to her/his body parts. We first apply a non-skin-color filter to reduce the search space inside each target frame. We then match previously devised template sequences of Gaussian feature descriptors against sequences of feature vectors which are computed within the remaining image regions. Finally, relative average joint positions from annotated images featuring the key pose are assigned to the detection result and three-dimensional joint positions are estimated. We present detection results for test videos of three different swim strokes and examine the performance of four types of feature descriptors.
A kinematic model for Bayesian tracking of cyclic human motion
Thomas Greif, Rainer Lienhart
We introduce a two-dimensional kinematic model for cyclic motions of humans, which is suitable for the use as temporal prior in any Bayesian tracking framework. This human motion model is solely based on simple kinematic properties: the joint accelerations. Distributions of joint accelerations subject to the cycle progress are learned from training data. We present results obtained by applying the introduced model to the cyclic motion of backstroke swimming in a Kalman filter framework that represents the posterior distribution by a Gaussian. We experimentally evaluate the sensitivity of the motion model with respect to the frequency and noise level of assumed appearance-based pose measurements by simulating various fidelities of the pose measurements using ground truth data.
A Viterbi tracker for local features
Gary Baugh, Anil Kokaram
The long term tracking of sparse local features in an image is important for many applications including camera calibration for stereo applications, camera or global motion estimation and people surveillance. The majority of existing tracking frameworks are based on some kind of prediction/correction idea e.g. KLT and Particle Filters. However, given a careful selection of interest points throughout the sequence, the problem of tracking can be solved with the Viterbi algorithm. This work introduces a novel approach to interest point selection for tracking using the Mean Shift algorithm over short time windows. The resulting points are then articulated within a Viterbi algorithm for creating very long term tracking data. The tracks are shown to be more accurate than traditional KLT implementations and also do not suffer from accumulation of error with time.
Object tracking initialization using automatic moving object detection
In this paper we present new methods for object tracking initialization using automated moving object detection based on background subtraction. The new methods are integrated into the real-time object tracking system we previously proposed. Our proposed new background model updating method and adaptive thresholding are used to produce a foreground object mask for object tracking initialization. Traditional background subtraction method detects moving objects by subtracting the background model from the current image. Compare to other common moving object detection algorithms, background subtraction segments foreground objects more accurately and detects foreground objects even if they are motionless. However, one drawback of traditional background subtraction is that it is susceptible to environmental changes, for example, gradual or sudden illumination changes. The reason of this drawback is that it assumes a static background, and hence a background model update is required for dynamic backgrounds. The major challenges then are how to update the background model, and how to determine the threshold for classification of foreground and background pixels. We proposed a method to determine the threshold automatically and dynamically depending on the intensities of the pixels in the current frame and a method to update the background model with learning rate depending on the differences of the pixels in the background model and the previous frame.
Keynote Session II
icon_mobile_dropdown
Image analysis and compression: renewed focus on texture
Thrasyvoulos N. Pappas, Jana Zujovic, David L. Neuhoff
We argue that a key to further advances in the fields of image analysis and compression is a better understanding of texture. We review a number of applications that critically depend on texture analysis, including image and video compression, content-based retrieval, visual to tactile image conversion, and multimodal interfaces. We introduce the idea of "structurally lossless" compression of visual data that allows significant differences between the original and decoded images, which may be perceptible when they are viewed side-by-side, but do not affect the overall quality of the image. We then discuss the development of objective texture similarity metrics, which allow substantial point-by-point deviations between textures that according to human judgment are essentially identical.
H.264/AVC Video Coding II
icon_mobile_dropdown
Texture refinement framework for improved video coding
F. Racapé, M. Babel, O. Déforges, et al.
H.264/AVC standard offers an efficient way of reducing the noticeable artefacts of former video coding schemes, but it can be perfectible for the coding of detailed texture areas. This paper presents a conceptual coding framework, utilizing visual perception redundancy, which aims at improving both bit-rate and quality on textured areas. The approach is generic and can be integrated into usual coding scheme. The proposed scheme is divided into three steps: a first algorithm analyses texture regions, with an eye to build a dictionary of the most representative texture sub-regions (RTS). The encoder preserves then them at a higher quality than the rest of the picture, in order to enable a refinement algorithm to finally spread the preserved information over textured areas. In this paper, we present a first solution to validate the framework, detailing then the encoder side in order to define a simple method for dictionary building. The proposed H.264/AVC compliant scheme creates a dictionary of macroblocks
Smoothed reference inter-layer texture prediction for bit depth scalable video coding
Zhan Ma, Jiancong Luo, Peng Yin, et al.
We present a smoothed reference inter-layer texture prediction mode for bit depth scalability based on the Scalable Video Coding extension of the H.264/MPEG-4 AVC standard. In our approach, the base layer encodes an 8-bit signal that can be decoded by any existing H.264/MPEG-4 AVC decoder and the enhancement layer encodes a higher bit depth signal (e.g. 10/12-bit) which requires a bit depth scalable decoder. The approach presented uses base layer motion vectors to conduct motion compensation upon enhancement layer reference frames. Then, the motion compensated block is tone mapped and summed with the co-located base layer residue block prior to being inverse tone mapped to obtain a smoothed reference predictor. In addition to the original inter-/intra-layer prediction modes, the smoothed reference prediction mode enables inter-layer texture prediction for blocks with inter-coded co-located block. The proposed method is designed to improve the coding efficiency for sequences with non-linear tone mapping, in which case we have gains up to 0.4dB over the CGS-based BDS framework.
An enhancement of H.264 coding mode for R-D optimization of ultra-high-resolution video coding under low bit rate
Tomonobu Yoshino, Sei Naito, Shigeyuki Sakazawa, et al.
So far an efficient coding scheme for ultra high resolution (8K) video under a low bit-rate condition has not yet been proposed. Within H.264 coding framework, highly efficient coding is realized due to the optimal control for the macroblock (MB) coding mode decision. However, coding modes available in H.264 coding are not necessarily appropriate for 8K full resolution coding under the considerably low bit-rate condition, and satisfactory coding performance cannot be achieved within H.264. In this paper, we propose to define the extended coding mode from the analytical result of R-D performance by conventional coding modes. From coding experiments, it was confirmed that the maximum coding gain reached 0.18dB at the target bit-rate assumed in this study.
Image and Video Processing
icon_mobile_dropdown
Image deblurring and denoising with non-local regularization constraint
Peter van Beek, Junlan Yang, Shuhei Yamamoto, et al.
In this paper, we investigate the use of the non-local means (NLM) denoising approach in the context of image deblurring and restoration. We propose a novel deblurring approach that utilizes a non-local regularization constraint. Our interest in the NLM principle is its potential to suppress noise while effectively preserving edges and texture detail. Our approach leads to an iterative cost function minimization algorithm, similar to common deblurring methods, but incorporating update terms due to the non-local regularization constraint. The dataadaptive noise suppression weights in the regularization term are updated and improved at each iteration, based on the partially denoised and deblurred result. We compare our proposed algorithm to conventional deblurring methods, including deblurring with total variation (TV) regularization. We also compare our algorithm to combinations of the NLM-based filter followed by conventional deblurring methods. Our initial experimental results demonstrate that the use of NLM-based filtering and regularization seems beneficial in the context of image deblurring, reducing the risk of over-smoothing or suppression of texture detail, while suppressing noise. Furthermore, the proposed deblurring algorithm with non-local regularization outperforms other methods, such as deblurring with TV regularization or separate NLM-based denoising followed by deblurring.
Image reconstruction from videos distorted by atmospheric turbulence
Xiang Zhu, Peyman Milanfar
To correct geometric distortion and reduce blur in videos that suffer from atmospheric turbulence, a multi-frame image reconstruction approach is proposed in this paper. This approach contains two major steps. In the first step, a B-spline based non-rigid image registration algorithm is employed to register each observed frame with respect to a reference image. To improve the registration accuracy, a symmetry constraint is introduced, which penalizes inconsistency between the forward and backward deformation parameters during the estimation process. A fast Gauss-Newton implementation method is also developed to reduce the computational cost of the registration algorithm. In the second step, a high quality image is restored from the registered observed frames under a Bayesian reconstruction framework, where we use L1 norm minimization and a bilateral total variation (BTV) regularization prior, to make the algorithm more robust to noise and estimation error. Experiments show that the proposed approach can effectively reduce the influence of atmospheric turbulence even for noisy videos with relatively long exposure time.
Adaptive motion estimation using warping for video frame rate up-conversion
Ying Chen, Mark J.T. Smith, Edward Delp
In this paper, a new motion compensated frame interpolation method is proposed based on the reliability of motion vectors determined by the block residual energy. Additional motion re-estimation is applied to those blocks where unreliable motion vectors are detected. The motion estimation algorithm employed in this work combines block-based motion estimation with optical flow-based estimation, resulting in a more accurate representation with only modest computational complexity. The experimental results show that it can improve the visual quality of the interpolated frames where competing methods fail.
Interactive Paper Session
icon_mobile_dropdown
Adaptation of H.264/AVC predictions for enabling fast transrating
Philippe Bordes, Safa Cherigui
Fast video transrating algorithms for DCT-based video coding standards have proven their efficiency in many applications and are widely used in the industry. However, they cannot be re-used for H.264/AVC because they introduce an unacceptable level of drift. To settle this issue, this paper proposes to adapt the H.264/AVC predictions by separately processing the DC component from the other AC coefficients. This allows the drift to be removed from the requantization transrating algorithms. Experimental results show the amount of bits in our prediction scheme is only increased by 2.46 % for CIF and 1.87% for 720p in Intra in comparison with the H.264/AVC codec under the same PSNR. The performance of the fast transrating algorithms applied on streams generated with our method are improved dramatically, allowing to directly compete with the best in class, but computation load demanding Cascaded Pixel Domain decode and recode Transcoding (CPDT) architecture. Additionally, one potential application induced by this new prediction principle is the partial decoding of video streams to obtain reduced size images.
Exact JPEG recompression
Andrew B. Lewis, Markus G. Kuhn
We present a variant of the JPEG baseline image compression algorithm optimized for images that were generated by a JPEG decompressor. It inverts the computational steps of one particular JPEG decompressor implementation (Independent JPEG Group, IJG), and uses interval arithmetic and an iterative process to infer the possible values of intermediate results during the decompression, which are not directly evident from the decompressor output due to rounding. We applied our exact recompressor on a large database of images, each compressed at ten different quality factors. At the default IJG quality factor 75, our implementation reconstructed the exact quantized transform coefficients in 96% of the 64-pixel image blocks. For blocks where exact reconstruction is not feasible, our implementation can output transform-coefficient intervals, each guaranteed to contain the respective original value. Where different JPEG images decompress to the same result, we can output all possible bit-streams. At quality factors 90 and above, exact recompression becomes infeasible due to combinatorial explosion; but 68% of blocks still recompressed exactly.
Seamless heterogeneous tessellation via smoothing and mosaicking in the DWT domain
K. Hayat, W. Puech, G. Gesquiere
In this paper we propose a strategy for seamless tessellation, of varying resolution tiles, based on smoothing and mosaicking in the DWT domain. The scenario involves a tessellation with three different tile qualities or levels of detail (LOD), at a given instant, depending on the viewpoint distance, the time of rendering and hardware resources. The LOD is dependent on the multiresolution characteristic of wavelets from the now widely accepted JPEG2000 codec. Taking the change in viewpoint focus, analogous to a window sliding approach, we believe that at worst the window may come up to be composed of three different tile qualities with the resultant artifacts at tile interfaces. To dilute these artifacts, we treat the tiles at the subband level, in the DWT domain, by employing operations involving suitable subband-sized composite masks, conceived with smoothing and mosaicking in perspective. The resultant composite subbands are subjected to a global inverse DWT to get the final seamless tessellation.
Video coding mode decision as a classification problem
Rashad Jillani, Urvang Joshi, Chiranjib Bhattacharya, et al.
In this paper, we show that it is possible to reduce the complexity of Intra MB coding in H.264/AVC based on a novel chance constrained classifier. Using the pairs of simple mean-variances values, our technique is able to reduce the complexity of Intra MB coding process with a negligible loss in PSNR. We present an alternate approach to address the classification problem which is equivalent to machine learning. Implementation results show that the proposed method reduces encoding time to about 20% of the reference implementation with average loss of 0.05 dB in PSNR.
JP3D compressed-domain watermarking of volumetric medical data sets
Azza Ouled Zaid, Achraf Makhloufi, Christian Olivier
Increasing transmission of medical data across multiple user systems raises concerns for medical image watermarking. Additionaly, the use of volumetric images triggers the need for efficient compression techniques in picture archiving and communication systems (PACS), or telemedicine applications. This paper describes an hybrid data hiding/compression system, adapted to volumetric medical imaging. The central contribution is to integrate blind watermarking, based on turbo trellis-coded quantization (TCQ), to JP3D encoder. Results of our method applied to Magnetic Resonance (MR) and Computed Tomography (CT) medical images have shown that our watermarking scheme is robust to JP3D compression attacks and can provide relative high data embedding rate whereas keep a relative lower distortion.
Improved quantization index modulation based watermarking integrated to JPEG2000 coding scheme
Azza Ouled Zaid, Achraf Makhloufi, Christian Olivier
In recent years it has been recognized that embedding information in wavelet transform domain leads to more robust watermarks. In particular, several approaches have been proposed to address the problem of watermark embedding combined to wavelet based image coding. In this paper, we present an alternative to quantization based blind watermarking strategy in the framework of JPEG2000 still image compression. The central contribution is the proposal of modified Quantization Index Modulation watermark design to reduce fidelity problem. We also show that the proposed watermarking scheme exhibits a high robustness with respect to JPEG2000 compression and Gaussian noise attacks. After detailing the proposed solution, system performance on image quality as well as robustness will be evaluated.
Dynamic algorithm for correlation noise estimation in distributed video coding
Kuganeswaran Thambu, Xavier Fernando, Ling Guan
Low complexity encoders at the expense of high complexity decoders are advantageous in wireless video sensor networks. Distributed video coding (DVC) achieves the above complexity balance, where the receivers compute Side information (SI) by interpolating the key frames. Side information is modeled as a noisy version of input video frame. In practise, correlation noise estimation at the receiver is a complex problem, and currently the noise is estimated based on a residual variance between pixels of the key frames. Then the estimated (fixed) variance is used to calculate the bit-metric values. In this paper, we have introduced the new variance estimation technique that rely on the bit pattern of each pixel, and it is dynamically calculated over the entire motion environment which helps to calculate the soft-value information required by the decoder. Our result shows that the proposed bit based dynamic variance estimation significantly improves the peak signal to noise ratio (PSNR) performance.
A novel embedding technique for dirty paper trellis codes watermarking
Dirty Paper Trellis Codes (DPTC) watermarking, published in 2004, is a very efficient high rate scheme. Nevertheless, it has two strong drawbacks: its security weakness and its CPU computation complexity. We propose an embedding space at least as secure and a faster embedding. The embedding space is built on the projections of some wavelet coefficients onto secret carriers. It keeps a good security level and has also good psycho-visual properties. The embedding is based on a dichotomous rotation in the Cox, Miller and Boom Plane. It gives better performances than previous fast embedding approaches. Four different attacks are performed and revealed good robustness and rapidity performances.
A sliced synchronous iteration architecture for real-time global stereo matching
Soon Kwon, ChungHee Lee, Young-Chul Lim, et al.
In this paper, we present a low memory-cost message iteration architecture for a fast belief propagation(BP) algorithm. To meet the real-time goal, our architecture basically follows multi-scale BP method and truncated linear smoothness cost model. We observe that the message iteration process in BP requires a huge intermediate buffer to store four directional messages of the whole node. Therefore, instead of updating all the node messages in each iteration sequence, we propose that individual node could be completed iteration process in ahead and consecutively execute it node by node. The key ideas in this paper focus on both maximizing architecture's parallelism and minimizing implementation cost overhead. Therefore, we first apply a pipelined architecture to each iteration stage that is executed independently. Note that pipelining makes it faster message throughput at a single iteration cycle rather than consuming whole iteration cycle time as previously. We also make multiple message update nodes as a minimal processing unit to maximize the parallelism. For the multi-scale BP method, the proposed parallel architecture does not cause additional execution time for processing the nodes in the down-scaled Markov Random Field(MRF). Considering VGA image size, 4 iterations per each scale and 64 disparity levels, our approach can reduce memory complexity by 99.7% and make it 340 times faster than the general multi-scale BP architecture.