Proceedings Volume 7073

Applications of Digital Image Processing XXXI

cover
Proceedings Volume 7073

Applications of Digital Image Processing XXXI

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 8 October 2008
Contents: 13 Sessions, 75 Papers, 0 Presentations
Conference: Optical Engineering + Applications 2008
Volume Number: 7073

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 7073
  • Implementation and Processing Scenarios I
  • Design and Implementation of Color Transformations
  • Advanced Video Coding
  • Mobile Video and Applications
  • Rate Control for Video Systems
  • Performance Evaluation in Image Coding Applications
  • Initiatives in Image Coding and Accessories and Applications
  • Biomedical Applications
  • Optical and Digital Image Processing Systems
  • Implementation and Processing Scenarios II
  • Implementation and Processing Scenarios III
  • Poster Session
Front Matter: Volume 7073
icon_mobile_dropdown
Front Matter: Volume 7073
This PDF file contains the front matter associated with SPIE Proceedings Volume 7073, including the Title Page, Copyright information, Table of Contents, and the Conference Committee listing.
Implementation and Processing Scenarios I
icon_mobile_dropdown
An improved image scene registration with wavelet preprocessing
Subpixel scene registration is useful for certain image processing applications. The scene shifts do not necessarily have to be integer shifts. In this paper, we present an image registration approach that is based on the wavelet decomposition and the Fitts correlation algorithm. The original Fitts algorithm is ideal for small-scale translations. A successful image-based tracker using Fitts correlation for position measurement will require additional modifications to the original algorithm to enable it to perform the small-scale translations. We used a wavelet transform to preprocess the images before performing scene registration.
Pattern recognition of an implicitly given target
Classic correlation filters for detection and location estimation of a reference require explicit information about the object to be recognized. In this work we assume that a reference signal is not available. However, we suppose that the target is placed at unknown coordinates in a noisy reference image. Optimal correlation filters with respect to signal-to-noise ratio and peak-to-output energy for detection and localization of a target embedded into an input scene are derived. Computer simulation results obtained with proposed filters are presented and compared with those of common correlation filters.
Wavelet domain denoising by using the universal hidden Markov tree model
In this paper, a new image denoising method which is based on the uHMT(universal Hidden Markov Tree) model in the wavelet domain is proposed. The MAP (Maximum a Posteriori) estimate is adopted to deal with the ill-conditioned problem (such as image denoising) in the wavelet domain. The uHMT model in the wavelet domain is applied to construct a prior model for the MAP estimate. By using the optimization method Conjugate Gradient, the closest approximation to the true result is achieved. The results show that images restored by our method are much better and sharper than other methods not only visually but also quantitatively.
Design and Implementation of Color Transformations
icon_mobile_dropdown
CIELAB to CMYK color conversion in the transform domain for JPEG compressed images
A new technique is described for color conversions of JPEG images. For each input block of each component, the conversion for the 63 AC coefficients is processed in the transform domain instead of the spatial domain. Only the DC coefficients for each input block of the color components are transformed to the spatial domain and then processed through the traditional lookup table to create color-converted output DC coefficients for each block. Given each converted DC value for each block, the remaining 63 AC coefficients are then converted directly in the transform domain via scaling functions that are accessed via a table as a function of only the DC term. For n-dimensional color space to m-dimensional color space conversion, n component blocks create m component blocks. An IDCT can then be applied to the m component blocks to create spatial domain data or these output blocks can be quantized and entropy encoded to create JPEG compressed data in the m-dimensional color space.
Lifting-based reversible color transformations for image compression
Henrique S. Malvar, Gary J. Sullivan, Sridhar Srinivasan
This paper reviews a set of color spaces that allow reversible mapping between red-green-blue and luma-chroma representations in integer arithmetic. The YCoCg transform and its reversible form YCoCg-R can improve coding gain by over 0.5 dB with respect to the popular YCrCb transform, while achieving much lower computational complexity. We also present extensions of the YCoCg transform for four-channel CMYK pixel data. Thanks to their reversibility under integer arithmetic, these transforms are useful for both lossy and lossless compression. Versions of these transforms are used in the HD Photo image coding technology (which is the basis for the upcoming JPEG XR standard) and in recent editions of the H.264/MPEG-4 AVC video coding standard.
Faster color conversion via lookup table mesh dimension management
Converting images between color spaces is a computationally very demanding task. The conversion is based on lookup tables that have an output colorspace value defined for each node in a mesh that covers the input space. If the input color value to be converted is not a mesh node, the output is computed by interpolating values in the surrounding mesh nodes. For a three dimensional input space, such as the RGB, tetrahedral and trilinear interpolations are used. If the input space is four dimensional, quadrilinear interpolation is used. This paper discusses how to reduce the complexity of lookup table implementation by exploiting the relationships between input and output color space components and using moderate lookup table expansion to achieve significant speed advantage. For example, a CMYK to K conversion, commonly implemented using quadrilinear interpolation on a 9x9x9x9 mesh can be reduced to bilinear interpolation assuming the mesh can grow from 6561 nodes to 684288 nodes.
Continued fractions, diophantine approximations, and design of color transforms
We study a problem of approximate computation of color transforms (with real and possibly irrational factors) using integer arithmetics. We show that precision of such computations can be significantly improved if we allow input or output variables to be scaled by some constant. The problem of finding such a constant turns out to be related to the classic Diophantine approximation problem. We use this relation to explain how best scaled approximations can be derived, and provide several examples of using this technique for design of color transforms.
Complex color management using optimized nonlinear three-dimensional look-up tables
Many researchers are working on various aspects of color imaging. However, it is apparent from the papers published that it is difficult for many of them to produce high quality color matching and reproductions. While steps needed to produce calibrated color images are fairly well developed, the problem of implementation high quality color matching and reproduction continues to be a serious problem in terms of creating huge color matching data sets or complicated procedures and last the complexity of creating the appropriate mach is often very significant. To surmount this final barrier, this paper is proposes a method to reduce the complexity of the color matching and reproductions without sacrificing quality. Additionally the amount of data that is collected is also significantly reduced. The color tables are used typical color matching of many deice types. Most of the device color spaces are nonlinear and matching and characterization is done in changing conditions. Typical color image processing techniques use profiles consisting of sparse multi-dimensional lookup tables that interpolate between adjacent nodes to prepare an image for rendering. Due to the proliferation of low-cost colour devices (digital colour cameras, scanners, printers etc.) during the last few years, colour calibration has become an important issue.
Advanced Video Coding
icon_mobile_dropdown
Low-complexity hierarchical lapped transform for lossy-to-lossless image coding in JPEG XR / HD Photo
Chengjie Tu, Sridhar Srinivasan, Gary J. Sullivan, et al.
JPEG XR is a draft international standard undergoing standardization within the JPEG committee, based on a Microsoft technology known as HD Photo. One of the key innovations in the draft JPEG XR standard is its integer-reversible hierarchical lapped transform. The transform can provide both bit-exact lossless and lossy compression in the same signal flow path. The transform requires only a small memory footprint while providing the compression benefits of a larger block transform. The hierarchical nature of the transform naturally provides three levels of multi-resolution signal representation. Its small dynamic range expansion, use of only integer arithmetic and its amenability to parallelized implementation lead to reduced computational complexity. This paper provides an overview of the key ideas behind the transform design in JPEG XR, and describes how the transform is constructed from simple building blocks.
SVC overview and performance evaluation
Tobias Oelbaum, Heiko Schwarz, Mathias Wien, et al.
This paper presents an overview of the new Scalable Video Coding (SVC) Amendment of H.264/AVC and the results of a performance evaluation for this new video coding specification. Whereas temporal scalability is already enabled by the existing H.264/AVC specification, the introduction of spatial and quality scalabiliy requires new coding tools. Here, the layered structure of SVC and the main new coding tools are briefly described, and an overview of the newly defined SVC profiles and levels is provided. The second part of the paper describes a subjective evaluation that was carried out to test the efficiency of the SVC concept. The results of this evaluation show, that the coding tools introduced in the scalable extension of H.264/AVC provide a reasonable degree of spatial and quality scalability at very low costs in terms of additional bit rate. The evaluation consisted of a series of subjective quality tests and is backed up by objective PSNR measurements. The results show, that SVC supports spatial and quality scalability with a bit rate overhead of less than or about 10%, and an indistinguishable visual quality compared to state of the art single-layer coding.
Low complexity inter-layer residual coding and efficient mode decision for spatial and temporal scalable video coding
We introduce an efficient mode selection method in the enhancement layers of spatial scalability in the SVC encoder by selectively performing the inter-layer residual coding of the SVC. The proposed method is to make an analysis of the characteristics of integer transform coefficients for the subtracted signals for two residuals from lower and upper spatial layers. Then it selectively performs the inter-layer residual prediction coding in the spatial scalability if the SAD values of inter-layer residuals exceed adaptive threshold values. Therefore, by classifying the residuals according to the properties of integer-transform coefficients only with the SAD of inter-layer residual signals between two layers, the SVC encoder can perform the inter-layer residual coding selectively, thus significantly reducing the total encoding time with 51.2% in average while maintaining the RD performance with negligible amounts of quality degradation.
Toward a 3D video format for auto-stereoscopic displays
There has been increased momentum recently in the production of 3D content for cinema applications; for the most part, this has been limited to stereo content. There are also a variety of display technologies on the market that support 3DTV, each offering a different viewing experience and having different input requirements. More specifically, stereoscopic displays support stereo content and require glasses, while auto-stereoscopic displays avoid the need for glasses by rendering view-dependent stereo pairs for a multitude of viewing angles. To realize high quality auto-stereoscopic displays, multiple views of the video must either be provided as input to the display, or these views must be created locally at the display. The former approach has difficulties in that the production environment is typically limited to stereo, and transmission bandwidth for a large number of views is not likely to be available. This paper discusses an emerging 3D data format that enables the latter approach to be realized. A new framework for efficiently representing a 3D scene and enabling the reconstruction of an arbitrarily large number of views prior to rendering is introduced. Several design challenges are also highlighted through experimental results.
High-throughput architecture for H.264/AVC CABAC encoding and decoding system
This paper presents a high-throughput and low-cost context adaptive binary arithmetic coding (CABAC) codec for H.264/AVC High and Main Profile. We analyze the similarities between CABAC encoding and decoding algorithms and propose an efficient pipeline architecture to accelerate their operations. A binary arithmetic unit is designed elaborately which integrates the three engines regular, bypass and terminate for the encoder and decoder by taking advantage of hardware sharing. The proposed CABAC codec is already integrated in a H.264 hardware core and is able to achieve real-time decoding for H.264/AVC High Profile HD level 4.1. The implemented design can operate at 265MHz with 39.2k gate count under 0.18μm silicon technology.
Modern transform design for advanced image/video coding applications
Trac D. Tran, Pankaj N. Topiwala
This paper offers an overall review of recent advances in the design of modern transforms for image and video coding applications. Transforms have been an integral part of signal coding applications from the beginning, but emphasis had been on true floating-point transforms for most of that history. Recently, with the proliferation of low-power handheld multimedia devices, a new vision of integer-only transforms that provide high performance yet very low complexity has quickly gained ascendency. We explore two key design approaches to creating integer transforms, and focus on a systematic, universal method based on decomposition into lifting steps, and use of (dyadic) rational coefficients. This method provides a wealth of solutions, many of which are already in use in leading media codecs today, such as H.264, HD Photo/JPEG XR, and scalable audio. We give early indications in this paper, and more fully elsewhere.
A fast macroblock mode decision scheme using ROI-based coding of H.264/MPEG-4 Part 10 AVC for mobile video telephony applications
Taeyoung Na, Yousun Lee, Jeongyeon Lim, et al.
We propose a fast macroblock mode decision scheme in the H.264|MPEG-4 Part 10 Advanced Video Coding (AVC) for mobile video telephony applications. In general, the face region around a speaker is considered region of interest (ROI) while the background is not importantly considered, thus being regarded as non-ROI for the video telephony situations. We usually have to consider the following two issues: (1) the platforms of mobile video telephony are usually computationally limited and the AVC codecs are computationally expensive; and (2) the allowed channel bandwidths are usually very small so the compressed video streams are transmitted with visual quality degraded. In this paper, we challenge the two issues: Firstly, a fast macroblock mode decision scheme is contrived to alleviate the computational complexity of H.264|MPEG-4 Part 10 AVC; and secondly, ROI/non-ROI coding of H.264|MPEG-4 Part 10 AVC is incorporated to enhance subjective visual quality by encoding ROI data in higher quality but non-ROI data in lower quality. Our proposed fast macroblock mode decision scheme consists of three parts: early skip mode detection, fast inter macroblock mode decision and intra prediction skip parts. The skip mode detection method is to decide whether or not to perform the rest inter macroblock modes in P-Slices. The fast inter macroblock mode method is to reduce the candidate block mode by SATDMOTION from 16x16 and 8x8 block motion estimation. The intra prediction skipping condition is set to decide whether or not to perform the 4x4 intra prediction in P-Slices using the relation between magnitude of the motion vectors of the current macroblock and the occurrence frequencies of intra predicted macroblocks. The experimental results show that the proposed scheme yields up to 51.88% of the computational complexity reduction in total encoding time with negligible amounts of PSNR drops and bit rate increments, respectively.
Apply network coding for H.264/SVC multicasting
In a packet erasure network environment, video streaming benefits from error control in two ways to achieve graceful degradation. The first approach is application-level (or the link-level) forward error-correction (FEC) to provide erasure protection. The second error control approach is error concealment at the decoder end to compensate lost packets. A large amount of research work has been done in the above two areas. More recently, network coding (NC) techniques have been proposed for efficient data multicast over networks. It was shown in our previous work that multicast video streaming benefits from NC for its throughput improvement. An algebraic model is given to analyze the performance in this work. By exploiting the linear combination of video packets along nodes in a network and the SVC video format, the system achieves path diversity automatically and enables efficient video delivery to heterogeneous receivers in packet erasure channels. The application of network coding can protect video packets against the erasure network environment. However, the rank defficiency problem of random linear network coding makes the error concealment inefficiently. It is shown by computer simulation that the proposed NC video multicast scheme enables heterogenous receiving according to their capacity constraints. But it needs special designing to improve the video transmission performance when applying network coding.
Mobile Video and Applications
icon_mobile_dropdown
Error resiliency of distributed video coding in wireless video communication
Distributed Video Coding (DVC) is a new paradigm in video coding, based on the Slepian-Wolf and Wyner-Ziv theorems. DVC offers a number of potential advantages: flexible partitioning of the complexity between the encoder and decoder, robustness to channel errors due to intrinsic joint source-channel coding, codec independent scalability, and multi-view coding without communications between the cameras. In this paper, we evaluate the performance of DVC in an error-prone wireless communication environment. We also present a hybrid spatial and temporal error concealment approach for DVC. Finally, we perform a comparison with a state-of-the-art AVC/H.264 video coding scheme in the presence of transmission errors.
Temporal flicker reduction and denoising in video using sparse directional transforms
The bulk of the video content available today over the Internet and over mobile networks suffers from many imperfections caused during acquisition and transmission. In the case of user-generated content, which is typically produced with inexpensive equipment, these imperfections manifest in various ways through noise, temporal flicker and blurring, just to name a few. Imperfections caused by compression noise and temporal flicker are present in both studio-produced and user-generated video content transmitted at low bit-rates. In this paper, we introduce an algorithm designed to reduce temporal flicker and noise in video sequences. The algorithm takes advantage of the sparse nature of video signals in an appropriate transform domain that is chosen adaptively based on local signal statistics. When the signal corresponds to a sparse representation in this transform domain, flicker and noise, which are spread over the entire domain, can be reduced easily by enforcing sparsity. Our results show that the proposed algorithm reduces flicker and noise significantly and enables better presentation of compressed videos.
Exploiting spatio-temporal characteristics of human vision for mobile video applications
Video applications on handheld devices such as smart phones pose a significant challenge to achieve high quality user experience. Recent advances in processor and wireless networking technology are producing a new class of multimedia applications (e.g. video streaming) for mobile handheld devices. These devices are light weight and have modest sizes, and therefore very limited resources - lower processing power, smaller display resolution, lesser memory, and limited battery life as compared to desktop and laptop systems. Multimedia applications on the other hand have extensive processing requirements which make the mobile devices extremely resource hungry. In addition, the device specific properties (e.g. display screen) significantly influence the human perception of multimedia quality. In this paper we propose a saliency based framework that exploits the structure in content creation as well as the human vision system to find the salient points in the incoming bitstream and adapt it according to the target device, thus improving the quality of new adapted area around salient points. Our experimental results indicate that the adaptation process that is cognizant of video content and user preferences can produce better perceptual quality video for mobile devices. Furthermore, we demonstrated how such a framework can affect user experience on a handheld device.
Mobile video processing for visual saliency map determination
The visual saliency map represents the most attractive regions in video. Automatic saliency map determination is important in mobile video applications such as autofocusing in video capturing. It is well known that motion plays a critical role in visual attention modeling. Motion in video consists of camera's motion and foreground target's motion. In determining the visual saliency map, we are concerned with the foreground target's motion. To achieve this, we evaluate the camera/global motion and then identify the moving target from the background. Specifically, we propose a three-step procedure for visual saliency map computation: 1) motion vector (MV) field filtering, 2) background extraction and 3) contrast map computation. In the first step, the mean value of the MV field is treated as the camera's motion. As a result, the MV of the background can be detected and eliminated, and the saliency map can be roughly determined. In the second step, we further remove noisy image blocks in the background and provide a refined description of the saliency map. In the third step, a contrast map is computed and integrated with the result of foreground extraction. All computations required in the our proposed algorithm are low so that they can be used in mobile devices. The accuracy and robustness of the proposed algorithm is supported by experimental results.
Scorebox extraction from mobile sports videos using Support Vector Machines
Scorebox plays an important role in understanding contents of sports videos. However, the tiny scorebox may give the small-display-viewers uncomfortable experience in grasping the game situation. In this paper, we propose a novel framework to extract the scorebox from sports video frames. We first extract candidates by using accumulated intensity and edge information after short learning period. Since there are various types of scoreboxes inserted in sports videos, multiple attributes need to be used for efficient extraction. Based on those attributes, the optimal information gain is computed and top three ranked attributes in terms of information gain are selected as a three-dimensional feature vector for Support Vector Machines (SVM) to distinguish the scorebox from other candidates, such as logos and advertisement boards. The proposed method is tested on various videos of sports games and experimental results show the efficiency and robustness of our proposed method.
Sub-component modeling for face image reconstruction in video communications
Emerging communications trends point to streaming video as a new form of content delivery. These systems are implemented over wired systems, such as cable or ethernet, and wireless networks, cell phones, and portable game systems. These communications systems require sophisticated methods of compression and error-resilience encoding to enable communications across band-limited and noisy delivery channels. Additionally, the transmitted video data must be of high enough quality to ensure a satisfactory end-user experience. Traditionally, video compression makes use of temporal and spatial coherence to reduce the information required to represent an image. In many communications systems, the communications channel is characterized by a probabilistic model which describes the capacity or fidelity of the channel. The implication is that information is lost or distorted in the channel, and requires concealment on the receiving end. We demonstrate a generative model based transmission scheme to compress human face images in video, which has the advantages of a potentially higher compression ratio, while maintaining robustness to errors and data corruption. This is accomplished by training an offline face model and using the model to reconstruct face images on the receiving end. We propose a sub-component AAM modeling the appearance of sub-facial components individually, and show face reconstruction results under different types of video degradation using a weighted and non-weighted version of the sub-component AAM.
Video quality measure for mobile IPTV service
Mobile IPTV is a multimedia service based on wireless networks with interactivity and mobility. Under mobile IPTV scenarios, people can watch various contents whenever they want and even deliver their request to service providers through the network. However, the frequent change of the wireless channel bandwidth may hinder the quality of service. In this paper, we propose an objective video quality measure (VQM) for mobile IPTV services, which is focused on the jitter measurement. Jitter is the result of frame repetition during the delay and one of the most severe impairments in the video transmission via mobile channels. We first employ YUV color space to compute the duration and occurrences of jitter and the motion activity. Then the VQM is modeled by the combination of these three factors and the result of subjective assessment. Since the proposed VQM is based on no-reference (NR) model, it can be applied for real-time applications. Experimental results show that the proposed VQM highly correlates to subjective evaluation.
Rate Control for Video Systems
icon_mobile_dropdown
Rate-controlled requantization transcoding for H.264/AVC video streams
Stijn Notebaert, Jan De Cock, Peter Lambert, et al.
Nowadays, most video material is coded using a non-scalable format. When transmitting these single-layer video bitstreams, there may be a problem for connection links with limited capacity. In order to solve this problem, requantization transcoding is often used. The requantization transcoder applies coarser quantization in order to reduce the amount of residual information in the compressed video bitstream. In this paper, we extend a requantization transcoder for H.264/AVC video bitstreams with a rate-control algorithm. A simple algorithm is proposed which limits the computational complexity. The bit allocation is based on the bit distribution in the original video bitstream. Using the bit budget and a linear model between rate and quantizer, the new quantizer is calculated. The target bit rate is attained with an average deviation lower than 6%, while the rate-distortion performance shows small improvements over transcoding without rate control.
Efficient network-aware macroblock mode decision for error resilient H.264/AVC video coding
This paper proposes a network-aware macroblock (MB) coding mode decision method, which is both error resilient and coding efficient. This method differs from traditional mode decision methods since MB mode decisions are made by simultaneously taking into account: i) their rate-distortion (RD) cost and also ii) their impact on error resilience by considering feedback information from the underlying network regarding current error characteristics. By doing so, the amount of Intra coded MBs can be varied to better suit, in a cost efficient way, the current state of the network and, therefore, further improve the decoded video quality for a given packet loss rate. The proposed approach outperforms a network-aware version of the H.264/AVC reference software with cyclic MB Intra refresh, for typical test sequences encoded at various bit rates and for several error conditions in terms of packet loss rate.
Rate controlling for color and depth based 3D video coding
B. Kamolrat, W. A. C. Fernando, M. Mrak
Rate control is a very important tool for any kind of video coding and transmission. Even though there are many rate control techniques designed for 2D video, not much work has been carried out in this area for 3D video. In this paper, a novel rate control approach for 3D video based on color and depth maps is introduced. The aim of this rate control algorithm is to keep the 3D video quality near constant. Since the 3D video is synthesized from color and depth maps and the final quality of 3D video is more influenced by color sequence rather than depth maps, the qualities of both color and depth maps are firstly varied until the target 3D quality is achieved with minimal bit rate. Subsequently, the PSNR of both color and depth maps at the optimum point are maintained for the entire group of picture (GOP). The bit-allocation problem is solved by introducing a Lagrangian optimization cost function. According to experimental results, the proposed rate control technique is capable to adjust the bit rate allocated to the color and depth maps sequences adaptively in order to maximize 3D video quality which is measured by PSNR of the synthesized left and right views using reconstructed color and depth map sequences.
Performance Evaluation in Image Coding Applications
icon_mobile_dropdown
Toward objective image quality metrics: the AIC Eval Program of the JPEG
Thomas Richter, Chaker Larabi
Objective quality assessment of lossy image compression codecs is an important part of the recent call of the JPEG for Advanced Image Coding. The target of the AIC ad-hoc group is twofold: First, to receive state-of-the-art still image codecs and to propose suitable technology for standardization; and second, to study objective image quality metrics to evaluate the performance of such codes. Even tthough the performance of an objective metric is defined by how well it predicts the outcome of a subjective assessment, one can also study the usefulness of a metric in a non-traditional way indirectly, namely by measuring the subjective quality improvement of a codec that has been optimized for a specific objective metric. This approach shall be demonstrated here on the recently proposed HDPhoto format14 introduced by Microsoft and a SSIM-tuned17 version of it by one of the authors. We compare these two implementations with JPEG1 in two variations and a visual and PSNR optimal JPEG200013 implementation. To this end, we use subjective and objective tests based on the multiscale SSIM and a new DCT based metric.
A comparative study of color image compression standards using perceptually driven quality metrics
Francesca De Simone, Daniele Ticca, Frederic Dufaux, et al.
The task of comparing the performance of different codecs is strictly related to the research in the field of objective quality metrics. Even if several objective quality metrics have been proposed in literature, the lack of standardization in the field of objective quality assessment and the lack of extensive and reliable comparisons of the performance of the different state-of-the-art metrics often make the results obtained using objective metrics not very reliable. In this paper we aim at comparing the performance of three of the existing alternatives for compression of digital pictures, i.e. JPEG, JPEG 2000, and JPEG XR compression, by using different objective Full Reference metrics and considering also perceptual quality metrics which take into account the color information of the data under analysis.
Application specific performance measurements of image compression codecs
Thomas Richter
Traditionally, the performance of a image compression codec is measured by its rate-distortion curve, where a distortion metric measures the fitness of the reconstructed image for a particular purpose. However, metrics employed here address the needs of professional photography only partially: Images are typically not published as taken, but are post-process to express the intent of the photographer. Compression codecs might degrade the image visibly by an interaction of the codec-specific loss with the editing tools. In this work, we present the idea of application specific metrics and measure the robustness of state of the art compression codecs under prototypical image manipulations.
Initiatives in Image Coding and Accessories and Applications
icon_mobile_dropdown
Image coding design considerations for cascaded encoding-decoding cycles and image editing: analysis of JPEG 1, JPEG 2000, and JPEG XR / HD Photo
This paper discusses cascaded multiple encoding/decoding cycles and their effect on image quality for lossy image coding designs. Cascaded multiple encoding/decoding is an important operating scenario in professional editing industries. In such scenarios, it is common for a single image to be edited by several people while the image is compressed between editors for transit and archival. In these cases, it is important that decoding followed by re-encoding introduce minimal (or no) distortion across generations. A significant number of potential sources of distortion introduction exist in a cascade of decoding and re-encoding, especially if such processes as conversion between RGB and YUV color representations, 4:2:0 resampling, etc., are considered (and operations like spatial shifting, resizing, and changes of the quantization process or coding format). This paper highlights various aspects of distortion introduced by decoding and re-encoding, and remarks on the impact of these issues in the context of three still-image coding designs: JPEG, JPEG 2000, and JPEG XR. JPEG XR is a draft standard under development in the JPEG committee based on Microsoft technology known as HD Photo. The paper focuses particularly on the JPEG XR technology, and suggests that the design of the draft JPEG XR standard has several quite good characteristics in regard to re-encoding robustness.
ITU-T T.851: an enhanced entropy coding design for JPEG baseline images
The Joint Photographic Experts Group (JPEG) baseline standard remains a popular and pervasive standard for continuous tone, still image coding. The "J" in JPEG acknowledges its two main parent organizations, ISO (International Organization for Standardization) and the ITU-T (International Telecommunications Union - Telecommunication). Notwithstanding their joint efforts, both groups have subsequently (and separately) standardized many improvements for still image coding. Recently, the ITU-T Study Group 16 completed the standardization for a new entropy coder - called the Q15-coder, whose statistical model is from the original JPEG-1 standard. This new standard, ITU-T Rec. T.851, can be used in lieu of the traditional Huffman (a form of variable length coding) entropy coder, and complements the QM arithmetic coder, both originally standardized in JPEG as ITU-T T.81 | ISO/IEC 10918:1. In contrast to Huffman entropy coding, arithmetic coding makes no assumptions about an image's statistics, but rather responds in real time. This paper will present a tutorial on arithmetic coding, provide a history of arithmetic coding in JPEG, share the motivation for T.851, outline its changes, and provide comparison results with both the baseline Huffman and the original QM-coder entropy coders. It will conclude with suggestions for future work.
Techniques for enhancing JPEG XR / HD Photo rate-distortion performance for particular fidelity metrics
Daniel Schonberg, Shijun Sun, Gary J. Sullivan, et al.
This paper explores several encoder-side techniques aimed at improving the compression performance of encoding for the draft JPEG XR standard. Though the syntax and decoding process are fixed by the standard, significant variation in encoder design and some variation in decoder design are possible. For a variety of selected quality metrics, the paper discusses techniques for achieving better compression performance according to each metric. As a basic reference encoder and decoder for the discussion and modifications, the publically available Microsoft HD Photo DPK (Device Porting Kit) 1.0, on which the draft JPEG XR standard was based, was used. The quality metrics considered include simple mathematical objective metrics (PSNR and L∞) as well as pseudo-perceptual metrics (single-scale and multi-scale MSSIM).
Coding of high dynamic range images in JPEG XR / HD Photo
Sridhar Srinivasan, Zhi Zhou, Gary J. Sullivan, et al.
High Dynamic Range (HDR) imaging support is one of the major features for the emerging draft JPEG XR standard. JPEG XR is being standardized within the JPEG committee based on Microsoft technology known as HD Photo. JPEG XR / HD Photo is primarily an integer-based coding technology design, accepting integer valued samples at the encoder and producing integer valued samples at the decoder, with internal processing entirely in the integer space. Yet, it can support compression of multiple HDR formats, including 16- and 32-bit float, 16-bit and 32-bit signed and unsigned integer, and RGBE. Further, JPEG XR can enable lossless compression of some HDR formats such as 16-bit signed and unsigned, 16-bit float and RGBE. This paper describes how HDR formats are handled in JPEG XR. It examines in depth how these various HDR formats are converted to and from integer valued samples within the JPEG XR codec, and the internal processing of these HDR formats. This paper describes how JPEG XR provides flexible ways to compress HDR formats within the same codec framework as integer-valued formats, while maintaining from the high compression efficiency and low computational complexity.
Split field coding: low complexity error-resilient entropy coding for image compression
James J. Meany, Christopher J. Martens
In this paper, we describe split field coding, an approach for low complexity, error-resilient entropy coding which splits code words into two fields: a variable length prefix and a fixed length suffix. Once a prefix has been decoded correctly, then the associated fixed length suffix is error-resilient, with bit errors causing no loss of code word synchronization and only a limited amount of distortion on the decoded value. When the fixed length suffixes are segregated to a separate block, this approach becomes suitable for use with a variety of methods which provide varying protection to different portions of the bitstream, such as unequal error protection or progressive ordering schemes. Split field coding is demonstrated in the context of a wavelet-based image codec, with examples of various error resilience properties, and comparisons to the rate-distortion and computational performance of JPEG 2000.
Scalable low complexity image coder for remote volume visualization
Hariharan G. Lalgudi, Michael W. Marcellin, Ali Bilgin, et al.
Remote visualization of volumetric data has gained importance over the past few years in order to realize the full potential of tele-radiology. Volume rendering is a computationally intensive process, often requiring hardware acceleration to achieve real time visualization. Hence a remote visualization model that is well-suited for high speed networks would be to transmit rendered images from the server (with dedicated hardware) based on view point requests from clients. In this regard, a compression scheme for the rendered images is vital for efficient utilization of the server-client bandwidth. Also, the complexity of the decompressor should be considered so that a low end client workstation can decode images at the desired frame rate. We present a scalable low complexity image coder that has good compression efficiency and high throughput.
Biomedical Applications
icon_mobile_dropdown
Hyperspectral imaging with wavelet transform for classification of colon tissue biopsy samples
Automatic classification of medical images is a part of our computerised medical imaging programme to support the pathologists in their diagnosis. Hyperspectral data has found its applications in medical imagery. Its usage is increasing significantly in biopsy analysis of medical images. In this paper, we present a histopathological analysis for the classification of colon biopsy samples into benign and malignant classes. The proposed study is based on comparison between 3D spectral/spatial analysis and 2D spatial analysis. Wavelet textural features in the wavelet domain are used in both these approaches for classification of colon biopsy samples. Experimental results indicate that the incorporation of wavelet textural features using a support vector machine, in 2D spatial analysis, achieve best classification accuracy.
Optical and Digital Image Processing Systems
icon_mobile_dropdown
Optical flow processing for sub-pixel registration of speckle image sequences
Corneliu Cofaru, Wilfried Philips, Wim Van Paepegem
In recent years digital image processing techniques have become a very popular way of determining strains and full-field displacements in the field of experimental mechanics due to advancements in image processing techniques and also because the actual process of measurement is simpler and not intrusive compared to traditional sensor based techniques. This paper presents a filtering technique which processes the polar components of the image displacement fields. First, pyramidal gradient-based optical flow is calculated between blocks of each two frames of a speckle image sequence while trying to compensate in the calculation small rotations and shears of the image blocks. The polar components of the resulting motion vectors - phase and amplitude - are then extracted. Each of the motion vector angle values is smoothed temporally using a Kalman filter that takes into account previously calculated angles located at the same spatial position in the motion fields. A subsequent adaptive spatial filter is used to process both the temporally smoothed angles and amplitudes of the motion field. Finally, test results of the proposed method being applied to a speckle image sequence that illustrates plastic materials being subjected to uniaxial stress and to artificial data sets are presented.
Advanced super-resolution image enhancement process
Hai-Wen Chen, Dennis Braunreiter
An advanced SRIE (super-resolution image enhancement) process has been developed for improving image resolution, for enhancing image signal-to-noise ratio (SNR) and contrast, as well as for correcting image blur and distortion caused by atmospheric turbulence. The SRIE process is composed of five sub-processes: 1) over-sampling process, 2) image registration process, 3) image averaging process, 4) high-pass filtering process, and 5) histogram-equalization process. These five sub-processes are discussed in detail in this paper. Performance of SRIE process was tested using a long wave infrared (LWIR) imagery and charge-coupled device (CCD) camera video imagery.
Fibered fluorescence microscopy (FFM) of intra epidermal nerve fibers--translational marker for peripheral neuropathies in preclinical research: processing and analysis of the data
Frans Cornelissen, Steve De Backer, Jan Lemeire, et al.
Peripheral neuropathy can be caused by diabetes or AIDS or be a side-effect of chemotherapy. Fibered Fluorescence Microscopy (FFM) is a recently developed imaging modality using a fiber optic probe connected to a laser scanning unit. It allows for in-vivo scanning of small animal subjects by moving the probe along the tissue surface. In preclinical research, FFM enables non-invasive, longitudinal in vivo assessment of intra epidermal nerve fibre density in various models for peripheral neuropathies. By moving the probe, FFM allows visualization of larger surfaces, since, during the movement, images are continuously captured, allowing to acquire an area larger then the field of view of the probe. For analysis purposes, we need to obtain a single static image from the multiple overlapping frames. We introduce a mosaicing procedure for this kind of video sequence. Construction of mosaic images with sub-pixel alignment is indispensable and must be integrated into a global consistent image aligning. An additional motivation for the mosaicing is the use of overlapping redundant information to improve the signal to noise ratio of the acquisition, because the individual frames tend to have both high noise levels and intensity inhomogeneities. For longitudinal analysis, mosaics captured at different times must be aligned as well. For alignment, global correlation-based matching is compared with interest point matching. Use of algorithms working on multiple CPU's (parallel processor/cluster/grid) is imperative for use in a screening model.
Algorithm design of liquid lens inspection system
In mobile lens domain, the glass lens is often to be applied in high-resolution requirement situation; but the glass zoom lens needs to be collocated with movable machinery and voice-coil motor, which usually arises some space limits in minimum design. In high level molding component technology development, the appearance of liquid lens has become the focus of mobile phone and digital camera companies. The liquid lens sets with solid optical lens and driving circuit has replaced the original components. As a result, the volume requirement is decreased to merely 50% of the original design. Besides, with the high focus adjusting speed, low energy requirement, high durability, and low-cost manufacturing process, the liquid lens shows advantages in the competitive market. In the past, authors only need to inspect the scrape defect made by external force for the glass lens. As to the liquid lens, authors need to inspect the state of four different structural layers due to the different design and structure. In this paper, authors apply machine vision and digital image processing technology to administer inspections in the particular layer according to the needs of users. According to our experiment results, the algorithm proposed can automatically delete non-focus background, extract the region of interest, find out and analyze the defects efficiently in the particular layer. In the future, authors will combine the algorithm of the system with automatic-focus technology to implement the inside inspection based on the product inspective demands.
Parallel magnetic resonance imaging using compressed sensing
Ali Bilgin, Yookyung Kim, Hariharan G. Lalgudi, et al.
Although magnetic resonance imaging (MRI) is routinely used in clinical practice, long acquisition times limit its practical utility in many applications. To increase the data acquisition speed of MRI, parallel MRI (pMRI) techniques have recently been proposed. These techniques utilize multi-channel receiver arrays and are based on simultaneous acquisition of data from multiple receiver coils. Recently, a novel framework called Compressed Sensing (CS) was introduced. Since this new framework illustrates how signals can be reconstructed from much fewer samples than suggested by the Nyquist theory, it has the potential to significantly accelerate data acquisition in MRI. This paper illustrates that CS and pMRI techniques can be combined and such joint processing yields results that are superior to those obtained from independent utilization of each technique.
Implementation and Processing Scenarios II
icon_mobile_dropdown
SAR automatic target recognition using maximum likelihood template-based classifiers
A review of several recently-developed maximum likelihood template-based automatic target recognition (ATR) algorithms for extended targets in synthetic aperture radar (SAR) imagery data is presented. The algorithms are based on 'gradient' peaks, 'ceiling' peaks, edges, corners, shadows, and rectangular-fits. A weight-based Bayesian maximum likelihood scheme to combine multiple template-based classifiers is presented. The feature weights are derived from prior recognition accuracies, i.e., confidence levels, achieved by individual template-based classifiers. Application of feature-based weights instead of target specific feature-based weights reduces the resulting ATR accuracy by only a small amount. Preliminary results indicate that (1) the ceiling peaks provide the most target-discriminating power, (2) inclusion of more target-discriminating features leads to higher classification accuracy. Dempster-Shaffer rule of combination is suggested as a potential alternative to the implemented Bayesian decision theory approach to resolve conflicting reports from multiple template-based classifiers.
Automatic generation of 360 degree panorama from image sequences
Recently, there has been an increasing interest in using panoramic images in surveillance and target tracking applications. With the wide availability of off-the-shelf web-based pan-tilt-zoom (PTZ) cameras and the advances of CPUs and GPUs, object tracking using mosaicked images that cover a scene of 360° in near real-time has become a reality. This paper presents a system that automatically constructs and maps full view panoramic mosaics to a cube-map from images captured from an active PTZ camera with 1-25x optical zoom. A hierarchical approach is used in storing and mosaicking multi-resolution images captured from a PTZ camera. Techniques based on scale-invariant local features and probabilistic models for verification are used in the mosaicking process. Our algorithm is automatic and robust in mapping each incoming image to one of the six faces of a cube with no prior knowledge of the scene structure. This work can be easily integrated to a surveillance system that wishes to track moving objects in its 360° surrounding.
Implementation and Processing Scenarios III
icon_mobile_dropdown
Implementation and evaluation of real-time pan-tilt-zoom camera calibration
Pan-tilt-zoom (PTZ) cameras are frequently used in surveillance applications as they can observe a much larger region of the environment than a fixed-lens camera while still providing high-resolution imagery. The pan, tilt, and zoom parameters of a single camera may be simultaneously controlled by online users as well as automated surveillance applications. To accurately register autonomously tracked objects to a world model, the surveillance system requires accurate knowledge of camera parameters. Due to imprecision in the PTZ mechanism, these parameters cannot be obtained from PTZ control commands but must be calculated directly from camera imagery. This paper describes the efforts undertaken to implement a real-time calibration system for a stationary PTZ camera. The approach continuously tracks distinctive image feature points from frame to frame, and from these correspondences, robustly calculates the homography transformation between frames. Camera internal parameters are then calculated from these homographies. The calculations are performed by a self contained program that continually monitors images collected by the camera as it performs pan, tilt, and zoom operations. The accuracy of the calculated calibration parameters are compared to ground truth data. Problems encountered include inaccuracies in large orientation changes and long algorithm execution time.
Adaptive fingerprint enhancement and identification using linear parametric models
Historically, due to its uniqueness and immutability, fingerprints have been used as evidence in criminal cases and in security identification as well as authorization verification applications. In this research, adaptive linear DWT models are developed to describe the fingerprint features (DWT coefficients) to be identified. The proposed model can be used to enhance the fingerprint characteristics identified from fingerprint images to improve recognition. This adaptive model identification technique is then applied to degraded or incomplete fingerprint images to demonstrate the efficacy of the technique under non-ideal conditions. The performance of the method is then compared to previously published research by the authors on identification of degraded fingerprints using PCA-and ICA-based features.
Design and evaluation of sparse quantization index modulation watermarking schemes
Bruno Cornelis, Joeri Barbarien, Ann Dooms, et al.
In the past decade the use of digital data has increased significantly. The advantages of digital data are, amongst others, easy editing, fast, cheap and cross-platform distribution and compact storage. The most crucial disadvantages are the unauthorized copying and copyright issues, by which authors and license holders can suffer considerable financial losses. Many inexpensive methods are readily available for editing digital data and, unlike analog information, the reproduction in the digital case is simple and robust. Hence, there is great interest in developing technology that helps to protect the integrity of a digital work and the copyrights of its owners. Watermarking, which is the embedding of a signal (known as the watermark) into the original digital data, is one method that has been proposed for the protection of digital media elements such as audio, video and images. In this article, we examine watermarking schemes for still images, based on selective quantization of the coefficients of a wavelet transformed image, i.e. sparse quantization-index modulation (QIM) watermarking. Different grouping schemes for the wavelet coefficients are evaluated and experimentally verified for robustness against several attacks. Wavelet tree-based grouping schemes yield a slightly improved performance over block-based grouping schemes. Additionally, the impact of the deployment of error correction codes on the most promising configurations is examined. The utilization of BCH-codes (Bose, Ray-Chaudhuri, Hocquenghem) results in an improved robustness as long as the capacity of the error codes is not exceeded (cliff-effect).
Acquisition and registration of aerial video imagery of urban traffic
Rohan C. Loveland, Edward Rosten
The amount of information available about urban traffic from aerial video imagery is extremely high. Here we discuss the collection of such video imagery from a helicopter platform with a low-cost sensor, and the post-processing used to correct radial distortion in the data and register it. The radial distortion correction is accomplished using a Harris model. The registration is implemented in a two-step process, using a globally applied polyprojective correction model followed by a fine scale local displacement field adjustment. The resulting cleaned-up data is sufficiently well-registered to allow subsequent straight-forward vehicle tracking.
High speed acquisition system of photo-colorimetric images to record and to model human visual signal
V. Boucher, F. Greffier, F. Fournela
Various problems in different domains are related to the operation of the Human Visual System (HVS). This is notably the case when interest turns to the driver's visual perception and road safety in general. As we know, 90% of the information used by a driver is supplied to him by his visual system. That's why the Laboratoire Régional des Ponts et Chaussées d'Angers has developed a human visual signal capture system. This system is based on a CCD video camera calibrated for luminance and chrominance which are the two physical magnitudes that make up the human visual signal. The first phase in the development is to adapt the spectral response of the optical system to the spectral characteristics of the human eye. Once the camera is adapted the second stage is calibration. This consists in transforming the graylevels registered by the camera into values of luminance and chrominance. The innovative feature of this system is the ability to record a single image containing the entire set of information carried by the visual signal. The framerate is 10 frames per second which allows the camera to be carried inside a vehicle and to record images of road scenes exactly as the driver has actually perceived them. After recording, vision algorithms can be applied to these images in order to reproduce the physiological processes which take place on the retina or in the brain. These tools can then be used to evaluate visibility levels of roadway infrastructure, of public lighting, or the saliency of objects.
Poster Session
icon_mobile_dropdown
Restoration of atmospherically degraded images using a sparse prior
The reconstruction of turbulence-affected images has been an active research topic in the field of astronomical imaging. Many approaches have been proposed in the literature. Recently, researchers have extended the methods to the recovery of long-path territorial natural scene surveillance, which is affected even more by air turbulence. Some approaches from astronomical imaging also work well in the latter problem. However, although these methods have involved statistics, such as a statistical model of atmospheric turbulence or the probability distribution of photons forming an image, they have not taken account of the statistical properties of natural scenes observed in long-path horizontal imagery. Recent research by others has made use of the fact that a real world image generally has a sparse distribution of its derivatives. In this paper, we investigate algorithms with such a constraint imposed during the restoration of turbulence-affected images. This paper proposes an iterative, blind deconvolution algorithm that follows a registration and averaging method to remove anisoplanatic warping in a time sequence of degraded images. The use of a sparse prior helps to reduce noise, produce sharper edges and remove unwanted artifacts in the estimated image for the reason that it pushes only a small number of pixels to have non-zero (or large) derivatives. We test the new algorithm with simulated and natural data and experiments show that it performs well.
Efficient video segmentation using temporally updated mean shift clustering
This paper presents a new method for unsupervised video segmentation based on mean shift clustering in spatio-temporal domain. The main novelties of the proposed approach are dynamic temporal adaptation of clusters due to which the segmentation evolves quickly and smoothly over time. The proposed method consists of a short initialization phase and an update phase. The proposed method significantly reduce the computation load for the mean shift clustering. In the update phase only the positions of relatively small number of cluster centers are updated and new frames are segmented based on the segmentation of previous frames. The method segments video in real-time and tracks video objects effectively.
Wavelet and PCA-based approach for 3D shape recovery from image focus
This paper introduces a new approach for 3D shape recovery based on Discrete Wavelet Transform (DWT) and Principal Component Analysis (PCA). Contrary to computing focus quality locally by summing all values in a 2D or 3D window obtained after applying a focus measure, a vector consisting of seven neighboring pixels is populated for each pixel in the image volume. Each vector in the sequence is decomposed by using DWT and then PCA is applied on the energies of detailed coefficients to transform the data into eigenspace. The first feature, as it contains maximum variation, is employed to compute the depth. Though DWT and PCA are both computationally expensive transformations, the reduced data elements and algorithm iterations have made the proposed method efficient. The new approach was experimented and its performance was compared with other methods by using synthetic and real image sequences. The evaluation is gauged on the basis of unimodality, monotonicity and resolution of the focus curve. Two other global statistical metrics Root Mean Square Error (RMSE) and correlation have also been applied for synthetic image sequence. Experimental results demonstrate the effectiveness and the robustness of the new method.
Optimization of focus measure using genetic algorithm
This paper presents the use of Genetic Algorithm as a search method for focus measure in Shape From Focus (SFF). Previous methods compute focus value for each pixel locally by summing all values within a small window. This summation is a good approximation of focus quality, but is not optimal one. The Genetic Algorithm is used as a fine tuning process in which a measure of best focus is used as the fitness function corresponding to motion parameter values which make up each gene. The experimental results show that the proposed method performs better than previous algorithms such as Sum of the Modified Laplacian(SML), Grey Level Variance(GLV) and Tenenbaum Focus Measure. The results are compared using root mean square error(RMSE) and correlation. The experiments are conducted using objects simulated cone, real cone and TFT-LCD color filter1 to evaluate performance of the proposed algorithm.
Fast three-dimensional shape recovery in TFT-LCD manufacturing
In this paper, we propose shape recovery method for measuring protrusions on LCD Color filter in TFT-LCD manufacturing process. We use 3-D Focus Measure operator to find focused points. Then we find the lens step that maximizes the sum of the Focus Measure. In order to reduce the computational complexity, we apply the successive focus measure update algorithm. The 3-D shape of the object can be easily estimated from the best-focused points. Experiments are conducted on both synthetic and real images to evaluate performance of the proposed algorithms. The experimental results show that our new method is faster than previous method.
Using wavelets for edge directed image interpolation
Resizing an image is an important technique in image processing. When increasing the size of the image, some details of the image are smeared or blurred with common interpolation techniques, such as bilinear interpolation. Edges do not appear as sharp as the original image. In addition, when performing interpolation with high magnification, blocking effects start to appear. In this paper, we present an approach that performs interpolation in the direction of the edges, rather than just horizontal and vertical direction. A wavelet preprocessing is used to extract edge direction information before performing interpolation in multiple directions.
A novel multiscale segmentation method for HRCT images
In medical area, many image segmentation methods have been proposed for the segmentation of the medical image. However, there are few multiscale segmentation methods that can segment the medical image so that various components within the image could be separated at multiple resolutions or scales. In this paper, we present a new algorithm for multiscale segmentation of high-resolution computed tomography (HRCT) images. With this new segmentation technique, we demonstrate that it is possible to segment the HRCT images into its various components at multiple scales hence separating the information available in HRCT image. We show that the HRCT image can be segmented such that we get separate images for bones, tissues, lungs and anatomical structures within lungs. The processing is done in frequency domain using the Discrete Cosine Transform (DCT).
Estimating surface roughness using image focus
Estimation of surface roughness is an important parameter for many applications including optics, polymers, semiconductor etc. In this paper, we propose to estimate surface roughness using one of the 3D shape recovery optical passive methods, i.e., shape from focus. Three-dimensional shape recovery from one or multiple observations is a challenging problem of computer vision. The objective of shape from focus is to calculate the depth map. That depth map can further be used in techniques and algorithms leading to recovery of three dimensional structure of object which is required in many high level vision applications. The same depth map can also be used for surface roughness estimation. One of the requirements, of researchers is to quickly compare the samples being fabricated based on various measures including surface roughness. However, the high cost involved in estimation of surface roughness limits its extensive and exhaustive usage. Therefore, we propose an inexpensive and fast method based on Shape From Focus (SFF). We use two microscopic test objects, i.e., coin and TFT-LCD cell for estimating the surface roughness.
Estimation of depth map based on focus adjustment
The objective of 3D shape recovery using focus is to estimate depth map of the scene or object based on best focus points from camera lens. In Shape From Focus (SFF), the measure of focus - sharpness - is the crucial part for final 3D shape estimation. The conventional methods compute sharpness by applying focus measure operator on each 2D image frame of the image sequence. However, such methods do not reflect the accurate focus levels in an image because the focus levels for curved objects require information from neighboring pixels in the adjacent frames too. To address this issue, we propose a new method based on focus adjustment which takes the values of the neighboring pixels from the adjacent image frames that have the same initial depth as of the center pixel and then it re-adjusts the center value accordingly. Experimental results show that the proposed technique generates better shape and takes less computation time in comparison to previous SFF methods based on Focused Image Surface (FIS) and dynamic programming.
A fast multi-pattern motion estimation algorithm based on the nature of error surfaces
In this paper we propose an algorithm for reducing the complexity of motion estimation module in standard video compression applications. In several video coding standards, motion estimation becomes the most time consuming sub system such as H.264/AVC. Therefore recently research focuses on the development of novel algorithms to save computations with minimal effects over the video distortion. Since real world video sequences usually exhibit a wide range of motion content, from uniform to random, adaptive algorithms have revealed as the most robust general purpose solutions. In this paper a simple, computationally efficient and robust scheme for multi pattern motion estimation algorithm based on the nature of error surfaces has been proposed. A combination of spatial and temporal predictors has been used for multiple initial search center prediction, determination of magnitude of motion and search pattern selection. The multiple initial predictors help to identify the absolute zero motion blocks and true location of global minimum based on the characteristic of error surfaces. Hence the final predictive search center selected is closer to the global minimum. This results in smaller number of search steps to reach minimum location and increases the computation speed. Further computational speed up has been obtained by considering half stop technique and threshold for minimum distortion point. The computational complexity of the proposed algorithm is drastically decreased (average speedup ~ 43%) whereas the image quality measured in terms of PSNR (~.20 dB loss with respect to Full Search) also shows results close to Full Search algorithm.
Image restoration based on camera microscanning
Common restoration techniques perform signal processing using a single observed image. In this paper we show that the accuracy of restoration could be significantly increased if at least three observed degraded images obtained from a microscanning camera are used. It is assumed that the degraded images contain information about an original image, linear degradation and illumination functions, and additive sensor's noise. Using spatial information from camera, a set of equations and objective function are formed. By solving the system of equations with the help of an iterative algorithm, the original image can be recovered. Computer simulation results presented and discussed.
Remote counseling using HyperMirror quasi space-sharing system
Sayuri Hashimoto, Osamu Morikawa, Nobuyuki Hashimoto, et al.
In the modern information society, networks are getting faster, costs are getting lower, and displays are getting clearer. Today, just about anyone can easily use precise, dynamic, image distribution systems in their everyday life. Now, the question is how to give the benefits of network systems to the local community, as well as to each individual.This study was designed to use communication with realistic sensations to examine the effectiveness of remote individual counseling intervention in reducing depression, anxiety and stress in child-rearing mothers. Three child-rearing mothers residing in the city of Osaka each received one session of remote counseling intervention. The results showed an alleviation of stress related to child-rearing, i.e., the reduction in state anxiety, depression and subjective stress related to child-rearing. Moreover, an experimental demonstration employed a HyperMirror system capable of presenting visual and auditory images similar to reality, in order to provide the counselees with realistic sensations. While the voice communication environment was poor, the remote counseling allowed for the communication of sensory information, i.e., skinship that communicated information related to assurance/peace of mind, and auditory information, i.e., a whispering voices in which signals of affection were transmitted; the realistic sensation contributed to a reduction in stress levels. The positive effects of the intervention were confirmed through a pre and post intervention study. The results suggested the need to conduct future studies to confirm the mid- and long-term improvements caused by the intervention, as well as the need to improve the voice transmission environment.
Alignment of vectorial shearing interferometer using a simple recognition algorithm
The vectorial shearing interferometer includes a pair of wedge prisms as a shearing system. Perfect alignment of the shearing system is crucial for the optimal detection and analysis of asymmetrical wave fronts. This paper describes a recognition algorithm for optical misalignment detection and prisms orientation based in the intensity pattern obtained in the calibration process. The key of the present algorithm is the comparison of a reference intensity pattern, against a sheared interferogram that depends on the wedge prism position. First, an optimum phase only filter is obtained from a set of reference images with the objective to discriminate between different phase changes. Then, the optimal filter is used in a digital correlator, which results in a simple and robust calibration system.
System of invariant correlation to rotation using a ring mask
A new rotational invariance computational filter is presented. The filter was applied to a problem image, in this case, an image of 256 by 256 pixels of black background with a centered white Arial letter. The complete alphabet is represented in those images. The image is rotated one degree by one degree until complete 360 degrees; hence, for each alphabet letter we are generating 360 images. To achieve the rotational invariance, first of all, a translational invariance is applied and then a 256 by 256 binary mask of concentric circular rings of three pixels of thickness and separation is used. The sum of the information in the circular rings represents the signature of the image. The average of the signature of the 360 images of a selected letter is the filter used to compute the phase correlation with all alphabet letter and their rotated images. The confidence level is calculated by the mean value with two standard errors (2SE) of those 360 correlation values for each letter. The confidence level shows that this system works efficiently on the discrimination between letters.
Invariant correlation of objects using non-linear composite filters
In this work we use non linear composite filters in object recognition, even when they have rotation, scale and noise distortions. We generated 936 images of the letters E, F, H, P and B. The images consisted of these letters scaled from 70% to 130% and rotated 360°. The maximum number of images supported by these filters was determined by a numerical experiment. This was done by generating filters with different amount of images each. We have images at 13 scales and each scale with 72 different angles, tests were done to two different kinds of filters, one where all the scales were present and we add more angles to increase the number of images, and another where all of the angles were present and more scales were added to increase the number of images. Considering a system confidence level of at least 80%, the maximum number of images allowed by the filter is around 216. In one type of filter we have the letter rotated 360°. We found a "rotation problem", since circles were introduced in the Fourier plane, in other words first order Bessel functions were introduced in the image spectrum, which creates complications when working with images that also have circles in their spectrum. Due to this we propose a segmented filter which breaks the circular symmetry. Non-linear composite filters can recognize the target in presence of distortions.
Distortion-invariant pattern recognition with nonlinear correlation filters
Classical correlation-based methods for pattern recognition are very sensitive to geometrical distortions of objects to be recognized. Besides, most captured images are corrupted by noise. In this work we use novel nonlinear composite filters for distortion-invariant pattern recognition. The filters are designed with an iterative algorithm to reject a background noise and to achieve a desired discrimination capability. The recognition performance of the proposed filters is compared with that of linear composite filters in terms of noise robustness and discrimination capability. Computer simulation results are provided and discussed.
Optical inspection for electronic assemblies using nonlinear correlation filters
This paper describes an image recognition system designed to inspect the standards quality of electronic assemblies. The essence of the present algorithm is the location of electronics components, at the input image, that disrupt the acceptance requirements for the manufacture of printed circuit board assemblies, which have been adopted by association connecting electronics industries. To this end, image processing modules, based on a nonlinear composite filter are employed with the objective to discriminate between the electronics components that meet the acceptance condition and those that are in defect condition. The proposed recognition system is based on nonlinear composite filter, which is obtained from a training set of reference images. Then, the optimal filter is used in a digital correlator, which results in a simple and robust inspection system.
Invariant recognition to position, rotation, and scale considering vectorial signatures
This work presents the development and utilization of vectorial signatures filters obtained from the application of properties of the scale and Fourier transform for images recognition. The filters were applied to different input scene, which consisted in the 26 letters of the alphabet. Each letter is an image of 256 × 256 pixels of black background with a centered white Arial letter. The image was rotated 360 degrees in increment of 1o and scaled from 70% to 130% in increment of 0.5%. In order to find a new invariant correlation digital system we obtained two unidimensional vector after to achieve different mathematical transformation in the target as well as the input scene. To recognize a target, signatures were compared, calculating the Euclidean distance between the target and the input scene; then, confidence levels are obtained. The results demonstrate that this system has a good performance to discriminate between letters.
Motivation and implementation of a software SVC real-time VGA encoder for mobile TV
V. Bottreau, T. Viellard, C. Chevance, et al.
An encoding system was implemented purely in software for sending SVC video multicast to mobile devices. The SVC encoder is a multi-layer, multi-threaded parallel-GOP encoder. It is capable of running in real-time with an Intel® Xeon® 5160 Processor dual-core platform in Scalable Baseline profile (H.264/AVC Baseline QVGA@25fps at 250 kbps to SVC Scalable Baseline VGA@25fps resolution at 1 Mbps. Real-time video encoding is accomplished with aggressive Single Instruction Multiple Data assembly code optimizations and advanced algorithms.
Digital holographic video of plankton
The use of digital holography for plankton detection is discussed, namely - for evaluation of the plankton particles shapes, orientation, and 3-D location. Algorithms are considered for improve the quality of the reconstructed plankton holographic image. The algorithms are based on the digital hologram pre-processing before reconstruction. Algorithm for recording and reconstruction of digital holographic video is described. The possibility is shown for reconstruction either fixed plane of volume contained plankton, or reconstruction the plane moving together with plankton species. Experimental results as well as their discussion are presented.
Study on depth of field of wavefront coding imaging system
Chao Pan, Jiabi Chen, Dawei Zhang, et al.
The wavefront coding imaging technology is a good approach of extending DOF (depth of field) of imaging optical system with an identical exit pupil. A phase mask being introduced in the exit pupil of the optical system makes the MTF of the system to be defocus-independent. As a result, the DOF of the system is extended without a loss of resolution. How much this technique can extend the DOF of optical system? It was shown by experimental and simulation results of some papers that a wavefront coding imaging system can make the DOF raise an order of magnitude. However, there is no theoretical analysis of DOF extension of wavefront coding technique up to now. In this paper, a special condition under which MTF of wavefront coding system is defocus-independent is discovered by investigating the OTF of the wavefront coding imaging system with cubic phase mask. Based on the condition, an expression of the DOF of wavefront coding imaging system is obtained. The conclusion is verified by the simulation results.
An efficient algorithm based on dynamic programming for 3D shape recovery
Luis M. Jimenez-Medina, Vitaly Kober, Hugo H. Hidalgo-Silva
In the field of machine vision, determination of the tridimensional information from images is of great importance. One of the most used ways for achieving this task is based on stereo vision. Shape-from-stereo method computes a depth map from a stereo-pair of images. Using several images of different perspectives, tridimensional representation of a test scene can be obtained. The key point in this process is the stereo matching. Stereo matching determines the position change of a pixel in one image of the stereo-pair with regard to another image. In this paper a stereo matching algorithm based on dynamic programming is suggested for 3D shape recovery from textured images. The proposed technique uses the information of a two-dimensional small neighborhood to solve the corresponding problem.
Application of digital image processing method for fish age estimation
Ting-wan Wu, Jian-hua Hong
In this paper, the study results on the fish age estimation issues, as an application of the method of digital image processing, presented by analysis from otolith image. First, a kind of Artificial Neural Network (ANN), the Pulse- coupled Neural Networks (PCNN), is proposed, and used to identify the different summer or winter year-rings patterns. Second, a well-founded approach, using morphological features, is brought forwand to automatically detect the nucleus within the otolith images. Finally, the Morphological Method is used to deduce the fish's age. The results of this paper maybe significant in fishery research, and the methods can be used in other biologic features identify fields.
Nonlinear filter for pattern recognition using the scale transform
Angel Coronel-Beltrán, Josué Álvarez-Borrego
An invariant correlation digital system using a nonlinear filter is presented. The invariance to position, rotation and scale of the target is achieved via Fourier transform, mapping polar and Scale transform, respectively. We analyzed the performance of this filter with different nonlinearities k values according to the peak-to-correlation energy (PCE) metric. We found experimentally the best k value for rotation and scale and the confidence levels of the filters. The filter was applied to the complete alphabet letters where each letter is a problem image of 256x256 pixels in size. The results are presented and show a better performance when they are compared with linear filters.
Ordinal-based method for robust image/video signature generation
Daniel Chongli Chen, Lekha Chaisorn, Susanto Rahardja
This paper proposes an algorithm for generating a video signature based on an ordinal measure. Current methods which use a measure of temporal ordinal rank are robust to many transformations but can only detect the entire query video, not a segment of the query, while methods which use local features may be more robust to certain transformations but less robust to excessive noise. The proposed algorithm incorporates region-based spatial information while maintaining a strong robustness to noise, different resolutions, illumination shifts and video file formats. In our method, a frame is first divided into blocks. For each pixel in a block, a slice (a binary image computed based on the comparison between the greyscale intensity of each pixel in the frame and the reference pixel) is generated. The slices of all the pixels in a block are then added component-wise to obtain a metaslice for the block. In order to compute the distance between any two frames, the Euclidean distance between corresponding metaslices of the two frames is computed to obtain the metadistance between two blocks. Summing the metadifferences over all blocks and normalizing give the final measure of distance between the two frames. To improve the speed of the algorithm, keyframes are first downsized and pixel intensity values are represented by the average of a small block. A table of frame differences between two sets of keyframes from two video sequences is constructed and then converted to a similarity matrix using a threshold. The longest chain of consecutive similar keyframes is found and this produces the best matching video sequence between the two videos. This algorithm is capable of taking into account differences between videos at various scales and is useful for finding duplicate or modified copies of a query video in a database. Preliminary experimental results are encouraging and demonstrate the potential of the proposed algorithm.
Large-scale 3-D profilometer
3-D profilometry, it is necessary to locate the in-focus region of the image and to reconstruct the best 3D profile. A series of images are collected on-the-fly. The contrast and the intensity indices of each region of each image are calculated in the scanning procedure. The proposed method will reconstruct 3D shape from moving platform. The proposed method is applied on some preliminary experiments and it shows that the large-scale 3-D profile reconstruction can be realized.