Proceedings Volume 7542

Multimedia on Mobile Devices 2010

cover
Proceedings Volume 7542

Multimedia on Mobile Devices 2010

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 27 January 2010
Contents: 6 Sessions, 28 Papers, 0 Presentations
Conference: IS&T/SPIE Electronic Imaging 2010
Volume Number: 7542

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 7542
  • New Emerging Technologies and Services
  • Secure Services
  • Watermarking and Forensics
  • Media Processing and Services
  • Interactive Paper Session
Front Matter: Volume 7542
icon_mobile_dropdown
Front Matter: Volume 7542
This PDF file contains the front matter associated with SPIE Proceedings Volume 7542, including the Title Page, Copyright information, Table of Contents, and the Conference Committee listing.
New Emerging Technologies and Services
icon_mobile_dropdown
Ergonomic evaluation of ubiquitous computing with monocular head-mounted display
Takashi Kawai, Jukka Häkkinen, Takashi Yamazoe, et al.
In this paper, the authors conducted an experiment to evaluate the UX in an actual outdoor environment, assuming the casual use of monocular HMD to view video content while short walking. In conducting the experiment, eight subjects were asked to view news videos on a monocular HMD while walking through a large shopping mall. Two types of monocular HMDs and a hand-held media player were used, and the psycho-physiological responses of the subjects were measured before, during, and after the experiment. The VSQ, SSQ and NASA-TLX were used to assess the subjective workloads and symptoms. The objective indexes were heart rate and stride and a video recording of the environment in front of the subject's face. The results revealed differences between the two types of monocular HMDs as well as between the monocular HMDs and other conditions. Differences between the types of monocular HMDs may have been due to screen vibration during walking, and it was considered as a major factor in the UX in terms of the workload. Future experiments to be conducted in other locations will have higher cognitive loads in order to study the performance and the situation awareness to actual and media environments.
Camera assisted multimodal user interaction
Jari Hannuksela, Olli Silvén, Sami Ronkainen, et al.
Since more processing power, new sensing and display technologies are already available in mobile devices, there has been increased interest in building systems to communicate via different modalities such as speech, gesture, expression, and touch. In context identification based user interfaces, these independent modalities are combined to create new ways how the users interact with hand-helds. While these are unlikely to completely replace traditional interfaces, they will considerably enrich and improve the user experience and task performance. We demonstrate a set of novel user interface concepts that rely on built-in multiple sensors of modern mobile devices for recognizing the context and sequences of actions. In particular, we use the camera to detect whether the user is watching the device, for instance, to make the decision to turn on the display backlight. In our approach the motion sensors are first employed for detecting the handling of the device. Then, based on ambient illumination information provided by a light sensor, the cameras are turned on. The frontal camera is used for face detection, while the back camera provides for supplemental contextual information. The subsequent applications triggered by the context can be, for example, image capturing, or bar code reading.
Image-based mobile service: automatic text extraction and translation
Jérôme Berclaz, Nina Bhatti, Steven J. Simske, et al.
We present a new mobile service for the translation of text from images taken by consumer-grade cell-phone cameras. Such capability represents a new paradigm for users where a simple image provides the basis for a service. The ubiquity and ease of use of cell-phone cameras enables acquisition and transmission of images anywhere and at any time a user wishes, delivering rapid and accurate translation over the phone's MMS and SMS facilities. Target text is extracted completely automatically, requiring no bounding box delineation or related user intervention. The service uses localization, binarization, text deskewing, and optical character recognition (OCR) in its analysis. Once the text is translated, an SMS message is sent to the user with the result. Further novelties include that no software installation is required on the handset, any service provider or camera phone can be used, and the entire service is implemented on the server side.
Mobile cosmetics advisor: an imaging based mobile service
Nina Bhatti, Harlyn Baker, Hui Chao, et al.
Selecting cosmetics requires visual information and often benefits from the assessments of a cosmetics expert. In this paper we present a unique mobile imaging application that enables women to use their cell phones to get immediate expert advice when selecting personal cosmetic products. We derive the visual information from analysis of camera phone images, and provide the judgment of the cosmetics specialist through use of an expert system. The result is a new paradigm for mobile interactions-image-based information services exploiting the ubiquity of camera phones. The application is designed to work with any handset over any cellular carrier using commonly available MMS and SMS features. Targeted at the unsophisticated consumer, it must be quick and easy to use, not requiring download capabilities or preplanning. Thus, all application processing occurs in the back-end system and not on the handset itself. We present the imaging pipeline technology and a comparison of the services' accuracy with respect to human experts.
Secure Services
icon_mobile_dropdown
New normalized expansions for redundant number systems: adaptive data hiding techniques
In this paper, we propose a new adaptive embedding technique which decomposes the image into various bitplanes based on redundant number systems. This technique is driven by three separate functions: 1) Adaptive selection of locations and number of bits per pixel to embed. 2) Adaptive selection of bit-plane decomposition for the cover image. 3) Adaptive selection of manner in which the information is inserted. Through the application of sensitive directional-based statistical estimation and a recorded account of actions taken, the proposed algorithms are able to provide the desired level of security, both visually and statistically. In comparison with other methods offering the same level of security, the new technique is able to offer a greater embedding capacity.
Fused number representation systems and their barcode applications
Sarkis Agaian
In this paper we focus on: a) enhancing the performance of existing barcode systems and b) building a barcode system for mobile applications. First we introduce a new concept of generating a parametric number representation system by fusing a number of representation systems that use multiplication, addition, and other operations. Second we show how one can generate a secure, reliable, and high capacity color barcode by using the fused system. The representation, symbols, and colors may be used as encryption keys that can be encoded into barcodes, thus eliminating the direct dependence on cryptographic techniques. To supply an extra layer of security, the fused system also allows one to encrypt given data using different types of encryption methods. In addition, this fused system can be used to improve image processing applications and cryptography.
Visual cryptography by use of polarization
Hirotsugu Yamamoto, Takanori Imagawa, Shiro Suyama
Visual cryptography is a powerful method to share secret information, such as identification numbers, between plural members. There have been many papers on visual cryptography by use of intensity modulation. Although the use of intensity modulation is suitable for printing, degradation of image quality is a problem. Another problem for conventional visual cryptography is a risk of theft of physical keys. To cope with these problems, we propose a new field of visual cryptography by use of polarization. In this study, we have implemented polarization decoding by stacking films. Use of polarization processing improves image quality of visual cryptography. The purpose of this paper is to construct visual cryptography based on polarization processing. Furthermore, we construct a new type of visual cryptography that uses stacking order as a key for decryption. The use of stacking order multiplies the complexity of encryption. Then, it is effective to prevent secret against theft because the theft cannot determine the secret only by collecting encrypted films.
Private anonymous fingerprinting for color images in the wavelet domain
W. Abdul, P. Gaborit, P. Carré
An online buyer of multimedia content does not want to reveal his identity or his choice of multimedia content whereas the seller or owner of the content does not want the buyer to further distribute the content illegally. To address these issues we present a new private anonymous fingerprinting protocol. It is based on superposed sending for communication security, group signature for anonymity and traceability and single database private information retrieval (PIR) to allow the user to get an element of the database without giving any information about the acquired element. In the presence of a semi-honest model, the protocol is implemented using a blind, wavelet based color image watermarking scheme. The main advantage of the proposed protocol is that both the user identity and the acquired database element are unknown to any third party and in the case of piracy, the pirate can be identified using the group signature scheme. The robustness of the watermarking scheme against Additive White Gaussian Noise is also shown.
Improvement of information fusion-based audio steganalysis
In the paper we extend an existing information fusion based audio steganalysis approach by three different kinds of evaluations: The first evaluation addresses the so far neglected evaluations on sensor level fusion. Our results show that this fusion removes content dependability while being capable of achieving similar classification rates (especially for the considered global features) if compared to single classifiers on the three exemplarily tested audio data hiding algorithms. The second evaluation enhances the observations on fusion from considering only segmental features to combinations of segmental and global features, with the result of a reduction of the required computational complexity for testing by about two magnitudes while maintaining the same degree of accuracy. The third evaluation tries to build a basis for estimating the plausibility of the introduced steganalysis approach by measuring the sensibility of the models used in supervised classification of steganographic material against typical signal modification operations like de-noising or 128kBit/s MP3 encoding. Our results show that for some of the tested classifiers the probability of false alarms rises dramatically after such modifications.
Watermarking and Forensics
icon_mobile_dropdown
Cell phone camera ballistics: attacks and countermeasures
Martin Steinebach, Huajian Liu, Peishuai Fan, et al.
Multimedia forensics deals with the analysis of multimedia data to gather information on its origin and authenticity. One therefore needs to distinguish classical criminal forensics (which today also uses multimedia data as evidence) and multimedia forensics where the actual case is based on a media file. One example for the latter is camera forensics where pixel error patters are used as fingerprints identifying a camera as the source of an image. Of course multimedia forensics can become a tool for criminal forensics when evidence used in a criminal investigation is likely to be manipulated. At this point an important question arises: How reliable are these algorithms? Can a judge trust their results? How easy are they to manipulate? In this work we show how camera forensics can be attacked and introduce a potential countermeasure against these attacks.
Reverse-engineering a watermark detector based on a more precise model
Detection results obtained from an oracle can be used to reverse-engineer the underlying detector structure, or parameters thereof. In particular, if a detector uses a common structure like correlation or normalized correlation, detection results can be used to estimate feature space dimensionality, watermark strength, and detector threshold values. Previous estimation techniques used a simplistic but tractable model for a watermarked image in the detection cone of a normalized correlation detector; in particular a watermarked image is assumed to lie along the axis of the detection cone, essentially corresponding to an image of zero magnitude. This produced useful results for feature spaces of fewer dimensions, but increasingly imprecise estimates for larger feature spaces. In this paper we model the watermarked image properly as a sum of a cover vector and approximately orthogonal watermark vector, offsetting the image within the cone, which is the geometry of a detector using normalized correlation. This symmetry breaking produces a far more complex model which boils down to a quartic equation. Although it is infeasible to find its symbolic solution even with the aid of computer, our numerical analysis results show certain critical behavior which reveals the relationship between the attacking noise strength and the detector parameters. The critical behavior predicted by our model extends our reverse-engineering capability to the case of detectors with large feature space dimensions, which is not uncommon in multimedia watermarking algorithms.
Toward a simplified perceptual quality metric for watermarking applications
Maurizio Carosi, Vinod Pankajakshan, Florent Autrusseau
This work is motivated by the limitations of statistical quality metrics to assess the quality of images distorted in distinct frequency ranges. Common quality metrics, which basically have been designed and tested for various kind of global distortions, such as image coding may not be efficient for watermarking applications, where the distortions might be restricted on a very narrow portion of the frequency spectrum. We hereby want to propose an objective quality metric whose performances do not depend on the distortion frequency range, but we nevertheless want to provide a simplified objective quality metric in opposition to the complex Human Visual System (HVS) based quality metrics recently made available. The proposed algorithm is generic (not designed for a particular distortion), and exploits the contrast sensitivity function (CSF) along with an adapted Minkowski error pooling. The results show a high correlation between the proposed objective metric and the mean opinion score (MOS) given by observers. A comparison with relevant existing objective quality metrics is provided.
Chain of evidence generation for contrast enhancement in digital image forensics
The quality of the images obtained by digital cameras has improved a lot since digital cameras early days. Unfortunately, it is not unusual in image forensics to find wrongly exposed pictures. This is mainly due to obsolete techniques or old technologies, but also due to backlight conditions. To extrapolate some invisible details a stretching of the image contrast is obviously required. The forensics rules to produce evidences require a complete documentation of the processing steps, enabling the replication of the entire process. The automation of enhancement techniques is thus quite difficult and needs to be carefully documented. This work presents an automatic procedure to find contrast enhancement settings, allowing both image correction and automatic scripting generation. The technique is based on a preprocessing step which extracts the features of the image and selects correction parameters. The parameters are thus saved through a JavaScript code that is used in the second step of the approach to correct the image. The generated script is Adobe Photoshop compliant (which is largely used in image forensics analysis) thus permitting the replication of the enhancement steps. Experiments on a dataset of images are also reported showing the effectiveness of the proposed methodology.
Media Processing and Services
icon_mobile_dropdown
A signal strength priority based position estimation for mobile platforms
Global Positioning System (GPS) products help to navigate while driving, hiking, boating, and flying. GPS uses a combination of orbiting satellites to determine position coordinates. This works great in most outdoor areas, but the satellite signals are not strong enough to penetrate inside most indoor environments. As a result, a new strain of indoor positioning technologies that make use of 802.11 wireless LANs (WLAN) is beginning to appear on the market. In WLAN positioning the system either monitors propagation delays between wireless access points and wireless device users to apply trilateration techniques or it maintains the database of location-specific signal fingerprints which is used to identify the most likely match of incoming signal data with those preliminary surveyed and saved in the database. In this paper we investigate the issue of deploying WLAN positioning software on mobile platforms with typically limited computational resources. We suggest a novel received signal strength rank order based location estimation system to reduce computational loads with a robust performance. The proposed system performance is compared to conventional approaches.
Seam carving with improved edge preservation
Johannes Kiess, Stephan Kopf, Benjamin Guthier, et al.
In this paper, we propose a new method to adapt the resolution of images to the limited display resolution of mobile devices. We use the seam carving technique to identify and remove less relevant content in images. Seam carving achieves a high adaptation quality for landscape images and distortions caused by the removal of seams are very low compared to other techniques like scaling or cropping. However, if an image depicts objects with straight lines or regular patterns like buildings, the visual quality of the adapted images is much lower. Errors caused by seam carving are especially obvious if straight lines become curved or disconnected. In order to preserve straight lines, our algorithm applies line detection in addition to the normal energy function of seam carving. The energy in the local neighborhood of the intersection point of a seam and a straight line is increased to prevent other seams from removing adjacent pixels. We evaluate our improved seam carving algorithm and compare the results with regular seam carving. In case of landscape images with no straight lines, traditional seam carving and our enhanced approach lead to very similar results. However, in the case of objects with straight lines, the quality of our results is significantly better.
Design of an H.264/SVC resilient watermarking scheme
The rapid dissemination of media technologies has lead to an increase of unauthorized copying and distribution of digital media. Digital watermarking, i.e. embedding information in the multimedia signal in a robust and imperceptible manner, can tackle this problem. Recently, there has been a huge growth in the number of different terminals and connections that can be used to consume multimedia. To tackle the resulting distribution challenges, scalable coding is often employed. Scalable coding allows the adaptation of a single bit-stream to varying terminal and transmission characteristics. As a result of this evolution, watermarking techniques that are robust against scalable compression become essential in order to control illegal copying. In this paper, a watermarking technique resilient against scalable video compression using the state-of-the-art H.264/SVC codec is therefore proposed and evaluated.
Pointing into remote 3D environments
In anticipation of the proliferation of micro-projectors on our handheld imaging devices, we designed and tested a camera-projector system that allows a distant user to point into a remote 3D environment with a projector. The solution involves a means for locating a projected dot, and for adjusting its location to correspond to a position indicted a remote user viewing the scene through a camera. It was designed to operate efficiently, even in the presence of camera noise. While many camera-projector display systems require a calibration phase, the presented approach allows calibration-free operation. The tracking algorithm is implemented with a modified 2D gradient decent method that performs even in the presence of spatial discontinuities. Our prototype was constructed using a standard web-camera and network to perform real-time tracking, navigating the projected dot across irregularly shaped and colored surfaces accurately. Our tests included a camera-projector system and client on either side of the Atlantic Ocean with no loss of responsiveness.
Mixed resolution framework for distributed multiview coding
Diogo C. Garcia, Camilo C. Dórea, Bruno Macchiavello, et al.
This work presents a new distributed multiview coding framework, based on the H.264/AVC standard operating with mixed resolution frames. It allows for a scalable complexity transfer from the encoder to the decoder, which is particularly suited for low-power video applications, such as multiview surveillance systems. Greater quality sequences are generated by exploiting the spatial and temporal correlation between views at the decoder. The results show a good potential for objective quality improvement over simulcast coding, with no extra rate cost.
Interactive Paper Session
icon_mobile_dropdown
Intelligent video surveillance with abandoned object detection and multiple pedestrian counting
Taekyung Kim, Joonki Paik
We present a novel intelligent video surveillance system with efficient detection of abandoned objects and counting number of pedestrians. In the proposed algorithm the adaptively generated background enables to solve problems of illumination change and occlusions. After building the adaptive background model, the counting procedure starts to augment number of detected objects. Experimental results show that the proposed system outperforms existing abandoned object detection and pedestrian counting methods.
How to secretly share the treasure map of the captain?
N. Islam, W. Puech, R. Brouzet
In this paper we present a new approach for sharing a secret image between l users exploiting additive homomorphic property of Paillier algorithm. With a traditional approach, when a dealer wants to share an image between l players, the secret image must be sequentially encrypted l + 1 times using l + 1 keys (secret or public keys). When the dealer and the l players want to extract the secret image, they must decrypt sequentially, keeping the same order of the encryption step, by using l + 1 keys (secret or private). With the proposed approach, during the encryption step, each player encrypts his own secret image using the same public key given by the dealer, the dealer encrypts the secret image to be shared with the same key and then the l secret encrypted images plus the encrypted image to be shared are multiplied between them to get a scrambled image. After this step, the dealer can securely use the private key to decrypt this scrambled image to get a new scrambled image which corresponds to the addition of the l + 1 original images because of the additive homomorphic property of Paillier algorithm. When the l players want to extract the secret image, they do not need the dealer and to use keys. Indeed, with our approach, to extract the secret image, the l players need only to subtract their own secret image from the scrambled image. In this paper we illustrate our approach with an example of a captain who wants to share a secret treasure map between l pirates. Experimental results and security analysis show the effectiveness of the proposed scheme.
Extending the Clark-Wilson security model for digital long-term preservation use-cases
A continuously growing amount of information of today exists not only in digital form but were actually born-digital. These informations need be preserved as they are part of our cultural and scientific heritage or because of legal requirements. As many of these information are born-digital they have no analog origin, and cannot be preserved by traditional means without losing their original representation. Thus digital long-term preservation becomes continuously important and is tackled by several international and national projects like the US National Digital Information Infrastructure and Preservation Program [1], the German NESTOR project [2] and the EU FP7 SHAMAN Integrated Project [3]. In digital long-term preservation the integrity and authenticity of the preserved information is of great importance and a challenging task considering the requirement to enforce both security aspects over a long time often assumed to be at least 100 years. Therefore in a previous work [4] we showed the general feasibility of the Clark-Wilson security model [5] for digital long-term preservation in combination with a syntactic and semantic verification approach [6] to tackle these issues. In this work we do a more detailed investigation and show exemplarily the influence of the application of such a security model on the use cases and roles of a digital long-term preservation environment. Our goals is a scalable security model - i.e. no fixed limitations of usable operations, users and objects - for mainly preserving integrity of objects but also ensuring authenticity.
Fast motion vector recovery algorithm in H.264 video streams
Kavish Seth, V. Kamakoti, S. Srinivasan
This paper proposes a fast statistical approach to recover lost motion vectors in H.264 video coding standard. Unlike other video coding standards, the motion vectors of H.264 cover smaller area of the video frame being encoded. This leads to a strong correlation between neighboring motion vectors, thus making H.264 standard amenable for statistical analysis to recover the lost motion vectors. This paper proposes a Pearson Correlation Coefficient based matching algorithm that speeds up the recovery of lost motion vectors with very less compromise in visual quality of the recovered video. To the best of our knowledge, this is the first attempt that employs correlation coefficient for motion vector recovery. Experimental results obtained by employing the proposed algorithm on standard benchmark video sequences show that they yield comparable quality of recovered video with significantly less computation than the best reported in the literature, thus making it suitable for real-time applications.
Adaptive down-sampling video coding
Ren-Jie Wang, Ming-Chen Chien, Pao-Chi Chang
Down-sampling coding, which sub-samples the image and encodes the smaller sized images, is one of the solutions to raise the image quality at insufficiently high rates. In this work, we propose an Adaptive Down-Sampling (ADS) coding for H.264/AVC. The overall system distortion can be analyzed as the sum of the down-sampling distortion and the coding distortion. The down-sampling distortion is mainly the loss of the high frequency components that is highly dependent of the spatial difference. The coding distortion can be derived from the classical Rate-Distortion theory. For a given rate and a video sequence, the optimum down-sampling resolution-ratio can be derived by utilizing the optimum theory toward minimizing the system distortion based on the models of the two distortions. This optimal resolution-ratio is used in both down-sampling and up-sampling processes in ADS coding scheme. As a result, the rate-distortion performance of ADS coding is always higher than the fixed ratio coding or H.264/AVC by 2 to 4 dB at low to medium rates.
Sharp, bright, three-dimensional: open profiling of quality for mobile 3DTV coding methods
Dominik Strohmeier, Gerhard Tech
The choice of the right coding method is a critical factor in the development process of mobile 3D television and video. Several coding methods are available and each of these is based on a different approach. These differences lead to method specific artefacts - content and bit rate are as well important parameters for the performance. In our study,w e evaluated Simulcast,Multi View Coding,Mixed Resolution Stereo Coding and Video + Depth Coding. Therefore each method has been optimized at a high and a low bit rate using parameters typical for mobile devices. The goal of the study was to get knowledge about the optimum codign method for mobile 3DTV,but also to get knowledge about the underlying rationale of quality perception. We used Open Profiling of Quality (OPQ) for comparison. OPQ combines quantitative rating and sensory profiling of the content. This allowed us to get a preference order of the coding methods and additional individual quality factors that were formed into a quality model. The results show that MVC and V+D outperform the other two approaches,but content itself is still an important factor.
Influence of camera parameters on the quality of mobile 3D capture
We investigate the effect of camera de-calibration on the quality of depth estimation. Dense depth map is a format particularly suitable for mobile 3D capture (scalable and screen independent). However, in real-world scenario cameras might move (vibrations, temp. bend) form their designated positions. For experiments, we create a test framework, described in the paper. We investigate how mechanical changes will affect different (4) stereo-matching algorithms. We also assess how different geometric corrections (none, motion compensation-like, full rectification) will affect the estimation quality (how much offset can be still compensated with "crop" over a larger CCD). Finally, we show how estimated camera pose change (E) relates with stereo-matching, which can be used for "rectification quality" measure.
Chrominance watermark for mobile applications
Alastair Reed, Eliot Rogers, Dan James
Creating an imperceptible watermark which can be read by a broad range of cell phone cameras is a difficult problem. The problems are caused by the inherently low resolution and noise levels of typical cell phone cameras. The quality limitations of these devices compared to a typical digital camera are caused by the small size of the cell phone and cost trade-offs made by the manufacturer. In order to achieve this, a low resolution watermark is required which can be resolved by a typical cell phone camera. The visibility of a traditional luminance watermark was too great at this lower resolution, so a chrominance watermark was developed. The chrominance watermark takes advantage of the relatively low sensitivity of the human visual system to chrominance changes. This enables a chrominance watermark to be inserted into an image which is imperceptible to the human eye but can be read using a typical cell phone camera. Sample images will be presented showing images with a very low visibility which can be easily read by a typical cell phone camera.
User-centered quality of experience of mobile 3DTV: how to evaluate quality in the context of use?
Satu Jumisko-Pyykkö, Timo Utriainen
Subjective quality evaluation experiments are conducted for optimizing critical system components during the process of system development. Conventionally, the experiments take place in the controlled viewing conditions even though the target application is meant to be used in the heterogeneous mobile settings. The goal of the paper is a two-fold. Firstly, we present a hybrid User-Centered Quality of Experience (UC-QoE) evaluation method for measuring quality in the context of use. The method combines quantitative preference ratings, qualitative descriptions of quality and context, characterization of context in the macro and micro levels, and the measures of effort. Secondly, we present results of two experiments using this method in different field settings and compared to the laboratory settings. We conducted the experiments with a relatively low quality range for current and future data rates for mobile (3D) television by varying encoding parameters for simulcast stereo video. The study was conducted on a portable device with parallax barrier display technology. The results show significant differences between the different field conditions and between field and laboratory measures.