Proceedings Volume 8667

Multimedia Content and Mobile Devices

cover
Proceedings Volume 8667

Multimedia Content and Mobile Devices

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 26 March 2013
Contents: 18 Sessions, 57 Papers, 0 Presentations
Conference: IS&T/SPIE Electronic Imaging 2013
Volume Number: 8667

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • JEI Focal Track Presentations
  • Special Session on Multimedia Event Detection
  • Semantic Multimedia Content Analysis
  • Bay Area Multimedia and Beyond
  • Interactive Paper Session
  • Session 4
  • Session 5
  • Session 6
  • Interactive Paper Session
  • Object Classification and Remote Sensing I
  • Object Classification and Remote Sensing II
  • Image Quality Evaluation Methods/Standards for Mobile and Digital Photography I: Joint Session with Conferences 8653, 8660, and 8667C
  • Image Quality Evaluation Methods/Standards for Mobile and Digital Photography II: Joint Session with Conferences 8653, 8660, and 8667C
  • Keynote Session
  • Plenoptic Cameras: Theory
  • Image Processing
  • Plenoptic Cameras: Depth of Field
  • Image Tracking and Stabilization
  • Front Matter: Volume 8667
JEI Focal Track Presentations
icon_mobile_dropdown
Introduction to the JEI Focal Track Presentations
Todor Georgiev, Andrew Lumsdaine
A special section of the Journal of Electronic Imaging (JEI) will replace the conference proceedings for papers presented at the SPIE conference on Mobile Computational Photography (SPIE Conference 8667D). The papers will be published early in 2013 and can be viewed at http://electronicimaging.spiedigitallibrary.org, Vol. 22 (2013).
On the application of the plenoptic camera to mobile phones
Icíar Montilla, Marta Puga, J. G. Marichal-Hernandez, et al.
The plenoptic camera was originally created to allow the capture of the Light Field, a four-variable volume representation of all rays and their directions, that allows the creation by synthesis of a 3D image of the observed object. This method has several advantages with regard to 3D capture systems based on stereo cameras, since it does note need frame synchronization or geometric and color calibration. And it has many applications, from 3DTV to medical imaging. A plenoptic camera uses a microlens array to measure the radiance and direction of all the light rays in a scene. The array is placed at the focal plane of the objective lens, and the sensor is at the focal plane of the microlenses. In this paper we study the application of our super resolution algorithm to mobile phones cameras. With a commercial camera, it is already possible to obtain images of good resolution and enough number of refocused planes, just placing a microlens array in front of the detector.
Special Session on Multimedia Event Detection
icon_mobile_dropdown
Sparse conditional mixture model: late fusion with missing scores for multimedia event detection
Ramesh Nallapati, Eric Yeh, Gregory Myers
The problem of event detection in multimedia clips is typically handled by modeling each of the component modalities independently, then combining their detection scores in a late fusion approach. One of the problems of a late fusion model in the multimedia setting is that the detection scores may be missing from one or more components for a given clip; e.g., when there is no speech in the clip; or when there is no overlay text. Standard fusion techniques typically address this problem by assuming a default backoff score for a component when its detection score is missing for a clip. This may potentially bias the fusion model, especially if there are many missing detections from a given component. In this work, we present the Sparse Conditional Mixture Model (SCMM) which models only the observed detection scores for each example, thereby avoiding making any assumptions about the distributions of the scores that are made by backoff models. Our experiments in multi-media event detection using the TRECVID-2011 corpus demonstrates that SCMM achieves statistically significant performance gains over standard late fusion techniques. The SCMM model is very general and is applicable to fusion problems with missing data in any domain.
Can object detectors aid Internet video event retrieval?
Davide Modolo, Cees G. M. Snoek
The problem of event representation for automatic event detection in Internet videos is acquiring an increasing importance, due to their applicability to a large number of applications. Existing methods focus on representing events in terms of either low-level descriptors or domain-specific models suited for a limited class of video only, ignoring the high-level meaning of the events. Ultimately aiming for a more robust and meaningful representation, in this paper we question whether object detectors can aid video event retrieval. We propose an experimental study that investigates the utility of present-day local and global object detectors for video event search. By evaluating object detectors optimized for high-quality photographs on low-quality Internet video, we establish that present-day detectors can successfully be used for recognizing objects in web videos. We use an object-based representation to re-rank the results of an appearance-based event detector. Results on the challenging TRECVID multimedia event detection corpus demonstrate that objects can indeed aid event retrieval. While much remains to be studied, we believe that our experimental study is a first step towards revealing the potential of object-based event representations.
Multimedia event detection using visual concept signatures
Ehsan Younessian, Michael Quinn, Teruko Mitamura, et al.
Multimedia Event Detection (MED) is a multimedia retrieval task with the goal of finding videos of a particular event in a large-scale Internet video archive, given example videos and text descriptions. In this paper, we mainly focus on an 'ad-hoc' scenario in MED where we do not use any example video. We aim to retrieve test videos based on their visual semantics using a Visual Concept Signature (VCS) generated for each event only derived from the event description provided as the query. Visual semantics are described using the Semantic INdexing (SIN) feature which represents the likelihood of predefined visual concepts in a video. To generate a VCS for an event, we project the given event description to a visual concept list using the proposed textual semantic similarity. Exploring SIN feature properties, we harmonize the generated visual concept signature and the SIN feature to improve retrieval performance. We conduct different experiments to assess the quality of generated visual concept signatures with respect to human expectation, and in the context of the MED task to retrieve the SIN feature of videos in the test dataset when we have no or only very few training videos.
Semantic Multimedia Content Analysis
icon_mobile_dropdown
A fast approach for integrating ORB descriptors in the bag of words model
Costantino Grana, Daniele Borghesani, Marco Manfredi, et al.
In this paper we propose to integrate the recently introduces ORB descriptors in the currently favored approach for image classification, that is the Bag of Words model. In particular the problem to be solved is to provide a clustering method able to deal with the binary string nature of the ORB descriptors. We suggest to use a k-means like approach, called k-majority, substituting Euclidean distance with Hamming distance and majority selected vector as the new cluster center. Results combining this new approach with other features are provided over the ImageCLEF 2011 dataset.
Video-based analysis of motion skills in simulation-based surgical training
Qiang Zhang, Lin Chen, Qiongjie Tian, et al.
Analysis ofmotionexpertiseisanimportantprobleminmanydomainsincludingsportsandsurgery.Inrecent years,surgicalsimulationhasemergedattheforefrontofnewtechnologiesforimprovingtheeducationand training ofsurgicalresidents.Insimulation-basedsurgicaltraining,akeytaskistoratetheperformanceofthe operators,whichisdonecurrentlybyseniorsurgeons.Thispaperintroducesanovelsolutiontothisproblem through employingvision-basedtechniques.Wedevelopanautomatic,video-basedapproachtoanalyzingthe motion skillsofasurgeoninsimulation-basedsurgicaltraining,whereasurgicalactioniscapturedbymultiple video cameraswithlittleornocalibration,resultinginmultiplevideostreamsofheterogeneousproperties. Typicalmultiple-viewvisiontechniquesareinadequateforprocessingsuchdata.Weproposeanovelapproach that employsbothcanonicalcorrelationanalysis(CCA)andthebag-of-wordsmodeltoclassifytheexpertise levelofthesubjectbasedontheheterogeneousvideostreamscapturingboththemotionofthesubject'shands and theresultantmotionofthetools.Experimentsweredesignedandperformedtovalidatetheproposed approachusingrealisticdatacapturedfromresidentsurgeonsinlocalhospitals.Theresultssuggestthatthe proposedapproachmayprovideapromisingpracticalsolutiontotherealworldproblemevaluatingmotionskills in simulation-basedsurgicaltraining.
Exploiting visual search theory to infer social interactions
Paolo Rota, Duc-Tien Dang-Nguyen, Nicola Conci, et al.
In this paper we propose a new method to infer human social interactions using typical techniques adopted in literature for visual search and information retrieval. The main piece of information we use to discriminate among different types of interactions is provided by proxemics cues acquired by a tracker, and used to distinguish between intentional and casual interactions. The proxemics information has been acquired through the analysis of two different metrics: on the one hand we observe the current distance between subjects, and on the other hand we measure the O-space synergy between subjects. The obtained values are taken at every time step over a temporal sliding window, and processed in the Discrete Fourier Transform (DFT) domain. The features are eventually merged into an unique array, and clustered using the K-means algorithm. The clusters are reorganized using a second larger temporal window into a Bag Of Words framework, so as to build the feature vector that will feed the SVM classifier.
Bay Area Multimedia and Beyond
icon_mobile_dropdown
Presentation video retrieval using automatically recovered slide and spoken text
Video is becoming a prevalent medium for e-learning. Lecture videos contain text information in both the presentation slides and lecturer's speech. This paper examines the relative utility of automatically recovered text from these sources for lecture video retrieval. To extract the visual information, we automatically detect slides within the videos and apply optical character recognition to obtain their text. Automatic speech recognition is used similarly to extract spoken text from the recorded audio. We perform controlled experiments with manually created ground truth for both the slide and spoken text from more than 60 hours of lecture video. We compare the automatically extracted slide and spoken text in terms of accuracy relative to ground truth, overlap with one another, and utility for video retrieval. Results reveal that automatically recovered slide text and spoken text contain different content with varying error profiles. Experiments demonstrate that automatically extracted slide text enables higher precision video retrieval than automatically recovered spoken text.
VidCat: an image and video analysis service for personal media management
Lee Begeja, Eric Zavesky, Zhu Liu, et al.
Cloud-based storage and consumption of personal photos and videos provides increased accessibility, functionality, and satisfaction for mobile users. One cloud service frontier that is recently growing is that of personal media management. This work presents a system called VidCat that assists users in the tagging, organization, and retrieval of their personal media by faces and visual content similarity, time, and date information. Evaluations for the effectiveness of the copy detection and face recognition algorithms on standard datasets are also discussed. Finally, the system includes a set of application programming interfaces (API’s) allowing content to be uploaded, analyzed, and retrieved on any client with simple HTTP-based methods as demonstrated with a prototype developed on the iOS and Android mobile platforms.
Interactive Paper Session
icon_mobile_dropdown
Audio stream classification for multimedia database search
M. Artese, S. Bianco, I. Gagliardi, et al.
Search and retrieval of huge archives of Multimedia data is a challenging task. A classification step is often used to reduce the number of entries on which to perform the subsequent search. In particular, when new entries of the database are continuously added, a fast classification based on simple threshold evaluation is desirable. In this work we present a CART-based (Classification And Regression Tree [1]) classification framework for audio streams belonging to multimedia databases. The database considered is the Archive of Ethnography and Social History (AESS) [2], which is mainly composed of popular songs and other audio records describing the popular traditions handed down generation by generation, such as traditional fairs, and customs. The peculiarities of this database are that it is continuously updated; the audio recordings are acquired in unconstrained environment; and for the non-expert human user is difficult to create the ground truth labels. In our experiments, half of all the available audio files have been randomly extracted and used as training set. The remaining ones have been used as test set. The classifier has been trained to distinguish among three different classes: speech, music, and song. All the audio files in the dataset have been previously manually labeled into the three classes above defined by domain experts.
Structuring a sharded image retrieval database
Eric Liang, Avideh Zakhor
In previous work we described an approach to localization based in image retrieval. Specifically, we assume coarse localization based on GPS or cell tower and refine it by matching a user generated image query to a geotagged image database. We partition the image dataset into overlapping cells, each of which contains its own approximate nearest-neighbors search structure. By combining search results from multiple cells as specified by coarse localization, we have demonstrated superior retrieval accuracy on a large image database covering downtown Berkeley. In this paper, we investigate how to select the parameters of such a system e.g. size and spacing of the cells, and show how the combination of many cells outperforms a single search structure over a large region.
Diversification of visual media retrieval results using saliency detection
Oleg Muratov, Giulia Boato, Francesco G. B. De Natale
Diversification of retrieval results allows for better and faster search. Recently there has been proposed different methods for diversification of image retrieval results mainly utilizing text information and techniques imported from natural language processing domain. However, images contain visual information that is impossible to describe in text and the use of visual features is inevitable. Visual saliency is information about the main object of an image implicitly included by humans while creating visual content. For this reason it is naturally to exploit this information for the task of diversification of the content. In this work we study whether visual saliency can be used for the task of diversification and propose a method for re-ranking image retrieval results using saliency. The evaluation has shown that the use of saliency information results in higher diversity of retrieval results.
Session 4
icon_mobile_dropdown
Enabling customer self service through image processing on mobile devices
Ingmar Kliche, Sascha Hellmann, Jörn Kreutel
Our paper will outline the results of a research project that employs image processing for the automatic diagnosis of technical devices whose internal state is communicated through visual displays. In particular, we developed a method for detecting exceptional states of retail wireless routers, analysing the state and blinking behaviour of the LEDs that make up most routers’ user interface. The method was made configurable by means of abstracting away from a particular device’s display properties, thus being able to analyse a whole range of different devices whose displays are covered by our abstraction. The method of analysis and its configuration mechanism were implemented as a native mobile application for the Android Platform. It employs the local camera of mobile devices for capturing a router’s state, and uses overlaid visual hints for guiding the user toward that perspective from where an analysis is possible.
Cognitive styles and visual quality
Satu Jumisko-Pyykkö, Dominik Strohmeier
Assessors are the main measurement instruments in subjective quality evaluation studies. Although the perceptual abilities and constrains are influenced by multiple demographic and psychographic factors, they are typically disregarded as a part of quality assessment. Cognitive styles refer to individual's consistent approaches to organize and represent information. Goal of this study is to explore the impact of cognitive styles on visual quality requirements. The data collection is conducted using the Style of Processing (SOP) questionnaire in three video quality experiments with a total of 72 participants. All participants were categorized into four groups according to sensorial preferences in information processing (visual, verbal, bimodal – high processing, and bimodal - low processing). The experiments were conducted in controlled circumstances when varying depth in video quality with several content types on the mobile device. The results show variation in quality requirements between these groups. Finally, these results also indicate that sensorial processing styles are essential to explore for sample characterization in quality evaluation experiments and for exploring more user-aware quality adjustments in future services and products.
Subjective evaluation of HEVC in mobile devices
Ray Garcia, Hari Kalva
Mobile compute environments provide a unique set of user needs and expectations that designers must consider. With increased multimedia use in mobile environments, video encoding methods within the smart phone market segment are key factors that contribute to positive user experience. Currently available display resolutions and expected cellular bandwidth are major factors the designer must consider when determining which encoding methods should be supported. The desired goal is to maximize the consumer experience, reduce cost, and reduce time to market. This paper presents a comparative evaluation of the quality of user experience when HEVC and AVC/H.264 video coding standards were used. The goal of the study was to evaluate any improvements in user experience when using HEVC. Subjective comparisons were made between H.264/AVC and HEVC encoding standards in accordance with Doublestimulus impairment scale (DSIS) as defined by ITU-R BT.500-13. Test environments are based on smart phone LCD resolutions and expected cellular bit rates, such as 200kbps and 400kbps. Subjective feedback shows both encoding methods are adequate at 400kbps constant bit rate. However, a noticeable consumer experience gap was observed for 200 kbps. Significantly less H.264 subjective quality is noticed with video sequences that have multiple objects moving and no single point of visual attraction. Video sequences with single points of visual attraction or few moving objects tended to have higher H.264 subjective quality.
Session 5
icon_mobile_dropdown
Location-based tracking using long-range passive RFID and ultrawideband communications
Faranak Nekoogar, Farid Dowla
Reliable positioning capability is a crucial need for first responders in emergency and disaster situations. Lack of a dependable positioning system can result in disruptions in the situational awareness between the local responders in the field and the emergency command and control centers. Indoor localization and navigation poses many challenges for search and rescue teams (i.e. firefighters) such as inability to determine their exact location and communicate with the incident commander outside the building. Although RF navigation and tracking systems have many advantages over other technologies, the harsh indoor RF environment demands new ways of developing and using RF sensor and communication systems. A new approach, proposed recently [1-4], employs passive RFID for geo-location and tracking of a first responder. However, because conventional passive RFID tags have limited communications ranges, a very large number of these tags will be required to fully cover a large multi-storied building without any dead spots. Another technical challenge for conventional RF communications is the transmission of data from the mobile RFID platform (the tag reader) to the outside command and control node, as the buildings walls impose challenges such as attenuation and multipath. In this paper, we introduce a mobile platform architecture that makes optimal use of long-range passive tags, and takes advantage of the frequency diversity of Ultra-wideband (UWB) communication systems for a reliable, robust and yet low-cost infrastructure.
Real-time content-aware video retargeting for tunnel vision assistance
Thomas Knack, Andreas Savakis
Image and video retargeting technologies are effective means of resizing media for aspect ratio constrained applications. In this paper, a real-time video retargeting model is proposed for smartphone implementation and potential application to low vision assistance for individuals with tunnel vision. Seam carving is a content-aware retargeting operator which defines 8-connected horizontal or vertical paths, or seams of pixels. The optimality of these seams is based on a specific energy function. Seam removal permits changes in the aspect ratio while simultaneously preserving important regions. This paper introduces a video retargeting model that incorporates spatial and temporal considerations to preserve visual integrity. Face detection masks and salience maps are used to achieve more comprehensive results. Additionally, formulation of a novel temporal coherence measure is presented that allows for retargeting on streaming video in real-time. Integration of the model with a mobile platform emphasizes its portability.
Human movement activity classification approaches that use wearable sensors and mobile devices
Sahak Kaghyan, Hakob Sarukhanyan, David Akopian
Cell phones and other mobile devices become part of human culture and change activity and lifestyle patterns. Mobile phone technology continuously evolves and incorporates more and more sensors for enabling advanced applications. Latest generations of smart phones incorporate GPS and WLAN location finding modules, vision cameras, microphones, accelerometers, temperature sensors etc. The availability of these sensors in mass-market communication devices creates exciting new opportunities for data mining applications. Particularly healthcare applications exploiting build-in sensors are very promising. This paper reviews different approaches of human activity recognition.
Concept for practical exercises for studying autonomous flying robots in a university environment: Part I
Ricardo Band, Johann Pleban, Stefan Schön, et al.
The aim of this paper is to demonstrate the usefulness of a concept of practical exercises for studying autonomous flying robots for computer science students in a university environment. It will show how students may assemble, program, fly, network and apply autonomous flying robots i.e. drones, quadrocopters, hexacopters, octocopters, helicopters, helicams, bugbots in different exercises, improve their skills, theoretical and practical knowledge in different aspects.
Applications of multimedia technology on autonomous flying robots for university technology transfer projects
Stefan Schön, Ricardo Band, Johann-Sebastian Pleban, et al.
The aim of this study is to provide an overview of the wide range of potential applications of multimedia technology in autonomous flying robots in technology transfer projects between universities and industry. In particular it describes the current status in industry and science, and depicts their potential in strengthening the links between universities and industry.
Session 6
icon_mobile_dropdown
Digitized forensics: retaining a link between physical and digital crime scene traces using QR-codes
The digitization of physical traces from crime scenes in forensic investigations in effect creates a digital chain-of-custody and entrains the challenge of creating a link between the two or more representations of the same trace. In order to be forensically sound, especially the two security aspects of integrity and authenticity need to be maintained at all times. Especially the adherence to the authenticity using technical means proves to be a challenge at the boundary between the physical object and its digital representations. In this article we propose a new method of linking physical objects with its digital counterparts using two-dimensional bar codes and additional meta-data accompanying the acquired data for integration in the conventional documentation of collection of items of evidence (bagging and tagging process). Using the exemplary chosen QR-code as particular implementation of a bar code and a model of the forensic process, we also supply a means to integrate our suggested approach into forensically sound proceedings as described by Holder et al.1 We use the example of the digital dactyloscopy as a forensic discipline, where currently progress is being made by digitizing some of the processing steps. We show an exemplary demonstrator of the suggested approach using a smartphone as a mobile device for the verification of the physical trace to extend the chain-of-custody from the physical to the digital domain. Our evaluation of the demonstrator is performed towards the readability and the verification of its contents. We can read the bar code despite its limited size of 42 x 42 mm and rather large amount of embedded data using various devices. Furthermore, the QR-code's error correction features help to recover contents of damaged codes. Subsequently, our appended digital signature allows for detecting malicious manipulations of the embedded data.
Smart apps for applied machine learning on mobile devices: the MOMO project
Stefan Edlich, Mathias Vogler
The MOMO Project consists of the parts Mobile Computing and ECO Mobility. We present research results from a subproject of the Mobile Computing part. In particular we develop Smart-Apps for modern smartphones. These applications gather data from the owners to be filtered and processed central or distributed. The applications target persons or streams of visitors at events as e.g. concerts, amusement parks or any other official buildings. Smart Apps will work up this data, visualize it and most of all allow an intelligent prediction of the user’s behavior together with recommendations. Therefore we discuss visualization strategies and the underlying machine learning concepts and applications on the mobile and on the server-side.
Real-time volume rendering of digital medical images on an iOS device
Christian Noon, Joseph Holub, Eliot Winer
Performing high quality 3D visualizations on mobile devices, while tantalizingly close in many areas, is still a quite difficult task. This is especially true for 3D volume rendering of digital medical images. Allowing this would empower medical personnel a powerful tool to diagnose and treat patients and train the next generation of physicians. This research focuses on performing real time volume rendering of digital medical images on iOS devices using custom developed GPU shaders for orthogonal texture slicing. An interactive volume renderer was designed and developed with several new features including dynamic modification of render resolutions, an incremental render loop, a shader-based clipping algorithm to support OpenGL ES 2.0, and an internal backface culling algorithm for properly sorting rendered geometry with alpha blending. The application was developed using several application programming interfaces (APIs) such as OpenSceneGraph (OSG) as the primary graphics renderer coupled with iOS Cocoa Touch for user interaction, and DCMTK for DICOM I/O. The developed application rendered volume datasets over 450 slices up to 50-60 frames per second, depending on the specific model of the iOS device. All rendering is done locally on the device so no Internet connection is required.
MessageSpace: a messaging system for health research
Rodrigo D. Escobar, David Akopian, Deborah Parra-Medina, et al.
Mobile Health (mHealth) has emerged as a promising direction for delivery of healthcare services via mobile communication devices such as cell phones. Examples include texting-based interventions for chronic disease monitoring, diabetes management, control of hypertension, smoking cessation, monitoring medication adherence, appointment keeping and medical test result delivery; as well as improving patient-provider communication, health information communication, data collection and access to health records. While existing messaging systems very well support bulk messaging and some polling applications, they are not designed for data collection and processing of health research oriented studies. For that reason known studies based on text-messaging campaigns have been constrained in participant numbers. In order to empower healthcare promotion and education research, this paper presents a system dedicated for healthcare research. It is designed for convenient communication with various study groups, feedback collection and automated processing.
Multi-resolution edge detection with edge pattern analysis
Edge detection is defined as the process of detecting and representing the presence of and locations of image signal discontinuities, which serves as the basic transformation of signals into symbols and it influences the performance of subsequent processing. In general, the edge detection operation has two main steps: filtering, and detection and localization. In the first step, finding an optimal scale of the filter is an ill-posed problem, especially when a single—global—scale is used over the entire image. Multi-resolution description of the image which can fully represent the image features occurring in a range of scales is used, where a combination of Gaussian filters with different scales can ameliorate the single scale issue. In the second step, often edge detectors have been designed to capture simple ideal step functions in image data, but real image signal discontinuities deviate from this ideal form. Another three types of deviations from the step function which relate to real distortions occurring in natural images are examined. These types are impulse, ramp, and sigmoid functions which respectively represent narrow line signals, simplified blur effects, and more accurate blur modeling. General rules for edge detection based upon the classification of edge types into four categories-ramp, impulse, step, and sigmoid are developed from this analysis. The performance analysis on experiments supports that the proposed multi-resolution edge detection algorithm with edge pattern analysis does lead to more effective edge detection and localization with improved accuracies.
Interactive Paper Session
icon_mobile_dropdown
Client-side Skype forensics: an overview
Tina Meißner, Knut Kröger, Reiner Creutzburg
IT security and computer forensics are important components in the information technology. In the present study, a client-side Skype forensics is performed. It is designed to explain which kind of user data are stored on a computer and which tools allow the extraction of those data for a forensic investigation. There are described both methods - a manual analysis and an analysis with (mainly) open source tools, respectively.
Gradient-based fusion of infrared and visual face images using support vector machine for human face identification
Priya Saha, Mrinal K. Bhowmik, Debotosh Bhattacharjee, et al.
Pose and illumination invariant face recognition problem is now-a-days an emergent problem in the field of information security. In this paper, gradient based fusion method of gradient visual and corresponding infrared face images have been proposed to overcome the problem of illumination varying conditions. This technique mainly extracts illumination insensitive features under different conditions for effective face recognition purpose. The gradient image is computed from a visible light image. Information fusion is performed in the gradient map domain. The image fusion of infrared image and corresponding visual gradient image is done in wavelet domain by taking the maximum information of approximation and detailed coefficients. These fused images have been taken for dimension reduction using Independent Component Analysis (ICA). The reduced face images are taken for training and testing purposes from different classes of different datasets of IRIS face database. SVM multiclass strategy ‘one-vs.-all’ have been taken in the experiment. For training support vector machine, Sequential Minimal Optimization (SMO) algorithm has been used. Linear kernel and Polynomial kernel with degree 3 are used in SVM kernel functions. The experiment results show that the proposed approach generates good classification accuracies for the face images under different lighting conditions.
Future mobile access for open-data platforms and the BBC-DaaS system
Stefan Edlich, Sonam Singh, Ingo Pfennigstorf
In this paper, we develop an open data platform on multimedia devices to act as marketplace of data for information seekers and data providers. We explore the important aspects of Data-as-a-Service (DaaS) service in the cloud with a mobile access point. The basis of the DaaS service is to act as a marketplace for information, utilizing new technologies and recent new scalable polyglot architectures based on NoSql databases. Whereas Open-Data platforms are beginning to be widely accepted, its mobile use is not. We compare similar products, their approach and a possible mobile usage. We discuss several approaches to address the mobile access as a native app, html5 and a mobile first approach together with the several frontend presentation techniques. Big data visualization itself is in the early days and we explore some possibilities to get big data / open data accessed by mobile users.
Location tracking forensics on mobile devices
Stefan Sack, Knut Kröger, Reiner Creutzburg
The spread of navigation devices has increased significantly over the last 10 years. With the help of the current development of even smaller navigation receiver units it is to navigate with almost any current smart phone. Modern navigation systems are no longer limited to satellite navigation, but use current techniques, e.g. WLAN localization. Due to the increased use of navigation devices their relevance to forensic investigations has risen rapidly. Because navigation, for example with navigation equipment and smartphones, have become common place these days, also the amount of saved navigation data has risen rapidly. All of these developments lead to a necessary forensic analysis of these devices. However, there are very few current procedures for investigating of navigation devices. Navigation data is forensically interesting because by the position of the devices in most cases the location and the traveled path of the owner can be reconstructed. In this work practices for forensic analysis of navigation devices are developed. Different devices will be analyzed and it is attempted, by means of forensic procedures to restore the traveled path of the mobile device. For analysis of the various devices different software and hardware is used. There will be presented common procedures for securing and testing of mobile devices. Further there will be represented the specials in the investigation of each device. The different classes considered are GPS handhelds, mobile navigation devices and smartphones. It will be attempted, wherever possible, to read all data of the device. The aim is to restore complete histories of the navigation data and to forensically study and analyze these data. This is realized by the usage of current forensic software e.g. TomTology or Oxygen Forensic Suite. It is also attempted to use free software whenever possible. Further alternative methods are used (e.g. rooting) to access locked data of the unit. To limit the practical work the data extraction is focused on the frequently used device sample of a specific class, as the procedure for many groups of devices can be similar. In the present work a Garmin Dakota 10, a TomTom GO 700, an iPhone 4 (iOS) and a Samsung Galaxy S Plus (Android) is used because they have a wide circulation.
Conception of a course for professional training and education in the field of computer and mobile forensics: Part II: Android Forensics
Knut Kröger, Reiner Creutzburg
The growth of Android in the mobile sector and the interest to investigate these devices from a forensic point of view has rapidly increased. Many companies have security problems with mobile devices in their own IT infrastructure. To respond to these incidents, it is important to have professional trained staff. Furthermore, it is necessary to further train their existing employees in the practical applications of mobile forensics owing to the fact that a lot of companies are trusted with very sensitive data. Inspired by these facts, this paper - a continuation of a paper of January 2012 [1] which showed the conception of a course for professional training and education in the field of computer and mobile forensics - addresses training approaches and practical exercises to investigate Android mobile devices.
Possibilities and modification of the forensic investigation process of solid-state drives
This paper addresses the possibilities of a forensic investigation of solid-state drives. The aim of this study is to clarify information gained via a forensic analysis of these media, and explain the differences to conventional forensic examinations of hard disk drives. Within each test design a series and a variety of hard- and software were used. An interesting result is that the built-in TRIM function of the SSD has an adverse affect in a forensic investigation.
Mobile learning in medicine
This paper outlines the main infrastructure for implicating mobile learning in medicine and present a sample mobile learning application for medical learning within the framework of mobile learning systems. Mobile technology is developing nowadays. In this case it will be useful to develop different learning environments using these innovations in internet based distance education. M-learning makes the most of being on location, providing immediate access, being connected, and acknowledges learning that occurs beyond formal learning settings, in places such as the workplace, home, and outdoors. Central to m-learning is the principle that it is the learner who is mobile rather than the device used to deliver m learning. The integration of mobile technologies into training has made learning more accessible and portable. Mobile technologies make it possible for a learner to have access to a computer and subsequently learning material and activities; at any time and in any place. Mobile devices can include: mobile phone, personal digital assistants (PDAs), personal digital media players (eg iPods, MP3 players), portable digital media players, portable digital multimedia players. Mobile learning (m-learning) is particularly important in medical education, and the major users of mobile devices are in the field of medicine. The contexts and environment in which learning occurs necessitates m-learning. Medical students are placed in hospital/clinical settings very early in training and require access to course information and to record and reflect on their experiences while on the move. As a result of this paper, this paper strives to compare and contrast mobile learning with normal learning in medicine from various perspectives and give insights and advises into the essential characteristics of both for sustaining medical education.
Overview and forensic investigation approaches of the gaming console Sony PlayStation Portable
Stephan Schön, Ralph Schön, Knut Kröger, et al.
This paper addresses the forensically interesting features of the Sony PlayStation Portable game console. The construction and the internal structure are analyzed precisely and interesting forensic features of the operating system and the file system are presented.
Reconstruction of the image on the Cartesian lattice from a finite number of projections in computed-tomographic imaging
The reconstruction of the image f(x, y) is from a finite number of projections on the discrete Cartesian lattice N × N is described. The reconstruction is exact in the framework of the model, when image is considered as the set of N2 cells, or image elements with constant intensity each. Such reconstruction is achieved because of the following two facts. Each basis function of the tensor transformation is determined by the set of parallel rays, and, therefore, the components of the tensor transform can be calculated by ray-sums. These sums can be determined from the ray-integrals, and we introduce here the concept of geometrical, or G-rays to solve this problem. The examples of image reconstruction by the proposed method are given, and the reconstruction on the Cartesian lattice 7 × 7 is described in detail.
Method of G-particles for image reconstruction from a finite number of projections
To reconstruct the image from a finite number of projections, the concept of the point map of projections is described. Each projection is described by the corresponding set of line-integrals along a finite set of rays. The image element with its geometry is considered as a particle, or G-particle which is described by the field function. The map of each particle is considered in the form of a matrix which describes all rays passing through this particle. The concept of the field functions of particles is described as a number of rays passing through the particles with others at the same time. The consideration of the field functions for these G-particles leads to a representation of the image by the field functions, and this representation allows for reconstructing the image from its projections. The reconstruction fn,m of the image f(x, y) on the 64×64 and 128×128 Cartesian lattices by the method of G-particles is demonstrated on the images with random rectangles.
Object Classification and Remote Sensing I
icon_mobile_dropdown
Determination of sensor oversize for stereo-pair mismatch compensation and image stabilization
Stereoscopic cameras consist of two camera modules that in theory are mounted parallel to each other at a fixed distance along a single plane. Practical tolerances in the manufacturing and assembly process can, however, cause mismatches in the relative orientation of the modules. One solution to this problem is to design sensors that image a larger field-of-view than is necessary to meet system specifications. This requires the computation of the sensor oversize needed to compensate for the various types of mismatch. This work presents a mathematical framework to determine these oversize values for mismatch along each of the six degrees of freedom. One module is considered as the reference and the extreme rays of the field-of-view of the second sensor are traced in order to derive equations for the required horizontal and vertical oversize. As a further application, by modeling user hand-shake as the displacement of the sensor from its intended position, these deterministic equations could be used to estimate the sensor oversize required to stabilize images that are captured using cell phones.
Object Classification and Remote Sensing II
icon_mobile_dropdown
Nokia PureView oversampling technology
Tero Vuori, Juha Alakarhu, Eero Salmelin, et al.
This paper describes Nokia’s PureView oversampling imaging technology as well as the product, Nokia 808 PureView, featuring it. The Nokia PureView imaging technology is the combination of a large, super high resolution 41Mpix with high performance Carl Zeiss optics. Large sensor enables a pixel oversampling technique that reduces an image taken at full resolution into a lower resolution picture, thus achieving higher definition and light sensitivity. One oversampled super pixel in image file is formed by using many sensor pixels. A large sensor enables also a lossless zoom. If a user wants to use the lossless zoom, the sensor image is cropped. However, up-scaling is not needed as in traditional digital zooming usually used in mobile devices. Lossless zooming means image quality that does not have the digital zooming artifacts as well as no optical zooming artifacts like zoom lens system distortions. Zooming with PureView is also completely silent. PureView imaging technology is the result of many years of research and development and the tangible fruits of this work are exceptional image quality, lossless zoom, and superior low light performance.
Image quality evaluation using moving targets
The basic concept of testing a digital imaging device is to reproduce a known target and to analyze the resulting image. This semi-reference approach can be used for various different aspects of image quality. Each part of the imaging chain can have an influence on the results: lens, sensor, image processing and the target itself. The results are valid only for the complete system. If we want to test a single component, we have to make sure that we change only one and keep all others constant. When testing mobile imaging devices, we run into the problem that hardly anything can be manually controlled by the tester. Manual exposure control is not available for most devices, the focus cannot be influenced and hardly any settings for the image processing are available. Due to the limitations in the hardware, the image pipeline in the digital signal processor (DSP) of mobile imaging devices is a critical part of the image quality evaluation. The processing power of the DSPs allows sharpening, tonal correction and noise reduction to be non-linear and adaptive. This makes it very hard to describe the behavior for an objective image quality evaluation. The image quality is highly influenced by the signal processing for noise and resolution and the processing is the main reason for the loss of low contrast, _ne details, the so called texture blur. We present our experience to describe the image processing in more detail. All standardized test methods use a defined chart and require, that the chart and the camera are not moved in any way during test. In this paper, we present our results investigating the influence of chart movement during the test. Different structures, optimized for different aspects of image quality evaluation, are moved with a defined speed during the capturing process. The chart movement will change the input for the signal processing depending on the speed of the target during the test. The basic theoretical changes in the image will be the introduction of motion blur. With the known speed and the measured exposure time, we can calculate the theoretical motion blur. We compare the theoretical influence of the motion blur with the measured results. We use different methods to evaluate image quality parameter vs. motion speed of the chart. Slanted edges are used to obtain a SFR and to check for image sharpening. The aspect of texture blur is measured using dead leaves structures. The theoretical and measured results are plotted against the speed of the chart and allow an insight into the behavior of the DSP.
Multiple-field approach for aberration correction in miniature imaging systems based on wafer-level production
Eric Logean, Toralf Scharf, Nicolas Bongard, et al.
In mobile imaging systems, the most difficult element to integrate is the objective lens. Here we present an intermediate approach between the costly traditional objectives and the low-resolution objectives inspired by the compound eyes of insects. Our multi-field approach uses a small number of optical channels each imaging a portion of the desired field of view. The full-field image is reconstructed digitally. The optics of each channel is kept simple for wafer-level fabrication and its size is sufficient to obtain a reasonable resolution. We present the design and fabrication of a prototype using 9 plano-convex lenses for 9 channels. Glass lenses glued on a wafer are used to image a full-field of ±40° with an f-number of 3. The images obtained shows field curvature correction. A simple image reconstruction scheme is presented. In conclusion, multi-field objectives fabricated with micro-optics technology are thin, simple to mount, robust, and easily replicated.
Auto-focus algorithm based on statistical blur estimation
Conventional auto-focus techniques in movable-lens camera systems use a measure of image sharpness to determine the lens position that brings the scene into focus. This paper presents a novel wavelet-domain approach to determine the position of best focus. In contrast to current techniques, the proposed algorithm estimates the level of blur in the captured image at each lens position. Image blur is quantified by fitting a Generalized Gaussian Density (GGD) curve to a high-pass version of the image using second-order statistics. The system then moves the lens to the position that yields the least measure of image blur. The algorithm overcomes shortcomings of sharpness-based approaches, namely, the application of large band-pass filters, sensitivity to image noise and need for calibration under different imaging conditions. Since noise has no effect on the proposed blur metric, the algorithm works with a short filter and is devoid of parameter tuning. Furthermore, the algorithm could be simplified to use a single high-pass filter to reduce complexity. These advantages, along with the optimization presented in the paper, make the proposed algorithm very attractive for hardware implementation on cell phones. Experiments prove that the algorithm performs well in the presence of noise as well as resolution and data scaling.
Image Quality Evaluation Methods/Standards for Mobile and Digital Photography I: Joint Session with Conferences 8653, 8660, and 8667C
icon_mobile_dropdown
Low light performance of digital still cameras
The major difference between a dSLR camera, a consumer camera, and a camera in a mobile device is the sensor size. The sensor size is also related to the over all system size including the lens. With the sensors getting smaller the individual light sensitive areas are also getting smaller leaving less light falling onto each of the pixels. This effect requires higher signal amplification that leads to higher noise levels or other problems that may occur due to denoising algorithms. These Problems become more visible at low light conditions because of the lower signal levels. The fact that the sensitivity of cameras decreases makes customers ask for a standardized way to measure low light performance of cameras. The CEA (Consumer Electronics Association) together with ANSI has addressed this for camcorders in the CEA-639 [1] standard. The ISO technical committee 42 (photography) is currently also thinking about a potential standard on this topic for still picture cameras. This paper is part of the preparation work for this standardization activity and addresses the differences compared to camcorders and also potential additional problems with noise reduction that have occurred over the past few years. The result of this paper is a proposed test procedure with a few open questions that have to be answered in future work.
Image Quality Evaluation Methods/Standards for Mobile and Digital Photography II: Joint Session with Conferences 8653, 8660, and 8667C
icon_mobile_dropdown
Noise evaluation standard of image sensor using visual spatio-temporal frequency characteristics
Takeyuki Fujii, Shoichi Suzuki, Shinichiro Saito
Regarding the noise evaluation of image sensor, it is important to establish the objective evaluation method which has high correlation with appearance. It is well known that visual noise standard is a noise evaluation metric using human visual characteristics. The visual noise level can vary depending on the viewing distance, spatial frequency, color and viewing conditions. A method of measuring the visual noise level is provided in ISO 15739.[1][2] Furthermore it was discovered that visual characteristics depend on contrast and frame rate; however, the ISO method doesn't consider that. For example, since ISO15739 focus the absolute threshold of human visual system for still image, in some case, the correlation between subjective evaluation and objective evaluation was not so high. And in moving image sequences case, the faster frame rate becomes, the lower perception of noise becomes. We propose solutions to solve those problems using visual spatio-temporal frequency characteristics. Firstly, we investigated visual spatial frequency characteristics that depend on contrast and propose a new evaluation method. It shows that the image sensor with large pixel count is effective in noise reduction. Secondly, we investigated visual temporal frequency characteristics and propose a new evaluation method for the moving image sequences. It shows that the image sensor with high frame rate is effective in noise reduction. Finally, by combining two proposed methods, we show the method in which a noise evaluation is possible in both a still image and in moving image sequences. We applied the proposal method to moving image sequences acquired by the image sensor and investigated the validity of the method.
Keynote Session
icon_mobile_dropdown
Lytro camera technology: theory, algorithms, performance analysis
Todor Georgiev, Zhan Yu, Andrew Lumsdaine, et al.
The Lytro camera is the first implementation of a plenoptic camera for the consumer market. We consider it a successful example of the miniaturization aided by the increase in computational power characterizing mobile computational photography. The plenoptic camera approach to radiance capture uses a microlens array as an imaging system focused on the focal plane of the main camera lens. This paper analyzes the performance of Lytro camera from a system level perspective, considering the Lytro camera as a black box, and uses our interpretation of Lytro image data saved by the camera. We present our findings based on our interpretation of Lytro camera file structure, image calibration and image rendering; in this context, artifacts and final image resolution are discussed.
Plenoptic Cameras: Theory
icon_mobile_dropdown
Wave analysis of a plenoptic system and its applications
Traditional imaging systems directly image a 2D object plane on to the sensor. Plenoptic imaging systems contain a lenslet array at the conventional image plane and a sensor at the back focal plane of the lenslet array. In this configuration the data captured at the sensor is not a direct image of the object. Each lenslet effectively images the aperture of the main imaging lens at the sensor. Therefore the sensor data retains angular light-field information which can be used for a posteriori digital computation of multi-angle images and axially refocused images. If a filter array, containing spectral filters or neutral density or polarization filters, is placed at the pupil aperture of the main imaging lens, then each lenslet images the filters on to the sensor. This enables the digital separation of multiple filter modalities giving single snapshot, multi-modal images. Due to the diversity of potential applications of plenoptic systems, their investigation is increasing. As the application space moves towards microscopes and other complex systems, and as pixel sizes become smaller, the consideration of diffraction effects in these systems becomes increasingly important. We discuss a plenoptic system and its wave propagation analysis for both coherent and incoherent imaging. We simulate a system response using our analysis and discuss various applications of the system response pertaining to plenoptic system design, implementation and calibration.
Fourier analysis of the focused plenoptic camera
Andrew Lumsdaine, Lili Lin, Jeremiah Willcock, et al.
The focused plenoptic camera is a recently developed approach to lightfield capture that uses the microlens array as an imaging system focused on the focal plane of the main camera lens. Since lightfields can be captured with significantly higher spatial resolution than with the traditional approach, images can be rendered at resolutions that meet the expectations of modern photographers. The focused plenoptic camera captures lightfields with a different tradeoff between spatial and angular information than with the traditional approach. To more rigorously characterize these tradeoffs, including the limits of this new approach, this paper presents a Fourier analysis of the focused plenoptic camera. Based on this analysis, we also present an extended Fourier-slice rendering algorithm that can be used to render high-resolution images from lightfields.
Image Processing
icon_mobile_dropdown
Design of user interfaces for selective editing of digital photos on touchscreen devices
Thomas Binder, Meikel Steiding, Manuel Wille, et al.
When editing images it is often desirable to apply a filter with a spatially varying strength. With the usual selection tools like gradient, lasso, brush, or quick selection tools, creating masks containing such spatially varying strength values is time-consuming and cumbersome. We present an interactive filtering approach which allows to process photos selectively without the intermediate step of creating a mask containing strength values. In using this approach, the user only needs to place reference points (called control points) on the image and to adjust the spatial influence and filter strength for each control point. The filter is then applied selectively to the image, with strength values interpolated for each pixel between control points. The interpolation is based on a mixture of distances in space, luminance, and color; it is therefore a low-level operation. Since the main goal of the approach is to make selective image editing intuitive, easy, and playful, emphasis is put on the user interface: We describe the process of developing an existing mouse-driven user interface into a touch-driven one. Many question needed to be answered anew, such as how to present a slider widget on a touchscreen. Several variants are discussed and compared.
Touch HDR: photograph enhancement by user controlled wide dynamic range adaptation
Steve Verrall, Hasib Siddiqui, Kalin Atanassov, et al.
High Dynamic Range (HDR) technology enables photographers to capture a greater range of tonal detail. HDR is typically used to bring out detail in a dark foreground object set against a bright background. HDR technologies include multi-frame HDR and single-frame HDR. Multi-frame HDR requires the combination of a sequence of images taken at different exposures. Single-frame HDR requires histogram equalization post-processing of a single image, a technique referred to as local tone mapping (LTM). Images generated using HDR technology can look less natural than their non- HDR counterparts. Sometimes it is only desired to enhance small regions of an original image. For example, it may be desired to enhance the tonal detail of one subject’s face while preserving the original background. The Touch HDR technique described in this paper achieves these goals by enabling selective blending of HDR and non-HDR versions of the same image to create a hybrid image. The HDR version of the image can be generated by either multi-frame or single-frame HDR. Selective blending can be performed as a post-processing step, for example, as a feature of a photo editor application, at any time after the image has been captured. HDR and non-HDR blending is controlled by a weighting surface, which is configured by the user through a sequence of touches on a touchscreen.
Temporal image stacking for noise reduction and dynamic range improvement
Kalin Atanassov, James Nash, Sergio Goma, et al.
The dynamic range of an imager is determined by the ratio of the pixel well capacity to the noise floor. As the scene dynamic range becomes larger than the imager dynamic range, the choices are to saturate some parts of the scene or “bury” others in noise. In this paper we propose an algorithm that produces high dynamic range images by “stacking” sequentially captured frames which reduces the noise and creates additional bits. The frame stacking is done by frame alignment subject to a projective transform and temporal anisotropic diffusion. The noise sources contributing to the noise floor are the sensor heat noise, the quantization noise, and the sensor fixed pattern noise. We demonstrate that by stacking images the quantization and heat noise are reduced and the decrease is limited only by the fixed pattern noise. As the noise is reduced, the resulting cleaner image enables the use of adaptive tone mapping algorithms which render HDR images in an 8-bit container without significant noise increase.
Accelerating defocus blur magnification
Florian Kriener, Thomas Binder, Manuel Wille
A shallow depth-of-field is often used as a creative element in photographs. This, however, comes at the cost of expensive and heavy camera equipment, such as large sensor DSLR bodies and fast lenses. In contrast, cheap small-sensor cameras with fixed lenses usually exhibit a larger depth-of-field than desirable. In this case a computational solution is suggesting, since a shallow depth-of-field cannot be achieved by optical means. One possibility is to algorithmically increase the defocus blur already present in the image. Yet, existing algorithmic solutions tackling this problem suffer from poor performance due to the ill-posedness of the problem: The amount of defocus blur can be estimated at edges only; homogeneous areas do not contain such information. However, to magnify the defocus blur we need to know the amount of blur at every pixel position. Estimating it requires solving an optimization problem with many unknowns. We propose a faster way to propagate the amount of blur from the edges to the entire image by solving the optimization problem on a small scale, followed by edge-aware upsampling using the original image as guide. The resulting approximate defocus map can be used to synthesize images with shallow depth-of-field with quality comparable to the original approach. This is demonstrated by experimental results.
Plenoptic Cameras: Depth of Field
icon_mobile_dropdown
Adaptive DOF for plenoptic cameras
Alexander Oberdörster, Hendrik P. A. Lensch
Plenoptic cameras promise to provide arbitrary re-focusing through a scene after the capture. In practice, however, the refocusing range is limited by the depth of field (DOF) of the plenoptic camera. For the focused plenoptic camera, this range is given by the range of object distances for which the microimages are in focus. We propose a technique of recording light fields with an adaptive depth of focus. Between multiple exposures { or multiple recordings of the light field { the distance between the microlens array (MLA) and the image sensor is adjusted. The depth and quality of focus is chosen by changing the number of exposures and the spacing of the MLA movements. In contrast to traditional cameras, extending the DOF does not necessarily lead to an all-in-focus image. Instead, the refocus range is extended. There is full creative control about the focus depth; images with shallow or selective focus can be generated.
Plenoptic depth map in the case of occlusions
Zhan Yu, Jingyi Yu, Andrew Lumsdaine, et al.
Recent realizations of hand-held plenoptic cameras have given rise to previously unexplored effects in photography. Designing a mobile phone plenoptic camera is becoming feasible with the significant increase of computing power of mobile devices and the introduction of System on a Chip. However, capturing high numbers of views is still impractical due to special requirements such as ultra-thin camera and low costs. In this paper, we analyze a mobile plenoptic camera solution with a small number of views. Such a camera can produce a refocusable high resolution final image if a depth map is generated for every pixel in the sparse set of views. With the captured multi-view images, the obstacle to recovering a high-resolution depth is occlusions. To robustly resolve these, we first analyze the behavior of pixels in such situations. We show that even under severe occlusion, one can still distinguish different depth layers based on statistics. We estimate the depth of each pixel by discretizing the space in the scene and conducting plane sweeping. Specifically, for each given depth, we gather all corresponding pixels from other views and model the in-focus pixels as a Gaussian distribution. We show how it is possible to distinguish occlusion pixels, and in-focus pixels in order to find the depths. Final depth maps are computed in real scenes captured by a mobile plenoptic camera.
Reduced depth of field using multi-image fusion
Boris Ajdin, Timo Ahonen
This paper presents a multi image fusion approach for artificially reducing the depth of field in handheld phone camera photographs. The system captures a low resolution focal stack of images with variable focus settings, and two high resolution images: one with maximum scene blur and another one with maximum sharpness. The focal stack is used to guide the segmentation of the object of interest from the sharp image, which is then blended onto the background obtained from the blurry image, resulting in a visually pleasing image with a shallow depth of field effect.
Optimizing depth-of-field based on a range map and a wavelet transform
Mike Wellner, Thomas Käster, Thomas Martinetz, et al.
The imaging properties of small cameras in mobile devices exclude restricted depth-of-field and range-dependent blur that may provide a sensation of depth. Algorithmic solutions to this problem usually fail because high- quality, dense range maps are hard to obtain, especially with a mobile device. However, methods like stereo, shape from focus stacks, and the use of ashlights may yield coarse and sparse range maps. A standard procedure is to regularize such range maps to make them dense and more accurate. In most cases, regularization leads to insufficient localization, and sharp edges in depth cannot be handled well. In a wavelet basis, an image is defined by its significant wavelet coefficients, only these need to be encoded. If we wish to perform range-dependent image processing, we only need to know the range for the significant wavelet coefficients. We therefore propose a method that determines a sparse range map only for significant wavelet coefficients, then weights the wavelet coefficients depending on the associate range information. The image reconstructed from the resulting wavelet representation exhibits space-variant, range-dependent blur. We present results based on images and range maps obtained with a consumer stereo camera and a stereo mobile phone.
Image Tracking and Stabilization
icon_mobile_dropdown
A new fusion-based low light still-shot stabilization
Young-Su Moon, Shi-Hwa Lee
Digital cameras under a dark illumination invoke artifacts like motion blur in a long-exposed shot or salient noise corruption in a short-exposed (High ISO) shot. To suppress such artifacts effectively, multi-frame fusion approaches involving the use of multiple short-exposed images has been studied actively. Moreover, it recently has been being applied to various consumer digital cameras for the practical still-shot stabilization. However, it requires too much computational complexities and costs in order to conduct both multiframe noise filtering and brightness/color appearance restoration well from a set of multiple input images acquired at a harsh low-light situation. In this paper, we propose a new fusion-based low-light stabilization approach, which inputs one proper-/long-exposure blurry image as well as multiple short-exposure noisy images. First, a coarse-to-fine motion compensated noise filtering is done to get a clean image from the multiple short-exposure images. Then, online low-light image restoration is followed to obtain a good visual appearance from the denoised image using a blurry long-exposure input image. More specifically, the noise filtering is conducted by a simple block-wise temporal averaging based on a between-frame motion info, which provides a denoising result with even better detail preservation. Our simulation and real scene tests show the possibility of the proposed algorithm for fast and effective low light stabilization at a programmable computing platform.
Real-time skeleton tracking for embedded systems
Foti Coleca, Sascha Klement, Thomas Martinetz, et al.
Touch-free gesture technology is beginning to become more popular with consumers and may have a significant future impact on interfaces for digital photography. However, almost every commercial software framework for gesture and pose detection is aimed at either desktop PCs or high-powered GPUs, making mobile implementations for gesture recognition an attractive area for research and development. In this paper we present an algorithm for hand skeleton tracking and gesture recognition that runs on an ARM-based platform (Pandaboard ES, OMAP 4460 architecture). The algorithm uses self-organizing maps to fit a given topology (skeleton) into a 3D point cloud. This is a novel way of approaching the problem of pose recognition as it does not employ complex optimization techniques or data-based learning. After an initial background segmentation step, the algorithm is ran in parallel with heuristics, which detect and correct artifacts arising from insufficient or erroneous input data. We then optimize the algorithm for the ARM platform using fixed-point computation and the NEON SIMD architecture the OMAP4460 provides. We tested the algorithm with two different depth-sensing devices (Microsoft Kinect, PMD Camboard). For both input devices we were able to accurately track the skeleton at the native framerate of the cameras.
Front Matter: Volume 8667
icon_mobile_dropdown
Front Matter: Volume 8667
This PDF file contains the front matter associated with SPIE Proceedings Volume 8667, including the Title Page, Copyright Information, Table of Contents, and the Conference Committee listing.