Multimedia on Mobile Devices 2012; and Multimedia Content Access: Algorithms and Systems VI | (2012) | Publications

Volume Details

Date Published: 8 February 2012

Contents: 8 Sessions, 32 Papers, 0 Presentations

Conference: IS&T/SPIE Electronic Imaging 2012

Volume Number: 8304

All links to SPIE Proceedings will open in the SPIE Digital Library.

Show all abstracts

View Session

Emerging Mobile Applications
Processing and Displays for Mobile Applications
Security, Safety, and Location Technologies
Algorithms for Mobile Computing
Interactive Paper Session
Multimedia Content Classification
Semantic Multimedia Access
Interactive Paper Session
Front Matter: Volume 8304

Emerging Mobile Applications

Location-aware gang graffiti acquisition and browsing on a mobile device

Albert Parra, Mireille Boutin, Edward J. Delp

Show abstract

In this paper we describe a mobile-based system that allows first responders to identify and track gang graffiti by combining the use of image analysis and location-based-services. The gang graffiti image and metadata (geoposition, date and time) obtained automatically are transferred to a server and uploaded to a database of graffiti images. The database can then be queried with the matched results sent back to the mobile device where the user can then review the results and provide extra inputs to refine the information.

Dietary intake assessment using integrated sensors and software

Junqing Shang, Eric Pepin, Eric Johnson, et al.

Show abstract

The area of dietary assessment is becoming increasingly important as obesity rates soar, but valid measurement of the food intake in free-living persons is extraordinarily challenging. Traditional paper-based dietary assessment methods have limitations due to bias, user burden and cost, and therefore improved methods are needed to address important hypotheses related to diet and health. In this paper, we will describe the progress of our mobile Diet Data Recorder System (DDRS), where an electronic device is used for objective measurement on dietary intake in real time and at moderate cost. The DDRS consists of (1) a mobile device that integrates a smartphone and an integrated laser package, (2) software on the smartphone for data collection and laser control, (3) an algorithm to process acquired data for food volume estimation, which is the largest source of error in calculating dietary intake, and (4) database and interface for data storage and management. The estimated food volume, together with direct entries of food questionnaires and voice recordings, could provide dietitians and nutritional epidemiologists with more complete food description and more accurate food portion sizes. In this paper, we will describe the system design of DDRS and initial results of dietary assessment.

FCam for multiple cameras

Alejandro Troccoli, Dawid Pajak, Kari Pulli

Show abstract

The Frankencamera (FCam) architecture and API enables precise control over the camera in computational photography applications. We present an extension to FCam API for systems equipped with multiple cameras. The proposed extension allows for an enumeration of cameras and their corresponding properties, such as position or orientation. In addition, we explicitly support camera synchronization, either through hardware mechanisms or software primitives. If hardware synchronization is available, cameras can be grouped together under a concept of a multi-sensor. Otherwise, multiple camera streams are scheduled asynchronously and synchronized using our software control primitives.

Processing and Displays for Mobile Applications

Continuously adjustable Pulfrich spectacles for mobile devices

Ken Jacobs, Ron Karpf

Show abstract

Mobile devices present a challenging platform for 3D video because of inherent device limitations. Continuously Adjustable Pulfrich Spectacles for Mobile Devices (CAPS-MD) is a new implementation of the Pulfrich 3D stereoscopic effect. For every scene that contains lateral motion in a 2D movie, CAPS-MD provides realistic 3D. Since it requires minimal additional processing, it is appropriate for mobile devices. 3D movies utilizing the Pulfrich stereoscopic effect have been made for 80 years using passive viewing spectacles. CAPS-MD use active viewing spectacles to overcome the limitations of passive spectacles. 3D movies normally employ the asymmetry of dual images to produce stereopsis. CAPS-MD works on the principle of illumination asymmetry, and only needs to control the differential lens optical densities. CAPS-MD is fabricated from optoelectronic materials that electronically control the lens optical densities. The eye's retinal triggering is used by CAPS-MD to determine the differential lens optical densities. Motion estimation calculations from the digital image processing used to display 2D video on mobile devices are reused to calculate realtime lens adjustments so CAPS-MD always conform to the optical density that optimizes the Pulfrich stereoscopic effect. Only negligible additional processing is necessary for CAPS-MD to show 3D for every scene that contains lateral motion in any 2D movie.

Parameters of the human 3D gaze while observing portable autostereoscopic display: a model and measurement results

Atanas Boev, Marianne Hanhela, Atanas Gotchev, et al.

Show abstract

We present an approach to measure and model the parameters of human point-of-gaze (PoG) in 3D space. Our model considers the following three parameters: position of the gaze in 3D space, volume encompassed by the gaze and time for the gaze to arrive on the desired target. Extracting the 3D gaze position from binocular gaze data is hindered by three problems. The first problem is the lack of convergence - due to micro saccadic movements the optical lines of both eyes rarely intersect at a point in space. The second problem is resolution - the combination of short observation distance and limited comfort disparity zone typical for a mobile 3D display does not allow the depth of the gaze position to be reliably extracted. The third problem is measurement noise - due to the limited display size, the noise range is close to the range of properly measured data. We have developed a methodology which allows us to suppress most of the measurement noise. This allows us to estimate the typical time which is needed for the point-of-gaze to travel in x, y or z direction. We identify three temporal properties of the binocular PoG. The first is reaction time, which is the minimum time that the vision reacts to a stimulus position change, and is measured as the time between the event and the time the PoG leaves the proximity of the old stimulus position. The second is the travel time of the PoG between the old and new stimulus position. The third is the time-to-arrive, which is the time combining the reaction time, travel time, and the time required for the PoG to settle in the new position. We present the method for filtering the PoG outliers, for deriving the PoG center from binocular eye-tracking data and for calculating the gaze volume as a function of the distance between PoG and the observer. As an outcome from our experiments we present binocular heat maps aggregated over all observers who participated in a viewing test. We also show the mean values for all temporal properties separately for x, y and z direction averaged over all observers. We show the typical size of a binocular area of interest for a portable autostereoscopic display, as well as typical time the 3D vision can react to sudden changes in a 3D scene.

Deblocking of mobile stereo video

Lucio Azzari, Atanas Gotchev, Karen Egiazarian

Show abstract

Most of candidate methods for compression of mobile stereo video apply block-transform based compression based on the H-264 standard with quantization of transform coefficients driven by quantization parameter (QP). The compression ratio and the resulting bit rate are directly determined by the QP level and high compression is achieved for the price of visually noticeable blocking artifacts. Previous studies on perceived quality of mobile stereo video have revealed that blocking artifacts are the most annoying and most influential in the acceptance/rejection of mobile stereo video and can even completely cancel the 3D effect and the corresponding quality added value. In this work, we address the problem of deblocking of mobile stereo video. We modify a powerful non-local transform-domain collaborative filtering method originally developed for denoising of images and video. The method employs grouping of similar block patches residing in spatial and temporal vicinity of a reference block in filtering them collaboratively in a suitable transform domain. We study the most suitable way of finding similar patches in both channels of stereo video and suggest a hybrid four-dimensional transform to process the collected synchronized (stereo) volumes of grouped blocks. The results benefit from the additional correlation available between the left and right channel of the stereo video. Furthermore, addition sharpening is applied through an embedded alpha-rooting in transform domain, which improve the visual appearance of the deblocked frames.

Security, Safety, and Location Technologies

SUPL support for mobile devices

Jayanthi Narisetty, Arpine Soghoyan, Mohanapriya Sundaramurthy, et al.

Show abstract

Conventional Global Positioning System (GPS) receivers operate well in open-sky environments. But their performance degrades in urban canyons, indoors and underground due to multipath, foliage, dissipation, etc. To overcome such situations, several enhancements have been suggested such as Assisted GPS (A-GPS). Using this approach, orbital parameters including ephemeris and almanac along with reference time and coarse location information are provided to GPS receivers to assist in acquisition of weak signals. To test A-GPS enabled receivers high-end simulators are used, which are not affordable by many academic institutions. This paper presents an economical A-GPS supplement for inexpensive simulators which operates on application layer. Particularly proposed solution is integrated with National Instruments' (NI) GPS Simulation Toolkit and implemented using NI's Labview environment. This A-GPS support works for J2ME and Android platforms. The communication between the simulator and the receiver is in accordance with the Secure User Plane Location (SUPL) protocol encapsulated with Radio Resource Location Protocol (RRLP) applies to Global System for Mobile Communications (GSM) and Universal Mobile Telecommunications System (UMTS) cellular networks.

Measuring ionizing radiation with a mobile device

Matthias Michelsburg, Thomas Fehrenbach, Fernando Puente León

Show abstract

In cases of nuclear disasters it is desirable to know one's personal exposure to radioactivity and the related health risk. Usually, Geiger-Mueller tubes are used to assess the situation. Equipping everyone with such a device in a short period of time is very expensive. We propose a method to detect ionizing radiation using the integrated camera of a mobile consumer device, e.g., a cell phone. In emergency cases, millions of existing mobile devices could then be used to monitor the exposure of its owners. In combination with internet access and GPS, measured data can be collected by a central server to get an overview of the situation. During a measurement, the CMOS sensor of a mobile device is shielded from surrounding light by an attachment in front of the lens or an internal shutter. The high-energy radiation produces free electrons on the sensor chip resulting in an image signal. By image analysis by means of the mobile device, signal components due to incident ionizing radiation are separated from the sensor noise. With radioactive sources present significant increases in detected pixels can be seen. Furthermore, the cell phone application can make a preliminary estimate on the collected dose of an individual and the associated health risks.

Design and evaluation of security multimedia warnings for children's smartphones

Wiebke Menzel, Sven Tuchscheerer, Jana Fruth, et al.

Show abstract

This article describes primarily the development and empiric validation of a design for security warning messages on smartphones for primary school children (7-10 years old). Our design approach for security warnings for children uses a specific character and is based on recommendations of a paediatrician expert. The design criteria are adapted to children's skills, e.g. their visual, acoustic, and haptic perception and their literacy. The developed security warnings are prototypically implemented in an iOS application (on the iPhone 3G/4G) where children are warned by a simulated anti-malware background service, while they are busy with another task. For the evaluation we select methods for empiric validation of the design approach from the field of usability testing ("think aloud" test, questionnaires, log-files, etc.). Our security warnings prototype is evaluated in an empiric user study with 13 primary school children, aged between 8 and 9 years and of different gender (5 girls, 8 boys). The evaluation analysis shows, that nearly all children liked the design of our security warnings. Surprisingly, on several security warning messages most of the children react in the right way after reading the warning, although the meaning couldn't be interpreted in the right way. Another interesting result is, that several children relate specific information, e.g. update, to a specific character. Furthermore, it could be seen that most of the primary school test candidates have little awareness of security threats on smartphones. It is a very strong argument to develop e.g. tutorials or websites in order to raise awareness and teach children how to recognize security threats and how to react to them. Our design approach of security warnings for children's smartphones can be a basis for warning on other systems or applications like tutorials, which are used by children. In a second investigation, we focus on webpages, designed for children since smartphones and webpages (the services behind) are more and more interconnected. From this point of view those services should continue the securityapproaches for children's smartphones. The webservices were evaluated among different criteria, e.g. data protection. The results of a first investigation are reported in this paper.

Using wi-fi hotspots as an intrusion vector into corporate networks

Maximilian Scharsich, Friedrich L. Holl

Show abstract

The following paper describes a method of gaining access to corporate networks through users who use virtual private networks over a modified Hotspot under the control of an attacker.

Algorithms for Mobile Computing

Frame rate up-conversion assisted with camera auto exposure information

Liang Liang, Bob Hung, Gokce Dane

Show abstract

Frame rate up conversion (FRC) is the process of converting between different frame rates for targeted display formats. Besides scanning format applications for large displays, FRC can be used to increase the frame rate of video at the receiver end for video telephony, video streaming or playback applications for mobile platforms where bandwidth savings are crucial. Many algorithms have been proposed for decoder/receiver side FRC. However, most of them are from video encoding/decoding point of view. We systematically studied the strategies of utilizing the camera 3A (auto exposure, auto white balance and auto focus) information to assist FRC process, while in this paper we focus on the technique using camera exposure information to assist the decoder FRC. In the proposed strategy the exposure information as well as other camera 3A related information is packetized as the meta data which is attached to the corresponding frame and transmitted together with the main video bit stream to the decoder side for FRC assistance. The meta data contains information such as zooming, auto focus, AE (auto exposure), AWB (auto white balance) statistics, scene change detection, global motion detected from motion sensors. The proposed meta data consists of camera specific information which is different than just sending motion vectors or mode information to aid FRC process. Compared to traditional FRC approaches used in mobile platforms, the proposed approach is a low-complexity, low-power solution which is crucial in resource constrained environments such as mobile platforms.

Fused Fibonacci-like (p,q) sequences with compression and barcoding applications

Sarkis Agaian, Jose Garcia, Salahodeen Abdul-Kafi, et al.

Show abstract

A double-base number system (DBNS) has recently been introduced and investigated [1] [2] [3]. This system has been shown to have some interesting and potentially far-reaching applications in digital filtering, encryption, digital electronics, and image enhancement. In this paper we present a new concept of generating parametric number representations by fusing systems such as DBNS using multiplication and addition operations. We introduce Fibonacci like (p,q)-sequences and determine their efficiency in representing data. We develop an algorithm to test the sparsity of fused number representation systems and explore the dual relationship between sparsity and memory. We also consider the applications of these representations in data compression and barcoding. Simulation results are presented to demonstrate the performance of the new class of systems. A comparison with commonly used doublebase number systems is also presented.

White synthesis with user input for color balancing on mobile camera systems

Satyam Srivastava, Chang Xu, Edward J. Delp

Show abstract

In this paper we extend the manual white balancing technique available on most imaging devices by allowing a user to specify arbitrary colors in the scene. We derive an interpolation technique to assign weights to the arbitrary colors which are then used to estimate the RGB complements corresponding to a white target. We obtain the user input by displaying a captured image alongside a color grid of commonly occurring colors. The user specifies color pairs - patches in the scene and veridical colors on the grid. We then use these pairs to estimate the white point with our interpolation method. The estimated white point is then used to construct a diagonal transform to determine the camera output under a desired illuminant. We will present results from testing our methods on images acquired under several illumination conditions. Our approach is very suitable for mobile devices because most mobile devices are equipped with moderately sophisticated imaging systems and our method allows better color capture with relatively little user input. Further, we can realize our method on mobile devices since these devices have built-in tools for graphical user input. Our method can be useful in several photography and image analysis applications.

Detection of symmetric shapes on a mobile device with applications to automatic sign interpretation

Andrew W. Haddad, Shanshan Huange, Mireille Boutin, et al.

Show abstract

We present a light-weight method for automatically detecting shapes that have an approximate rotational symmetry (e.g., a square or equilateral triangle) on discrete-space images. Our motivation is the problem of automatically detecting and recognizing hazardous material placards on a mobile platform (e.g., a mobile telephone) equipped with a camera. The proposed method is well-suited for mobile device applications, which are characterized by limited memory, processing power and battery life. It is based on comparing the magnitude of the coefficients of the Fourier series of the centralized moments of the Radon transform of the image after segmentation. However, in our approach, the computation of the Radon transform is bypassed as we obtain these coefficients directly from the rows of the Pascal Triangle of the segmented image. The Pascal Triangle of an image is composed of complex moments arranged in a pyramidal fashion similar to the binomial coefficients. These complex moments are obtained from a coarse segmentation of the shape represented by a gray-scale image. In particular, the contours of the object do not need to be precisely defined, and the shape needs not be connected. Moreover, our approach is invariant under translation, rotation, and scaling. We tested our method on images from the MPEG-7 shape database as well as images from our own database of hazardous material placards.

Raster image adaptation for mobile devices using profiles

René Rosenbaum, Bernd Hamann

Show abstract

Focusing on digital imagery, this paper introduces a strategy to handle heterogeneous hardware in mobile environments. Constrained system resources of most mobile viewing devices require contents that are tailored to the requirements of the user and the capabilities of the device. Appropriate image adaptation is still an unsolved research question. Due to the complexity of the problem, available solutions are either too resource-intensive or inflexible to be more generally applicable. The proposed approach is based on scalable image compression and progressive refinement as well as data and user profiles. A scalable image is created once and used multiple times for different kinds of devices and user requirements. Profiles available on the server side allow for an image representation that is adapted to the most important resources in mobile computing: screen space, computing power, and the volume of the transmitted data. Options for progressively refining content thereby allow for a fluent viewing experience during adaptation. Due to its flexibility and low complexity, the proposed solution is much more general compared to related approaches. To document the advantages of our approach we provide empirical results obtained in experiments with an implementation of the method.

Interactive Paper Session

Low complexity bit-plane entropy coding for 3-D DWT-based video compression

E. Belyaev, K. Egiazarian, M. Gabbouj

Show abstract

This paper is dedicated to entropy coding for scalable video compression based on three-dimensional discrete wavelet transform (3-D DWT). A new simple bit-plane entropy coding of wavelet subband matrices is proposed. Practical results show that 3-D DWT video codec with proposed entropy coding allows to increase the encoding speed 2-3 times for the same quality level in comparison with x.264 codec which is one of the fastest software implementation of H.264/AVC standard.

Bi-directional probabilistic hypergraph matching method using Bayes theorem

Wanhyun Cho, Sunworl Kim, Sangcheol Park

Show abstract

Establishing correspondences between two hyper-graphs is a fundamental issue in computer vision, pattern recognition, and machine learning. A hyper-graph is modeled by feature set where the complex relations are represented by hyperedges. Hence, a match between two vertex sets determines a hyper-graph matching problem. We propose a new bidirectional probabilistic hyper-graph matching method using Bayesian inference principle. First, we formulate the corresponding hyper-graph matching problem as the maximization of a matching score function over all permutations of the vertexes. Second, we induce an algebraic relation between the hyper-edge weight matrixes and derive the desired vertex to vertex probabilistic matching algorithm using Bayes theorem. Third, we apply the well known convex relaxation procedure with probabilistic soft matching matrix to get a complete hard matching result. Finally, we have conducted the comparative experiments on synthetic data and real images. Experimental results show that the proposed method clearly outperforms existing algorithms especially in the presence of noise and outliers.

SeamCrop for image retargeting

Johannes Kiess, Benjamin Guthier, Stephan Kopf, et al.

Show abstract

In this paper, we present a novel approach for the adaptation of large images to small display sizes. As a recent study suggests, most viewers prefer the loss of content over the insertion of deformations in the retargeting process.1 Therefore, we combine the two image retargeting operators seam carving and cropping in order to resize an image without manipulating the important objects in an image at all. First, seams are removed carefully until a dynamic energy threshold is reached to prevent the creation of visible artifacts. Then, a cropping window is selected in the image that has the smallest possible window size without having the removed energy rise above a second dynamic threshold. As the number of removed seams and the size of the cropping window are not fix, the process is repeated iteratively until the target size is reached. Our results show that by using this method, more important content of an image can be included in the cropping window than in normal cropping. The "squeezing" of objects which might occur in approaches based on warping or scaling is also prevented.

Collecting fingerprints for recognition using mobile phone cameras

Bian Yang, Xue Li, Christoph Busch

Show abstract

We present in this paper a sample quality control approach for the case using a mobile phone's camera as a fingerprint sensor for fingerprint recognition. Our approach directly estimates the maximum ridge frequency orientation by the amplitude-frequency features of the Fast Fourier Transform and takes the frequency features' difference in two perpendicular orientations as a distinguishing feature for ridge-like patterns. Then a decision criterion which combines the frequency components' energy and ridge orientation features is used to determine if an image block should be classified as high-quality fingerprint area or not. The number of such high-quality blocks can thus be used to indicate the whole fingerprint sample's quality. Experiments show this approach's effectiveness in distinguishing the high-quality blocks from other low-quality ones or background area. Mapping the quality metric to the sample utility as derived from the the NIST minutiae extractor "mindtct" function is also given to verify the approach's quality prediction effectiveness. Keywords: Fingerprint, quality assessment, mobile phone camera

Overview of potential forensic analysis of an android smartphone

Stefan Sack, Knut Kröger, Reiner Creutzburg

Show abstract

This paper deals with the forensic examination of Android smartphones. The structure of the Android system was analyzed and a forensic guide was created. As an example this guide was used to examine a HTC Desire. The conclusion of this paper is the fact that all data stored on the smartphone can be examined. The main problem is that some of the used procedures lack forensic requirements.

Forensics of location data collected by Google android mobile devices

Knut Kröger, Reiner Creutzburg

Show abstract

This paper deals with forensic investigation of stored location data collected by Android mobile devices. The main aspects of the study are the extraction and examination of the location data and the possibilities for additional use of the extracted data.

Template-based education toolkit for mobile platforms

Santosh Chandana Golagani, Moosa Esfahanian, David Akopian

Show abstract

Nowadays mobile phones are the most widely used portable devices which evolve very fast adding new features and improving user experiences. The latest generation of hand-held devices called smartphones is equipped with superior memory, cameras and rich multimedia features, empowering people to use their mobile phones not only as a communication tool but also for entertainment purposes. With many young students showing interest in learning mobile application development one should introduce novel learning methods which may adapt to fast technology changes and introduce students to application development. Mobile phones become a common device, and engineering community incorporates phones in various solutions. Overcoming the limitations of conventional undergraduate electrical engineering (EE) education this paper explores the concept of template-based based education in mobile phone programming. The concept is based on developing small exercise templates which students can manipulate and revise for quick hands-on introduction to the application development and integration. Android platform is used as a popular open source environment for application development. The exercises relate to image processing topics typically studied by many students. The goal is to enable conventional course enhancements by incorporating in them short hands-on learning modules.

Combining associative computing and distributed arithmetic methods for efficient implementation of multiple inner products

David Guevorkian, Timo Yli-Pietilä, Petri Liuha, et al.

Show abstract

Many multimedia processing algorithms as well as communication algorithms implemented in mobile devices are based on intensive implementation of linear algebra methods, in particular, implying implementation of a large number of inner products in real time. Among most efficient approaches to perform inner products are the Associative Computing (ASC) approach and Distributed Arithmetic (DA) approach. In ASC, computations are performed on Associative Processors (ASP), where Content-Addressable memories (CAMs) are used instead of traditional processing elements to perform basic arithmetic operations. In the DA approach, computations are reduced to look-up table reads with respect to binary planes of inputs. In this work, we propose a modification of Associative processors that supports efficient implementation of the DA method. Thus, the two powerful methods are combined to further improve the efficiency of multiple inner product computation. Computational complexity analysis of the proposed method illustrates significant speed-up when computing multiple inner products as compared both to the pure ASC method and to the pure DA method as well as to other state-of the art traditional methods for inner product calculation.

Remarks on forensically interesting Microsoft XBox 360 console features

Silas Luttenberger, Knut Kröger, Reiner Creutzburg

Show abstract

This paper deals with forensically interesting features of the Microsoft Xbox 360 game console. The construction and the internal structure are analysed more precisely. One of the main aspects of the study is to analyse the used file system which was examined for forensic features. Possible difficulties that might be of importance to the forensic investigator are discussed.

Remarks on forensically interesting Sony Playstation 3 console features

Gunnar Daugs, Knut Kröger, Reiner Creutzburg

Show abstract

This paper deals with forensically interesting features of the Sony Playstation 3 game console. The construction and the internal structure are analyzed more precisely. Interesting forensic features of the operating system and the file system are presented. Differences between a PS3 with and without jailbreak are introduced and possible forensic attempts when using an installed Linux are discussed.

Multimedia Content Classification

Searching through photographic databases with QuickLook

Gianluigi Ciocca, Claudio Cusano, Raimondo Schettini, et al.

Show abstract

We present here the results obtained by including a new image descriptor, that we called prosemantic feature vector, within the framework of QuickLook² image retrieval system. By coupling the prosemantic features and the relevance feedback mechanism provided by QuickLook², the user can move in a more rapid and precise way through the feature space toward the intended goal. The prosemantic features are obtained by a two-step feature extraction process. At the first step, low level features related to image structure and color distribution are extracted from the images. At the second step, these features are used as input to a bank of classifiers, each one trained to recognize a given semantic category, to produce score vectors. We evaluated the efficacy of the prosemantic features under search tasks on a dataset provided by Fratelli Alinari Photo Archive.

Large-scale classification of traffic signs under real-world conditions

Lykele Hazelhoff, Ivo Creusen, Dennis van de Wouw, et al.

Show abstract

Traffic sign inventories are important to governmental agencies as they facilitate evaluation of traffic sign locations and are beneficial for road and sign maintenance. These inventories can be created (semi-)automatically based on street-level panoramic images. In these images, object detection is employed to detect the signs in each image, followed by a classification stage to retrieve the specific sign type. Classification of traffic signs is a complicated matter, since sign types are very similar with only minor differences within the sign, a high number of different signs is involved and multiple distortions occur, including variations in capturing conditions, occlusions, viewpoints and sign deformations. Therefore, we propose a method for robust classification of traffic signs, based on the Bag of Words approach for generic object classification. We extend the approach with a flexible, modular codebook to model the specific features of each sign type independently, in order to emphasize at the inter-sign differences instead of the parts common for all sign types. Additionally, this allows us to model and label the present false detections. Furthermore, analysis of the classification output provides the unreliable results. This classification system has been extensively tested for three different sign classes, covering 60 different sign types in total. These three data sets contain the sign detection results on street-level panoramic images, extracted from a country-wide database. The introduction of the modular codebook shows a significant improvement for all three sets, where the system is able to classify about 98% of the reliable results correctly.

Human action recognition using a Markovian conditional exponential model

Atulya Velivelli, Alexander G. Hauptmann

Show abstract

We model the sequence of human actions operating an infusion pump using a Markovian conditional exponential model. We divide each video recorded by a camera into video action units. A video action unit corresponds to the start of a unique human action operation of the infusion pump to the end of that human action operating an infusion pump. We calculate the MOSIFT features of video action units which combines the spatial and temporal dimensions from videos. We vector quantize the MOSIFT features of video action units using K means clustering as video codebook elements. We estimate the conditional exponential model parameters from a training set using maximum entropy constraint and use the video codebook elements as maximum entropy constraint features. We estimate the parameters of the Markovian conditional exponential model from a training set. This Markovian conditional exponential model has 6 states which correspond to the 6 classes of infusion pump operation. To find the optimal state sequence of the Markovian conditional exponential model we use the Viterbi algorithm. This optimal state sequence corresponds to the class label sequence. The infusion pump operation is recorded from 4 video cameras. We calculate the results of classification of 6 classes of infusion pump operation using the conditional exponential model for the 4 video cameras and also we calculate the results of of classification of 6 classes of infusion pump operation using the Markovian conditional exponential model for the 4 video cameras. The classification performance of the Markovian conditional exponential model is better than the classification performance of conditional exponential model.

Semantic Multimedia Access

Swimmer detection and pose estimation for continuous stroke-rate determination

Dan Zecha, Thomas Greif, Rainer Lienhart

Show abstract

In this work we propose a novel approach to automatically detect a swimmer and estimate his/her pose continuously in order to derive an estimate of his/her stroke rate given that we observe the swimmer from the side. We divide a swimming cycle of each stroke into several intervals. Each interval represents a pose of the stroke. We use specifically trained object detectors to detect each pose of a stroke within a video and count the number of occurrences per time unit of the most distinctive poses (so-called key poses) of a stroke to continuously infer the stroke rate. We extensively evaluate the overall performance and the influence of the selected poses for all swimming styles on a data set consisting of a variety of swimmers.

Multiview face detection based on position estimation over multicamera surveillance system

Ching-chun Huang, Jay Chou, Jia-Hou Shiu, et al.

Show abstract

In this paper, we propose a multi-view face detection system that locates head positions and indicates the direction of each face in 3-D space over a multi-camera surveillance system. To locate 3-D head positions, conventional methods relied on face detection in 2-D images and projected the face regions back to 3-D space for correspondence. However, the inevitable false face detection and rejection usually degrades the system performance. Instead, our system searches for the heads and face directions over the 3-D space using a sliding cube. Each searched 3-D cube is projected onto the 2-D camera views to determine the existence and direction of human faces. Moreover, a pre-process to estimate the locations of candidate targets is illustrated to speed-up the searching process over the 3-D space. In summary, our proposed method can efficiently fuse multi-camera information and suppress the ambiguity caused by detection errors. Our evaluation shows that the proposed approach can efficiently indicate the head position and face direction on real video sequences even under serious occlusion.

Interactive Paper Session

Keyframe generation from cartoon animation using rule-based optical flow

Pakpoom Tanapichet, Nagul Cooharojananone, Rajalida Lipikorn

Show abstract

This paper proposes a novel method to generate keyframes from cartoon animation with the aim to improve the details and accuracy of contents represented by keyframes. Consider that general techniques on video summarization usually drop some important contents due to its restriction on aspect ratio; this paper thus proposes a new method using panorama technology to add more details to be included in each keyframe. The concept is to mark the time code based on shot boundary and optical flow direction. The period of time between every two consecutive marked time codes is used to form a shot sequence which is actually a sequence of frames. The global and local optical flows are also used to determine how to select the frames and when to stitch the frames together according to the rules. The results of this proposed method are keyframes generated from various types of cartoon animation which are outstanding compared to their comic adaptations.

Front Matter: Volume 8304

Show abstract

This PDF file contains the front matter associated with SPIE Proceedings Volume 8304, including the Title Page, Copyright information, Table of Contents, Introduction, and Conference Committee listing.