Internet Imaging VII

Requirements for benchmarking personal image retrieval systems

Jean-Yves Bouguet, Carole Dulong, Igor Kozintsev, et al.

Show abstract

It is now common to have accumulated tens of thousands of personal ictures. Efficient access to that many pictures can only be done with a robust image retrieval system. This application is of high interest to Intel processor architects. It is highly compute intensive, and could motivate end users to upgrade their personal computers to the next generations of processors. A key question is how to assess the robustness of a personal image retrieval system. Personal image databases are very different from digital libraries that have been used by many Content Based Image Retrieval Systems.¹ For example a personal image database has a lot of pictures of people, but a small set of different people typically family, relatives, and friends. Pictures are taken in a limited set of places like home, work, school, and vacation destination. The most frequent queries are searched for people, and for places. These attributes, and many others affect how a personal image retrieval system should be benchmarked, and benchmarks need to be different from existing ones based on art images, or medical images for examples. The attributes of the data set do not change the list of components needed for the benchmarking of such systems as specified in²: - data sets - query tasks - ground truth - evaluation measures - benchmarking events. This paper proposed a way to build these components to be representative of personal image databases, and of the corresponding usage models.

On usage models of content-based image search, filtering, and annotation

David Telleen-Lawton, Edward Y. Chang, Kwang-Ting Cheng, et al.

Show abstract

VIMA has experienced an increasing demand for Content-based Image Retrieval (CBIR) systems since late 2004. In this paper, we report the search, filtering, and annotation systems that we have developed and deployed, and the user models of these systems. The objective of this paper is to provide to the researchers and developers in the area of image retrieval, guidelines for measuring the performance of their algorithms/systems, in a way that is consonant with the requirements of the users. We also enumerate technical challenges of building CBIR systems, and outline our solutions to tackle these challenges.

Human factors in automatic image retrieval system design and evaluation

Alejandro Jaimes

Show abstract

Image retrieval is a human-centered task: images are created by people and are ultimately accessed and used by people for human-related activities. In designing image retrieval systems and algorithms, or measuring their performance, it is therefore imperative to consider the conditions that surround both the indexing of image content and the retrieval. This includes examining the different levels of interpretation for retrieval, possible search strategies, and image uses. Furthermore, we must consider different levels of similarity and the role of human factors such as culture, memory, and personal context. This paper takes a human-centered perspective in outlining levels of description, types of users, search strategies, image uses, and human factors that affect the construction and evaluation of automatic content-based retrieval systems, such as human memory, context, and subjectivity.

Using heterogeneous annotation and visual information for the benchmarking of image retrieval system

Henning Müller, Paul Clough, William Hersh, et al.

Show abstract

Many image retrieval systems, and the evaluation methodologies of these systems, make use of either visual or textual information only. Only few combine textual and visual features for retrieval and evaluation. If text is used, it is often relies upon having a standardised and complete annotation schema for the entire collection. This, in combination with high-level semantic queries, makes visual/textual combinations almost useless as the information need can often be solved using just textual features. In reality, many collections do have some form of annotation but this is often heterogeneous and incomplete. Web-based image repositories such as FlickR even allow collective, as well as multilingual annotation of multimedia objects. This article describes an image retrieval evaluation campaign called ImageCLEF. Unlike previous evaluations, we offer a range of realistic tasks and image collections in which combining text and visual features is likely to obtain the best results. In particular, we offer a medical retrieval task which models exactly the situation of heterogenous annotation by combining four collections with annotations of varying quality, structure, extent and language. Two collections have an annotation per case and not per image, which is normal in the medical domain, making it difficult to relate parts of the accompanying text to corresponding images. This is also typical of image retrieval from the web in which adjacent text does not always describe an image. The ImageCLEF benchmark shows the need for realistic and standardised datasets, search tasks and ground truths for visual information retrieval evaluation.

On benchmarking content-based image retrieval applications

Yuanyuan Zuo, Jinhui Yuan, Dayong Ding, et al.

Show abstract

Constructing a benchmark for content-based image retrieval (CBIR) applications is an important task because researchers in this area highly depend on experiments to compare different systems. Image collection, concept annotation and performance evaluation are the three main issues that should be considered carefully. Based on our previous work and experiments on both Corel image collection and TRECVID dataset, we present some basic principles of constructing a benchmark for CBIR applications. According to our experience in the collaborative annotation of TRECVID 2005 data, we propose a hierarchical concept annotation strategy to produce ground truth for the CBIR benchmark image collection. To address the conflicts among collaborative annotations from multiple annotators, we present a fuzzy annotation method, in which a membership function is defined to indicate the probability that an image contains a given concept. Evaluation criteria corresponding to the fuzzy annotation method are also presented so as to give a more reasonable evaluation of performance for different CBIR applications.

TRECVID: the utility of a content-based video retrieval evaluation

Alexander G. Hauptmann

Show abstract

TRECVID, an annual retrieval evaluation benchmark organized by NIST, encourages research in information retrieval from digital video. TRECVID benchmarking covers both interactive and manual searching by end users, as well as the benchmarking of some supporting technologies including shot boundary detection, extraction of semantic features, and the automatic segmentation of TV news broadcasts. Evaluations done in the context of the TRECVID benchmarks show that generally, speech transcripts and annotations provide the single most important clue for successful retrieval. However, automatically finding the individual images is still a tremendous and unsolved challenge. The evaluations repeatedly found that none of the multimedia analysis and retrieval techniques provide a significant benefit over retrieval using only textual information such as from automatic speech recognition transcripts or closed captions. In interactive systems, we do find significant differences among the top systems, indicating that interfaces can make a huge difference for effective video/image search. For interactive tasks efficient interfaces require few key clicks, but display large numbers of images for visual inspection by the user. The text search finds the right context region in the video in general, but to select specific relevant images we need good interfaces to easily browse the storyboard pictures. In general, TRECVID has motivated the video retrieval community to be honest about what we don't know how to do well (sometimes through painful failures), and has focused us to work on the actual task of video retrieval, as opposed to flashy demos based on technological capabilities.

A color selection tool for the readability of textual information on web pages

Silvia Zuffi, Giordano Beretta, Carla Brambilla

Show abstract

One of the issues in Web page design is the selection of appropriate combinations of background and foreground colors to display textual information. Colors have to be selected in order to guarantee legibility for different devices, viewing conditions and, more important, for all the users, including those with deficient color vision. In this paper we present a tool to select background and foreground colors for the display of textual information. The tool is based on the Munsell Book of Colors; it allows the browsing of the atlas and indicates plausible colors based on a set of legibility rules, which have been defined experimentally.

A color interface for audio clustering visualization

Silvia Zuffi, Isabella Gagliardi

Show abstract

The availability of large audio collections calls for ways to efficiently access and explore them by providing an effective overview of their contents at the interface level. In this paper we present an innovative strategy exploiting color to visualize the content of a database of audio records, part of a website dedicated to ethnographic information in a region of Italy.

Interactive Internet delivery of scientific visualization via structured prerendered imagery

Jerry Chen, E. Wes Bethel, Ilmi Yoon

Show abstract

In this paper, we explore leveraging industry-standard media formats to effectively deliver interactive, 3D scientific visualization to a remote viewer. Our work is motivated by the need for remote visualization of time-varying, 3D data produced by scientific simulations or experiments while taking several practical factors into account, including: maximizing ease of use from the user's perspective, maximizing reuse of image frames, and taking advantage of existing software infrastructure wherever possible. Visualization or graphics applications first generate images at some number of view orientations for 3D scenes and temporal locations for time-varying scenes. We then encode the resulting imagery into one of two industry-standard formats: QuickTime VR Object Movies or a combination of HTML and JavaScript code implementing the client-side navigator. Using an industry-standard QuickTime player or web browser, remote users may freely navigate through the pre-rendered images of time-varying, 3D visualization output. Since the only inputs consist of image data, a viewpoint and time stamps, our approach is generally applicable to all visualization and graphics rendering applications capable of generating image files in an ordered fashion. Our design is a form of latency-tolerant remote visualization infrastructure where processing time for visualization, rendering and content delivery is effectively decoupled from interactive exploration. Our approach trades off increased interactivity, reduced load and effective reuse of coherent frames between multiple users (from the server's perspective) at the expense of unconstrained exploration. This paper presents the system architecture along with an analysis and discussion of its strengths and limitations.

Clustering and semantically filtering web images to create a large-scale image ontology

S. Zinger, C. Millet, B. Mathieu, et al.

Show abstract

In our effort to contribute to the closing of the "semantic gap" between images and their semantic description, we are building a large-scale ontology of images of objects. This visual catalog will contain a large number of images of objects, structured in a hierarchical catalog, allowing image processing researchers to derive signatures for wide classes of objects. We are building this ontology using images found on the web. We describe in this article our initial approach for finding coherent sets of object images. We first perform two semantic filtering steps: the first involves deciding which words correspond to objects and using these words to access databases which index text found associated with an image (e.g. Google Image search) to find a set of candidate images; the second semantic filtering step involves using face recognition technology to remove images of people from the candidate set (we have found that often requests for objects return images of people). After these two steps, we have a cleaner set of candidate images for each object. We then index and cluster the remaining images using our system VIKA (VIsual KAtaloguer) to find coherent sets of objects.

Ontology and image semantics in multimodal imaging: submission and retrieval

Yun Bei, Mounia Belmamoune, Fons J. Verbeek

Show abstract

In scientific communities images play a dominant role to convey a message but more important, as a tool for experimental output. The meaning of these images develops from annotation that is provided by the researchers. Annotation can be accomplished in a number of ways. In this paper we describe graphical and textual annotations that are developed from ontologies. The Internet has provided the research community a medium for exchange of images. Images are, however, not straightforwardly suitable for exchange. Knowledge about what is depicted in the image as well as specific image content is important for image understanding. This holds in particular for scientific images that are the result of experimentation. For the purpose of image exchange, that is query-based search, image retrieval mechanisms based on pixel content as well as semantics are developed. In the field of experimental imaging new paradigms will have to be developed so that a search query results in correct image collections.

Integrating colour models for more robust feature detection

F. Aldershoff, Th. Gevers, H. Stokman

Show abstract

The choice of a colour space is of great importance for many computer vision algorithms (e.g. edge detection and object recognition). It induces the equivalence classes to the actual algorithms. Since there are many colour spaces available, the problem is how to automatically select the weighting to integrate the colour spaces in order to produce the best result for a particular task. In this paper we propose a method to learn these weights, while exploiting the non-perfect correlation between colour spaces of features through the principle of diversification. As a result an optimal trade-off is achieved between repeatability and distinctiveness. The resulting weighting scheme will ensure maximal feature discrimination. The method is experimentally verified for three feature detection tasks: Skin colour detection, edge detection and corner detection. In all three tasks the method achieved an optimal trade-off between (colour) invariance (repeatability) and discriminative power (distinctiveness).

Using context and similarity for face and location identification

Marc Davis, Michael Smith, Fred Stentiford, et al.

Show abstract

This paper describes a new approach to the automatic detection of human faces and places depicted in photographs taken on cameraphones. Cameraphones offer a unique opportunity to pursue new approaches to media analysis and management: namely to combine the analysis of automatically gathered contextual metadata with media content analysis to fundamentally improve image content recognition and retrieval. Current approaches to content-based image analysis are not sufficient to enable retrieval of cameraphone photos by high-level semantic concepts, such as who is in the photo or what the photo is actually depicting. In this paper, new methods for determining image similarity are combined with analysis of automatically acquired contextual metadata to substantially improve the performance of face and place recognition algorithms. For faces, we apply Sparse-Factor Analysis (SFA) to both the automatically captured contextual metadata and the results of PCA (Principal Components Analysis) of the photo content to achieve a 60% face recognition accuracy of people depicted in our database of photos, which is 40% better than media analysis alone. For location, grouping visually similar photos using a model of Cognitive Visual Attention (CVA) in conjunction with contextual metadata analysis yields a significant improvement over color histogram and CVA methods alone. We achieve an improvement in location retrieval precision from 30% precision for color histogram and CVA image analysis, to 55% precision using contextual metadata alone, to 67% precision achieved by combining contextual metadata with CVA image analysis. The combination of context and content analysis produces results that can indicate the faces and places depicted in cameraphone photos significantly better than image analysis or context analysis alone. We believe these results indicate the possibilities of a new context-aware paradigm for image analysis.

Skin segmentation using multiple thresholding

Francesca Gasparini, Raimondo Schettini

Show abstract

The segmentation of skin regions in color images is a preliminary step in several applications. Many different methods for discriminating between skin and non-skin pixels are available in the literature. The simplest, and often applied, methods build what is called an "explicit skin cluster" classifier which expressly defines the boundaries of the skin cluster in certain color spaces. These binary methods are very popular as they are easy to implement and do not require a training phase. The main difficulty in achieving high skin recognition rates, and producing the smallest possible number of false positive pixels, is that of defining accurate cluster boundaries through simple, often heuristically chosen, decision rules. In this study we apply a genetic algorithm to determine the boundaries of the skin clusters in multiple color spaces. To quantify the performance of these skin detection methods, we use recall and precision scores. A good classifier should provide both high recall and high precision, but generally, as recall increases, precision decreases. Consequently, we adopt a weighted mean of precision and recall as the fitness function of the genetic algorithm. Keeping in mind that different applications may have sharply different requirements, the weighting coefficients can be chosen to favor either high recall or high precision, or to satisfy a reasonable tradeoff between the two, depending on application demands. To train the genetic algorithm (GA) and test the performance of the classifiers applying the GA suggested boundaries, we use the large and heterogeneous Compaq skin database.

Integration of multimedia content and e-learning resources in a digital library

Mireia Pascual, Núria Ferran, Julià Minguillón

Show abstract

In this paper we describe a proposal for multimedia and e-learning content description based on standards interoperability within a digital library environment integrated in a virtual campus. In any virtual e-learning environment, a complex scenario which usually includes a digital library or, at least, a repository of learning resources, different levels of description are needed for all the elements: learning resources, multimedia content, activities, roles, etc. These elements can be described using library, e-learning and multimedia standards, depending on the specific needs of each particular scenario of use, but this might lead to an undesirable duplication of metadata, and to inefficient content queries and maintenance. Furthermore, there is a lack of semantic descriptions which makes all these contents merely become digital objects in the digital library, without exploiting all the possibilities in a e-learning virtual environment. Due to its flexibility and completeness, we propose to use the MPEG-7 standard for describing all the learning resources in the digital library, combined with the use of an ontology for a formal description of the learning process. The equivalences of Dublin Core, LOM and MPEG-7 standards are outlined, and the requirements of a proposal for a MPEG-7 based representation for all the contents in the digital library and the virtual classroom are described. The intellectual property policies for content sharing both within and among organizations are also addressed. With such proposal, it would be possible to build complex multimedia courses from a repository of learning objects using the digital library as the core repository.

Selecting the kernel type for a web-based adaptive image retrieval systems (AIRS)

Anca Doloc-Mihu, Vijay V. Raghavan

Show abstract

The goal of this paper is to investigate the selection of the kernel for a Web-based AIRS. Using the Kernel Perceptron learning method, several kernels having polynomial and Gaussian Radial Basis Function (RBF) like forms (6 polynomials and 6 RBFs) are applied to general images represented by color histograms in RGB and HSV color spaces. Experimental results on these collections show that performance varies significantly between different kernel types and that choosing an appropriate kernel is important.

Benchmarking without ground truth

Simone Santini

Show abstract

Many evaluation techniques for content based image retrieval are based on the availability of a ground truth, that is on a "correct" categorization of images so that, say, if the query image is of category A, only the returned images in category A will be considered as "hits." Based on such a ground truth, standard information retrieval measures such as precision and recall and given and used to evaluate and compare retrieval algorithms. Coherently, the assemblers of benchmarking data bases go to a certain length to have their images categorized. The assumption of the existence of a ground truth is, in many respect, naive. It is well known that the categorization of the images depends on the a priori (from the point of view of such categorization) subdivision of the semantic field in which the images are placed (a trivial observation: a plant subdivision for a botanist is very different from that for a layperson). Even within a given semantic field, however, categorization by human subjects is subject to uncertainty, and it makes little statistical sense to consider the categorization given by one person as the unassailable ground truth. In this paper I propose two evaluation techniques that apply to the case in which the ground truth is subject to uncertainty. In this case, obviously, measures such as precision and recall as well will be subject to uncertainty. The paper will explore the relation between the uncertainty in the ground truth and that in the most commonly used evaluation measures, so that the measurements done on a given system can preserve statistical significance.

Medical validation and CBIR of spine x-ray images over the Internet

Sameer Antani, Jing Cheng, Jonathan Long, et al.

Show abstract

As found in the literature, most Internet-based prototype Content-Based Image Retrieval (CBIR) systems focus on stock photo collections and do not address challenges of large specialized image collections and topics such as medical information retrieval by image content. Even fewer have medically validated data to evaluate retrieval quality in terms of precision and relevance. To date, our research has reported over 75% relevant spine X-ray image retrieval tested on 888 validated vertebral shapes from 207 images using our prototype CBIR system operating within our local network. As a next step, we have designed and developed an Internet-based medical validation tool and a CBIR retrieval tool in MATLAB and JAVA that can remotely connect to our database. The retrieval tool supports hybrid text and image queries and also provides partial shape annotation for pathology-specific querying. These tools are initially developed for domain experts, such as radiologists and educators, to identify design issues for improved workflow. This article describes the tools and design considerations in their development.

The integration of cartographic information into a content management system

Mario Mango Furnari, Carmine Noviello

Show abstract

A corporate information system needs to be as accessible as library content, which implies to organize the content in a logical structure, categorizing it, and using the categories to add metadata to the information. Content Management System (CMS) are an emerging kind software component that manages content, usually making a large use of the web technologies, whose main goals are to allow easy creation, publishing and retrieval of content to fit business needs. The focus of this paper is to describe how we integrated "map" metaphor into a CMS. Where maps are symbolic information and rely on the use of a graphic sign language. A characteristic feature of maps is that their design has traditionally been constrained by the need to create one model of reality for a variety of purposes. The map's primary role as a communication medium involves the application of processes such as selection, classification, displacement, symbolization and graphic exaggeration. A model of the infrastructure is presented and the current prototype of the model is briefly discussed together the currently deployed environment for the cultural heritage information dissemination.

Enhanced video display and navigation for networked streaming video and networked video playlists

Sachin Deshpande

Show abstract

In this paper we present an automatic enhanced video display and navigation capability for networked streaming video and networked video playlists. Our proposed method uses Synchronized Multimedia Integration Language (SMIL) as presentation language and Real Time Streaming Protocol (RTSP) as network remote control protocol to automatically generate a "enhanced video strip" display for easy navigation. We propose and describe two approaches - a smart client approach and a smart server approach. We also describe a prototype system implementation of our proposed approach.

3D display technique for moving pictures from web cameras using screen pixel access

T. Hasegawa, T. Namiki, H. Unno, et al.

Show abstract

This paper presents a technique to display real-time 3-D images captured by web cameras on the stereoscopic display of a personal computer (PC) using screen pixel access. Images captured by two side-by-side web cameras are sent through the Internet to a PC and displayed in two conventional viewers for moving images. These processes are carried out independently for the two cameras. The image data displayed in the viewer are in the video memory of the PC. Our method uses this video-memory data, i.e., the two web-camera images are read from the video memory, they are composed as a 3-D image, and then it is written in the video memory again. A 3-D image can be seen if the PC being used has a 3-D display. We developed an experimental system to evaluate the feasibility of this technique. The web cameras captured up to 640 × 480 pixels of an image, compressed it with motion JPEG, and then sent it over a LAN. Using an experimental system, we evaluated that the 3-D image had almost the same quality as a conventional TV image by using a broadband network like ADSL.

Dynamic conversion between XML-based languages for vector graphics

Angelo Di Iorio, Fabio Vitali, Gianluca Zonta

Show abstract

Vector graphics is increasingly gaining importance within the Word Wide Web community, because it allows users to create images that are easily manageable, modifiable and understandable. Two formats play a leading role among the languages for vector graphics: SVG and VML. Achieving a complete interoperability between these two languages means providing users a complete support for vector images across implementations, operating systems and media. In this paper we describe VectorConverter, a tool that allows easy, automatic and reasonably good conversion between two vector graphic formats, SVG and VML, and one raster format, GIF. This tool makes good translations between languages with very different functionalities and expressivity, by applying translation rules, approximation and heuristics. A high-level discussion about implementation details, open issues and future developments of VectorConverter is provided as well.

Bezier curves approximation of triangularized surfaces using SVG

G. Messina, E. Ingrà, S. Battiato, et al.

Show abstract

This paper presents a technique to convert surfaces, obtained through a Data Dependent Triangulation, in Bezier Curves by using a Scalable Vector Graphics File format. The method starts from a Data Dependent Triangulation, traces a map of the boundaries present into the triangulation, using the characteristics of the triangles, then the estimated barycenters are connected, and a final conversion of the resulting polylines in curves is performed. After the curves have been estimated and closed the final representation is obtained by sorting the surfaces in a decreasing order. The proposed techniques have been compared with other raster to vector conversions in terms of perceptual quality.

Subjective trajectory characterization: acquisition, matching, and retrieval

Michael Yonghua Zhang, Luke Olsen, Jeffrey E. Boyd

Show abstract

We describe a system that automatically tracks moving objects in a scene and subjectively characterizes the object trajectories for storage and retrieval. A multi-target color-histogram particle filter combined with besthypothesis data association is the foundation of our trajectory acquisition algorithm. To improve computational performance, we use quasi-Monte-Carlo methods to reduce the number of particles required by each filter. The tracking system operates in real-time to produce a stream of XML documents that contain the object trajectories. To characterize trajectories subjectively, we form a set of shape templates that describes basic maneuvers (e.g., gentle turn right, hard turn left, straight line). Procrustes shape analysis provides a scaleand rotation-invariant mechanism to identify occurrences of these maneuvers within a trajectory. To add spatial information to our trajectory representation, we partition the two-dimensional space under surveillance into a set of mutually exclusive regions. A temporal sequence of region-to-region transitions gives a spatial representation of the trajectory. The shape and position descriptions combine to form a compact, high-level representation of a trajectory. We provide similarity measures for the shape, position, and combined shape and position representations.

Archiving of meaningful scenes for personal TV terminals

Sung Ho Jin, Jun Ho Cho, Yong Man Ro, et al.

Show abstract

In this paper, we propose an archiving method of broadcasts for TV terminals including a set-top box (STB) and a personal video recorder (PVR). Our goal is to effectively cluster and retrieve semantic video scenes obtained by realtime broadcasting content filtering for re-use or transmission. For TV terminals, we generate new video archiving formats which combine broadcasting media resources with the related metadata and auxiliary media data. In addition, we implement an archiving system to decode and retrieve the media resource and the metadata within the format. The experiment shows that the proposed format makes it possible to retrieve or browse media data or metadata in the TV terminal effectively, and could have compatibility with a portable device.

AVIR: a spoken document retrieval system in e-learning environment

Isabella Gagliardi, Marco Padula, Patrizia Pagliarulo, et al.

Show abstract

In this paper we present AVIR (Audio & Video Information Retrieval), a project of CNR (Italian National Research Council) - ITC to develop a tools to support an information system for distance e-learning. AVIR has been designed to store, index, and classify audio and video lessons to make them available to students and other interested users. The core of AVIR is a SDR (Spoken Document Retrieval) system which automatically transcribes the spoken documents into texts and indexes them through dictionaries appropriately created. During the fruition on-line, the user can formulate his queries searching documents by date, professor, title of the lesson or selecting one or more specific words. The results are presented to the users: in case of video lessons the preview of the first frames is shown. Moreover, slides of the lessons and associate papers can be retrieved.

Internet-based remote counseling to support stress management: preventing interruptions to regular exercise in elderly people

Sayuri Hashimoto, Tsunestugu Munakata, Nobuyuki Hashimoto, et al.

Show abstract

Our research showed that a high degree of life-stress has a negative mental health effect that may interrupt regular exercise. We used an internet based, remotely conducted, face to face, preventive counseling program using video monitors to reduce the source of life-stresses that interrupts regular exercise and evaluated the preventative effects of the program in elderly people. NTSC Video signals were converted to the IP protocol and facial images were transmitted to a PC display using the exclusive optical network lines of JGN2. Participants were 22 elderly people in Hokkaido, Japan, who regularly played table tennis. A survey was conducted before the intervention in August 2003. IT remote counseling was conducted on two occasions for one hour on each occasion. A post intervention survey was conducted in February 2004 and a follow-up survey was conducted in March 2005. Network quality was satisfactory with little data loss and high display quality. Results indicated that self-esteem increased significantly, trait anxiety decreased significantly, cognition of emotional support by people other than family members had a tendency to increase, and source of stress had a tendency to decrease after the intervention. Follow-up results indicated that cognition of emotional support by family increased significantly, and interpersonal dependency decreased significantly compared to before the intervention. These results suggest that face to face IT remote counseling using video monitors is useful to keep elderly people from feeling anxious and to make them confident to continue exercising regularly. Moreover, it has a stress management effect.

Vertex and face permutation order compression for efficient animation support

Eun-Young Chang, Daiyong Kim, Byeongwook Min, et al.

Show abstract

In MPEG-4, 3D mesh coding (3DMC) achieves 40:1 to 50:1 compression ratio over 3-D meshes (in VRML IndexedFaceSet representation) without noticeable visual degradation. This substantial gain comes not for free: it changes the vertex and face permutation order of the original 3-D mesh model. This vertex and face permutation order change may cause a serious problem for animation, editing operation, and special effects, where the original permutation order information is critical not only to the mesh representation, but also to the related tools. To fix this problem, we need to transmit the vertex and face permutation order information additionally. This additional transmission causes the unexpected increase of the bitstream size. In this paper, we proposed a novel vertex and face permutation order compression algorithm to address the vertex and face permutation order change by the 3DMC encoding with the minimal increase of side information. Our proposed vertex and face permutation order coding method is based on the adaptive probability model, which makes allocating one fewer bits codeword to each vertex and face permutation order in every distinguishable unit as encoding proceeds. Additionally to the adaptive probability model, we further increased the coding efficiency of the proposed method by representing and encoding each vertex and face permutation order per connected component (CC). Simulation results demonstrated that the proposed algorithm can encode the vertex and face permutation order losslessly while making up to 12% bit-saving compared with the logarithmic representation based on the fixed probability model.

FaceLab: a tool for performance evaluation of face recognition strategies

Luca Caflisch, Alessandro Colombo, Claudio Cusano, et al.

Show abstract

This paper presents FaceLab, an innovative, open environment created to evaluate the performance of face recognition strategies. It simplifies, through an easy-to-use graphical interface, the basic steps involved in testing procedures such as data organization and preprocessing, definition and management of training and test sets, definition and execution of recognition strategies and automatic computation of performance measures. The user can extend the environment to include new algorithms, allowing the definition of innovative recognition strategies. The performance of these strategies can be automatically evaluated and compared by the tool, which computes several performance measures for both identity verification and identification scenarios.

Internet Imaging VII

Volume Details

Table of Contents

Table of Contents