Proceedings Volume 3656

Storage and Retrieval for Image and Video Databases VII

Minerva M. Yeung, Boon-Lock Yeo, Charles A. Bouman
cover
Proceedings Volume 3656

Storage and Retrieval for Image and Video Databases VII

Minerva M. Yeung, Boon-Lock Yeo, Charles A. Bouman
View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 17 December 1998
Contents: 13 Sessions, 69 Papers, 0 Presentations
Conference: Electronic Imaging '99 1999
Volume Number: 3656

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Image Retrieval Applications
  • Multimedia Management and Retrieval Systems
  • Video Retrieval
  • Image Browsing
  • Video Analysis and Classification
  • Emerging Platforms and Applications
  • Video Shot Detection and Comparative Study
  • Audio Analysis and Audiovisual Summary for Retrieval
  • Image Indexing and Representation
  • Feature Classification
  • Image Databases
  • Query Tools
  • Poster Session
  • Emerging Platforms and Applications
Image Retrieval Applications
icon_mobile_dropdown
Comparing texture feature sets for retrieving core images in petroleum applications
Chung-Sheng Li, John R. Smith, Vittorio Castelli, et al.
In this paper, the performance of similarity retrieval from a database of earth core images by using different sets of spatial and transformed-based texture features is evaluated and compared. A benchmark consisting of 69 core images from rock samples is devised for the experiments. We show that the Gabor feature set is far superior to other feature sets in terms of precision-recall for the benchmark images. This is in contrast to an earlier report by the authors in which we have observed that the spatial-based feature set outperforms the other feature sets by a wide margin for a benchmark image set consisting of satellite images when the evaluation window has to be small (32 X 32) in order to extract homogenous regions. Consequently, we conclude that optimal texture feature set for texture feature-based similarity retrieval is highly application dependent, and has to be carefully evaluated for each individual application scenario.
Image query and indexing for digital x rays
The web-based medical information retrieval system (WebMIRS) allows interned access to databases containing 17,000 digitized x-ray spine images and associated text data from National Health and Nutrition Examination Surveys (NHANES). WebMIRS allows SQL query of the text, and viewing of the returned text records and images using a standard browser. We are now working (1) to determine utility of data directly derived from the images in our databases, and (2) to investigate the feasibility of computer-assisted or automated indexing of the images to support image retrieval of images of interest to biomedical researchers in the field of osteoarthritis. To build an initial database based on image data, we are manually segmenting a subset of the vertebrae, using techniques from vertebral morphometry. From this, we will derive and add to the database vertebral features. This image-derived data will enhance the user's data access capability by enabling the creation of combined SQL/image-content queries.
Customized-queries approach to CBIR
Jennifer G. Dy, Carla E. Brodley, Avinash C. Kak, et al.
This paper introduces a new approach called the 'customized- queries' approach to content-based image retrieval (CBIR). The customized-queries approach first classifies a query using the features that best differentiate the major classes and then customizes the query to that class by using the features that best distinguish the subclasses within the chosen major class. This research is motivated by the observation that the features which are most effective in discriminating among images from different classes may not be the most effective for retrieval of visually similar images within a class. This occurs for domains in which not all pairs of images within one class have equivalent visual similarity. We apply this approach to content-based retrieval of high-resolution tomographic images of patients with lung disease and show that this approach yields 82.8 percent retrieval precision. The traditional approach that performs retrieval using a single feature vector yields only 37.9 percent retrieval precision.
Archiving and retrieval of sequential images from tomographic databases in PACS
Chi-Ren Shyu, T. Tony Cai, Lynn S. Broderick
In the picture archiving and communication systems (PACS) used in modern hospitals, the current practice is to retrieve images based on keyword search, which returns a complete set of images from the same scan. Both diagnostically useful and negligible images in the image databases are retrieved and browsed by the physicians. In addition to the text-based search query method, queries based on image contents and image examples have been developed and integrated into existing PACS systems. Most of the content-based image retrieval (CBIR) systems for medical image databases are designed to retrieve images individually. However, in a database of tomographic images, it is often diagnostically more useful to simultaneously retrieve multiple images that are closely related for various reasons, such as physiological continguousness, etc. For example, high resolution computed tomography (HRCT) images are taken in a series of cross-sectional slices of human body. Typically, several slices are relevant for making a diagnosis, requiring a PACS system that can retrieve a contiguous sequence of slices. In this paper, we present an extension to our physician-in-the-loop CBIR system, which allows our algorithms to automatically determine the number of adjoining images to retain after certain key images are identified by the physician. Only the key images, so identified by the physician, and the other adjoining images that cohere with the key images are kept on-line for fast retrieval; the rest of the images can be discarded if so desired. This results in large reduction in the amount of storage needed for fast retrieval.
Query by sketch in DARWIN: digital analysis to recognize whale images on a network
Daniel J. Wilkin, Kelly R. Debure, Zach W. Roberts
DARWIN is a computer vision system, which helps researchers identify individual bottlenose dolphins, Tursiops truncatus, by comparing digital images of the dorsal fins of newly photographed dolphins with a database of previously identified dolphin fins. In additional to dorsal fin images, textual information containing sighting data is stored for each of the previously identified dolphins. The software uses a semiautomated process to create an approximation of the fin outline. The outline is used to formulate a sketch- based query of the dolphin database. The system utilizes a variety of image processing and computer vision algorithms to perform the matching process, which is necessary to identify those previously identified fins, which most closely resemble the unknown fin. The program presents the database fin images to the researcher in rank order for comparison with the new fin image.
Multimedia Management and Retrieval Systems
icon_mobile_dropdown
Automated semantic structure reconstruction and representation generation for broadcast news
Qian Huang, Zhu Liu, Aaron Rosenberg
This paper addresses the problem of recovering the semantic structure of broadcast news. A hierarchy of retrievable units is automatically constructed by integrating information from different media. The hierarchy provides a compact, yet meaningful, abstraction of the broadcast news data, similar to a conventional table of content that can serve as an effective index table, facilitating the capability of browsing through large amounts of data in a nonlinear fashion. The recovery of the semantic structure of the data further enables the automated solutions in constructing visual representations that are relevant to the semantics as well as in establishing useful relationships among data units such as topic categorization and content based multimedia hyperlinking. Preliminary experiments of integrating different media for hierarchical segmentation of semantics have yielded encouraging results. Some of the results are presented and discussed in this paper.
Multimedia information retrieval by analyzing content and learning from examples
S. Kicha Ganapathy, Zhibin Lei, Robert J. Safranek
Multimedia information systems are experiencing a tremendous growth as a direct consequence of the popularity and pervasive use of the world wide web. As a consequence, it is becoming increasingly important to provide efficient and flexible solutions for accessing and retrieving multimedia data. Images and video are emerging as significant data types in multimedia systems. And yet, most commercial systems are still text and keyword based and do not fully exploit the image content of these systems. We believe that there is an opportunity to build a novel interactive multimedia system for some specific applications in electronic commerce. In this paper, we present an overview of our approach, the rationale behind it and the problems that are inherent in building such a system. We address some of the technical issues in representing and analyzing image primitive features. These are the building blocks of any such systems. They can be generalized into a much broader range of applications as well.
Texture content-based retrieval using text descriptions
Joseph K. P. Kuan, Dan W. Joyce, Paul H. Lewis
We developed a content-based retrieval scheme for texture by using text-based description. The texture technique is based on our previous work, which uses very simple texture primitives, such as edges and plain regions to generate features. Other methods that apply complicated statistics can be difficult to transcribe into understandable forms for normal users. Unlike these other methods, with the simplicity of our features, we can express them in terms of simple language. Hence, we can bridge the gap between semantics and computed features. A number of benefits can be achieved, which open a new horizon for content-based retrieval with texture. For example, the user can request a texture image without necessarily knowing what types of textures are stored.
Semiautomatic news analysis, indexing, and classification system based on topic preselection
In this paper, we present the concept of an efficient semiautomatic system for analysis, classification and indexing of TV news program material, and show the feasibility of its practical realization. The only input into the system, other than the news program itself, are the spoken words, serving as keys for topic prespecification. The chosen topics express user's current professional or private interests and are used for filtering the news material correspondingly. After the basic analysis steps on a news program stream, including the processes of shot change detection and key frame extraction, the system automatically represents the news program as a series of longer higher-level segments. Each of them contains one or more video shots and belongs to one of the coarse categories, such as anchorperson (news reader) shots, news shot series, the starting and ending program sequence. The segmentation procedure is performed on the video component of the news program stream and the results are used to define the corresponding segments in the news audio stream. In the next step, the system uses the prespecified audio keys to index the segments and group them into reports, being the actual retrieval units. This step is performed on the segmented news audio stream by applying the wordspotting procedure to each segment. As a result, all the reports on prespecified topics are easily reachable for efficient retrieval.
MUVIS: a system for content-based indexing and retrieval in large image databases
Faouzi Alaya Cheikh, Bogdan Cramariuc, Carole Reynaud, et al.
Until recently, collections of digital images were stored in classical databases and indexed by keywords entered by a human operator. This is not longer practical, due to the growing size of these collections. Moreover, the keywords associated with an image are either selected from a fixed set of words and thus cannot cover the content of all images; or they are the operators' personal description of each image and, therefore, are subjective. That is why systems for image indexing based on their content are needed. In this context, we propose in this paper a new system, MUVIS*, for content-based indexing and retrieval for image database management systems. MUVIS*indexes by key words, and also allows indexing of objects and images based on color, texture, shape and objects' layout inside them. Due to the use of large vector features, we adopted the pyramid trees are used for creating the index structure. The block diagram of the system is presented and the functionality of each block is explained. The features used are presented as well.
Video Retrieval
icon_mobile_dropdown
Efficient video sequence retrieval in large repositories
This paper presents algorithms to deal with problems associated with indexing high-dimensional feature vectors, which characterize video data. Indexing high-dimensional vectors is well known to be computationally expensive. Our solution is to optimally split the high dimensional vector into a few low dimensional feature vectors and querying the system for each feature vector. This involves solving an important subproblem: developing a model of retrieval which enables us to query the system efficiently. Once we formulate the retrieval problem in terms of a retrieval model, we present an optimality criterion to maximize the number of results using this model. The criterion is based on a novel idea of using the underlying probability distribution of the feature vectors. A branch-and-prune strategy optimized per each query, is developed. This uses the set of features derived from the optimality criterion. Our results show that the algorithm performs well, giving a speedup of a factor of 25 with respect to a linear search, while retaining the same level of recall.
Content-based live video retrieval by telop character recognition: TV on demand
Minoru Takahata, Hidetaka Kuwano, Shoji Kurakake, et al.
We have developed a TV-on-demand system, which provides playback of a television program after a period ranging from a few seconds to one week after broadcast, and have conducted usage trials in cooperation with a television station in Nagano Prefecture of Japan. This system has been achieved through the development of various technologies, such as automatic updating of stored television programs and contents retrieval by telop characters. Users in the trials can begin playback of a television program immediately after its broadcast has begun. The purpose of the trials was to evaluate the system's usability in applications, such as contents retrieval, selective viewing of commercials, and customer service at the television station. This paper presents applied technologies and some experimental results, and also addresses a new direction of information retrieval system based on the evaluation of the usage trials.
VORTEX: video retrieval and tracking from compressed multimedia databases: affine transformation and occlusion invariant tracking from MPEG-2 video
In this paper, we present topics related to tracking of video objects in compressed video databases, within the context of video retrieval applications. We developed a video retrieval and tracking system (VORTEX), to enable operation directly on compressed video data. The structure of the video compression standards is exploited in order to avoid the costly decompression operation. This is achieved by utilizing motion compensation - a critical prediction filter embedded in video compression standards - to eliminate and interpolate the desired method for template matching. Occlusion analysis, filtering and motion analysis are used to implement fast tracking of objects of interest on the compressed video data. Being presented with a query in the form of template images of objects, the system operates on the compressed video in order to find the images or video sequences, where those objects are present and their positions are in the image. This enables the retrieval and display of the query-relevant sequences.
Image Browsing
icon_mobile_dropdown
Active browsing using similarity pyramids
In this paper, we describe a new approach to managing large image databases, which we call active browsing. Active browsing integrates relevance feedback into the browsing environment, so that users can modify the database's organization to suit the desired task. Our method is based on a similarity pyramid data structure, which hierarchically organizes the database, so that it can be efficiently browsed. At coarse levels, the similarity pyramid allows users to view the database as large clusters of similar images. Alternatively, users can 'zoom into' finer levels to view individual images. We discuss relevance feedback for the browsing process, and argue that it is fundamentally different from relevance feedback for more traditional search-by-query tasks. We propose two fundamental operations for active browsing: pruning and reorganization. Both of these operations depend on a user-defined relevance set, which represents the image or set of images desired by the user. We present statistical methods for accurately pruning the database, and we propose a new 'worm hole' distance metric for reorganizing the database, so that members of the relevance set are grouped together.
Multilinearization data structure for image browsing
Scott A. Craver, Boon-Lock Yeo, Minerva M. Yeung
Image search has been actively studied in recent years. On the other hand, image browsing has received limited attention. Image browsing refers to the process of presenting forms of overview or summary of the image relationships, thus facilitating a user to navigate across the data set and find images of interest. In this paper, we present a new data structure, built on the multi- linearization of image attributes, for efficient organization of the data set and fast visual browsing of the images. We describe new techniques for multi-linearization based on multiple space-filling curves and hierarchical clustering techniques. In addition to providing fast navigation, our proposed data structure allows computationally efficient insertion and deletion of images from the data set. We then present a novel image navigator and browser, built on dual-linearization data structure and intuitive presentation of image relevance and relationships. We then demonstrate the image navigation process, and report results on 1000 and 22,000 image databases. We also discuss how our data structure can be extended to support fast image search.
Interfaces for emergent semantics in multimedia databases
In this paper, we introduce our approach to multimedia database interfaces. Although we deal mainly with image databases, most of the ideas we present can be generalized to other types of data. We argue that, when dealing with complex data, such as images, the problem of access must be redefined along different lines than text databases. In multimedia databases, the semantics of the data is imprecise, and depends in part on the user's interpretation. This observation made us consider the development of interfaces in which the user explores the database rather than querying it. In this paper, we give a brief justification of our position and present the exploratory interface, which we have developed for our image database El Nino.
Video Analysis and Classification
icon_mobile_dropdown
Semiautomatic dynamic video object marker creation
Candemir Toklu, Shih-Ping Liou
In this paper, we propose a method for tracking a video object in an ordered sequence of two-dimensional images, where the outcome is the trajectory of the video object throughout the time sequence of images. This method is designed to run in real-time in a synchronous video collaboration environment, and is used for producing dynamic object annotations for enhanced video content understanding. A dynamic object is one whose location or size in the video frame constantly changes, due to the camera motion, its own motion, or both. We suggest a novel method for finding the trajectory of the object in the intermediate frames given the locations and shapes of the object in two end frames. In addition to the shape and location information of the object, its texture information in the end frames is used to predict the location and its search space in the intermediate frames.
Eigen-decomposition-based analysis of video images
Chu-Yin Chang, Anthony A. Maciejewski, Venkataramanan S. Balakrishnan
We present a fast algorithm for computing the singular value decomposition (SVD) of a matrix, consisting of the frames from a video sequence. The computational efficiency of this algorithm derives from the observation that portions of a video sequence will consist of sets of correlated frames. We then show that the information obtained from the SVD can be used to analyze video sequences to obtain information such as scene breaks, scene query, reduced-order shot representation and key frame determination. We illustrate this approach on several video sequences.
Choosing efficient feature sets for video classification
Stephan Fischer, Ralf Steinmetz
In this paper, we address the problem of choosing appropriate features to describe the content of still pictures or video sequences, including audio. As the computational analysis of these features is often time- consuming, it is useful to identify a minimal set allowing for an automatic classification of some class or genre. Further, it can be shown that deleting the coherence of the features characterizing some class, is not suitable to guarantee an optimal classification result. The central question of the paper is thus, which features should be selected, and how they should be weighted to optimize a classification problem.
Similarity sequence and its application in shot organization
Xuesheng Bai, Guang-you Xu, Yuanchun Shi
Organizing video shots into hierarchy structures is very important for efficient browsing and retrieval on large video databases, and many shot organizing methods have been proposed. Most algorithms are based on automatic clustering schemes, which usually fail to give satisfactory results in real applications. In this paper, we proposed a preprocessing technology for interactive shot organizing - similarity sequence. It differs from traditional shot organizing methods in that it does not classify shots. Rather it reorders the shot sequence so that similar shots appear near each other, thus providing an effective interactive shot organizing interface, leaving the classification to the user. A measure called similarity length was introduced to evaluate the similarity between adjacent shots in a shot sequence, thus an improved genetic algorithm was developed to calculate the similarity sequence. Basic thoughts and implementation details are provided, also with experiment results on real videos and analysis.
Emerging Platforms and Applications
icon_mobile_dropdown
Digital television: a new way to deliver information
Samson Huang
Digital television (DTV) is a new way to deliver video, audio, and other data. Why should TV be converted to digital? How does DTV work? What can we do with it? This paper provides some introduction about DTV, its history, and its roll-out plan. It then compares DTV with analog TV, and describes how DTV works. It also describes why the computer industry, as well as the consumer electronics industry, are both very interested I the DTV market. Next, it describes what Intel has done on DTV, including how we build a PC- based DTV, its test evaluation results, its new applications, and Intel's DTV station DMRL. This paper also describes remaining issues, our roadmap, vision, and future directions.
Gesture for video content navigation
Gary Bradski, Boon-Lock Yeo, Minerva M. Yeung
This article describes the use of gesture recognition techniques in computer vision, as a natural interface for video content navigation, and the design of a navigation and browsing system, which caters to these natural means of computer-human interaction. For consumer applications, video content navigation presents two challenges: (1) how to parse and summarize multiple video streams in an intuitive and efficient manner, and (2) what type of interface will enhance the ease of use for video browsing, and navigation in a living room setting or an interactive environment. In this paper, we address the issues, and propose the techniques which combine video content navigation with gestures, seamlessly and intuitively, into an integrated system. The current framework can incorporate speech recognition technology. We present a new type of browser for navigating and browsing video content, as well as a gesture- recognition interface for this browser.
Parsing TV programs for identification and removal of nonstory segments
Thomas McGee, Nevenka Dimitrova
Abstracting video information automatically from TV broadcast, requires reliable methods for isolating program and commercial segments out of the full broadcast material. In this paper, we present the results from cut, static sequence, black frame, and text detection, for the purpose of isolating non-program segments. These results are evaluated, by comparison, to human visual inspection using more than 13 hours of varied program content. Using cut rate detection alone, produced a high recall with medium precision. Text detection was performed on the commercials, and the false positive segments. Adding text detection slightly lowers the recall. However, much higher precision is achieved. A new fast black frame detector algorithm is presented. Black frame detection is important for identifying commercial boundaries. Results indicate that adding detection of text, in addition to cut rate, to reduce the number of false positives, appears to be a promising method. Furthermore, by adding the information about position and size of text, and tracking it through an area, should further increase reliability.
Media content management on the DTV platform
Boon-Lock Yeo, Minerva M. Yeung
Digital TV offers many advantages over analog TV. The obvious advantages are higher picture resolution and superior quality, more programs for the same channel bandwidth, and potential for mixed video and data broadcasting. In addition, the reception of video, audio, and data in digital form, offer new opportunities for better filtering, management and organization of the content, during and after the broadcast. The platform on which these media content management operations are performed is similar to the computing platform, with which we are familiar. Hence, the goals are to process incoming bits of data and to generate output for users' reception and/or interactivity. In this paper, we present our vision and summarize our research activities in media content management on the DTV platform. In particular, we will focus on three key areas, namely, channel surfing, digital video recording, and content filtering. We will also demonstrate how media content management can offer new and better viewing experiences in the era of DTV.
Querying multiple-perspective video
Simone Santini, Amarnath Gupta, Ramesh C. Jain
This paper introduces a model of a spatio-temporal database, which we are developing to query interesting events in video sequences. The database we are designing is pushing the state-of-the-art for a number of fields, and there are many issues that are still waiting a satisfactory solution. In this paper, we present our (albeit still partial) answer to some of these problems, and the future direction of our work. Our design is divided into two layers: a logbook, which operates as a short time repository of unsummarized and unprocessed data, and a long-term spatio-temporal database, which stores and queries summarized data.
Video Shot Detection and Comparative Study
icon_mobile_dropdown
Processing of partial video data for detection of wipes
Hyeokman Kim, Sung-Joon Park, Jinho Lee, et al.
With the currently existing shot change detection algorithms, abrupt changes are detected fairly well. It is thus more challenging to detect gradual changes, including fades, dissolves, and wipes, as these are often missed or falsely detected. In this paper, we focus on the detection of wipes. The proposed algorithm begins by processing the visual rhythm, a portion of the DC image sequence. It is a single image, a sub-sampled version of a full video, in which the sampling is performed in a predetermined and systematic fashion. The visual rhythm contains distinctive patterns or visual features for many different types of video effects. The different video effects manifest themselves differently on the visual rhythm. In particular, wipes appear as curves, which run from the top to the bottom of the visual rhythm. Thus, using the visual rhythm, it becomes possible to automatically detect wipes, simply by determining various lines and curves on the visual rhythm.
Comparison of automatic shot boundary detection algorithms
Various methods of automatic shot boundary detection have been proposed and claimed to perform reliably. Detection of edits is fundamental to any kind of video analysis. It segments a video into its basic components, that is, the shots. However, only few comparative investigations on early shot boundary detection algorithms have been published. These investigations mainly concentrate on measuring the edit detection performance. However, they do not consider the algorithms' ability to classify the types, and to locate the boundaries of the edits correctly. This paper extends these comparative investigations. More recent algorithms designed explicitly to detect specific complex editing operations, such as fades and dissolves, are taken into account. In addition, their ability to classify the types and locate the boundaries of such edits are examined. The algorithms' performance is measured in terms of hit rate, number of false hits, and miss rate for hard cuts, fades, and dissolves, over a large and diverse set of video sequences. The experiments show that while hard cuts and fades can be detected reliably, dissolves are still an open research issue. The false hit rate for dissolves is usually unacceptably high, ranging from 50 percent up to more than 400 percent. Moreover, all algorithms seem to fail under roughly the same conditions.
Special-effect edit detection using VideoTrails: a comparison with existing techniques
Video segmentation plays an integral role in many multimedia applications, such as digital libraries, content management systems, and various other video browsing, indexing, and retrieval systems. Many algorithms for segmentation of video have appeared within the past few years. Most of these algorithms perform well on cuts, but yield poor performance on gradual transitions or special effects edits. A complete video segmentation system must also achieve good performance on special effect edit detection. In this paper, we discuss the performance of our Video Trails-based algorithms, with other existing special effect edit-detection algorithms within the literature. Results from experiments testing for the ability to detect edits from TV programs, ranging from commercials to news magazine programs, including diverse special effect edits, which we have introduced.
Audio Analysis and Audiovisual Summary for Retrieval
icon_mobile_dropdown
Audio-guided audiovisual data segmentation, indexing, and retrieval
While current approaches for video segmentation and indexing are mostly focused on visual information, audio signals may actually play a primary role in video content parsing. In this paper, we present an approach for automatic segmentation, indexing, and retrieval of audiovisual data, based on audio content analysis. The accompanying audio signal of audiovisual data is first segmented and classified into basic types, i.e., speech, music, environmental sound, and silence. This coarse-level segmentation and indexing step is based upon morphological and statistical analysis of several short-term features of the audio signals. Then, environmental sounds are classified into finer classes, such as applause, explosions, bird sounds, etc. This fine-level classification and indexing step is based upon time- frequency analysis of audio signals and the use of the hidden Markov model as the classifier. On top of this archiving scheme, an audiovisual data retrieval system is proposed. Experimental results show that the proposed approach has an accuracy rate higher than 90 percent for the coarse-level classification, and higher than 85 percent for the fine-level classification. Examples of audiovisual data segmentation and retrieval are also provided.
Importance of perceptive adaptation of sound features in audio content processing
The importance of perceptive modeling for calculation of sound features is well known. Use of simple perception-based adaptations of physically measured stimuli, such as the dB- scale or loudness, is a minimal requirement. Exactly how much value can be gained by more complex perceptive modeling, has not been investigated in detail. The paper examines this question for loudness measures, using well- known psychoacoustic knowledge for their calculation. Profiles of these measures are calculated on audio data of movie material, deliberately using 'natural' sound, instead of reverting to artificial sounds in the laboratory. Ultimately, the quality of a sound feature can only be judged by comparison to human estimates. Therefore, test people were asked to express their perception of loudness by continuous classification into five classes (called pp, p, mf, f, and ff). The results were used to evaluate two loudness measures: the sound pressure level, and an integral loudness measure, developed in the discussed research. The correlation of the human loudness estimates to the integral loudness measure, is about 10 percent higher than to the sound pressure level. In addition, the integral loudness results in a significantly better approximation of the curve of human loudness estimates.
Using content models to build audio-video summaries
Janne Saarela, Bernard Merialdo
The amount of digitized video in archives is becoming so huge, that easier access and content browsing tools are desperately needed. Also, video is no longer one big piece of data, but a collection of useful smaller building blocks, which can be accessed and used independently from the original context of presentation. In this paper, we demonstrate a content model for audio video sequences, with the purpose of enabling the automatic generation of video summaries. The model is based on descriptors, which indicate various properties and relations of audio and video segments. In practice, these descriptors could either be generated automatically by methods of analysis, or produced manually (or computer-assisted) by the content provider. We analyze the requirements and characteristics of the different data segments, with respect to the problem of summarization, and we define our model as a set of constraints, which allow to produce good quality summaries.
Image Indexing and Representation
icon_mobile_dropdown
Fast indexing method for multidimensional nearest-neighbor search
John A. Shepherd, Xiaoming Zhu, Nimrod Megiddo
This paper describes a snapshot of work in progress on the development of an efficient file-access method for similarity searching in high-dimensional vector spaces. This method has applications in image databases, where images are accessed via high-dimensional feature vectors, as well as other areas. The technique is based on using a collection of space-filling curves, as an auxiliary indexing structure. Initial performance analyses suggest that the method works as efficiently in moderately high-dimensional spaces (256 dimensions), with tolerable storage and execution-time overhead.
Triangle-inequality-based pruning algorithms with triangle tries
Andrew P. Berman, Linda G. Shapiro
A new class of algorithms, based on triangle inequality, has recently been proposed for use in content-based image retrieval. These algorithms rely on comparing a set of key images to the database images, and storing the computed distance distances. Query images are later compared to the keys, and the triangle inequality is used to speedily compute lower bounds on the distance from the query to each database image. This paper addresses the question of increasing performance of this algorithm, by the addition of a data structure known as the Triangle Trie.
Location hashing: an efficient indexing method for locating object queries in image databases
Tanveer F. Syeda-Mahmood
Queries referring to content embedded within images are an essential component of content-based search, browse, or summarize operations in image databases. Localization of such queries under changes in appearance, occlusions and background clutter, is a difficult problem, for which current spatial access structures in databases are not suitable. In this paper, we present a new method of indexing image databases, called location hashing, that uses a special data structure, called the location hash tree, for organizing feature information from images of a database. Location hashing is based on the principle of geometric hashing. It simultaneously determines the relevant images in the database, and the regions within them, which are most likely to contain 2D pattern query, without incurring a detailed search of either. The location hash tree being a red-black tree, allows for efficient search for candidate locations using pose-invariant feature information derived from the query.
Image descriptors based on fractal transform analysis
Stephen G. Demko, Mehdi Khosravi, Keshi Chen
The Fractal Transform (FT) was originally introduced as a methodology for compressing digital images and representing them at different scales. The process of calculating an FT, generates a great deal of information about the affine similarities and dissimilarities of an image, most of which is discarded in compression applications. In this paper, we introduce the concept of Fractal Transform Analysis, and use it to derive new image descriptors. We present results of experiments in which description schemes comprised of some of these FT-based descriptors are applied to the problems of finding objects in an image, similar to a given object, of indexing images, and of querying an image database consisting of about 17,000 images. Complexity and timing data are also presented.
Combining indexing and learning in iterative refinement
Chung-Sheng Li, Vittorio Castelli, John R. Smith, et al.
Similarity measure has been one of the critical issues for successful content-based retrieval. Simple Euclidean or quadratic forms of distance are often inadequate, as they do not correspond to perceived similarity, nor adapt to different applications. Relevance feedback and/or iterative refinement techniques, based on the user feedback, have been proposed to adjust the similarity metric or the feature space. However, this learning process potentially renders those indices for facilitating high dimensional indexing, such as R-tree useless, as those indexing techniques usually assume a predetermined similarity measure. In this paper, we propose a simultaneous learning and indexing technique, for efficient content-based retrieval of images, that can be described by feature vectors. This technique builds a compact high-dimensional index, while taking into account that the raw feature space needs to be adjusted for each new application. Consequently, much better efficiency can be achieved, as compared to those techniques which do not make provisions for efficient indexing.
Feature Classification
icon_mobile_dropdown
Model-based classification of visual information for content-based retrieval
Most existing approaches to content-based retrieval rely on query by example, or user sketch based on low-level features. However, these are not suitable for semantic (object level) distinctions. In other approaches, information is classified according to a predefined set of classes and classification is either performed manually or by using class-specific algorithms. Most of these systems lack flexibility: the user does not have the ability to define or change the classes, and new classification schemes require implementation of new class-specific algorithms and/or the input of an expert. In this paper, we present a different approach to content-based retrieval and a novel framework for classification of visual information, in which (1) users define their own visual classes and classifiers are learned automatically, and (multiple fuzzy-classifiers and machine learning techniques are combined for automatic classification at multiple levels (region, perceptual, object-part, object and scene). We present The Visual Apprentice, an implementation of our framework for still images and video that uses a combination of lazy-learning, decision trees, and evolution programs for classification and grouping. Our system is flexible, in that models can be changed by users over time, different types of classifiers are combined, and user-model definitions can be applied to object and scene structure classification. Special emphasis is placed on the difference between semantic and visual classes, and between classification and detection. Examples and results are presented to demonstrate the applicability of our approach to perform visual classification and detection.
Bayesian framework for semantic classification of outdoor vacation images
Aditya Vailaya, Mario A. T. Figueiredo, Anil K. Jain, et al.
Grouping images into (semantically) meaningful categories using low-level visual features is still a challenging and important problem in content-based image retrieval. Based on these groupings, effective indices can be built for an image database. In this paper, we cast the image classification problem in a Bayesian framework. Specifically, we consider city vs. landscape classification, and further, classification of landscape into sunset, forest, and mountain classes. We demonstrate how high-level concepts can be understood from specific low-level image features, under the constraint that the test images do belong to one of the delineated classes. We further demonstrate that a small codebook (the optimal size is selected using the MDL principle) extracted from a vector quantizer, can be used to estimate the class-conditional densities needed for the Bayesian methodology. Classification based on color histograms, color coherence vectors, edge direction histograms, and edge-direction coherence vectors as features shows promising results. On a database of 2,716 city and landscape images, our system achieved an accuracy of 95.3 percent for city vs. landscape classification. On a subset of 528 landscape images, our system achieves an accuracy of 94.9 percent for sunset vs. forest and mountain classification, and 93.6 percent for forest vs. mountain classification. Our final goal is to combine multiple 2- class classifiers into a single hierarchical classifier.
Hierarchical clustering algorithm for fast image retrieval
Santhana Krishnamachari, Mohamed Abdel-Mottaleb
Image retrieval systems, which compare the query image exhaustively with each individual image in the database, are not scalable to large databases. A scalable search system should ensure that the search time does not increase linearly with the number of images in the database. We present a clustering based indexing technique, where the images in the database are grouped into clusters of images, with similar color content using a hierarchical clustering algorithm. At search time, the query image is not compared with all the images in the database, but only with a small subset. Experiments show that this clustering-based approach offers a superior response time with high retrieval accuracy. Experiments with different database sizes indicate that for a given retrieval accuracy, the search time does not increase linearly with the database size.
Video and image clustering using relative entropy
Giridharan Iyengar, Andrew B. Lippman
In this paper, we present an approach to clustering video sequences and images for efficient retrieval using relative entropy as our cost criterion. In addition, our experiments indicate that relative entropy is a good similarity measure for content-based retrieval. In our clustering work, we treat images and video as probability density functions over the extracted features. This leads us to formulate a general algorithm for clustering densities. In this context, it can be seen that a euclidean distance between features and the Kullback-Liebler (KL) divergence, give equivalent clustering. In addition, the asymmetry of the KL divergence leads to another clustering. Our experiments indicate that this clustering is more robust to noise and distortions, compared with the one resulting from euclidean norm.
Image Databases
icon_mobile_dropdown
Integrated system for image storage, retrieval, and transmission using wavelet transform
Dan Yu, Yawen Liu, Ray Yan Mu, et al.
Currently, much work has been done in the area of image storage and retrieval. However, the overall performance has been far from practical. A highly integrated wavelet-based image management system is proposed in this paper. By integrating wavelet-based solutions for image compression and decompression, content-based retrieval and progressive transmission, much higher performance can be achieved. The multiresolution nature of the wavelet transform has been proven to be a powerful tool to represent images. The wavelet transform decomposes the image into a set of subimages with different resolutions. From here three solutions for key aspects of image management are reached. The content-based image retrieval (CBIR) features of our system include the color, contour, texture, sample, keyword and topic information of images. The first four features can be naturally extracted from the wavelet transform coefficients. By scoring the similarity of users' requests with images in the database, those who have higher scores are noted and the user receives feedback. Image compression and decompression. Assuming that details at high resolution and diagonal directions are less visible to the human eye, a good compression ratio can be achieved. In each subimage, the wavelet coefficients are vector quantized (VQ), using the LGB algorithm, which is improved in our approach to accelerate the process. Higher compression ratio can be achieved with DPCM and entropy coding method applied together. With YIQ representation, color images can also be effectively compressed. There is a very low load on the network bandwidth by transmitting compressed image data across the network. Progressive transmission is possible by employment of the multiresolution nature of the wavelet, which makes the system respond faster and the user-interface more friendly. The system shows a high overall performance by exploring the excellent features of wavelet, and integrating key aspects of image management. An image retrieval service using Java is also available on the world wide web (www), to demonstrate the system's features.
Gaussian mixture model for human skin color and its applications in image and video databases
Ming-Hsuan Yang, Narendra Ahuja
This paper is concerned with estimating a probability density function of human skin color, using a finite Gaussian mixture model, whose parameters are estimated through the EM algorithm. Hawkins' statistical test on the normality and homoscedasticity (common covariance matrix) of the estimated Gaussian mixture models is performed and McLachlan's bootstrap method is used to test the number of components in a mixture. Experimental results show that the estimated Gaussian mixture model fits skin images from a large database. Applications of the estimated density function in image and video databases are presented.
Adaptive storage and retrieval of large compressed images
John R. Smith, Vittorio Castelli, Chung-Sheng Li
Enabling the efficient storage, access and retrieval of large volumes of multidimensional data is one of the important emerging problems in databases. We present a framework for adaptively storing, accessing, and retrieving large images. The framework uses a space and frequency graph to generate and select image view elements for storing in the database. By adapting to user access patterns, the system selects and stores those view elements that yield the lowest average cost for accessing the multiresolution subregion image views. The system uses a second adaptation strategy to divide computation between server and client in progressive retrieval of image views using view elements. We show that the system speeds-up retrieval for access and retrieval modes, such as drill-down browsing and remote zooming and panning, and minimizes the amount of data transfer over the network.
Query Tools
icon_mobile_dropdown
Semantic feature extraction for interior environment understanding and retrieval
Zhibin Lei, Yufeng Liang
In this paper, we propose a novel system of semantic feature extraction and retrieval for interior design and decoration application. The system, V2ID(Virtual Visual Interior Design), uses colored texture and spatial edge layout to obtain simple information about global room environment. We address the domain-specific segmentation problem in our application and present techniques for obtaining semantic features from a room environment. We also discuss heuristics for making use of these features (color, texture, edge layout, and shape), to retrieve objects from an existing database. The final resynthesized room environment, with the original scene and objects from the database, is created for the purpose of animation and virtual walk-through.
Image indexing using composite regional color channel features
Ahmed R. Appas, Ahmed M. Darwish, Ayman I. El-Desouki, et al.
Color indexing is a technique by which images in the database could be retrieved on the basis of their color content. In this paper, we propose a new set of color features for representing color images, and show how they can be computed and used efficiently to retrieve images that possess certain similarity. These features are based on the first three moments of each color channel. Two differences distinguish this work from previous work reported in the literature. First, we compute the third moment of the color channel distribution around the second moment, not around the first moment. The second moment is less sensitive to small luminance changes, than the first moment. Secondly, we combine all three moment values in a single descriptor. This reduces the number of floating point values needed to index the image and, hence, speeds up the search. To give the user flexibility in terns of defining his center of attention during query time, the proposed approach divides the image into five geometrical regions and allows the user of give different weights for each region to designate its importance. The approach has been tested on databases of 205 images of airplanes and natural scenes. It proved to be insensitive to small rotations and small translations in the image and yielded a better hit rate than similar algorithms previously reported in the literature.
Finding regions of interest for content extraction
Eric J. Pauwels, Greet Frederix
A major problem in content based image retrieval (CBIR) is the unsupervised identification of perceptually salient regions in images. We contend that this problem can be tackled by mapping the pixels into various feature-spaces, whereupon they are subjected to a grouping algorithm. In this paper, we develop a robust and versatile non-parametric clustering algorithm that is able to handle the unbalanced and highly irregular clusters encountered in such CBIR applications. The strength of our approach lies not so much in the clustering itself, but rather in the definition and use of two cluster-validity indices that are independent of the cluster topology. By combining them, an optimal clustering can be identified, and experiments confirm that the associated clusters do, indeed, correspond to perceptually salient image regions.
Query vector projection access method
We present a new multidimensional access method for querying by similarity in databases of high-dimensional vectors. The query vector projection access method (QVPAM) addresses the shortcomings of other dimensionality reduction techniques by deriving the best transformation of the vectors at query time. QVPAM creates a projection library that contains building blocks for constructing the transformations. QVPAM rapidly searches the projection library at query time in order to select the set of projection elements that minimizes the work for processing the query. Since the selected set does not need to be complete, QVPAM effectively trades-off query precision and query response time. We describe QVPAM and demonstrate its performance in the content-based querying of a database of high-dimensional color histograms.
Augmented image histogram for image and video similarity search
Yu Chen, Edward K. Wong
Image histogram is an image feature widely used in content- based image retrieval and video segmentation. It is simple to compute, yet very effective as a feature in detecting image-to-image similarity, or frame-to-frame dissimilarity. While the image histogram captures the global distribution of different intensities or colors well, it does not contain any information about the spatial distribution of pixels. In this paper, we propose to incorporate spatial information into the image histogram, by computing features from the spatial distance between pixels, belonging to the same intensity or color. In addition to the frequency, count of the intensity or color, the mean, variance, and entropy of the distances are computed to form an augmented image histogram. Using the new feature, we performed experiments on a set of color images and a color video sequence. Experimental results demonstrate that the augmented image histogram performs significantly better than the conventional color histogram, both in the image retrieval and video shot segmentation.
Poster Session
icon_mobile_dropdown
Wide-area-distributed storage system for a multimedia database
Masahiro Ueno, Shigechika Kinoshita, Makato Kuriki, et al.
We have developed a wide-area-distribution storage system for multimedia databases, which minimizes the possibility of simultaneous failure of multiple disks in the event of a major disaster. It features a RAID system, whose member disks are spatially distributed over a wide area. Each node has a device, which includes the controller of the RAID and the controller of the member disks controlled by other nodes. The devices in the node are connected to a computer, using fiber optic cables and communicate using fiber-channel technology. Any computer at a node can utilize multiple devices connected by optical fibers as a single 'virtual disk.' The advantage of this system structure is that devices and fiber optic cables are shared by the computers. In this report, we first described our proposed system, and a prototype was used for testing. We then discussed its performance; i.e., how to read and write throughputs are affected by data-access delay, the RAID level, and queuing.
Scene change detection and feature extraction for MPEG-4 sequences
Ajay Divakaran, Hiroshi Ito, Huifang Sun, et al.
In this paper, we present a new, computationally efficient, effective technique for detection of abrupt scene changes in MPEG-4/2 compressed video sequences. We combine the dc image-based approach of Feng, Lo, and Mehrpour. The bit allocation-based approach has the advantage of computational simplicity, since it only requires entropy decoding of the sequence. Since extraction of dc images from I- Frames/Objects is simple, the dc image-based technique of Yeo is a good alternative for comparison of I- Frames/Objects. For P-Frames/Objects, however, Yeo's algorithm requires additional computation. We find that the bit allocation-change based approach is prone to false detection in comparison to intracoded objects in MPEG-4 sequences. However, if a suspected scene/object change has been located accurately in a group of consecutive frames/objects, the bit allocation-based technique quickly and accurately locates the cut point therein. This motivates us to use dc image-based detection between successive I- Frames/Objects, to identify the subsequences with scene/object changes, and then use bit allocation-based detection to find the cut point therein. Our technique thus has only a marginally greater complexity than the completely bit allocation-based technique, but has greater accuracy. It is applicable to both MPEG-2 sequences and MPEG-4 multiple- object sequences. In the MPEG-4 multiple object case, we use a weighted sum of the change in each object of the frame, using the area of the object as the weight.
Image content retrieval from image databases using feature integration by Choquet integral
Mihail Popescu, Paul D. Gader
A novel similarity measure based on the Choquet integral was introduced for retrieving images that 'mostly' fit the query image, from an image database. We show that in certain conditions, the measure is a norm, a fact that can be used to reduce the searching time, using the triangle inequality. To test the new measure, a content-based image retrieval system was built. The system was benchmarked against the visual retrieval cartridge, Virage, built into Oracle 8 database system. The results suggested that the new measure is useful for image retrieval.
DrawSearch: a tool for interactive content-based image retrieval over the Internet
Eugenio Di Sciascio, M. Mongiello
Content-based image retrieval has recently become one of the most active research areas, due to the massive increase in the amount and complexity of digitized data being stored, transmitted, and accessed. We present here a prototype implementation of DRAWSEARCH, an image retrieval by content system, which uses color and shape (and texture in the near future) features to index and retrieve images. The system, currently being tested and improved, is designed to increase interactivity with users posing queries over the Internet and avails of a Java client for query by sketch. It also implements relevance feedback to allow users to dynamically refine queries. Experiments show that the proposed approach can greatly reduce the user's effort to compose a query, while capturing the needed information with greater precision.
User interface framework for image searching
Orlie T. Brewer Jr.
This paper describes an API for image searching. The attempt was to isolate the functionality of the GUI from the functionality of the image search engine. The GUI would then make calls to the image search API, and could be used with any image search engine implementing that API. Different methods of specifying the initial search image are discussed, as well as different methods of displaying the results, including the use of 3-D, using VRML.
Refining image retrieval based on context-driven methods
Dezhong Hong, Jiankang Wu, Sumeet Sohan Singh
A prototype of the content-based image retrieval system is implemented, based on the algorithms introduced in this paper. The image contents at the high levels are extracted. The fuzzy C means classifier is employed to compute the object clusters and provide useful information for overlapped clusters. Automatic image segmentation and categorization are achieved. To obtain the context for image retrieval, the subjective and objective contexts are modeled by means of the fuzzy sets theory. The system is able to trace the users' interactions during retrieval. The refinements of the retrieval results can be made, while the users are submitting the queries telling the specific requirements.
Image retrieval system based on interactive reduction of feature space
Aki Kobayashi, Toshiyuki Yoshida, Yoshinori Sakai
This paper proposes an image retrieval system, which searches a database for images similar to a target imagined by a user. The system uses image features, rather than keywords, and retrieves images by reducing a multidimensional feature space generated by the image feature vectors. The system presents the user sample images, with a suitable feature vector value, and requires the user's interaction to obtain information regarding application of specific images. The information is then used to appropriately reduce the feature space. This process continues, until the target region is reduced to a suitable volume. Since this method requires neither a real target image nor keywords in retrieving, it is quite simple and practical. Experimental results show the advantage and efficiency of the proposed system.
Vector angular distance measure for indexing and retrieval of color
Dimitrios Androutsos, Konstantinos N. Plataniotis, Anastasios N. Venetsanopoulos
A key aspect of image retrieval using color, is the creation of robust and efficient indices. In particular, the color histogram remains the most popular index, due primarily to its simplicity. However, the color histogram has a number of drawbacks. Specifically, histograms capture only global activity, they require quantization to reduce dimensionality, are highly dependent on the chosen color space, have no means to exclude a certain color from a query, and can provide erroneous results due to gamma nonlinearity. In this paper, we present a vector angular distance measure, which is implemented as part of our database system. Our system does away with histogram techniques for color indexing and retrieval, and implements color vector techniques. We use color segmentation to extract regions of prominent color and use representative vectors from these extracted regions in the image indices. We therefore reach a much smaller index, which does not have the granularity of a histogram. Rather, similarity is based on our vector angular distance measure, between a query color vector and the indexed representative vectors.
Texture classification by a two-level hybrid scheme
Gouchol Pok, Jyh-Charn S. Liu
In this paper, we propose a novel feature extraction scheme for texture classification, in which the texture features are extracted by a two-level hybrid scheme, by integrating two statistical techniques of texture analysis. In the first step, the low level features are extracted by the Gabor filters, and they are encoded with the feature map indices, using Kohonen's SOFM algorithm. In the next step, the encoded feature images are processed by the Gabor filters, Gaussian Markov random fields (GMRF), and Grey level co- occurrence matrix (GLCM) methods to extract the high level features. By integrating two methods of texture analysis in a cascaded manner, we obtained the texture features which achieved a high accuracy for the classification of texture patterns. The proposed schemes were tested on the real microtextures, and the Gabor-GMRF scheme achieved 10 percent increase of the recognition rate, compared to the result obtained by the simple Gabor filtering.
Rotation-, translation-, and scaling-invariant color image indexing
Mehmet Celenk, Yuan Shao
This paper describes a rotation, translation, and scaling (RTS) invariant color image indexing technique for imaging database systems. The features used for image indexing are color based, which are extracted, using the principal component analysis, Hotelling transform, and moment invariants. This synthesized feature extraction technique is devised to be computationally efficient for on-line fast image storage and retrieval, using color information. Since the database indexing relies on the use of average (mean) color vector, and seven moment invariants of an image, the index storage requirement of the method is only a ten- dimensional (10-D) vector. This index storage efficiency is very desirable for many imaging database applications. A new similarity measure is also proposed, based on the Tanimoto measure of recognizing similar patterns for fast image retrieval in large database systems. The underlying similarity measure is computationally effective, since the vector inner product is the only operation needed for its computation. Four databases are used in the computer simulation of the algorithm, to demonstrate the RTS property of the image retrieval. It is determined, experimentally, that the proposed method is not affected by substantial changes in the database images, due to rotation, translation, and scaling. This is attributed to the fact that the moment features used for retrieval are not predefined set. They are, rather, obtained directly from the images submitted for recording or searching. This makes the algorithm very robust and attractive for many applications of the image storage and retrieval systems.
Delaunay triangulation for image object indexing: a novel method for shape representation
Yi Tao, William I. Grosky
Recent research on image databases has been aimed at the development of content-based retrieval techniques for the management of visual information. Compared with such visual information as color, texture, and spatial constraints, shape is an important feature. Associated with those image objects of interest, shape alone may be sufficient to identify and classify an object completely and accurately. This paper presents a novel method, based on feature point histogram indexing for object shape representation in image databases. In this scheme, the feature point histogram is obtained by discretizing the angles produced by the Delaunay triangulation of a set of unique feature points, which characterize object shape in context, and then counting the number of times each discrete angle occurs in the resulting triangulation. The proposed shape representation technique is translation, scale, and rotation independent. Our various experiments concluded that the Euclidean distance performs well as the similarity measure function, in combination with the feature point histogram computed by counting the two largest angles of each individual Delauney triangle. Through further experiments, we also found evidence that an image object representation, using a feature point histogram, provides an effective cue for image object discrimination.
Similarity-based retrieval of images using color histograms
Keshi Chen, Stephen G. Demko, Ruifeng Xie
The color histogram of an image has been widely used as a feature descriptor for the image in content-based retrieval applications. In this paper, we report some results from our investigation efforts into its usage. We outline three typical color space quantization schemes used in our experiments, and introduce the soft-decision histogramming method to eliminate the discontinuity problem in traditional color histogram population process. Then, to improve the effectiveness of color histogram-based retrieval algorithms, several similarity metrics are proposed for comparing color histograms, including three special forms of the Kantorovich metric.
Comparative study of strategies for illumination-invariant texture representations
Barbara V. Levienaise-Obadia, Josef Kittler, William J. Christmas
Illumination invariance is of paramount importance to annotate video sequences, stored in large video databases. However, popular texture analysis methods, such as multichannel filtering techniques, do not yield illumination-invariant texture representations. In this paper, we assess the effectiveness of three illumination normalization schemes for texture representations, derived from Gabor filter outputs. The schemes aim at overcoming intensity scaling effects, due to changes in illuminating conditions. A theoretical analysis and experimental results, enable us to select one scheme as the most promising. In this scheme, a normalizing factor is derived at each pixel, by combining the energy response of different filters at that pixel. The scheme overcomes illumination variations well, while still preserving discriminatory textural information. Further statistical analysis may shed light on other interesting properties or limitations of the scheme.
Quantitative comparison of shot boundary detection metrics
The detection of shot boundaries in video sequences is an important task for generating indexed video databases. This paper provides a comprehensive quantitative comparison of the metrics, which have been applied to shot boundary detection. We will additionally consider several standardized statistical tests, which have not been applied to this problem, and three new metrics. A mathematical framework for quantitatively comparing metrics is supplied. Also included, are experimental results based on a video database containing 39,000 frames.
Efficient color feature extraction in compressed video
Chee Sun Won, Dong Kwon Park, In Yup Na, et al.
In this paper, we propose a new image feature extraction method for MPEG compressed video. To minimize the MPEG decoding process, we use only DC values for Y, Cr, and Cb components for each macroblock. Then, we can obtain a feature vector, using the decoded DC values of Y, Cr, and Cb components, for all macroblocks in I frame. The feature vector consists of histograms for various colors, luminance, and edge types. In obtaining histograms for colors and luminance features, we consider the ratio of contributing pure colors and luminance to the chroma DC values for each macroblock. Then, we update all contributing colors and/or luminance histograms accordingly. Otherwise, if the macroblock is classified as an edge block, we update the corresponding edge type histogram. To demonstrate the performance of the proposed feature extraction method, we apply it to a scene change detection problem.
Video segmentation and classification for content-based storage and retrieval using motion vectors
W.A.C. Fernando, Cedric Nishan Canagarajah, David R. Bull
Video parsing is an important step in content-based indexing techniques, where the input video is decomposed into segments with uniform content. In video parsing detection of scene changes is one of the approaches widely used for extracting key frames from the video sequence. In this paper, an algorithm, based on motion vectors, is proposed to detect sudden scene changes and gradual scene changes (camera movements such as panning, tilting and zooming). Unlike some of the existing schemes, the proposed scheme is capable of detecting both sudden and gradual changes in uncompressed, as well as, compressed domain video. It is shown that the resultant motion vector can be used to identify and classify gradual changes due to camera movements. Results show that algorithm performed as well as the histogram-based schemes, with uncompressed video. The performance of the algorithm was also investigated with H.263 compressed video. The detection and classification of both sudden and gradual scene changes was successfully demonstrated.
Fast video segmentation using encoding cost data
Ricardo L. de Queiroz, Gozde Bozdagi, Taha H. Sencar
This paper presents a simple and effective pre-processing method, developed for the segmentation of MPEG compressed video sequences. The proposed method for scene-cut detection only involves computing the number of bits spent for each frame (encoding cost data), thus avoiding decoding the bitstream. The information is separated into I-, P-, B- frames, thus forming 3 vectors, which are independently processed by a new peak detection algorithm, based on overcomplete filter banks and on joint thresholding, using a confidence number. Each processed vector yields a set of candidate frame numbers, i.e., 'hints' of positions where scene-cuts may have occurred. The 'hints' for all frame types are recombined into one frame sequence and clustered into scene cuts. The algorithm was not designed to distinguish among types of cuts, but rather to indicate its position and duration. Experimental results show that the proposed algorithm is effective in detecting abrupt scene changes, as well as gradual transitions. For precision- demanding applications, the algorithm can be used with a low confidence factor, just to select the frames, which are worth being investigated by a more complex algorithm. The algorithm is not particularly tailored to MPEG and can be applied to most video compression techniques.
Fast edge map extraction from MPEG compressed video data for video parsing
For the last few years, shot boundary detection has been recognized as an important research issue on video retrieval. Also, as a preliminary step for the task, it is essential to extract salient features from videos. Recently, it has become common to perform the two tasks in the compressed domain to alleviate their computational costs. In this paper, we propose a novel shot boundary detection technique, which uses two feature images, DC and edge image, which are extracted directly from MPEG compressed video. While a DC image can be easily obtained, edge image extraction usually requires a considerable computational burden. For fast edge image extraction, we suggest us of only a few AC coefficients of each DCT block, in motion compensated P-frames, B-frames, and I-frames. This drastically reduces the computational burden, compared to edge extraction in the spatial domain. In order to further reduce the computational burden, another edge image extraction technique is also suggested on the basis of AC prediction, using DC images. By using the edge energy diagram, obtained from edge images, and histograms from DC images, shot boundaries, such as abrupt transitions, fades, and dissolves are detected automatically. Simulation results show that the proposed techniques are fast and effective.
VideoBase: a prototype of a video database managing system
Xuesheng Bai, Guang-you Xu, Yuanchun Shi
Content-based retrieval on a large multimedia database attracts the interest of many researchers. However, the database architecture needed for content-based retrieval is still a problem. Traditional relation database systems do not support high-dimension feature form content description and indexing, and is thus limited in its content-based retrieval function. Some systems do support high-dimension feature form content description and indexing, but lack descriptions and query expressions on media object content and relations. In this paper, we present our study results on query mechanism and proposed CbExpr - a powerful, flexible query expression mechanism on a media object. Based on CbExpr, we proposed GMA (general mediabase architecture) - a general architecture for management and content-based retrieval on large media databases. We also present VideoBase, a content-based video retrieval system, present as example of GMA. Basic thoughts, considerations, and definitions are presented in this paper, along with some details for implementation.
Morphological approach to scene change detection and digital video storage and retrieval
Woonkyung Michael Kim, Samuel Moon-Ho Song, Hyeokman Kim, et al.
With the abstraction of digital video, as the corresponding binary video, a process which, upon subjective experimentation seems to preserve the intelligibility of video content, we can pursue a precise and analytic approach to digital video storage and retrieval algorithm design based upon geometrical and morphological intuition. The foremost and tangible general benefit of such abstraction, however, is the immediate reduction of both data and computational complexities, involved in implementing various algorithms and databases. The general paradigm presented may be utilized to address all issues pertaining to video library construction, including visualization, optimum feedback query generation, and object recognition. However, the primary focus of attention in this paper pertains to detection of fast and gradual scene changes, such as dissolves, fades, and various special effects, such as wipes. Upon simulation, we observed that we can achieve performances comparable to those of others with drastic reductions in both storage and computational complexities. The conversion from grayscale to binary videos can be performed directly (with minimal additional computation) in the compressed domain by thresholding on the DCT DC coefficients themselves, or by using the contour information attached to MPEG4 formats. The algorithms presented herein are ideally suited for performing fast (on-the-fly) determinations of scene change, object recognition, and/or tracking, as well as other, more intelligent, tasks, traditionally requiring heavy demand of computational and/or storage complexities. The fast determinations may then be used on their own merit , or can be used in conjunction/complement with other higher-layer information in the future.
Emerging Platforms and Applications
icon_mobile_dropdown
Video and image databases: who cares?
In this paper, I will not discuss the research frontiers of image and video data bases, but who will be the users of these systems. Questions which have not been adequately addressed by the research community are, who are the users, and what do they really want these systems to do? The purpose of this paper is to be controversial and to engage a debate within the research community, as to where the real applications of our work lie. It should be noted that the author does not agree with every point made in this paper.