Proceedings Volume 7540

Imaging and Printing in a Web 2.0 World; and Multimedia Content Access: Algorithms and Systems IV

cover
Proceedings Volume 7540

Imaging and Printing in a Web 2.0 World; and Multimedia Content Access: Algorithms and Systems IV

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 8 February 2010
Contents: 11 Sessions, 37 Papers, 0 Presentations
Conference: IS&T/SPIE Electronic Imaging 2010
Volume Number: 7540

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Web Content Creation and Analysis
  • Web Printing and Publishing I
  • Web Printing and Publishing II
  • Web Design and Content Representation
  • Online Photo Services I
  • Online Photo Services II
  • Keynote Session
  • Semantic/Multimodal Retrieval
  • Image Representation, Retrieval, and Techniques I
  • Image Representation, Retrieval, and Techniques II
  • Video Retrieval and Techniques
Web Content Creation and Analysis
icon_mobile_dropdown
Contextual advertisement placement in printed media
Sam Liu, Parag Joshi
Advertisements today provide the necessary revenue model supporting the WWW ecosystem. Targeted or contextual ad insertion plays an important role in optimizing the financial return of this model. Nearly all the current ads that appear on web sites are geared for display purposes such as banner and "pay-per-click". Little attention, however, is focused on deriving additional ad revenues when the content is repurposed for alternative mean of presentation, e.g. being printed. Although more and more content is moving to the Web, there are still many occasions where printed output of web content is desirable, such as maps and articles; thus printed ad insertion can potentially be lucrative. In this paper, we describe a contextual ad insertion network aimed to realize new revenue for print service providers for web printing. We introduce a cloud print service that enables contextual ads insertion, with respect to the main web page content, when a printout of the page is requested. To encourage service utilization, it would provide higher quality printouts than what is possible from current browser print drivers, which generally produce poor outputs, e.g. ill formatted pages. At this juncture we will limit the scope to only article-related web pages although the concept can be extended to arbitrary web pages. The key components of this system include (1) the extraction of article from web pages, (2) the extraction of semantics from article, (3) querying the ad database for matching advertisement or coupon, and (4) joint content and ad layout for print outputs.
Content-based image retrieval with ontological ranking
Images are a much more powerful medium of expression than text, as the adage says: "One picture is worth a thousand words." It is because compared with text consisting of an array of words, an image has more degrees of freedom and therefore a more complicated structure. However, the less limited structure of images presents researchers in the computer vision community a tough task of teaching machines to understand and organize images, especially when a limit number of learning examples and background knowledge are given. The advance of internet and web technology in the past decade has changed the way human gain knowledge. People, hence, can exchange knowledge with others by discussing and contributing information on the web. As a result, the web pages in the internet have become a living and growing source of information. One is therefore tempted to wonder whether machines can learn from the web knowledge base as well. Indeed, it is possible to make computer learn from the internet and provide human with more meaningful knowledge. In this work, we explore this novel possibility on image understanding applied to semantic image search. We exploit web resources to obtain links from images to keywords and a semantic ontology constituting human's general knowledge. The former maps visual content to related text in contrast to the traditional way of associating images with surrounding text; the latter provides relations between concepts for machines to understand to what extent and in what sense an image is close to the image search query. With the aid of these two tools, the resulting image search system is thus content-based and moreover, organized. The returned images are ranked and organized such that semantically similar images are grouped together and given a rank based on the semantic closeness to the input query. The novelty of the system is twofold: first, images are retrieved not only based on text cues but their actual contents as well; second, the grouping is different from pure visual similarity clustering. More specifically, the inferred concepts of each image in the group are examined in the context of a huge concept ontology to determine their true relations with what people have in mind when doing image search.
A case study on rule-based and CRF-based author extraction methods
Shengwen Yang, Yuhong Xiong
Information extraction (IE) is the task of automatically extracting structured information from unstructured documents. A typical application of IE is to process a set of documents written in a natural language and populate a database with the information extracted. This paper presents a case study on author extraction from unstructured documents. A rulebased method and a CRF-based (Conditional Random Field) method are implemented for this task. The rule-based method involves defining a set of heuristic rules and leveraging prior knowledge on author names and affiliations to identify metadata. The CRF-based method involves preparing a labeled training dataset, defining a set of feature functions, learning a CRF model, and applying the model to label new documents. We evaluate and compare the performance of the two methods through experiments, and give some useful hints for application developers on the choice of heuristics and formal methods when addressing the real-world information extraction problems.
New performance evaluation models for character detection in images
YanWei Wang, XiaoQing Ding, ChangSong Liu, et al.
Detection of characters regions is a meaningful research work for both highlighting region of interest and recognition for further information processing. A lot of researches have been performed on character localization and extraction and this leads to the great needs of performance evaluation scheme to inspect detection algorithms. In this paper, two probability models are established to accomplish evaluation tasks for different applications respectively. For highlighting region of interest, a Gaussian probability model, which simulates the property of a low-pass Gaussian filter of human vision system (HVS), was constructed to allocate different weights to different character parts. It reveals the greatest potential to describe the performance of detectors, especially, when the result detected is an incomplete character, where other methods cannot effectively work. For the recognition destination, we also introduced a weighted probability model to give an appropriate description for the contribution of detection results to final recognition results. The validity of performance evaluation models proposed in this paper are proved by experiments on web images and natural scene images. These models proposed in this paper may also be able to be applied in evaluating algorithms of locating other objects, like face detection and more wide experiments need to be done to examine the assumption.
Web Printing and Publishing I
icon_mobile_dropdown
Xerox trails: a new web-based publishing technology
Venkatesh G. Rao, David Vandervort, Jesse Silverstein
Xerox Trails is a new digital publishing model developed at the Xerox Research Center, Webster. The primary purpose of the technology is to allow Web users and publishers to collect, organize and present information in the form of a useful annotated narrative (possibly non-sequential) with editorial content and metadata, that can be consumed both online and offline. The core concept is a trail: a digital object that improves online content production, consumption and navigation user experiences. When appropriate, trails can also be easily sequenced and transformed into printable documents, thereby bridging the gap between online and offline content experiences. The model is partly inspired by Vannevar Bush's influential idea of the "Memex" [1] which has inspired several generations of Web technology [2]. Xerox Trails is a realization of selected elements from the idea of the Memex, along with several original design ideas. It is based on a primitive data construct, the trail. In Xerox Trails, the idea of a trail is used to support the architecture of a Web 2.0 product suite called Trailmeme, that includes a destination Web site, plugins for major content management systems, and a browser toolbar.
WikiPrints: rendering enterprise Wiki content for printing
Wikis have become a tool of choice for collaborative, informative communication. In contrast to the immense Wikipedia, that serves as a reference web site and typically covers only one topic per web page, enterprise wikis are often used as project management tools and contain several closely related pages authored by members of one project. In that scenario it is useful to print closely related content for review or teaching purposes. In this paper we propose a novel technique for rendering enterprise wiki content for printing called WikiPrints, that creates a linearized version of wiki content formatted as a mixture between web layout and conventional document layout suitable for printing. Compared to existing print options for wiki content, Wikiprints automatically selects content from different wiki pages given user preferences and usage scenarios. Meta data such as content authors or time of content editing are considered. A preview of the linearized content is shown to the user and an interface for making manual formatting changes provided.
Navigating web search results
Web searches for a specific topic can result in multiple document references for the topic, where information on the topic is redundantly presented across the document set. This can make it difficult for the user to locate a unique piece of information from the document set, or to comprehend the full scope of the information, without examining one document after another in the hope of discovering that new or interesting fact. Summarization techniques reduce the redundancy but often at the cost of information loss. Aggregation is difficult and may present information out of context. This paper presents a method for navigating the document set such that the facts or concepts and their redundant presentations are identified. The user can gain an overview of the concepts, and can locate where they are presented. The user can then view a desired concept as presented in the context of the document of choice. The approach also allows the user to move from concept to concept apart from the sequence of any one particular document. Navigation is accomplished via a graph structure in which redundant material is grouped into nodes. Sequential material unique to a document can also be clustered into a node for a more compact graph representation. Methods for identification of redundant content and for the construction of the navigation graph are discussed.
Cloud-based printing for mobile devices
Nina Bhatti, Eamonn O'Brien-Strain, Jerry Liu
Consumers are increasingly using their smart phones to view web pages. However, there is no native operating system support for printing these web pages. We propose to overcome two barriers to printing from mobile devices - the inability to connect and transmit to a printer and the typically poor format of printed web pages. Our system includes a client component that causes the web browser to upload the page (as a URL reference for public pages or the DOM content for private pages) to a cloud service that extracts the content and formats it for printing. We transfer the printready content to the HP CloudPrint service and leverage its ability to locate printers and transmit print jobs. We have built a working system the uses iPhones and Windows Mobile devices clients, but the system can be extended to include other clients.
Web Printing and Publishing II
icon_mobile_dropdown
DIY eBooks: collaborative publishing made easy
Steve Battle, Fabio Vitali, Angelo Di Iorio, et al.
Print is undergoing a revolution as significant as the invention of the printing press. The emergence of ePaper is a major disruption for the printing industry; defining a new medium with the potential to redefine publishing in a way that is as different to today's Web, as the Web is to traditional print. In this new eBook ecosystem we don't just see users as consumers of eBooks, but as active prosumers able to collaboratively create, customize and publish their own eBooks. We describe a transclusive, collaborative publishing framework for the web.
Using ePub as framework for the automated collection, tagging, and transformation of web content for cross-media publication
Tona Henderson, Steven Battle, Matt Bernius, et al.
In this paper, we describe the development of Page2Pub, a Firefox extension for gathering, unifying, and publishing a wide variety of web based materials.
MagCloud: magazine self-publishing for the long tail
Kok-Wei Koh, Ehud Chatow
In June of 2008, Hewlett-Packard Labs launched MagCloud, a print-on-demand web service for magazine selfpublishing. MagCloud enables anyone to publish their own magazine by simply uploading a PDF file to the site. There are no setup fees, minimum print runs, storage requirements or waste due to unsold magazines. Magazines are only printed when an order is placed, and are shipped directly to the end customer. In the course of building this web service, a number of technological challenges were encountered. In this paper, we will discuss these challenges and the methods used to overcome them. Perhaps the most important decision in enabling the successful launch of MagCloud was the choice to offer a single product. This simplified the PDF validation phase and streamlined the print fulfillment process such that orders can be printed, folded and trimmed in batches, rather than one-by-one. In a sense, MagCloud adopted the Ford Model T approach to manufacturing, where having just a single model with little or no options allows for efficiencies in the production line, enabling a lower product price and opening the market to a much larger customer base. This platform has resulted in a number of new niche publications - the long tail of publishing.
A web-based rapid assessment tool for production publishing solutions
Tong Sun
Solution assessment is a critical first-step in understanding and measuring the business process efficiency enabled by an integrated solution package. However, assessing the effectiveness of any solution is usually a very expensive and timeconsuming task which involves lots of domain knowledge, collecting and understanding the specific customer operational context, defining validation scenarios and estimating the expected performance and operational cost. This paper presents an intelligent web-based tool that can rapidly assess any given solution package for production publishing workflows via a simulation engine and create a report for various estimated performance metrics (e.g. throughput, turnaround time, resource utilization) and operational cost. By integrating the digital publishing workflow ontology and an activity based costing model with a Petri-net based workflow simulation engine, this web-based tool allows users to quickly evaluate any potential digital publishing solutions side-by-side within their desired operational contexts, and provides a low-cost and rapid assessment for organizations before committing any purchase. This tool also benefits the solution providers to shorten the sales cycles, establishing a trustworthy customer relationship and supplement the professional assessment services with a proven quantitative simulation and estimation technology.
Web Design and Content Representation
icon_mobile_dropdown
An investigation of document aesthetics for web-to-print repurposing of small-medium business marketing collateral
J. P. Allebach, Maria Ortiz Segovia, C. Brian Atkins, et al.
Businesses have traditionally relied on different types of media to communicate with existing and potential customers. With the emergence of the Web, the relation between the use of print and electronic media has continually evolved. In this paper, we investigate one possible scenario that combines the use of the Web and print. Specifically, we consider the scenario where a small- or medium-sized business (SMB) has an existing web site from which they wish to pull content to create a print piece. Our assumption is that the web site was developed by a professional designer, working in conjunction with the business owner or marketing team, and that it contains a rich assembly of content that is presented in an aesthetically pleasing manner. Our goal is to understand the process that a designer would follow to create an effective and aesthetically pleasing print piece. We are particularly interested to understand the choices made by the designer with respect to placement and size of the text and graphic elements on the page. Toward this end, we conducted an experiment in which professional designers worked with SMBs to create print pieces from their respective web pages. In this paper, we report our findings from this experiment, and examine the underlying conclusions regarding the resulting document aesthetics in the context of the existing design, and engineering and computer science literatures that address this topic
Learning from graphic designers: using grids as a scaffolding for automatic print layout
Eamonn O'Brien-Strain, Jerry Liu
We describe an approach for automatically laying out content for high quality printed formats such as magazines or brochures, producing an aesthetically pleasing layout that correctly conveys the semantic structure of the content and elicits the desired experiential affect in the reader. The semantic structure of the content includes the reading order graph, the association of illustrations with referring paragraphs, and the preservation of perceived text hierarchies. We appropriate a popular conceptual tool used by graphic designers called the grid. A well-designed grid will cause a pleasing uniformity through all the pages of a publication while still allowing flexibility in the layout of each page. In the space of different automatic layout systems, our approach is somewhere between template-based techniques and generative techniques, with the aesthetics determined by the combination of the grid and a generative algorithm One consequence of using the grid is that it greatly reduces the space of possible layouts from a high dimensional continuous space to a discrete space. Using a simple greedy algorithm, our first results are promising.
Ubiquitous picture-rich content representation
Wiley Wang, Jennifer Dean, Russ Muzzolini
The amount of digital images taken by the average consumer is consistently increasing. People enjoy the convenience of storing and sharing their pictures through online (digital) and offline (traditional) media. A set of pictures can be uploaded to: online photo services, web blogs and social network websites. Alternatively, these images can be used to generate: prints, cards, photo books or other photo products. Through uploading and sharing, images are easily transferred from one format to another. And often, a different set of associated content (text, tags) is created across formats. For example, on his web blog, a user may journal his experiences of his recent travel; on his social network website, his friends tag and comment on the pictures; in his online photo album, some pictures are titled and keyword-tagged. When the user wants to tell a complete story, perhaps in a photo book, he must collect, across all formats: the pictures, writings and comments, etc. and organize them in a book format. The user has to arrange the content of his trip in each format. The arrangement, the associations between the images, tags, keywords and text, cannot be shared with other formats. In this paper, we propose a system that allows the content to be easily created and shared across various digital media formats. We define a uniformed data association structure to connect: images, documents, comments, tags, keywords and other data. This content structure allows the user to switch representation formats without reediting. The framework under each format can emphasize (display or hide) content elements based on preference. For example, a slide show view will emphasize the display of pictures with limited text; a blog view will display highlighted images and journal text; and the photo book will try to fit in all images and text content. In this paper, we will discuss the strategy to associate pictures with text content, so that it can naturally tell a story. We will also list sample solutions on different formats such as: picture view, blog view and photo book view.
A novel XML-based document format with printing quality for web publishing
Ruiheng Qiu, Zhi Tang, Liangcai Gao, et al.
Although many XML-based document formats are available for printing or publishing on the Internet, none of them is well designed to support both high quality printing and web publishing. Therefore, we propose a novel XML-based document format for web publishing, called CEBX, in this paper. The proposed format is a fixed-layout document supporting high quality printing, which has optimized document content organization, physical structure and protection scheme to support web publishing. There are four noteworthy features of CEBX documents: (1) CEBX provides original fixed layout by graphic units for printing quality. (2) The content in CEBX document can be reflowed to fit the display device basing on the content blocks and additional fluid information. (3) XML Document Archiving model (XDA), the packaging model used in CEBX, supports document linearization and incremental edit well. (4) By introducing a segment-based content protection scheme into CEBX, some part of a document can be previewed directly while the remaining part is protected effectively such that readers only need to purchase partial content of a book that they are interested in. This will be very helpful to document distribution and support flexible business models such as try-beforebuy, on-demand reading, superdistribution, etc.
Smart Browser: a framework for bringing intelligence into the browser
Demiao Lin, Jianming Jin, Yuhong Xiong
Smart Browser is a framework that supports the easy integration of customized background services within a Web browser. The framework utilizes a set of extendable XML message schemas to communicate between the browser and the background services. Based on this framework, a set of background services are integrated into Firefox browser. These services can bring the intelligence of crowd and machine, which are usually logic complicated, data intensive and computing complex, to each end user by utilizing light weighted and daily used browser. It's obvious that applications built on this framework can improve users' experiences when surfing the Web.
Online Photo Services I
icon_mobile_dropdown
AutoPhotobook: using technology to streamline photobook creation
Xuemei Zhang, Yuli Gao, C. Brian Atkins, et al.
The design of a computer-assisted photobook authoring solution continues to be a challenging task, since consumers want four things from such an application: simplicity, quality, customizability and speed. Our AutoPhotobook solution uses technology to enable a system that preserves all four characteristics, providing high quality custom photobooks while keeping complexity and authoring time modest. We leverage both design knowledge and image understanding algorithms to automate time-consuming tasks like image selection, grouping, cropping and layout. This streamlines the initial creation phase, so the user is never stuck staring at a blank page wondering where to begin. Our composition engine then allows users to easily edit the book: adding, swapping or moving objects, exploring different page layouts and themes, and even dynamically adjusting the aspect ratio of the final book. Our technologies enable even novice users to easily create aesthetically pleasing photobooks that tell their underlying stories. AutoPhotobook provides advances over prior solutions in the following areas: automatic image selection and theme-based image grouping; dynamic page layout including text support; automatic cropping; design-preserving background artwork transformation; and a simple yet powerful user interface for personalization. In this paper, we present these technologies and illustrate how they work together to improve the photobook authoring process.
Faces from the web: automatic selection and composition of media for casual screen consumption and printed artwork
Phil Cheatle, Darryl Greig, David Slatter
Web image search engines facilitate the production of image sets in which faces appear. Many people enjoy producing and sharing media collections of this type and generating new images or video experiences. Skilled practitioners produce visually appealing artifacts from such collections but few users have the time or creative ability to do so. The problem is to automatically create an image or ambient experience which sustains interest. A full solution requires agreements with copyright holders and input from graphics designers. We address the underlying technical problems of extraction and composition. We describe an automatic system that identifies regions containing human faces in each image of an image set resulting from a web search. The face regions are composed into dynamically synthesized multilayer graphical backgrounds. The aesthetic aspects of the composition are controlled by active templates. These aspects include face size and positioning but also face identity and number of faces in a group. The output structure is multi layer supporting both the generation of static images and video consisting of transitions between the compositions.
Semi-automatic image personalization tool for variable text insertion and replacement
Image personalization is a widely used technique in personalized marketing,1 in which a vendor attempts to promote new products or retain customers by sending marketing collateral that is tailored to the customers' demographics, needs, and interests. With current solutions of which we are aware such as XMPie,2 DirectSmile,3 and AlphaPicture,4 in order to produce this tailored marketing collateral, image templates need to be created manually by graphic designers, involving complex grid manipulation and detailed geometric adjustments. As a matter of fact, the image template design is highly manual, skill-demanding and costly, and essentially the bottleneck for image personalization. We present a semi-automatic image personalization tool for designing image templates. Two scenarios are considered: text insertion and text replacement, with the text replacement option not offered in current solutions. The graphical user interface (GUI) of the tool is described in detail. Unlike current solutions, the tool renders the text in 3-D, which allows easy adjustment of the text. In particular, the tool has been implemented in Java, which introduces flexible deployment and eliminates the need for any special software or know-how on the part of the end user.
Automatic image cropping for republishing
Phil Cheatle
Image cropping is an important aspect of creating aesthetically pleasing web pages and repurposing content for different web or printed output layouts. Cropping provides both the possibility of improving the composition of the image, and also the ability to change the aspect ratio of the image to suit the layout design needs of different document or web page formats. This paper presents a method for aesthetically cropping images on the basis of their content. Underlying the approach is a novel segmentation-based saliency method which identifies some regions as "distractions", as an alternative to the conventional "foreground" and "background" classifications. Distractions are a particular problem with typical consumer photos found on social networking websites such as FaceBook, Flickr etc. Automatic cropping is achieved by identifying the main subject area of the image and then using an optimization search to expand this to form an aesthetically pleasing crop. Evaluation of aesthetic functions like auto-crop is difficult as there is no single correct solution. A further contribution of this paper is an automated evaluation method which goes some way towards handling the complexity of aesthetic assessment. This allows crop algorithms to be easily evaluated against a large test set.
Online Photo Services II
icon_mobile_dropdown
Assessing photographer competence using face statistics
Darryl Greig, Yuli Gao
The rapid growth of photo sharing websites has resulted in some new problems around the management of a large (and quickly increasing) number of photographers with different needs and usage characteristics. Despite significant advances in the field of computer vision, little has been done to leverage these technologies for photographer understanding and management, partly due to the high computational cost of extracting application-specific image features. Recently robust multi-view face detection technologies have been widely adopted by many photo sharing sites. This affords a limited but "standard" pre-computed set of face features to tackle these administrative problems in large scale settings. In this paper we present a principled statistical model to alleviate one such administrative task - the automatic analysis of photographer competency given only face detection results on a set of their photos. The model uses summary statistics to estimate the probability a given individual belongs to a population of high competence photographers over against a second population of lower competence photographers. Using this model, we have achieved high classification accuracy (respectively 84.3% and 90.9%) on two large image datasets. We discuss an application of this approach to assist in managing a photo-sharing website.
Automatic eye enhancement by sclera whitening
Changhyung Lee, Morgan T. Schramm, Mireille Boutin, et al.
We propose a method for automatically enhancing the eyes of all the faces in a digital image by whitening their scleras (i.e., the white part of their eyes). The scleras are identified by combining existing face detection and feature alignment technology with a color-based sclera probability map. We then smooth, brighten, and desaturate the scleras. This reduces the appearance of blood vessels and produces a healthier, more "refreshed" look.
Automatic digital photo-book making system
Wiley Wang, Patrick Teo, Russ Muzzolini
The diversity of photo products has grown more than ever before. A group of photos are not only printed individually, but also can be arranged in specific order to tell a story, such as in a photo book, a calendar or a poster collage. Similar to making a traditional scrapbook, digital photo book tools allow the user to choose a book style/theme, layouts of pages, backgrounds and the way the pictures are arranged. This process is often time consuming to users, given the number of images and the choices of layout/background combinations. In this paper, we developed a system to automatically generate photo books with only a few initial selections required. The system utilizes time stamps, color indices, orientations and other image properties to best fit pictures into a final photo book. The common way of telling a story is to lay the pictures out in chronological order. If the pictures are proximate in time, they will coincide with each other and are often logically related. The pictures are naturally clustered along a time line. Breaks between clusters can be used as a guide to separate pages or spreads, thus, pictures that are logically related can stay close on the same page or spread. When people are making a photo book, it is helpful to start with chronologically grouped images, but time alone wont be enough to complete the process. Each page is limited by the number of layouts available. Many aesthetic rules also apply, such as, emphasis of preferred pictures, consistency of local image density throughout the whole book, matching a background to the content of the images, and the variety of adjacent page layouts. We developed an algorithm to group images onto pages under the constraints of aesthetic rules. We also apply content analysis based on the color and blurriness of each picture, to match backgrounds and to adjust page layouts. Some of our aesthetic rules are fixed and given by designers. Other aesthetic rules are statistic models trained by using customer photo book samples. We evaluate our algorithm with test photo sets, and ask participants both quantitative and qualitative questions for feedback. We have seen the improvement on the time it takes users to produce a photo book and on the satisfaction with the overall quality.
The impact of geo-tagging on the photo industry and creating revenue streams
Rolf Richter, Henning Böge, Christoph Weckmann, et al.
Internet geo and mapping services like Google Maps, Google Earth and Microsoft Bing Maps have reinvented the use of geographical information and have reached an enormous popularity. Besides that, location technologies like GPS have become affordable and are now being integrated in many camera phones. GPS is also available for standalone cameras as add on products or integrated in cameras. These developments are the enabler for new products for the photo industry or they enhance existing products. New commercial opportunities have been identified in the areas of photo hardware, internet/software and photo finishing.
Keynote Session
icon_mobile_dropdown
Cooperative classification of shared images
We propose a method the for semi-automatic organization of photo albums. The method analyzes how different users organize their own pictures. The goal is to help the user in dividing his pictures into groups characterized by a similar semantic content. The method is semi-automatic: the user starts to assign labels to the pictures and unlabeled pictures are tagged with proposed labels. The user can accept the recommendation or made a correction. We use a suitable feature representation of the images to model the different classes that the users have collected. Then, we look for correspondences between the criteria used by the different users which are integrated using boosting. A quantitative evaluation of the proposed approach is obtained by simulating the amount of user interaction needed to annotate the albums of a set of members of the flickr R(trademark) photo-sharing community.
Semantic/Multimodal Retrieval
icon_mobile_dropdown
Semantic retrieval and automatic annotation: linear transformations, correlation, and semantic spaces
Jonathon S. Hare, Paul H. Lewis
This paper proposes a new technique for auto-annotation and semantic retrieval based upon the idea of linearly mapping an image feature space to a keyword space. The new technique is compared to several related techniques, and a number of salient points about each of the techniques are discussed and contrasted. The paper also discusses how these techniques might actually scale to a real-world retrieval problem, and demonstrates this though a case study of a semantic retrieval technique being used on a real-world data-set (with a mix of annotated and unannotated images) from a picture library.
Generic and optimized framework for multi-content analysis based on learning approaches
During the European Cantata project (ITEA project, 2006-2009), a Multi-Content Analysis framework for the classification of compound images in various categories (text, graphical user interface, medical images, other complex images) was developed within Barco. The framework consists of six parts: a dataset, a feature selection method, a machine learning based Multi-Content Analysis (MCA) algorithm, a Ground Truth, an evaluation module based on metrics and a presentation module. This methodology was built on a cascade of decision tree-based classifiers combined and trained with the AdaBoost meta-algorithm. In order to be able to train these classifiers on large training datasets without excessively increasing the training time, various optimizations were implemented. These optimizations were performed at two levels: the methodology itself (feature selection / elimination, dataset pre-computation) and the decision-tree training algorithm (binary threshold search, dataset presorting and alternate splitting algorithm). These optimizations have little or no negative impact on the classification performance of the resulting classifiers. As a result, the training time of the classifiers was significantly reduced, mainly because the optimized decision-tree training algorithm has a lower algorithmic complexity. The time saved through this optimized methodology was used to compare the results of a greater number of different training parameters.
Benchmark of multiple approaches for feature extraction and image similarity characterization
Chunlei Yang, Yuli Gao, Jianping Fan
The performance of image classification largely depends on both the discrimination power of the visual features for image content representation and the effectiveness of the kernels for diverse image similarity characterization. Different types of kernels have been developed for SVM image classifier training, and different research teams may use different types of visual features in their experiments. Thus there is an urgent need to provide benchmark work to assess the real performance of different types of visual features and kernels for various image classification tasks. In this paper, we have benchmarked multiple approaches for feature extraction and image similarity characterization, so that some useful guidelines can be provided for: (a) how to select more effective approach for feature extraction and enhance the discrimination power of various types of visual features; and (b) how to combine multiple types of visual features and their kernels to enhance the discrimination power of SVM image classifiers. Our experiments on large-scale image collections have also obtained very positive results.
Image Representation, Retrieval, and Techniques I
icon_mobile_dropdown
Three-domain image representation for personal photo album management
E. Ardizzone, M. La Cascia, M. Morana, et al.
In this paper we present a novel approach for personal photo album management. Pictures are analyzed and described in three representation spaces, namely, faces, background and time of capture. Faces are automatically detected and rectified using a probabilistic feature extraction technique. Face representation is then produced by computing PCA (Principal Component Analysis). Backgrounds are represented with low-level visual features based on RGB histogram and Gabor filter bank. Temporal data is obtained through the extraction of EXIF (Exchangeable image file format) data. Each image in the collection is then automatically organized using a mean-shift clustering technique. While many systems manage faces and typically allow queries about them we use a common approach to manage multiple aspects, that is, queries regarding people, time and background are dealt with in a homogenous way. We report experimental results on a realistic set, i.e., a personal photo album, of about 2000 images where automatic detection and rectification of faces lead to approximately 800 faces. Significance of clustering has been evaluated and results are very interesting.
Harvesting weakly tagged images for computer vision tasks
Yi Shen, Chunlei Yang, Yuli Gao, et al.
To crawl large amounts of weakly-tagged images for computer vision tasks such as object detection and scene recognition, it is very important to develop new techniques for tag cleansing and word sense disambiguation (i.e., removing irrelevant images from the crawled results). Based on this observation, a topic network is first generated to characterize both the semantic similarity contexts and the visual similarity contexts between the image topics more sufficiently. The topic network is used to represent the classes of objects and scenes of interest. Second, both the visual similarity contexts between the images and the semantic similarity contexts between their tags are integrated for tag cleansing and word sense disambiguation. By addressing the issues of polysemes and synonyms more effectively, our word sense disambiguation algorithm can determine the relevance between the images and the associated tags more precisely, and thus it can allow us to crawl large-scale weakly-tagged images for computer vision tasks.
Image Representation, Retrieval, and Techniques II
icon_mobile_dropdown
Image retrieval for identifying house plants
Hanife Kebapci, Berrin Yanikoglu, Gozde Unal
We present a content-based image retrieval system for plant identification which is intended for providing users with a simple method to locate information about their house plants. A plant image consists of a collection of overlapping leaves and possibly flowers, which makes the problem challenging. We studied the suitability of various well-known color, texture and shape features for this problem, as well as introducing some new ones. The features are extracted from the general plant region that is segmented from the background using the max-flow min-cut technique. Results on a database of 132 different plant images show promise (in about 72% of the queries, the correct plant image is retrieved among the top-15 results).
Comparative study of content-based image retrieval and video fingerprinting
Content-based image retrieval (CBIR) has been studied for nearly two decades since IBM's research on QBIC (Query by Image Content) [1]. In the past decade, another related but different area, video fingerprinting, is attracting more and more attention. There are numerous papers published in both areas and researchers from the two areas are sporadically citing results from each other. However, as far as we know, there is no comprehensive comparison between those two areas. This paper attempts to fill in this gap by explicitly discussing the wide array of differences and connections between those two areas. We believe that such a comparative study can facilitate researchers migrating or crosspollinating between the two areas.
Video Retrieval and Techniques
icon_mobile_dropdown
Robust video and audio-based synchronization of multimedia files
Benjamin A. Raichel, Peter Bajcsy
This paper addresses the problem of robust and automated synchronization of multiple audio and video signals. The input signals are from a set of independent multimedia recordings coming from several camcorders and microphones. While the camcorders are static, the microphones are mobile as they are attached to people. The motivation for synchronization of all signals is to support studies and understanding of human interaction in a decision support environment that have been limited so far due to the difficulties in automated processing of any observations during the decision making sessions. The application of our work is to environments supporting decisions. The data sets for this work have been acquired during training exercises of response teams, rescue workers, and fire fighters at multiple locations. The developed synchronization methodology for a set of independent multimedia recordings is based on introducing aural and visual landmarks with a bell and room light switches. Our approach to synchronization is based on detecting the landmarks in audio and video signals per camcorder and per microphone, and then fusing the results to increase robustness and accuracy of the synchronization. We report synchronization results that demonstrate accuracy of synchronization based on video and audio.
Query-based video event definition using rough set theory and video prototypes
Kimiaki Shirahama, Chieri Sugihara, Kuniaki Uehara
Since a user wants to retrieve a great variety of events, it is impractical to index a video archive with predefined events. So, "query-based event definition" is essential to dynamically define events from example videos provided by the user. Especially, we address how to accurately cover a large variation of low-level features in an event. Specifically, due to arbitrary camera techniques and object movements, shots of the same event contain significantly different low-level features. That is, these shots are distributed in different subsets in the space of low-level features. So, we use "rough set theory" to extract each subset where example shots can be correctly classified by a simple combination of low-level features. Based on such subsets, we can retrieve various shots of the same event. But, this retrieval only for a wide coverage is not so accurate, where many irrelevant shots are ranked at top positions. Thus, we re-rank retrieved shots by finely matching them with example shots. With respect to this, since the original representation of a low-level feature is very high-dimensional, we use "video prototypes" which mask irrelevant dimensions to the above matching. Experimental results on TRECVID 2008 video archive show the possibility of our two-step method.
Composition of SIFT features for robust image representation
Ignazio Infantino, Giovanni Spoto, Filippo Vella, et al.
In this paper we propose a novel feature based on SIFT (Scale Invariant Feature Transform) algorithm1 for the robust representation of local visual contents. SIFT features have raised much interest for their power of description of visual content characterizing punctual information against variation of luminance and change of viewpoint and they are very useful to capture local information. For a single image hundreds of keypoints are found and they are particularly suitable for tasks dealing with image registration or image matching. In this work we stretched the spatial coverage of descriptors creating a novel feature as composition of keypoints present in an image region while maintaining the invariance properties of SIFT descriptors. The number of descriptors is reduced, limiting the computational weight, and at the same time a more abstract descriptor is achieved. The new feature is therefore suitable at describing objects and characteristic image regions. We tested the retrieval performance with a dataset used to test PCA SIFT2 and image matching capability among images processed with affine transformations. Experimental results are reported.
On accuracy, privacy, and complexity in the identification problem
F. Beekhof, S. Voloshynovskiy, O. Koval, et al.
This paper presents recent advances in the identification problem taking into account the accuracy, complexity and privacy leak of different decoding algorithms. Using a model of different actors from literature, we show that it is possible to use more accurate decoding algorithms using reliability information without increasing the privacy leak relative to algorithms that only use binary information. Existing algorithms from literature have been modified to take advantage of reliability information, and we show that a proposed branch-and-bound algorithm can outperform existing work, including the enhanced variants.