Show all abstracts
View Session
- Web Content Creation and Analysis
- Web Printing and Publishing I
- Web Printing and Publishing II
- Web Design and Content Representation
- Online Photo Services I
- Online Photo Services II
- Keynote Session
- Semantic/Multimodal Retrieval
- Image Representation, Retrieval, and Techniques I
- Image Representation, Retrieval, and Techniques II
- Video Retrieval and Techniques
Web Content Creation and Analysis
Contextual advertisement placement in printed media
Show abstract
Advertisements today provide the necessary revenue model supporting the WWW ecosystem. Targeted or contextual ad
insertion plays an important role in optimizing the financial return of this model. Nearly all the current ads that appear on
web sites are geared for display purposes such as banner and "pay-per-click". Little attention, however, is focused on
deriving additional ad revenues when the content is repurposed for alternative mean of presentation, e.g. being printed.
Although more and more content is moving to the Web, there are still many occasions where printed output of web
content is desirable, such as maps and articles; thus printed ad insertion can potentially be lucrative. In this paper, we
describe a contextual ad insertion network aimed to realize new revenue for print service providers for web printing. We
introduce a cloud print service that enables contextual ads insertion, with respect to the main web page content, when a
printout of the page is requested. To encourage service utilization, it would provide higher quality printouts than what is
possible from current browser print drivers, which generally produce poor outputs, e.g. ill formatted pages. At this
juncture we will limit the scope to only article-related web pages although the concept can be extended to arbitrary web
pages. The key components of this system include (1) the extraction of article from web pages, (2) the extraction of
semantics from article, (3) querying the ad database for matching advertisement or coupon, and (4) joint content and ad
layout for print outputs.
Content-based image retrieval with ontological ranking
Show abstract
Images are a much more powerful medium of expression than text, as the adage says: "One picture is worth a
thousand words." It is because compared with text consisting of an array of words, an image has more degrees
of freedom and therefore a more complicated structure. However, the less limited structure of images presents
researchers in the computer vision community a tough task of teaching machines to understand and organize
images, especially when a limit number of learning examples and background knowledge are given.
The advance of internet and web technology in the past decade has changed the way human gain knowledge.
People, hence, can exchange knowledge with others by discussing and contributing information on the web. As
a result, the web pages in the internet have become a living and growing source of information. One is therefore
tempted to wonder whether machines can learn from the web knowledge base as well. Indeed, it is possible to
make computer learn from the internet and provide human with more meaningful knowledge.
In this work, we explore this novel possibility on image understanding applied to semantic image search. We
exploit web resources to obtain links from images to keywords and a semantic ontology constituting human's
general knowledge. The former maps visual content to related text in contrast to the traditional way of associating
images with surrounding text; the latter provides relations between concepts for machines to understand to what
extent and in what sense an image is close to the image search query.
With the aid of these two tools, the resulting image search system is thus content-based and moreover,
organized. The returned images are ranked and organized such that semantically similar images are grouped
together and given a rank based on the semantic closeness to the input query. The novelty of the system is
twofold: first, images are retrieved not only based on text cues but their actual contents as well; second, the
grouping is different from pure visual similarity clustering. More specifically, the inferred concepts of each image
in the group are examined in the context of a huge concept ontology to determine their true relations with what
people have in mind when doing image search.
A case study on rule-based and CRF-based author extraction methods
Shengwen Yang,
Yuhong Xiong
Show abstract
Information extraction (IE) is the task of automatically extracting structured information from unstructured documents.
A typical application of IE is to process a set of documents written in a natural language and populate a database with
the information extracted. This paper presents a case study on author extraction from unstructured documents. A rulebased
method and a CRF-based (Conditional Random Field) method are implemented for this task. The rule-based
method involves defining a set of heuristic rules and leveraging prior knowledge on author names and affiliations to
identify metadata. The CRF-based method involves preparing a labeled training dataset, defining a set of feature
functions, learning a CRF model, and applying the model to label new documents. We evaluate and compare the
performance of the two methods through experiments, and give some useful hints for application developers on the
choice of heuristics and formal methods when addressing the real-world information extraction problems.
New performance evaluation models for character detection in images
Show abstract
Detection of characters regions is a meaningful research work for both highlighting region of interest and recognition for
further information processing. A lot of researches have been performed on character localization and extraction and this
leads to the great needs of performance evaluation scheme to inspect detection algorithms. In this paper, two probability
models are established to accomplish evaluation tasks for different applications respectively. For highlighting region of
interest, a Gaussian probability model, which simulates the property of a low-pass Gaussian filter of human vision
system (HVS), was constructed to allocate different weights to different character parts. It reveals the greatest potential
to describe the performance of detectors, especially, when the result detected is an incomplete character, where other
methods cannot effectively work. For the recognition destination, we also introduced a weighted probability model to
give an appropriate description for the contribution of detection results to final recognition results. The validity of
performance evaluation models proposed in this paper are proved by experiments on web images and natural scene
images. These models proposed in this paper may also be able to be applied in evaluating algorithms of locating other
objects, like face detection and more wide experiments need to be done to examine the assumption.
Web Printing and Publishing I
Xerox trails: a new web-based publishing technology
Venkatesh G. Rao,
David Vandervort,
Jesse Silverstein
Show abstract
Xerox Trails is a new digital publishing model developed at the Xerox Research Center, Webster. The primary purpose
of the technology is to allow Web users and publishers to collect, organize and present information in the form of a
useful annotated narrative (possibly non-sequential) with editorial content and metadata, that can be consumed both
online and offline. The core concept is a trail: a digital object that improves online content production, consumption and
navigation user experiences. When appropriate, trails can also be easily sequenced and transformed into printable
documents, thereby bridging the gap between online and offline content experiences.
The model is partly inspired by Vannevar Bush's influential idea of the "Memex" [1] which has inspired several
generations of Web technology [2]. Xerox Trails is a realization of selected elements from the idea of the Memex,
along with several original design ideas. It is based on a primitive data construct, the trail. In Xerox Trails, the idea of a
trail is used to support the architecture of a Web 2.0 product suite called Trailmeme, that includes a destination Web site,
plugins for major content management systems, and a browser toolbar.
WikiPrints: rendering enterprise Wiki content for printing
Show abstract
Wikis have become a tool of choice for collaborative, informative communication. In contrast to the immense
Wikipedia, that serves as a reference web site and typically covers only one topic per web page, enterprise wikis are
often used as project management tools and contain several closely related pages authored by members of one project. In
that scenario it is useful to print closely related content for review or teaching purposes. In this paper we propose a novel
technique for rendering enterprise wiki content for printing called WikiPrints, that creates a linearized version of wiki
content formatted as a mixture between web layout and conventional document layout suitable for printing. Compared to
existing print options for wiki content, Wikiprints automatically selects content from different wiki pages given user
preferences and usage scenarios. Meta data such as content authors or time of content editing are considered. A preview
of the linearized content is shown to the user and an interface for making manual formatting changes provided.
Navigating web search results
Show abstract
Web searches for a specific topic can result in multiple document references for the topic, where information on the
topic is redundantly presented across the document set. This can make it difficult for the user to locate a unique piece of
information from the document set, or to comprehend the full scope of the information, without examining one
document after another in the hope of discovering that new or interesting fact. Summarization techniques reduce the
redundancy but often at the cost of information loss. Aggregation is difficult and may present information out of
context. This paper presents a method for navigating the document set such that the facts or concepts and their
redundant presentations are identified. The user can gain an overview of the concepts, and can locate where they are
presented. The user can then view a desired concept as presented in the context of the document of choice. The
approach also allows the user to move from concept to concept apart from the sequence of any one particular document.
Navigation is accomplished via a graph structure in which redundant material is grouped into nodes. Sequential material
unique to a document can also be clustered into a node for a more compact graph representation. Methods for
identification of redundant content and for the construction of the navigation graph are discussed.
Cloud-based printing for mobile devices
Show abstract
Consumers are increasingly using their smart phones to view web pages. However, there is no native operating system
support for printing these web pages. We propose to overcome two barriers to printing from mobile devices - the
inability to connect and transmit to a printer and the typically poor format of printed web pages. Our system includes a
client component that causes the web browser to upload the page (as a URL reference for public pages or the DOM
content for private pages) to a cloud service that extracts the content and formats it for printing. We transfer the printready
content to the HP CloudPrint service and leverage its ability to locate printers and transmit print jobs. We have
built a working system the uses iPhones and Windows Mobile devices clients, but the system can be extended to include
other clients.
Web Printing and Publishing II
DIY eBooks: collaborative publishing made easy
Show abstract
Print is undergoing a revolution as significant as the invention of the printing press. The emergence of ePaper
is a major disruption for the printing industry; defining a new medium with the potential to redefine publishing
in a way that is as different to today's Web, as the Web is to traditional print. In this new eBook ecosystem we
don't just see users as consumers of eBooks, but as active prosumers able to collaboratively create, customize
and publish their own eBooks. We describe a transclusive, collaborative publishing framework for the web.
Using ePub as framework for the automated collection, tagging, and transformation of web content for cross-media publication
Show abstract
In this paper, we describe the development of Page2Pub, a Firefox extension for gathering, unifying,
and publishing a wide variety of web based materials.
MagCloud: magazine self-publishing for the long tail
Kok-Wei Koh,
Ehud Chatow
Show abstract
In June of 2008, Hewlett-Packard Labs launched MagCloud, a print-on-demand web service for magazine selfpublishing.
MagCloud enables anyone to publish their own magazine by simply uploading a PDF file to the site. There
are no setup fees, minimum print runs, storage requirements or waste due to unsold magazines. Magazines are only
printed when an order is placed, and are shipped directly to the end customer. In the course of building this web service,
a number of technological challenges were encountered. In this paper, we will discuss these challenges and the methods
used to overcome them. Perhaps the most important decision in enabling the successful launch of MagCloud was the
choice to offer a single product. This simplified the PDF validation phase and streamlined the print fulfillment process
such that orders can be printed, folded and trimmed in batches, rather than one-by-one. In a sense, MagCloud adopted
the Ford Model T approach to manufacturing, where having just a single model with little or no options allows for
efficiencies in the production line, enabling a lower product price and opening the market to a much larger customer
base. This platform has resulted in a number of new niche publications - the long tail of publishing.
A web-based rapid assessment tool for production publishing solutions
Tong Sun
Show abstract
Solution assessment is a critical first-step in understanding and measuring the business process efficiency enabled by an
integrated solution package. However, assessing the effectiveness of any solution is usually a very expensive and timeconsuming
task which involves lots of domain knowledge, collecting and understanding the specific customer
operational context, defining validation scenarios and estimating the expected performance and operational cost. This
paper presents an intelligent web-based tool that can rapidly assess any given solution package for production publishing
workflows via a simulation engine and create a report for various estimated performance metrics (e.g. throughput,
turnaround time, resource utilization) and operational cost. By integrating the digital publishing workflow ontology and
an activity based costing model with a Petri-net based workflow simulation engine, this web-based tool allows users to
quickly evaluate any potential digital publishing solutions side-by-side within their desired operational contexts, and
provides a low-cost and rapid assessment for organizations before committing any purchase. This tool also benefits the
solution providers to shorten the sales cycles, establishing a trustworthy customer relationship and supplement the
professional assessment services with a proven quantitative simulation and estimation technology.
Web Design and Content Representation
An investigation of document aesthetics for web-to-print repurposing of small-medium business marketing collateral
Show abstract
Businesses have traditionally relied on different types of media to communicate with existing and potential customers.
With the emergence of the Web, the relation between the use of print and electronic media has continually evolved. In
this paper, we investigate one possible scenario that combines the use of the Web and print. Specifically, we consider the
scenario where a small- or medium-sized business (SMB) has an existing web site from which they wish to pull content
to create a print piece. Our assumption is that the web site was developed by a professional designer, working in
conjunction with the business owner or marketing team, and that it contains a rich assembly of content that is presented
in an aesthetically pleasing manner. Our goal is to understand the process that a designer would follow to create an
effective and aesthetically pleasing print piece. We are particularly interested to understand the choices made by the
designer with respect to placement and size of the text and graphic elements on the page. Toward this end, we conducted
an experiment in which professional designers worked with SMBs to create print pieces from their respective web pages.
In this paper, we report our findings from this experiment, and examine the underlying conclusions regarding the
resulting document aesthetics in the context of the existing design, and engineering and computer science literatures that
address this topic
Learning from graphic designers: using grids as a scaffolding for automatic print layout
Show abstract
We describe an approach for automatically laying out content for high quality printed formats such as magazines or
brochures, producing an aesthetically pleasing layout that correctly conveys the semantic structure of the content and
elicits the desired experiential affect in the reader. The semantic structure of the content includes the reading order graph,
the association of illustrations with referring paragraphs, and the preservation of perceived text hierarchies.
We appropriate a popular conceptual tool used by graphic designers called the grid. A well-designed grid will cause a
pleasing uniformity through all the pages of a publication while still allowing flexibility in the layout of each page.
In the space of different automatic layout systems, our approach is somewhere between template-based techniques and
generative techniques, with the aesthetics determined by the combination of the grid and a generative algorithm
One consequence of using the grid is that it greatly reduces the space of possible layouts from a high dimensional
continuous space to a discrete space. Using a simple greedy algorithm, our first results are promising.
Ubiquitous picture-rich content representation
Wiley Wang,
Jennifer Dean,
Russ Muzzolini
Show abstract
The amount of digital images taken by the average consumer is consistently increasing. People enjoy the convenience
of storing and sharing their pictures through online (digital) and offline (traditional) media. A set of
pictures can be uploaded to: online photo services, web blogs and social network websites. Alternatively, these
images can be used to generate: prints, cards, photo books or other photo products. Through uploading and
sharing, images are easily transferred from one format to another. And often, a different set of associated content
(text, tags) is created across formats. For example, on his web blog, a user may journal his experiences of his
recent travel; on his social network website, his friends tag and comment on the pictures; in his online photo
album, some pictures are titled and keyword-tagged. When the user wants to tell a complete story, perhaps in a
photo book, he must collect, across all formats: the pictures, writings and comments, etc. and organize them in
a book format. The user has to arrange the content of his trip in each format. The arrangement, the associations
between the images, tags, keywords and text, cannot be shared with other formats. In this paper, we propose a
system that allows the content to be easily created and shared across various digital media formats. We define a
uniformed data association structure to connect: images, documents, comments, tags, keywords and other data.
This content structure allows the user to switch representation formats without reediting. The framework under
each format can emphasize (display or hide) content elements based on preference. For example, a slide show
view will emphasize the display of pictures with limited text; a blog view will display highlighted images and
journal text; and the photo book will try to fit in all images and text content. In this paper, we will discuss the
strategy to associate pictures with text content, so that it can naturally tell a story. We will also list sample
solutions on different formats such as: picture view, blog view and photo book view.
A novel XML-based document format with printing quality for web publishing
Show abstract
Although many XML-based document formats are available for printing or publishing on the Internet, none of them is
well designed to support both high quality printing and web publishing. Therefore, we propose a novel XML-based
document format for web publishing, called CEBX, in this paper. The proposed format is a fixed-layout document
supporting high quality printing, which has optimized document content organization, physical structure and protection
scheme to support web publishing. There are four noteworthy features of CEBX documents: (1) CEBX provides original
fixed layout by graphic units for printing quality. (2) The content in CEBX document can be reflowed to fit the display
device basing on the content blocks and additional fluid information. (3) XML Document Archiving model (XDA), the
packaging model used in CEBX, supports document linearization and incremental edit well. (4) By introducing a
segment-based content protection scheme into CEBX, some part of a document can be previewed directly while the
remaining part is protected effectively such that readers only need to purchase partial content of a book that they are
interested in. This will be very helpful to document distribution and support flexible business models such as try-beforebuy,
on-demand reading, superdistribution, etc.
Smart Browser: a framework for bringing intelligence into the browser
Show abstract
Smart Browser is a framework that supports the easy integration of customized background services within a Web
browser. The framework utilizes a set of extendable XML message schemas to communicate between the browser and
the background services. Based on this framework, a set of background services are integrated into Firefox browser.
These services can bring the intelligence of crowd and machine, which are usually logic complicated, data intensive and
computing complex, to each end user by utilizing light weighted and daily used browser. It's obvious that applications
built on this framework can improve users' experiences when surfing the Web.
Online Photo Services I
AutoPhotobook: using technology to streamline photobook creation
Show abstract
The design of a computer-assisted photobook authoring solution continues to be a challenging task, since consumers
want four things from such an application: simplicity, quality, customizability and speed. Our AutoPhotobook solution
uses technology to enable a system that preserves all four characteristics, providing high quality custom photobooks
while keeping complexity and authoring time modest. We leverage both design knowledge and image understanding
algorithms to automate time-consuming tasks like image selection, grouping, cropping and layout. This streamlines the
initial creation phase, so the user is never stuck staring at a blank page wondering where to begin. Our composition
engine then allows users to easily edit the book: adding, swapping or moving objects, exploring different page layouts
and themes, and even dynamically adjusting the aspect ratio of the final book. Our technologies enable even novice
users to easily create aesthetically pleasing photobooks that tell their underlying stories. AutoPhotobook provides
advances over prior solutions in the following areas: automatic image selection and theme-based image grouping;
dynamic page layout including text support; automatic cropping; design-preserving background artwork transformation;
and a simple yet powerful user interface for personalization. In this paper, we present these technologies and illustrate
how they work together to improve the photobook authoring process.
Faces from the web: automatic selection and composition of media for casual screen consumption and printed artwork
Phil Cheatle,
Darryl Greig,
David Slatter
Show abstract
Web image search engines facilitate the production of image sets in which faces appear. Many people enjoy producing
and sharing media collections of this type and generating new images or video experiences. Skilled practitioners
produce visually appealing artifacts from such collections but few users have the time or creative ability to do so. The
problem is to automatically create an image or ambient experience which sustains interest. A full solution requires
agreements with copyright holders and input from graphics designers. We address the underlying technical problems of
extraction and composition.
We describe an automatic system that identifies regions containing human faces in each image of an image set resulting
from a web search. The face regions are composed into dynamically synthesized multilayer graphical backgrounds. The
aesthetic aspects of the composition are controlled by active templates. These aspects include face size and positioning
but also face identity and number of faces in a group. The output structure is multi layer supporting both the generation
of static images and video consisting of transitions between the compositions.
Semi-automatic image personalization tool for variable text insertion and replacement
Show abstract
Image personalization is a widely used technique in personalized marketing,1 in which a vendor attempts to
promote new products or retain customers by sending marketing collateral that is tailored to the customers'
demographics, needs, and interests. With current solutions of which we are aware such as XMPie,2 DirectSmile,3
and AlphaPicture,4 in order to produce this tailored marketing collateral, image templates need to be created
manually by graphic designers, involving complex grid manipulation and detailed geometric adjustments. As
a matter of fact, the image template design is highly manual, skill-demanding and costly, and essentially the
bottleneck for image personalization.
We present a semi-automatic image personalization tool for designing image templates. Two scenarios are
considered: text insertion and text replacement, with the text replacement option not offered in current solutions.
The graphical user interface (GUI) of the tool is described in detail. Unlike current solutions, the tool renders
the text in 3-D, which allows easy adjustment of the text. In particular, the tool has been implemented in Java,
which introduces flexible deployment and eliminates the need for any special software or know-how on the part
of the end user.
Automatic image cropping for republishing
Phil Cheatle
Show abstract
Image cropping is an important aspect of creating aesthetically pleasing web pages and repurposing content for different
web or printed output layouts. Cropping provides both the possibility of improving the composition of the image, and
also the ability to change the aspect ratio of the image to suit the layout design needs of different document or web page
formats. This paper presents a method for aesthetically cropping images on the basis of their content. Underlying the
approach is a novel segmentation-based saliency method which identifies some regions as "distractions", as an
alternative to the conventional "foreground" and "background" classifications. Distractions are a particular problem with
typical consumer photos found on social networking websites such as FaceBook, Flickr etc. Automatic cropping is
achieved by identifying the main subject area of the image and then using an optimization search to expand this to form
an aesthetically pleasing crop. Evaluation of aesthetic functions like auto-crop is difficult as there is no single correct
solution. A further contribution of this paper is an automated evaluation method which goes some way towards handling
the complexity of aesthetic assessment. This allows crop algorithms to be easily evaluated against a large test set.
Online Photo Services II
Assessing photographer competence using face statistics
Darryl Greig,
Yuli Gao
Show abstract
The rapid growth of photo sharing websites has resulted in some new problems around the management of a large (and
quickly increasing) number of photographers with different needs and usage characteristics. Despite significant advances
in the field of computer vision, little has been done to leverage these technologies for photographer understanding and
management, partly due to the high computational cost of extracting application-specific image features. Recently robust
multi-view face detection technologies have been widely adopted by many photo sharing sites. This affords a limited but
"standard" pre-computed set of face features to tackle these administrative problems in large scale settings. In this paper
we present a principled statistical model to alleviate one such administrative task - the automatic analysis of
photographer competency given only face detection results on a set of their photos. The model uses summary statistics to
estimate the probability a given individual belongs to a population of high competence photographers over against a
second population of lower competence photographers. Using this model, we have achieved high classification accuracy
(respectively 84.3% and 90.9%) on two large image datasets. We discuss an application of this approach to assist in
managing a photo-sharing website.
Automatic eye enhancement by sclera whitening
Show abstract
We propose a method for automatically enhancing the eyes of all the faces in a digital image by whitening
their scleras (i.e., the white part of their eyes). The scleras are identified by combining existing face detection
and feature alignment technology with a color-based sclera probability map. We then smooth, brighten, and
desaturate the scleras. This reduces the appearance of blood vessels and produces a healthier, more "refreshed"
look.
Automatic digital photo-book making system
Wiley Wang,
Patrick Teo,
Russ Muzzolini
Show abstract
The diversity of photo products has grown more than ever before. A group of photos are not only printed
individually, but also can be arranged in specific order to tell a story, such as in a photo book, a calendar or a
poster collage. Similar to making a traditional scrapbook, digital photo book tools allow the user to choose a
book style/theme, layouts of pages, backgrounds and the way the pictures are arranged. This process is often
time consuming to users, given the number of images and the choices of layout/background combinations. In this
paper, we developed a system to automatically generate photo books with only a few initial selections required.
The system utilizes time stamps, color indices, orientations and other image properties to best fit pictures into
a final photo book. The common way of telling a story is to lay the pictures out in chronological order. If the
pictures are proximate in time, they will coincide with each other and are often logically related. The pictures
are naturally clustered along a time line. Breaks between clusters can be used as a guide to separate pages or
spreads, thus, pictures that are logically related can stay close on the same page or spread. When people are
making a photo book, it is helpful to start with chronologically grouped images, but time alone wont be enough
to complete the process. Each page is limited by the number of layouts available. Many aesthetic rules also
apply, such as, emphasis of preferred pictures, consistency of local image density throughout the whole book,
matching a background to the content of the images, and the variety of adjacent page layouts. We developed an
algorithm to group images onto pages under the constraints of aesthetic rules. We also apply content analysis
based on the color and blurriness of each picture, to match backgrounds and to adjust page layouts. Some
of our aesthetic rules are fixed and given by designers. Other aesthetic rules are statistic models trained by
using customer photo book samples. We evaluate our algorithm with test photo sets, and ask participants both
quantitative and qualitative questions for feedback. We have seen the improvement on the time it takes users to
produce a photo book and on the satisfaction with the overall quality.
The impact of geo-tagging on the photo industry and creating revenue streams
Rolf Richter,
Henning Böge,
Christoph Weckmann,
et al.
Show abstract
Internet geo and mapping services like Google Maps, Google Earth and Microsoft Bing Maps
have reinvented the use of geographical information and have reached an enormous popularity.
Besides that, location technologies like GPS have become affordable and are now being integrated
in many camera phones. GPS is also available for standalone cameras as add on products
or integrated in cameras. These developments are the enabler for new products for the photo
industry or they enhance existing products. New commercial opportunities have been identified
in the areas of photo hardware, internet/software and photo finishing.
Keynote Session
Cooperative classification of shared images
Show abstract
We propose a method the for semi-automatic organization of photo albums. The method analyzes how different
users organize their own pictures. The goal is to help the user in dividing his pictures into groups characterized
by a similar semantic content. The method is semi-automatic: the user starts to assign labels to the pictures
and unlabeled pictures are tagged with proposed labels. The user can accept the recommendation or made a
correction. We use a suitable feature representation of the images to model the different classes that the users
have collected. Then, we look for correspondences between the criteria used by the different users which are
integrated using boosting. A quantitative evaluation of the proposed approach is obtained by simulating the
amount of user interaction needed to annotate the albums of a set of members of the flickr R(trademark) photo-sharing
community.
Semantic/Multimodal Retrieval
Semantic retrieval and automatic annotation: linear transformations, correlation, and semantic spaces
Show abstract
This paper proposes a new technique for auto-annotation and semantic retrieval based upon the idea of linearly
mapping an image feature space to a keyword space. The new technique is compared to several related techniques,
and a number of salient points about each of the techniques are discussed and contrasted. The paper also discusses
how these techniques might actually scale to a real-world retrieval problem, and demonstrates this though a
case study of a semantic retrieval technique being used on a real-world data-set (with a mix of annotated and
unannotated images) from a picture library.
Generic and optimized framework for multi-content analysis based on learning approaches
Show abstract
During the European Cantata project (ITEA project, 2006-2009), a Multi-Content Analysis framework for the
classification of compound images in various categories (text, graphical user interface, medical images, other complex
images) was developed within Barco. The framework consists of six parts: a dataset, a feature selection method, a
machine learning based Multi-Content Analysis (MCA) algorithm, a Ground Truth, an evaluation module based on
metrics and a presentation module. This methodology was built on a cascade of decision tree-based classifiers combined
and trained with the AdaBoost meta-algorithm. In order to be able to train these classifiers on large training datasets
without excessively increasing the training time, various optimizations were implemented. These optimizations were
performed at two levels: the methodology itself (feature selection / elimination, dataset pre-computation) and the
decision-tree training algorithm (binary threshold search, dataset presorting and alternate splitting algorithm). These
optimizations have little or no negative impact on the classification performance of the resulting classifiers. As a result,
the training time of the classifiers was significantly reduced, mainly because the optimized decision-tree training
algorithm has a lower algorithmic complexity. The time saved through this optimized methodology was used to compare
the results of a greater number of different training parameters.
Benchmark of multiple approaches for feature extraction and image similarity characterization
Show abstract
The performance of image classification largely depends on both the discrimination power of the visual features
for image content representation and the effectiveness of the kernels for diverse image similarity characterization.
Different types of kernels have been developed for SVM image classifier training, and different research teams may
use different types of visual features in their experiments. Thus there is an urgent need to provide benchmark work
to assess the real performance of different types of visual features and kernels for various image classification
tasks. In this paper, we have benchmarked multiple approaches for feature extraction and image similarity
characterization, so that some useful guidelines can be provided for: (a) how to select more effective approach
for feature extraction and enhance the discrimination power of various types of visual features; and (b) how to
combine multiple types of visual features and their kernels to enhance the discrimination power of SVM image
classifiers. Our experiments on large-scale image collections have also obtained very positive results.
Image Representation, Retrieval, and Techniques I
Three-domain image representation for personal photo album management
E. Ardizzone,
M. La Cascia,
M. Morana,
et al.
Show abstract
In this paper we present a novel approach for personal photo album management. Pictures are analyzed and
described in three representation spaces, namely, faces, background and time of capture. Faces are automatically
detected and rectified using a probabilistic feature extraction technique. Face representation is then produced
by computing PCA (Principal Component Analysis). Backgrounds are represented with low-level visual features
based on RGB histogram and Gabor filter bank. Temporal data is obtained through the extraction of EXIF
(Exchangeable image file format) data. Each image in the collection is then automatically organized using a
mean-shift clustering technique. While many systems manage faces and typically allow queries about them we
use a common approach to manage multiple aspects, that is, queries regarding people, time and background
are dealt with in a homogenous way. We report experimental results on a realistic set, i.e., a personal photo
album, of about 2000 images where automatic detection and rectification of faces lead to approximately 800
faces. Significance of clustering has been evaluated and results are very interesting.
Harvesting weakly tagged images for computer vision tasks
Yi Shen,
Chunlei Yang,
Yuli Gao,
et al.
Show abstract
To crawl large amounts of weakly-tagged images for computer vision tasks such as object detection and scene
recognition, it is very important to develop new techniques for tag cleansing and word sense disambiguation
(i.e., removing irrelevant images from the crawled results). Based on this observation, a topic network is first
generated to characterize both the semantic similarity contexts and the visual similarity contexts between the
image topics more sufficiently. The topic network is used to represent the classes of objects and scenes of interest.
Second, both the visual similarity contexts between the images and the semantic similarity contexts between
their tags are integrated for tag cleansing and word sense disambiguation. By addressing the issues of polysemes
and synonyms more effectively, our word sense disambiguation algorithm can determine the relevance between
the images and the associated tags more precisely, and thus it can allow us to crawl large-scale weakly-tagged
images for computer vision tasks.
Image Representation, Retrieval, and Techniques II
Image retrieval for identifying house plants
Show abstract
We present a content-based image retrieval system for plant identification which is intended for providing users with a
simple method to locate information about their house plants. A plant image consists of a collection of overlapping leaves
and possibly flowers, which makes the problem challenging. We studied the suitability of various well-known color, texture
and shape features for this problem, as well as introducing some new ones. The features are extracted from the general
plant region that is segmented from the background using the max-flow min-cut technique. Results on a database of 132
different plant images show promise (in about 72% of the queries, the correct plant image is retrieved among the top-15
results).
Comparative study of content-based image retrieval and video fingerprinting
Show abstract
Content-based image retrieval (CBIR) has been studied for nearly two decades since IBM's research on QBIC (Query by
Image Content) [1]. In the past decade, another related but different area, video fingerprinting, is attracting more and
more attention. There are numerous papers published in both areas and researchers from the two areas are sporadically
citing results from each other. However, as far as we know, there is no comprehensive comparison between those two
areas. This paper attempts to fill in this gap by explicitly discussing the wide array of differences and connections
between those two areas. We believe that such a comparative study can facilitate researchers migrating or crosspollinating
between the two areas.
Video Retrieval and Techniques
Robust video and audio-based synchronization of multimedia files
Show abstract
This paper addresses the problem of robust and automated synchronization of multiple audio and video signals. The
input signals are from a set of independent multimedia recordings coming from several camcorders and microphones.
While the camcorders are static, the microphones are mobile as they are attached to people. The motivation for
synchronization of all signals is to support studies and understanding of human interaction in a decision support
environment that have been limited so far due to the difficulties in automated processing of any observations during the
decision making sessions. The application of our work is to environments supporting decisions. The data sets for this
work have been acquired during training exercises of response teams, rescue workers, and fire fighters at multiple
locations.
The developed synchronization methodology for a set of independent multimedia recordings is based on introducing
aural and visual landmarks with a bell and room light switches. Our approach to synchronization is based on detecting
the landmarks in audio and video signals per camcorder and per microphone, and then fusing the results to increase
robustness and accuracy of the synchronization. We report synchronization results that demonstrate accuracy of
synchronization based on video and audio.
Query-based video event definition using rough set theory and video prototypes
Kimiaki Shirahama,
Chieri Sugihara,
Kuniaki Uehara
Show abstract
Since a user wants to retrieve a great variety of events, it is impractical to index a video archive with predefined
events. So, "query-based event definition" is essential to dynamically define events from example videos
provided by the user. Especially, we address how to accurately cover a large variation of low-level features in an
event. Specifically, due to arbitrary camera techniques and object movements, shots of the same event contain
significantly different low-level features. That is, these shots are distributed in different subsets in the space of
low-level features. So, we use "rough set theory" to extract each subset where example shots can be correctly
classified by a simple combination of low-level features. Based on such subsets, we can retrieve various shots of
the same event. But, this retrieval only for a wide coverage is not so accurate, where many irrelevant shots are
ranked at top positions. Thus, we re-rank retrieved shots by finely matching them with example shots. With
respect to this, since the original representation of a low-level feature is very high-dimensional, we use "video
prototypes" which mask irrelevant dimensions to the above matching. Experimental results on TRECVID 2008
video archive show the possibility of our two-step method.
Composition of SIFT features for robust image representation
Show abstract
In this paper we propose a novel feature based on SIFT (Scale Invariant Feature Transform) algorithm1 for
the robust representation of local visual contents. SIFT features have raised much interest for their power of
description of visual content characterizing punctual information against variation of luminance and change of
viewpoint and they are very useful to capture local information. For a single image hundreds of keypoints are
found and they are particularly suitable for tasks dealing with image registration or image matching. In this
work we stretched the spatial coverage of descriptors creating a novel feature as composition of keypoints present
in an image region while maintaining the invariance properties of SIFT descriptors. The number of descriptors
is reduced, limiting the computational weight, and at the same time a more abstract descriptor is achieved. The
new feature is therefore suitable at describing objects and characteristic image regions.
We tested the retrieval performance with a dataset used to test PCA SIFT2 and image matching capability
among images processed with affine transformations. Experimental results are reported.
On accuracy, privacy, and complexity in the identification problem
Show abstract
This paper presents recent advances in the identification problem taking into account the accuracy, complexity
and privacy leak of different decoding algorithms. Using a model of different actors from literature, we show
that it is possible to use more accurate decoding algorithms using reliability information without increasing the
privacy leak relative to algorithms that only use binary information. Existing algorithms from literature have
been modified to take advantage of reliability information, and we show that a proposed branch-and-bound
algorithm can outperform existing work, including the enhanced variants.