Proceedings Volume 4311

Internet Imaging II

cover
Proceedings Volume 4311

Internet Imaging II

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 27 December 2000
Contents: 10 Sessions, 46 Papers, 0 Presentations
Conference: Photonics West 2001 - Electronic Imaging 2001
Volume Number: 4311

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Systems and Architecture
  • Visual Organization
  • Video Summarization
  • Image Retrieval
  • Metadata
  • Visualization
  • Teleprescence and Collaborative Design
  • Medical Applications
  • Video Retrieval
  • Color and Compression
Systems and Architecture
icon_mobile_dropdown
Printing images from the Web
Images have become ubiquitous, virtually every personal computer and every Internet WEB site contains at least some number of images. Professional as well as amateur artists display their work on the net and large image collections exist, spanning all aspects of art, hobby and daily life. The images vary widely in content, but more importantly, they also vary widely in general image quality attributes. Some of the quality variation is caused by the actual image creation or internet posting process, whereas some other quality variations are caused by the lack of defining image data description, like color-space definition. This paper describes problems and possible solutions associated with the printing of images from unknown sources.
DjVu document browsing with on-demand loading and rendering of image components
Yann Le Cun, Leon Bottou, Andrei Erofeev, et al.
Image-based digital documents are composed of multiple pages, each of which may be composed of multiple components such as the test, pictures background, and annotations. We describe the image structure and software architecture that allows the DjVu system to load and render the required components on demand while minimizing the bandwidth requirements, and the memory requirements in the client. DjVu document files are merely a list of enriched URLs that point to individual files (or file elements) that contain image components. Image components include :text images, background images, shape dictionaries shared by multiple pages, OCRed text, and several types of annotations. A multithreaded software architecture with smart caching allows individual components to be loaded and pre-decoded and rendered on-demand. Pages are pre-fetched or loaded on demand, allowing users to randomly access pages without downloading the entire document, and without the help of a byte server. Components that are shared across pages (e.g. shape dictionaries, or background layers) are loaded as required and cached. This greatly reduces the overall bandwidth requirements. Shared dictionaries allow 40% typical file size reduction for scanned bitonal documents at 300dpi. Compression ratios on scanned US patents at 300dpi are 5.2 to 10.2 times higher than GroupIV with shared dictionaries and 3.6 to 8.5 times higher than GroupIV without shared dictionaries.
Multispectral Internet imaging
Hans Brettel, Francis J. M. Schmitt
We present a system for multispectral image acquisition which is accessible via an Internet connection. The system includes an electronically tunable spectral filter and a monochrome digital camera, both controlled from a PC-type computer acting as a Web server. In contrast to the three fixed color channels of an ordinary WebCam, our system provides a virtually unlimited number of spectral channels. To allow for interactive use of this multispectral image acquisition system through the network, we developed a set of Java servlets which provide access to the system through HyperText Transfer Protocol (HTTP) requests. Since only the standard Common Gateway Interface (CGI) mechanisms for client-server communication are used, the system is accessible from any Web browser.
Multilingual system using Internet imaging
Tadashi Mori, Yoshitsugu Hata, Ryouji Iida, et al.
In these years, multilingual system becomes important, but, most computer environment cannot handle all languages (scripts) in ths world. This paper presents a multilingual imaging system on the Internet. In this system, characters are converted into bitmaps, and therefore, we can display multilingual text on WWW browsers. In order to convert multilingual plain text into bitmap images, we have developed software named ctext2pgm and VFlib. VFlib is a software component to rasterize fonts in various file formats, and ctext2pgm generates bitmap image files form multilingual plain texts. Ctext2pgm is an application program of VFlib, and it supports about 30 languages. We also introduce a language education system for various languages. This is an example of the multilingual system using internet imaging.
Object and image retrieval over the Internet
Sebastien Gilles, A. Winter, J. Feldmar, et al.
In this article, we describe some of the work that was carried out at LookThatUp for designing an infrastructure enabling image-based search over the Internet. The service was designed to be remotely accessible and easily integrated to partner sites. One application of the technology, called Image-Shopper, is described and demonstrated. The technological basis of the system is then reviewed.
Intellihance: client-side and server-side architectures for photo site image enhancement
David M. Pfeiffer
Consumer Digital Photography (CDP_ has many advantages over film photography, including instant preview, on-demand- printing, image enhancement and Internet distribution. Despite its many advantages, CDP will not be broadly accepted until it meets and exceeds the defacto standards for quality, low cost and ease-of-use of the film based camera and its supporting infrastructure. With the advent of low-cost high-resolution image sensors, only ease-of-use will remain as a barrier to consumer acceptance of digital photography. Image enhancement is an easily understood CDP capability that allows the user to reclaim photos that might have otherwise been discarded due to composition problems or poor lighting conditions. While the consumer might recognize the saving power of image enhancement, it must be point and shoot simple. Intellihance supports a point and shoot approach to image enhancement, making good images better and rescuing images that may have otherwise been discarded. Intellihance, a proven technology in image editing applications, can also be used in photo Kiosks, digital cameras and photo web sites. This paper will specifically examine the Intellihance user interfaces and browser-based Intellihance solutions.
Visual Organization
icon_mobile_dropdown
Vision theory guiding Web communication
Claudio M. Privitera, Lawrence W. Stark, Yuek Fai Ho, et al.
Eye movements, EMs, are one important component of vision: only specific regions of the visual input are fixated and processed by the brain at high resolution. The rest of the image is viewed at lower and coarser resolution by the retina, but the image is still perceived and recognized uniformly and clearly. We embodied this sampling characteristic of human vision within a computational model, A*, based on a collection of image processing algorithms that are able to predict regions of visual interest. Several web-related applications are presented and discussed in this paper.
Indexing and retrieving Web documents as direct manipulation of images
Fernando Ferri, Patrizia Grifoni, Piero Mussio, et al.
The rapid growth of network communication through the World Wide Web has encouraged a large diffusion of connections to Internet, due to the heavily interactive services which are offered for accessing, using and producing the incredible mass of information and more general resources which is now available. People communicating in this environment are usually end users whom are not skilled in computer science and are experienced in a specific area; they are generally interested in search, producing information, and accessibility. The phenomenon of the World Wide Web is producing a significant change in the concept of document, which is becoming strongly visual and dynamically arranged. A document is an image, and an image is a document. This change requires a new approach in presenting, authoring, indexing and querying a web document. In the paper we propose visual language defined to reach the previously introduced goals, discussing the case of an Information Base containing clinical data. Notwithstanding the amount and the heterogeneity of the data available, it is quite difficult to access truly interesting information and to suitably exploit it; this is due to the poor usability of tools which offer and interaction style still limited with respect to the interfaces WIMP (Window, Icons, Menu, Pointers) and to the indexing techniques usually adopted to organize the web pages by means of robots and search engines.
New classification strategy for color documents
Raimondo Schettini, Carla Brambilla, A. Valsasna, et al.
A hierarchical classification for photographs, graphics, texts and compound documents is described. The key features of the strategy are the use of Cart trees for classification and the indexing of the images considering only low-level perceptual features, such as color, texture, and shape, automatically computed on the images. The preliminary results are reported and discussed.
Automatic page layout using genetic algorithms for electronic albuming
Joe Geigel, Alexander C. P. Loui
In this paper, we describe a flexible system for automatic page layout that makes use of genetic algorithms for albuming applications. The system is divided into two modules, a page creator module which is responsible for distributing images amongst various album pages, and an image placement module which positions images on individual pages. Final page layouts are specified in a textual form using XML for printing or viewing over the Internet. The system makes use of genetic algorithms, a class of search and optimization algorithms that are based on the concepts of biological evolution, for generating solutions with fitness based on graphic design preferences supplied by the user. The genetic page layout algorithm has been incorporated into a web-based prototype system for interactive page layout over the Internet. The prototype system is built using client-server architecture and is implemented in java. The system described in this paper has demonstrated the feasibility of using genetic algorithms for automated page layout in albuming and web-based imaging applications. We believe that the system adequately proves the validity of the concept, providing creative layouts in a reasonable number of iterations. By optimizing the layout parameters of the fitness function, we hope to further improve the quality of the final layout in terms of user preference and computation speed.
Image representations for accessing and organizing Web information
Jonathan I. Helfman, James D. Hollan
The web is enormous and constantly growing. User-interfaces for web-based applications need to make it easy for people to access relevant information without becoming overwhelmed or disoriented. Today's interfaces employ textual representations almost exclusively, typically organized in lists and hierarchies of web-page titles or URL taxonomies. Given the ability of images to assist memory and our frequent exploitation of space in everyday problem solving to simplify choice, perception, and mental computation, it is surprising that so little use is made of images and spatial organizations in accessing and organizing web information. The work we summarize in this paper suggest that spatial and temporal organization of selectable images may offer multiple advantages over textual lists of titles and URLs. We describe several image-based applications, detail basic image representation techniques, and discuss spatial and temporal strategies for organization.
Quicklink: a system for the generation of similarity links in Web image archives
Isabella Gagliardi, Bruna Zonta
We present here Quicklink, a system that retrieves images similar to a query image in large web archives of artworks by dynamically matching their textual descriptions (usually catalog cards), adapts its behavior to user requests, and presents the retrieval results in HTML pages, where the images are ordered according to their degree of similarity.
Video Summarization
icon_mobile_dropdown
Distributional clustering for content-based browsing of unstructured video
Giridharan Iyengar
The focus of this paper is to facilitate access into an important class of multimedia content: unscripted and casually shot home videos. Unlike scripted video (such as movies and commercials), video that has been shot for the simple purpose of recording an event of place has no structure to facilitate its access. Our approach to tackling the unstructured video problem is to use clustering techniques to create multiple groupings of the video. We describe two clustering algorithm in this paper, which assume fairly general probability density descriptions of video. In addition, we describe an application built upon these clustering algorithms that enables browsing and rescripting unstructured content.
Mosaic-based query paradigm for content-based video retrieval
Jurgen Assfalg, Alberto Del Bimbo, Masahito Hirakawa
To support widespread deployment and usage of content-based video retrieval (CBVR), definition of simple (i.e. intuitive), yet powerful query interfaces must accompany the ongoing investigation of feature descriptors, and of related extraction and indexing techniques. In this paper we propose a visual query paradigm for CBVR which develops on mosaicing- a well-known technique in computer vision and graphics for creating a comprehensive overview of a scene reproduced in a set of images. The language underlying the paradigm supports querying for video shots by specifying camera motion, as well as motion and visual appearance of objects. This approach supports a consistent reproduction of the spatio-temporal nature of videos in both query specification and visualization of retrieval results, which enables users to specify and refine queries in an iterative way.
Image Retrieval
icon_mobile_dropdown
Prefiltering with Retinex in color image retrieval
We have examined the performance of various color-based retrieval strategies when coupled with a pre-filtering Retinex algorithm to see whether, and to what degree, Retinex improved the effectiveness of the retrieval, regardless of the strategy adopted. The retrieval strategies implemented included color and spatial-chromatic histogram matching, color coherence vector matching, and the weighted sum of the absolute differences between the first three moments of each color channel. The experimental results are reported and discussed.
Using positive and negative examples for precise image retrieval
Jurgen Assfalg, Alberto Del Bimbo, Pietro Pala
Systems for content based image retrieval typically support access to database images through the query-by-example paradigm. This includes query-by-image and query-by-sketch. Since query-by-sketch can be difficult in some cases-lack of sketching abilities, difficulty to detect distinguishing image features-querying is generally performed through the query-by-image paradigm. A limiting factor of this paradigm is that a single sample image rarely includes all and only the characterizing elements the user is looking for. Querying using multiple examples is a possible solution to overcome this limitation. In this paper some issues and solutions for retrieval by content using positive and negative examples are presented and discussed.
Dynamic multiscale image classification
Alfons H. Salden, Sorin Marcel Iacob
We present and demonstrate a mathematical, physical and logical framework for classifying images at various scales (dynamic and spatio-temporal resolutions) such that the Internet requirements concerning e.g. MPEG7/21 standards and available bandwidth are met. The mathematical and physical framework hinges on the (de) categorification (simplification and abstraction) of the dynamics involved in image formation and the Internet requirements at various scales. Firstly, the dynamics is categorified by an initialization of physical fields, such as color models, subjected to a gauge group capturing various imaging conditions. A decategorification of those fields consists of joint (non-) local geometric and topological equivalences (symmetries or invariants). Secondly, categorifications of dynamic scale-space paradigms for these equivalences are derived incorporating Internet requirements. These paradigms are set up to be robust to particular imaging conditions, Lyapunov instabilities (noise) in image formation and to structural instabilities due to e.g. changes in Internet requirements. The logical framework consists of a decategorification of the various dynamic scale-space paradigms and their evolutions caused by changing Internet requirements in terms of (non-)local symmetries, conservation laws and curvatures. Simple examples of (de) categorifications of dynamic scale-space paradigms taking into account Internet requirements are presented.
Metadata
icon_mobile_dropdown
Image retrieval and semiautomatic annotation scheme for large image databases on the Web
Xingquan Zhu, Wenyin Liu, HongJiang Zhang, et al.
Image annotation is used in traditional image database systems. However, without the help of human beings, it is very difficult to extract the semantic content of an image automatically. On the other hand, it is a tedious work to annotate images in large databases one by one manually. In this paper, we present a web based semi-automatic annotation and image retrieval scheme, which integrates image search and image annotation seamlessly and effectively. In this scheme, we use both low-level features and high-level semantics to measure similarity between images in an image database. A relevance feedback process at both levels is used to refine similarity assessment. The annotation process is activated when the user provides feedback on the retrieved images. With the help of the proposed similarity metrics and relevance feedback approach at these two levels, the system can find out those images that are relevant to the user's keyword or image query more efficiently. Experimental results have proved that our scheme is effective and efficient and can be used in large image databases for image annotation and retrieval.
Classifying images using multiple binary-class decision trees for object-based image retrieval
Linhui Jia, Leslie Kitchen
This paper describes an approach to multiclass object classification using local information based invariant object-contour representation and a combination of one-per- class binary-class decision tree classifiers. The object representation scheme is based on the polygonal approximations of object contours. C4.5 is used to learn each of the binary-class tree classifiers which are used to predict the class of each segment of an object. A new decision combination method is used to determine the class of an object based on class probability distribution of each segment of the object on each of the binary-class trees. The proposed object classification approach is invariant to translation, rotation, and scale changes of objects. On applying this approach to a hand tool image database in the situation of image retrieval, the experimental results show that the retrieval performance is significantly better than the results obtained by previous studies.
Multimedia database system with embedding MPEG-7 meta data
In this paper, a multimedia database system is proposed using MPEG-7 meta data. Multimedia content based retrieval system is implemented with the MPEG-7 meta data by use of a data hiding technique. MPEG-7 descriptor and descriptor scheme are hidden into the original data using data hiding and watermarking technique. The hidden data is used as a query for the multimedia indexing/retrieval system. In this paper, color and texture descriptors and their descriptor scheme are used for the MPEG-7 multimedia database. To verify the usefulness of the proposed descriptor for contents featuring of texture, computer simulations and experiments with MPEG-7 image database were performed.
Accessing textual information embedded in Internet images
Indexing and searching for WWW pages is relying on analyzing text. Current technology cannot process the text embedded in images on WWW pages. This paper argues that this is a significant problem as text in image form is usually semantically important (e.g. headers, titles). The results of a recent study are presented to show that the majority (76%) of words embedded in images do not appear elsewhere in the main text and that the majority (56%) of ALT tag descriptions of images are incorrect of do not exist at all. Research under way to devise tools to extracted text from images based on the way humans perceive color differences is outlined and results are presented.
Thematico-visual image retrieval: how to deal with partially indexed corpora
Gerald Duffing
It becomes very easy to access large amounts of images when surfing the Internet. All images, however, are not always thematically indexed. We think that partially thematically indexed corpora can be organized in a way that facilitates retrieval. We assume that, concerning visual properties, the corpus is totally indexed by means of generic features. Based on these indexes, a hierarchical clustering technique is used to bring together images that share some similarities: two distinct structures are built (dendrograms). We propose a new retrieval strategy based ona virtual image that captures the user's need along the retrieval session, taking into account both thematic and visual aspects. Clusters are successively selected in each dendrogram. A combined method, called tunnels, allows dendrograms cooperation. Images are then ranked according to the virtual image. After each retrieval step, the virtual image is enriched within a relevance feedback process. Theme, color and general layout of each images can be rated and the query is updated accordingly. In our experiments, we used two different corpora (2470 and 1100 images) to assess the performance of our thematico- visual approach within different indexing conditions. Experimentation results confirm the relevance of our approach and suggests improvement possibilities.
Visualization
icon_mobile_dropdown
Interactive visualization of energy consumption using VRML
Tomohiro Kuroda, Atsushi Nakamura, Yoshiyuki Kojima, et al.
Under growth of request for energy saving, city planners should consider efficiency of energy consumption from the beginning. Diversified analysis of end-use energy consumption is indispensable for exploration of desirable energy system in urban area. When the visualization is available on the Internet, the city planners can discuss freely on given plans on the Internet and can ask for the help and comments of certain learned people. This paper proposes a VR-based interactive visualization system utilizing hyperlink function of VRML. The proposed visualization relates end-use energy consumption with consumers' geometrical arrangements and nests sets of visualizations. The city planners can observe them in a virtual environment over the Internet. The proposed system was applied to a set of end-use electric power consumption data of a certain area. Experimental results clear that the visualization lets users comprehend a trend of end-user and characteristics of each consumer.
Photorealistic rendering over the Internet for restoration support of ancient buildings
Maurizio Rossi, Daniele Marini, Alessandro Rizzi
In the field of cultural heritage restoration, experts are extremely interested in the analysis of the large amounts of data describing the condition and history of ancient monuments. In this paper we describe a method and its implementation for providing high quality photorealistic image synthesis of ancient buildings over the Internet through VRML and Java technology. A network-based Java application manages geometric 3D VRML models of an ancient building to provide an interface to add information and to compute high quality photorealistic snapshots of the entire model or any of its parts. The poor quality VRML real time rendering is upheld by a slower but more accurate rendering computed on a radiometric basis. The input data for this advanced rendering is taken form the geometric VRML model. We have also implemented some extensions to provide spectral dat including the measurement of light and materials obtained experimentally. The interface to access the ancient building database is descriptive VRML model itself. The Java application enhances the interaction with the model to provide and manage high quality images that allow visual qualitative evaluation of restoration hypotheses by providing a tool to improve the appearance of the resulting image under assigned lighting conditions.
Indexing VRML objects with triples
YongCheol Yang, JaeDong Yang, HyungJeong Yang, et al.
In this paper, we propose a triple-based indexing model for the retrieval of VRML (Virtual Reality Modeling Language) objects. A VRML object is basically a VRML class, which can be presented at WWW through 3D visualization tools. Since a composite VRML object consists of several constituent VRML objects having 3D spatial structure, an indexing mechanism is crucial to fully describe the characteristics of the spatial structure. Triples are used to index the composite VRML object by representing its structure in terms of its constituents and their relative directions. TO enhance recall, the proposed model also provides a mechanism to match two objects, which are equivalent if performing an appropriate rotation transformation. This model is an attempt towards concept-based 3D object retrieval.
Mathematical morphology three-dimensional binary image representation
Dragos Nicolae Vizireanu, A. Vizireanu, V. Lazarescu
This research addresses the representation of 3D binary images by means of mathematical morphology. In this work, the main image representation is called structure representation, useful for 3D binary image compression. It consists of first calculating the balls (or other predefined 3D convex shapes) with the greatest size contained inside the 3D shape, then taking the residue (set difference) between the original 3D shape to the above greatest balls, and finally reiterating the above procedure on the residue until the whole shape is decomposed. The resulting decomposition elements are, therefore, disjoint. Original shape can perfectly reconstruct. The 3D structure representation is generalized, to extend the scope of its algebraic characteristics as much as possible.
Benchmark for image retrieval using distributed systems over the Iinternet: BIRDS-I
Comparing the performance of CBIR (Content-Based Image Retrieval) algorithms is difficult. Private data sets are used so it is controversial to compare CBIR algorithms developed by different researchers. Also, the performance of CBIR algorithms is usually measured on an isolated, well- tuned PC or workstation. In a real-world environment, however, the CBIR algorithms would only constitute a minor component among the many interacting components needed to facilitate a useful CBIR application e.g., Web-based applications on the Internet. The Internet, being a shared medium, dramatically changes many of the usual assumptions about measuring CBIR performance. Any CBIR benchmark should be designed form a networked systems standpoint. Networked system benchmarks have been developed for other applications e.g., text retrieval, and relational database management. These benchmarks typically introduce communication overhead because the real systems they model are distributed applications e.g., and airline reservation system. The most common type of distributed computing architecture uses a client/server model. We present our implementation of a client/server CBIR benchmark called BIRDS-I (Benchmark for Image Retrieval using Distributed Systems over the Internet) to measure image retrieval performance over the Internet. The BIRDS-I benchmark has been designed with the trend toward the use of small personalized wireless-internet systems in mind. Web-based CBIR implies the use of heterogeneous image sets and this, in turn, imposes certain constraints on how the images are organized and the type of performance metrics that are applicable. Surprisingly, BIRDS-I only requires controlled human intervention for the compilation of the image collection and none for the generation of ground truth in the measurement of retrieval accuracy. Benchmark image collections need to be evolved incrementally toward the storage of millions of images and that scaleup can only be achieved through the use of computer-incrementally toward the storage of millions of images and that scaleup can only be achieved through the use of computer-aided compilation. Finally, the BIRDS-I scoring metric introduces a tightly optimized image-ranking window, which is important for the future benchmarking of large- scale personalized wireless-internet CBIR systems.
Teleprescence and Collaborative Design
icon_mobile_dropdown
Low-cost system for real-time integration of virtual and real distributed environments
Nello Balossino, Fulvio Bresciani, Maurizio Lucenteforte, et al.
As is known, the term virtual studio is usually referred to the integration of synthetic and real environments: if an action occurs in the real world, the same must be reproduced in the artificial scene. The integration may be realized using one of the available systems which are generally characterized by high and expensive technology. Our work proposes an approach with two important advantages: low cost and possibility of building collaborative work. The system consists in a location named SM (generator of Synthetic worlds Meshing between real and virtual worlds) and in a differently located site indicated as V. The system SM is a powerful graphics oriented machine, which is able both to make highly realistic real time rendering of a complex virtual world, and to mesh the virtual scene with the video signal received from the V system. We suppose the V system characterized by a uniform background and a subject captured by a web cam whose video frames are sent to the SM system. In order to obtain the right information about the position of each video camera in the real world coordinate system and the zoom parameters, we propose an easy approach based on detecting the shape variations of a flag, with known aspect and dimension, placed in a defined position in the uniform background. This means that in a particular frame the scene modifications are codified in a few parameters related to the flag variations, so the integration between real and virtual becomes easy. The mesh results are sent to V, while just the selected meshed image is available for a generic user connected to the net service. The system may be applied in different contexts, for example video conferences and multiplayer virtual sets.
Very low rate video processing
Web cameras are becoming more and more common on the Internet, and the technology is ready to make cameras a standard accessory of any computer. The development of applications, however, hasn't followed the explosive diffusion of the cameras. Problems in developing applications for remote web cameras come form the low image quality that they generally provide and from the fact that, unless the application runs locally, the image data are only available at a very low frame rate (typically between 10 sec/image and 30 min/image). New image analysis and processing techniques are needed to take advantage of the opportunity represented by web cameras. This paper presents some early considerations and techniques to deal with what I call Very Low Rate Vide (VLRV). Certain operations of fundamental importance in vision, such as motion detection, are impossible in VLRV, due to the large interval between consecutive images. Other operations, like color processing, are made difficult by the low quality and temporal instability of the images. The paper presents techniques to deal with different processes with different time constants, and tries to determine the limits of what is feasible using one web camera and using a whole collection of web cameras.
Collaborative design using distributed virtual reality over the Internet
Fabien Costantini, Christian Toinard, Nicolas Chevassus, et al.
Efficient collaborative virtual environments are missing. First, current solutions do not support mobility to more easily from a disconnected work to a meeting. Second, they do not preserve the consistency or they limit the parallel working. Third, a client-server approach is inefficient in many ways. It introduces a bottleneck and a point of failure in the system. At last, requiring a specific Quality of Service (QoS) from the under laying network limits the ease of deployment. Paper answers these shortages. It enables to distribute a global scene tree among several private spaces. A worker carries out a disconnected work to improve his private space while satisfying protection rules. These different puzzle pieces assemble automatically into a global scene tree during a meeting work. Workers modify in real-time the shared scene. Real time awareness, parallel working and work persistency are provided. A consistency property guarantees the work progression. The solution is fully distributed. Full replication and multicasting improve the performances. A connection facility solves the connectivity problem. It switches automatically from multicast to point-to-point. At last, security is addressed. Standard and secure email enable authentication and distribution of a session key. A re-keying protocol assures confidentiality without requiring communication entities to process X509 certificates.
Distributed monitoring architecture for movable objects over the GSM system
Jui-fa Chen, Wei-Chuan Lin, Chi-Ming Chung, et al.
With the popularity of the network, many applications have been designed to be a distributed system. The more distance for communication, the more hardware and lines for communication are needed. As the GSM has been widely used for telecommunication, it can also be applied to the wireless communication. This paper proposes a wireless monitoring architecture based on the GSM system and takes a car system as a test case. The proposed architecture is divided into three parts. The first is that the data collected from the sensor are sent to the GSM provider by the simulated message of GSM. The second is that the GSM provider processes the received data. To reduce the transmission traffic, a dead-reckoning algorithm is applied to decide whether the process result should be sent to the monitoring center or not. The third is that the monitoring center receives the data which are sending from the GSM provider to give advises to the car driver. With the help of this proposed architecture, a wireless monitoring architecture based on GSM system is verified. In addition, the monitoring center can combine with the GIS to display the car status in an electronic map.
Design of a MPEG-4-based multimedia e-mail system
Guoyin Wang, Li Liao
In this paper, we discuss the design of an MPEG-4 compression technology based multimedia e-mail system. As our test in ISDN context indicates, the MPEG-4 gives high compression ratio up to 130:1 without discernable quality deterioration in multimedia e-mail application where video is featured by infrequent and slow motions. It is an exciting improvement over other compressors in this area. The performance comparison of our system with other similar systems is also given in this paper. IN our multimedia e- mail system, messaging subsystem is implemented with Messaging Application Programming Interface (MAPI). A special architecture of multimedia e-mail package is also presented.
Medical Applications
icon_mobile_dropdown
Teleprescence in ear, nose, and throat surgery
Wolfgang Freysinger, Andreas R. Gunkel, Walter F. Thumfart, et al.
The intraoperative orientation of a surgeon during video- endoscopic endonasal procedures is a challenge. Individual anatomical knowledge and modern 3D-computer assisted navigation technologies provide a maximum of information during surgery (=patient safety). A such, the position of a tool is visualized in the preoperative radiologic images as the center of cross-hairs in typical axial, coronal and sagittal views of, e.g. a stack of CT images. We have implement the augmentation of reality by superimposing the positional data and additional guiding structures- access paths and delicate structures- to the live video of the surgical site. The currently available telecommunication infrastructure allows to connect any two locations in order to facilitate and allow remotely proctored preoperative planning, consultation and guidance. This allows to provide the maximum of intra-operative information, with an expert advice from a remote specialist. We have been achieving satisfactory results on base of telephone, ISDN, Ethernet and Atm connections and could demonstrate that the ARTMA technology provides essential information for a remote expert, who shares the same information as the local surgeon, and can be an essential aid for difficult surgical interventions. The ARTMA Knowledge Guided Surgery can become an important tool for further optimizing surgery.
Web tools for effective retrieval, visualization, and evaluation of cardiology medical images and records
Marco Masseroli, Francesco Pinciroli
To provide easy retrieval, integration and evaluation of multimodal cardiology images and data in a web browser environment, distributed application technologies and java programming were used to implement a client-server architecture based on software agents. The server side manages secure connections and queries to heterogeneous remote databases and file systems containing patient personal and clinical data. The client side is a Java applet running in a web browser and providing a friendly medical user interface to perform queries on patient and medical test dat and integrate and visualize properly the various query results. A set of tools based on Java Advanced Imaging API enables to process and analyze the retrieved cardiology images, and quantify their features in different regions of interest. The platform-independence Java technology makes the developed prototype easy to be managed in a centralized form and provided in each site where an intranet or internet connection can be located. Giving the healthcare providers effective tools for querying, visualizing and evaluating comprehensively cardiology medical images and records in all locations where they can need them- i.e. emergency, operating theaters, ward, or even outpatient clinics- the developed prototype represents an important aid in providing more efficient diagnoses and medical treatments.
Acquisition and review of diagnostic images for use in medical research and medical testing examinations via the Internet
Mark A. Pauley, Glenn V. Dalrymple, Quiming Zhu, et al.
With the continued centralization of medical care into large, regional centers, there is a growing need for a flexible, inexpensive, and secure system to rapidly provide referring physicians in the field with the results of the sophisticated medical tests performed at these facilities. Furthermore, the medical community has long recognized the need for a system with similar characteristics to maintain and upgrade patient case sets for oral and written student examinations. With the move toward filmless radiographic instrumentation, the widespread and growing use of digital methods and the Internet, both of these processes can now be realized. This article describes the conceptual development and testing of a protocol that allow users to transmit, modify, remotely store and display the images and textual information of medical cases via the Internet. We also discuss some of the legal issues we encountered regarding the transmission of medical information; these issues have had a direct impact on the implementation of the results of this project.
Visible human slice sequence animation Web server
Jean-Christophe Bessaud, Roger David Hersch
Since June 1998, EPFL's Visible Human Slice Server (http://visiblehuman.epfl.ch) allows to extract arbitrarily oriented and positioned slices. More than 300,000 slices are extracted each year. In order to give a 3D view of anatomic structures, a new service has been added for extracting slice animations along a user-defined trajectory. This service is useful both for research and teaching purposes (http:visiblehuman.epfl.ch/animation/). Extracting slices of animations at any desired position and orientation from the Visible Human volume (Visible Man or Woman) requires both high throughput and much processing power. The I/O disk bandwidth can be increased by accessing more than one disk at the same time, i.e. by stripping data across several disks and by carrying out parallel asynchronous disk accesses. Since processing operations such as slice and animation extraction are compute- intensive, they require the program execution to be carried out in parallel on several computers. In the present contribution, we describe the new slice sequence animation service as well as the approach taken for parallelizing this service on a multi-PC multi-disk Web server.
3D foveated visualization on the Web
John Schermann, John L. Barron, Irene A. Gargantini
Recent developments in Internet technology, combined with the computerization of hospital radiology departments, allow the remote viewing of medical data images. Generally, however, medical images are data intensive and the transmission of such images over a network can consumer large amounts of network resources. Previous work by Liptay et al, presented an interactive, progressive program (implemented in JAVA and requiring a web browser) that allowed the transmission of multi-resolution JPEG image data using various ROI (Region of Interest) strategies in order to minimize Internet bandwidth requirements. This work handles both 2D and 3D image data, but 3D data was treated as a sequence of 2D images, where each 2D image had to be individually requested by the system. The work described in this paper replaces the representation of 3D data as a 2D JPEG image sequence with a single block of lossy 3D image data compressed using wavelets. In a similar fashion, 2D image data is wavelet compressed. Wavelet decomposition has been shown to have consistently better image quality at high compression ratios than other lossy compression methods. We use wavelet compression in a JAVA application program on the server side to construct a lossy low resolution version of the data. As well, high resolution difference sub-blocks of data are also created by the JAVA application; a difference sub-block and the corresponding low resolution lossless data. Transmitting the low resolution image and difference sub-blocks (as requested) only requires a small fraction of the network bandwidth compared to that which would otherwise be needed to transmit the entire lossless data set. The user, via a JAVA applet on the client side, is provided with a number of methods to choose a trajectory (sequence) of regions of interest in the low resolution image. Once the region(s) of interest are chosen, the sub-blocks of image data in the various trajectories are then retrieved and integrated into the low resolution display to provide lossless reconstruction in the regions of interest. Our program significantly reduces download time since extraneous information is not transmitted.
Video Retrieval
icon_mobile_dropdown
Three-dimensional semantic object tracking in video sequences
Jean Gao, Akio Kosaka
One of the difficulties in semantic object tracking is to trace the object precisely as time going on. In this paper, a system for 3D semantic object motion tracking is proposed. Different form other approaches which have used regular shapes as tracked region, our system starts with a specially designed Color Image Segmentation Editor (CISE) to devise shapes that more accurately describe the region of interest (ROI) to be tracked. CISE is an integration of edge and region detection, which is based on edge-linking, split-and- merge and the energy minimization for active contour detection. An ROI is further segmented into single motion blobs by considering the constancy of the motion parameters in each blob. The tracking of each blob is based on an extended Kalman filter derived form linearization of a constraint equation satisfied by the pinhole model of a camera. The Kalman filter allows the tracker to project the uncertainties associated with the blob feature points to the next frame. Feature points extraction is done by similarity test based on optimized semantic search region. Extracted feature points serially update motion parameters. Experimental results show the different stages of the system.
Approach of sports programs classification with motion information in MPEG domain
Yuwen He, Shi-Qiang Yang, Yuzhuo Zhong
Classifying different sorts of programs is very important and necessary in order to realize fast retrieval in large multimedia retrieval systems. This paper focuses on classification of sports program with motion information in MPEG domain. It is fast and efficient to analysis motion information in compressed data without the preprocessing of total decoding and many programs are compressed in MPEG- 1/MPEG-2 format. The paper proposes an approach to classify the sports programs by dominant motion information. There is motion information in forward prediction coded frames (P- frame) in MPEG compression data. The motion information can be extracted form MPEG domain. Principal component analysis method is utilized in order to get the dominant motion information from the macroblock's motion vectors, which can simplify the motion information. Principal components of motion information are used to recognize which kind of sport it belongs to with hidden Markov model after those patterns are trained with different kind of sports programs. Some testing sets are used to do some experiment in order to evaluate the performance of the method proposed in paper. Classifying sports programs with motion is available from the experimental results.
Motion feature extraction for content-based video sequence retrieval
In this paper we present a region based approach for Short Term Motion analysis and retrieval of video sequences. Our feature extraction scheme converts the motion information of a video frame pair into a combination of different symbols. First the system analyzes the global and local motion to get a dense optical flow field for every frame pair. The local optical flow field is segmented using an affine model based region growing method. The affine model parameters of the segmented regions as well as the region size from a 7 dimensional space, which is partitioned by a vector quanitzer. Each region is then mapped to a code book symbol of the quantizer. With a group of symbols representing each frame pair, we borrow the Vector Space Model and TF*IDF scoring from text document retrieval to index and retrieve their motion information. Preliminary experimental results are shown in the paper. Our approach is able to retrieve complex combination of different motion in the video, and can be easily scaled up to form a shot level descriptor as well as integrated with other video features.
Color and Compression
icon_mobile_dropdown
Invariant variational principle for model-based interpolation of high-dimensional clustered data
Ravi C. Venkatesan
A self-consistent formulation for the model-based interpolation of high dimensional data, approximated by clusters, has been derived on the basis of the calculus of infinitesimal transformations. The model-based interpolation is represented in the form of a dynamical system, obtained as a consequence of Noether's theorem. The case for intra-cluster interpolation has been derived, and extensions to the case of mixture model interpolation are discussed. The present formulation has been proven to be computationally efficient. Numerical examples for exemplary cases are demonstrated.
Geometric compression with predictable compression ratio
Zisheng Le, David Y. Yun, Tianyu Lu
Due to the proliferation of 3D objects, which are often expensive to manipulate in computers and to transmit across the Internet, techniques of geometric oppression are becoming increasingly important. Based on the parallelogram coordinate prediction and connectivity compression, this paper presents a near-lossless, two-pass, triangular mesh compression algorithm that achieves a compression ratio ranging from 15:1 to 40:1. An average compression ratio better than 20:1 has also been derived form a large collection of 3D objects ranging from simple man-made shapes to complex natural objects. Several derived compression algorithms, taking advantage of fold angles, have been implemented as part of the compression ratio and algorithm comparison. A linear predictive formula has been derived that effectively foretells the compression ratio from a derivable parameter of the fold-angle histogram of any given 3D-object model. The parameter x is defined as x= df/(tf- df), where df is the frequency of a dominant fold angle and tf is the total non-zero frequency. Experiments show that the predictive formula holds for most high-resolution models ($GTR1000 points). This predictability of compression ratio allows users to effectively predetermine the transmission time and the computing time requirements for any post-processing. Thus, the results presented here not only contribute algorithms for geometric compression by achieving good compression ratios but also provide valuable predictability for those dynamic or online applications.
Scene detection for MPEG video sequences
Carmelo Lodato, Salvatore Lopes
The indexing process required by content-based video databases is correlated to visual characteristics of their content. A preliminary step consists in the partitioning of the video into a sequence of short dynamic scenes, generally characterized by a set of homogeneous features. Each scene may be therefore characterized by the features of one of more representative frames, i.e. still images. In this paper, a method for scene detection of MPEG-1 and MPEG-2 video sequences is reported. The method does not need to decode the streams, because it is based on the analysis of their external characteristics, such as the frame pattern and the sized of I-, P- and B-frames. Changes in above characteristics are used to detect the frames representing each scene, using heuristics and statistical considerations. Since the analysis is based on very simple computation, the algorithm performs very fast. Its computational cost is linear dependent on the number of frames. On the other hand, the method is not well suited for video clip of short length, for which a statistical analysis is not significant. For its low computation cost and accuracy, the method is a suitable tool for preliminary segmentation step of long video sequences.
What do you see in a digital color dot picture such as the Ishihara pseudo-isochromatic plates? Web Accessibility Palette (WAP)
Internet imaging is used as interactive visual communication. It is different form other electronic imaging fields because the imaging is transported from one client to many others. If you and I each had different color vision, we may see Internet Imaging differently. So what do you see in a digital color dot picture such as the Ishihara pseudoisochromatic plates? The ishihara pseudoisochromatic test is the most widely used screening test for red-green color deficiency. The full verison contains 38 plates. Plates 18-21 are hidden digit designs. For example, plate 20 has 45 hidden digit designs that cannot be seen by normal trichromats but can be distinguished by most color deficient observers. In this study, we present a new digital color pallette. This is the web accessibility palette where the same information on Internet imaging can be seen correctly by any color vision person. For this study, we have measured the Ishihara pseudoisochromatic test. We used the new Minolta 2D- colorimeter system, CL1040i that can define all pixels in a 4cm x 4cm square to take measurements. From the results, color groups of 8 to 10 colors in the Ishihara plates can be seen on isochromatic lines of CIE-xy color spaces. On each plate, the form of a number is composed of 4 colors and the background colors are composed of the remaining 5 colors. Normal trichromats, it is difficult to find the difference between the 4 color group which makes up the form of the number and the 5 color group of the background colors. We also found that for normal trichromats, colors like orange and red that are highly salient are included in the warm color group and are distinguished form the cool color group of blue, green and gray. Form the results of our analysis of the Ishihara pseudoisochromatic test we suggest the web accessibility palette consists of 4 colors.
Combining multiple image descriptions for browsing and retrieval
Retrieving images form large collections using image content is an important problem, in this multimedia age. A quick content-based visual access to the stored image is capital for efficient navigation through image collections. In this paper we introduce several techniques which characterize color homogeneous object and their spatial relationships for efficient content-based image retrieval. We present a region growing technique for efficient color homogeneous objects segmentation and extend the 2D string to an accurate description of spatial information and relationships. In order to improve content-based image retrieval, our method emphasized several objectives, such as: automated extraction of localize coherent regions and visual features, development of techniques for fast indexing and retrieval, and querying by both features and spatial information coupled with a symbolic level of image representation. We present our flexible image retrieval system and we give some experimental results.
Analysis of color management systems (CMS) and measurement devices for monitors as instruments for calibration and profiling
Walter F. Steiger, Christopher M. Li
In view of the increased use of the Internet and Cross Media Publishing one of the most important questions is: Are current monitors consistent and the corresponding CMS good enough to reproduce a colored original or a proof? We have found that there is a large difference when displaying the same color patch on multiple monitors. Also from CMS to CMS, there is variation on between profiles on how the colors are to be displayed. Largest variations come form the CMS software that makes the ICC-profile and from the monitor measuring devices used to calibrate and profile the monitor. Working with a CMS software that makes the ICC- profile and from the monitor measuring device which generates the data to calculate the deciding color profiles. If different measuring devices generate different color data (from the very same color patch on a monitor) also variations in color profiles (e.g. for the very same monitor) will be the consequence. So the readings of color measuring devices become more interesting because of its importance for color communication over the Internet. The difference in color precision published by manufacturers of monitor-, CMS- and measuring devices (around delta E 1) and the color differences measured in this project (most between 3 and 15 delta E) is causing great concern.