Proceedings Volume 7240

Human Vision and Electronic Imaging XIV

cover
Proceedings Volume 7240

Human Vision and Electronic Imaging XIV

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 28 January 2009
Contents: 12 Sessions, 60 Papers, 0 Presentations
Conference: IS&T/SPIE Electronic Imaging 2009
Volume Number: 7240

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 7240
  • Keynote Session
  • Social Software, Internet Experiments, and New Paradigms for the Web
  • Multimodal Interactive Environments
  • Haptics
  • High Dynamic Range
  • Video Perception and Quality
  • Region of Interest, Sharpness and Blurring
  • Image Analysis and Perception
  • 3D Perception, Environments, and Applications
  • Art and Perception
  • Interactive Paper Session
Front Matter: Volume 7240
icon_mobile_dropdown
Front Matter: Volume 7240
This PDF file contains the front matter associated with SPIE Proceedings Volume 7240, including the Title Page, Copyright information, Table of Contents, and the Conference Committee listing.
Keynote Session
icon_mobile_dropdown
Towards a true spherical camera
Gurunandan Krishnan, Shree K. Nayar
We present the design of a spherical imaging system with the following properties: (i) A 4π field of view that enables it to "see" in all directions; (ii) a single center of projection to avoid parallax within the field of view; and (iii) a uniform spatial and angular resolution in order to achieve a uniform sampling of the field of view. Our design consists of a spherical (ball) lens encased within a spherical detector shell. The detector shell has a uniform distribution of sensing elements, but with free space between neighboring elements, thereby making the detector partly transparent to light. We determine the optimal dimensions of the sensing elements and the diameter of the detector shell that produce the most compact point spread function. The image captured with such a camera has minimal blur and can be deblurred using spherical deconvolution. Current solid state technologies do not permit the fabrication of a high resolution spherical detector array. Therefore, in order to verify our design, we have built a prototype spherical camera with a single sensing element, which can scan a spherical image one pixel at a time.
Behavioral and neural correlates of visual preference decision
Shinsuke Shimojo
Three sets of findings are reported here, all related to behavioral and neural correlates of preference decision. First, when one is engaged in a preference decision task with free observation, one's gaze is biased towards the to-be-chosen stimulus (eg. face) long before (s)he is consciously aware of the decision ("gaze cascade effect"). Second, an fMRI study suggested that implicit activity in a subcortical structure (the Nucleus Accumbens) precedes cognitive and conscious decision of preference. Finally, both novelty and familiarity causally contribute to attractiveness, but differently across object categories (such as faces and natural scenes). Taken together, these results point to dynamical and implicit processes both in short- and long-term, towards conscious preference decision. Finally, some discussion will be given on aesthetic decision (i.e. "beauty").
Social Software, Internet Experiments, and New Paradigms for the Web
icon_mobile_dropdown
Thousands of on-line observers is just the beginning
Web-based or on-line experiments are still a relatively new research topic. Laboratory or highly controlled experiments are, for many reasons, the preferred methodology for visual experiments. However recent experiments suggest that online experiments have some unique properties and advantages that may in some cases outweigh or offset their disadvantages. This paper will consider on-line experiments from both a general and a narrow perspective. Specifically a range of specific experiments, and on-line tools, will be considered from a broad vantage and possible themes or considerations for these experiments will be considered.
Presentation of calibrated images over the web
This paper considers the problem of delivering calibrated images over the web with the precision appropriate for psychophysical experimentation. We are interested only in methods that might be employed by a remote participant possessing nothing other than a computer terminal. Therefore, we consider only purely psychophysical methods not requiring any measurement instruments or standards. Because of this limitation, there are certain things we can not determine, the most significant of which is the absolute luminance. We present solutions for three particular problems: linearization, also known as gamma correction; determination of the relative luminances of the display primaries; and colorimetry, i.e. determining the chromaticity of the primaries.
Tags, micro-tags and tag editing: improving internet search
Social tagging is an emerging methodology that allows individual users to assign semantic keywords to content on the web. Popular web services allow the community of users to search for content based on these user-defined tags. Tags are typically attached to a whole entity such as a web page (e.g., del.icio.us), a video (e.g., YouTube), a product description (e.g., Amazon) or a photograph (e.g., Flickr). However, finding specific information within a whole entity can be a difficult, time-intensive process. This is especially true for content such as video, where the information sought may be a small segment within a very long presentation. Moreover, the tags provided by a community of users may be incorrect, conflicting, or incomplete when used as search terms. In this paper we introduce a system that allows users to create "micro-tags," that is, semantic markers that are attached to subsets of information. These micro-tags give the user the ability to direct attention to specific subsets within a larger and more complex entity, and the set of micro-tags provides a more nuanced description of the full content. Also, when these micro-tags are used as search terms, there is no need to do a serial search of the content, since micro-tags draw attention to the semantic content of interest. This system also provides a mechanism that allows users in the community to edit and delete each others' tags, using the community to refine and improve tag quality. We will also report on empirical studies that demonstrate the value of micro-tagging and tag editing and explore the role micro-tags and tag editing will play in future applications.
Internet experiments: methods, guidelines, metadata
The Internet experiment is now a well-established and widely used method. The present paper describes guidelines for the proper conduct of Internet experiments, e.g. handling of dropout, unobtrusive naming of materials, and pre-testing. Several methods are presented that further increase the quality of Internet experiments and help to avoid frequent errors. These methods include the "seriousness check", "warm-up," "high hurdle," and "multiple site entry" techniques, control of multiple submissions, and control of motivational confounding. Finally, metadata from sites like WEXTOR (http://wextor.org) and the web experiment list (http://genpsylab-wexlist.uzh.ch/) are reported that show the current state of Internet-based research in terms of the distribution of fields, topics, and research designs used.
Multimodal Interactive Environments
icon_mobile_dropdown
Ecological optics of natural materials and light fields
The appearance of objects in scenes is determined by their shape, material properties and by the light field, and, in contradistinction, the appearance of those objects provides us with cues about the shape, material properties and light field. The latter so-called inverse problem is underdetermined and therefore suffers from interesting ambiguities. Therefore, interactions in the perception of shape, material, and luminous environment are bound to occur. Textures of illuminated rough materials depend strongly on the illumination and viewing directions. Luminance histogram-based measures such as the average luminance, its variance, shadow and highlight modes, and the contrast provide robust estimates with regard to the surface structure and the light field. Human observers performance agrees well with predictions on the basis of such measures. If we also take into account the spatial structure of the texture it is possible to estimate the illumination orientation locally. Image analysis on the basis of second order statistics and human observers estimates correspond well and are both subject to the bas-relief and the convex-concave ambiguities. The systematic robust illuminance flow patterns of local illumination orientation estimates on rough 3D objects are an important entity for shape from shading and for light field estimates. Human observers are able to match and discriminate simple light field properties (e.g. average illumination direction and diffuseness) of objects and scenes, but they make systematic errors, which depend on material properties, object shapes and position in the scene. Moreover, our results show that perception of material and illumination are basically confounded. Detailed analysis of these confounds suggests that observers primarily attend to the low-pass structure of the light field. We measured and visualized this structure, which was found to vary smoothly in natural scenes in- and outdoors.
Stereoscopic displays in medical domains: a review of perception and performance effects
Maurice H. P. H. van Beurden, Gert van Hoey, Haralambos Hatzakis, et al.
In this paper we review empirical studies that investigate performance effects of stereoscopic displays for medical applications. We focus on four distinct application areas: diagnosis, pre-operative planning, minimally invasive surgery (MIS) and training/teaching. For diagnosis, stereoscopic displays can augment the understanding of complex spatial structures and increase the detection of abnormalities. Stereoscopic viewing of medical data has proven to increase the detection rate in breast imaging. A stereoscopic presentation of noisy and transparent images in 3D ultrasound results in better visualization of the internal structures, however more empirical studies are needed to confirm the clinical relevance. For MRI and CT, where images are frequently rendered in 3D perspective, the added value of binocular depth has not yet been convincingly demonstrated. For MIS, stereoscopic displays can decrease surgery time and increase accuracy of surgical procedures. Performance of surgical procedures is similar when high resolution 2D displays are compared with lower resolution stereoscopic displays, indicating an image quality improvement for stereoscopic displays. Training and surgical planning already use computer simulations in 2D, however more research is needed to the benefit of stereoscopic displays in those applications. Overall there is a clear need for more empirical evidence that quantifies the added value of stereoscopic displays in medical domains, such that the medical community will have ample basis to invest in stereoscopic displays in all or some of the described medical applications.
Roughness in sound and vision
René van Egmond, Paul Lemmens, Thrasyvoulos N. Pappas, et al.
In three experiments the perceived roughness of visual and of auditory materials was investigated. In Experiment 1, the roughness of frequency-modulated tones was determined using a paired-comparison paradigm. It was found that using this paradigm similar results in comparison to literature were found. In Experiment 2, the perceived visual roughness of textures drawn from the CUReT database was determined. It was found that participants could systematically judge the roughness of the textures. In Experiment 3 the perceived pleasantness for the textures used in Experiment 2 was determined. It was found that two groups of participants could be distinguished. One group found rough textures unpleasant and smooth textures pleasant. The other group found rough textures pleasant and smooth textures unpleasant. Although for the latter groups the relation between relative roughness and perceived pleasantness was less strong.
Sign language perception research for improving automatic sign language recognition
Gineke A. ten Holt, Jeroen Arendsen, Huib de Ridder, et al.
Current automatic sign language recognition (ASLR) seldom uses perceptual knowledge about the recognition of sign language. Using such knowledge can improve ASLR because it can give an indication which elements or phases of a sign are important for its meaning. Also, the current generation of data-driven ASLR methods has shortcomings which may not be solvable without the use of knowledge on human sign language processing. Handling variation in the precise execution of signs is an example of such shortcomings: data-driven methods (which include almost all current methods) have difficulty recognizing signs that deviate too much from the examples that were used to train the method. Insight into human sign processing is needed to solve these problems. Perceptual research on sign language can provide such insights. This paper discusses knowledge derived from a set of sign perception experiments, and the application of such knowledge in ASLR. Among the findings are the facts that not all phases and elements of a sign are equally informative, that defining the 'correct' form for a sign is not trivial, and that statistical ASLR methods do not necessarily arrive at sign representations that resemble those of human beings. Apparently, current ASLR methods are quite different from human observers: their method of learning gives them different sign definitions, they regard each moment and element of a sign as equally important and they employ a single definition of 'correct' for all circumstances. If the object is for an ASLR method to handle natural sign language, then the insights from sign perception research must be integrated into ASLR.
Quantifying the effect of disruptions to temporal coherence on the intelligibility of compressed American Sign Language video
Frank M. Ciaramello, Sheila S. Hemami
Communication of American Sign Language (ASL) over mobile phones would be very beneficial to the Deaf community. ASL video encoded to achieve the rates provided by current cellular networks must be heavily compressed and appropriate assessment techniques are required to analyze the intelligibility of the compressed video. As an extension to a purely spatial measure of intelligibility, this paper quantifies the effect of temporal compression artifacts on sign language intelligibility. These artifacts can be the result of motion-compensation errors that distract the observer or frame rate reductions. They reduce the the perception of smooth motion and disrupt the temporal coherence of the video. Motion-compensation errors that affect temporal coherence are identified by measuring the block-level correlation between co-located macroblocks in adjacent frames. The impact of frame rate reductions was quantified through experimental testing. A subjective study was performed in which fluent ASL participants rated the intelligibility of sequences encoded at a range of 5 different frame rates and with 3 different levels of distortion. The subjective data is used to parameterize an objective intelligibility measure which is highly correlated with subjective ratings at multiple frame rates.
Virtual microscopy: merging of computer mediated communication and intuitive interfacing
Huib de Ridder, Johanna G. de Ridder-Sluiter, Philip M. Kluin, et al.
Ubiquitous computing (or Ambient Intelligence) is an upcoming technology that is usually associated with futuristic smart environments in which information is available anytime anywhere and with which humans can interact in a natural, multimodal way. However spectacular the corresponding scenarios may be, it is equally challenging to consider how this technology may enhance existing situations. This is illustrated by a case study from the Dutch medical field: central quality reviewing for pathology in child oncology. The main goal of the review is to assess the quality of the diagnosis based on patient material. The sharing of knowledge in social face-to-face interaction during such meeting is an important advantage. At the same time there is the disadvantage that the experts from the seven Dutch academic medical centers have to travel to the review meeting and that the required logistics to collect and bring patient material and data to the meeting is cumbersome and time-consuming. This paper focuses on how this time-consuming, nonefficient way of reviewing can be replaced by a virtual collaboration system by merging technology supporting Computer Mediated Collaboration and intuitive interfacing. This requires insight in the preferred way of communication and collaboration as well as knowledge about preferred interaction style with a virtual shared workspace.
A model of memory for incidental learning
Roger A. Browse, Lisa Y. Drewell
This paper describes a radial basis memory system that is used to model the performance of human participants in a task of learning to traverse mazes in a virtual environment. The memory model is a multiple-trace system, in which each event is stored as a separate memory trace. In the modeling of the maze traversal task, the events that are stored as memories are the perceptions and decisions taken at the intersections of the maze. As the virtual agent traverses the maze, it makes decisions based upon all of its memories, but those that match best to the current perceptual situation, and which were successful in the past, have the greatest influence. As the agent carries out repeated attempts to traverse the same maze, memories of successful decisions accumulate, and performance gradually improves. The system uses only three free parameters, which most importantly includes adjustments to the standard deviation of the underlying Gaussian used as the radial basis function. It is demonstrated that adjustments of these parameters can easily result in exact modeling of the average human performance in the same task, and that variation of the parameters matches the variation in human performance. We conclude that human memory interaction that does not involve conscious memorization, as in learning navigation routes, may be much more primitive and simply explained than has been previously thought.
Haptics
icon_mobile_dropdown
Crossmodal information for visual and haptic discrimination
Flip Phillips, Eric J. L. Egan
Both our visual and haptic systems contribute to the perception of the three dimensional world, especially the proximal perception of objects. The interaction of these systems has been the subject of some debate over the years, ranging from the philosophically posed Molyneux problem to the more pragmatic examination of their psychophysical relationship. To better understand the nature of this interaction we have performed a variety of experiments characterizing the detection, discrimination, and production of 3D shape. A stimulus set of 25 complex, natural appearing, noisy 3D target objects were statistically specified in the Fourier domain and manufactured using a 3D printer. A series of paired-comparison experiments examined subjects' unimodal (visual-visual and haptic-haptic) and crossmodal (visual-haptic) perceptual abilities. Additionally, subjects sculpted objects using uni- or crossmodal source information. In all experiments, the performance in the unimodal conditions were similar to one another and unimodal presentation fared better than crossmodal. Also, the spatial frequency of object features affected performance differentially across the range used in this experiment. The sculpted objects were scanned in 3D and the resulting geometry was compared metrically and statistically to the original stimuli. Objects with higher spatial frequency were harder to sculpt when limited to haptic input compared to only visual input. The opposite was found for objects with low spatial frequency. The psychophysical discrimination and comparison experiments yielded similar findings. There is a marked performance difference between the visual and haptic systems and these differences were systematically distributed along the range of feature details. The existence of non-universal (i.e. modality-specific) representations explain the poor crossmodal performance. Our current findings suggest that haptic and visual information is either integrated into a multi-modal form, or each is independent and somewhat efficient translation is possible. Vision shows a distinct advantage when dealing with higher frequency objects but both modalities are effective when comparing objects that differ by a large amount.
The haptic cuing of visual spatial attention: evidence of a spotlight effect
Hong Z. Tan, Robert Gray, Charles Spence, et al.
This article provides an overview of an ongoing program of research designed to investigate the effectiveness of haptic cuing to redirect a user's visual spatial attention under various conditions using a visual change detection paradigm. Participants visually inspected displays consisting of rectangular horizontal and vertical elements in order to try and detect an orientation change in one of the elements. Prior to performing the visual task on each trial, the participants were tapped on the back from one of four locations by a vibrotactile stimulator. The validity of the haptic cues (i.e., the probability that the tactor location coincided with the quadrant where the visual target occurred) was varied. Response time was recorded and eye-position monitored with an eyetracker. Under conditions where the validity of the haptic cue was high (i.e., when the cue predicted the likely target quadrant), initial saccades predominantly went to the cued quadrant and response times were significantly faster as compared to the baseline condition where no haptic cuing was provided. When the cue validity was low (i.e., when the cue provided no information with regard to the quadrant in which the visual target might occur), however, the participants were able to ignore haptic cuing as instructed. Furthermore, a spotlight effect was observed in that the response time increased as the visual target moved away from the center of the cued quadrant. These results have implications for the designers of multimodal (or multisensory) interfaces where a user can benefit from haptic attentional cues in order to detect and/or process the information from a small region within a large and complex visual display.
Psychophysical evaluation of a variable friction tactile interface
Evren Samur, J. Edward Colgate, Michael A. Peshkin
This study explores the haptic rendering capabilities of a variable friction tactile interface through psychophysical experiments. In order to obtain a deeper understanding of the sensory resolution associated with the Tactile Pattern Display (TPaD), friction discrimination experiments are conducted. During the experiments, subjects are asked to explore the glass surface of the TPaD using their bare index fingers, to feel the friction on the surface, and to compare the slipperiness of two stimuli, displayed in sequential order. The fingertip position data is collected by an infrared frame and normal and translational forces applied by the finger are measured by force sensors attached to the TPaD. The recorded data is used to calculate the coefficient of friction between the fingertip and the TPaD. The experiments determine the just noticeable difference (JND) of friction coefficient for humans interacting with the TPaD.
Perceptual dimensions for a dynamic tactile display
Thrasyvoulos N. Pappas, Vivien C. Tartter, Andrew G. Seward, et al.
We propose a new approach for converting graphical and pictorial information into tactile patterns that can be displayed in a static or dynamic tactile device. The key components of the proposed approach are (1) an algorithm that segments a scene into perceptually uniform segments; (2) a procedure for generating perceptually distinct tactile patterns; and (3) a mapping of the visual textures of the segments into tactile textures that convey similar concepts. We used existing digital halftoning and other techniques to generate a wide variety of tactile textures. We then conducted formal and informal subjective tests with sighted (but visually blocked) and visually-impaired subjects to determine the ability of human tactile perception to perceive differences among them. In addition to generating perceptually distinguishable tactile patterns, our goal is to identify significant dimensions of tactile texture perception, which will make it possible to map different visual attributes into independent tactile attributes. Our experimental results indicate that it is poosible to generate a number of perceptually distinguishable tactile patterns, and that different dimensions of tactile texture perception can indeed be identified.
Haptics disambiguates vision in the perception of pictorial relief
In this study we demonstrate that touch decreases the ambiguity in a visual image. It has been previously found that visual perception of three-dimensional shape is subject to certain variations. These variations can be described by the affine transformation. While the visual system thus seems unable to capture the Euclidean structure of a shape, touch could potentially be a useful source to disambiguate the image. Participants performed a so-called 'attitude task' from which the structure of the perceived three-dimensional shape was calculated. One group performed the task with only vision and a second group could touch the stimulus while viewing it. We found that the consistency within the haptics+vision group was higher than in the vision-only group. Thus, haptics decreases the visual ambiguity. Furthermore, we found that the touched shape was consistently perceived as having more relief than the untouched the shape. It was also found that the direction of affine shear differences within the two groups was more consistent when touch was used. We thus show that haptics has a significant influence on the perception of pictorial relief.
High Dynamic Range
icon_mobile_dropdown
The dynamic range of visual imagery in space
Space operations present the human visual system with a wide dynamic range of images from faint stars and starlit shadows to un-attenuated sunlight. Lunar operations near the poles will result in low sun angles, exacerbating visual problems associated with shadowing and glare. We discuss the perceptual challenges these conditions will present to the human explorers, and consider some possible mitigations and countermeasures. We also discuss the problems of simulating these conditions for realistic training.
Exploring eye movements for tone mapped images
Marina Bloj, Glen Harding, Alan Chalmers
In the real world we can find large intensity ranges: the ratio from the brightest to the darkest part of the scene can be of the order of 10000 to 1. Since most of our electronic displays have a limited range of around 100 to 1, the last 20 years has seen much work done to develop different algorithms that compress the actual dynamic range of an image to that available in the display device. These algorithms, known as tone mappers, attempt to preserve as much of the images characteristics as possible [1]. An increasing amount of research has also been done to try to evaluate the 'best' tone mapper. Approaches have included pair wise comparisons of tone mapped images [2], comparison with real scenes [3] or using images displayed on a High Dynamic Range (HDR) monitor [4]. None of these approaches are entirely satisfactory and all suffer from potential confounding factors due to participant's interpretation of instructions and biases. There is evidence that the spatial and chronological path of fixations made by observers' when viewing an image (i.e. the scanpath) is repeated to some extent when the same image is again presented to the observer (e.g. [5]). In this paper we are the first to investigate the potential of using eye movement recordings, particularly scanpaths, as a discriminatory tool. We propose that if a tone-mapped image gives rise to scanpaths that are different from those obtained when viewing the original image this might be an indication of a poor quality tone mapper since it is eliciting eye movements that are different from those observed when viewing the original image.
SS-SSIM and MS-SSIM for digital cinema applications
Fitri N. Rahayu, Ulrich Reiter, Touradj Ebrahimi, et al.
One of the key issues for a successful roll out of digital cinema is in the quality it offers. The most practical and least expensive way of measuring quality of multimedia content is through the use of objective metrics. In addition to the widely used objective quality metric peak signal-to-noise ratio (PSNR), recently other metrics such as single scale structural similarity (SS-SSIM) and multi scale structural similarity (MS-SSIM) have been claimed as good alternatives for estimation of perceived quality by human subjects. The goal of this paper is to verify by means of subjective tests the validity of such claims for digital cinema content and environment.
Measuring perceptual contrast in a multi-level framework
In this paper, we propose and discuss some approaches for measuring perceptual contrast in digital images. We start from previous algorithms by implementing different local measures of contrast and a parameterized way to recombine local contrast maps and color channels. We propose the idea of recombining the local contrast maps and the channels using particular measures taken from the image itself as weighting parameters. Exhaustive tests and results are presented and discussed, in particular we compare the performance of each algorithm in relation to perceived contrast by observers. Current results show an improvement in correlation between contrast measures and observers perceived contrast when the variance of the three color channels separately is used as weighting parameter for local contrast maps.
A perceptual evaluation of 3D unsharp masking
Matthias Ihrke, Tobias Ritschel, Kaleigh Smith, et al.
Much research has gone into developing methods for enhancing the contrast of displayed 3D scenes. In the current study, we investigated the perceptual impact of an algorithm recently proposed by Ritschel et al.1 that provides a general technique for enhancing the perceived contrast in synthesized scenes. Their algorithm extends traditional image-based Unsharp Masking to a 3D scene, achieving a scene-coherent enhancement. We conducted a standardized perceptual experiment to test the proposition that a 3D unsharp enhanced scene was superior to the original scene in terms of perceived contrast and preference. Furthermore, the impact of different settings of the algorithm's main parameters enhancement-strength (λ) and gradient size (σ) were studied in order to provide an estimate of a reasonable parameter space for the method. All participants preferred a clearly visible enhancement over the original, non-enhanced scenes and the setting for objectionable enhancement was far above the preferred settings. The effect of the gradient size σ was negligible. The general pattern found for the parameters provides a useful guideline for designers when making use of 3D Unsharp Masking: as a rule of thumb they can easily determine the strength for which they start to perceive an enhancement and use twice this value for a good effect. Since the value for objectionable results was twice as large again, artifacts should not impose restrictions on the applicability of this rule.
Influence of surround luminance upon perceived blackness
Tetsuya Eda, Yoshiki Koike, Sakurako Matsushima, et al.
In this study, we investigated how the luminance ratio of the surround field (Ls) to that of the central field (Lc) influence the perceived blackness of the central field in a simple configuration of concentric circle (Experiment 1) and in digital images of masterpieces (Experiment 2). Results of Experiment 1 showed that perceived blackness of the central field becomes more blackish and deeper as the contrast between Lc and Ls increases. Results of Experiment 2 showed that perceived blackness of black area surrounded by relatively bright area in artistic images is stronger than the perceived blackness given by the same luminance contrast between the center and surround in a concentric circular configuration.
Preservation of edges: the mechanism for improvements in HDR imaging
There are a number of modern myths about High Dynamic Range (HDR) imaging. There have been claims that multiple-exposure techniques can accurately record scene luminances over a dynamic range of more than a million to one. There are assertions that human appearance tracks the same range. The most common myth is that HDR imaging accurately records and reproduces actual scene radiances. Regardless, there is no doubt that HDR imaging is superior to conventional imaging. We need to understand the basis of HDR image quality improvements. This paper shows that multiple exposure techniques can preserve spatial information, although they cannot record accurate scene luminances. Synthesizing HDR renditions from relative spatial records accounts for improved images.
Video Perception and Quality
icon_mobile_dropdown
HVS-based quantization steps for validation of digital cinema extended bitrates
M.-C. Larabi, P. Pellegrin, G. Anciaux, et al.
In Digital Cinema, the video compression must be as transparent as possible to provide the best image quality to the audience. The goal of compression is to simplify transport, storing, distribution and projection of films. For all those tasks, equipments need to be developed. It is thus mandatory to reduce the complexity of the equipments by imposing limitations in the specifications. In this sense, the DCI has fixed the maximum bitrate for a compressed stream to 250 Mbps independently from the input format (4K/24fps, 2K/48fps or 2K/24fps). The work described in this paper This parameter is discussed in this paper because it is not consistent to double/quadruple the input rate without increasing the output rate. The work presented in this paper is intended to define quantization steps ensuring the visually lossless compression. Two steps are followed first to evaluate the effect of each subband separately and then to fin the scaling ratio. The obtained results show that it is necessary to increase the bitrate limit for cinema material in order to achieve the visually lossless.
Statistics of natural image sequences: temporal motion smoothness by local phase correlations
Zhou Wang, Qiang Li
Statistical modeling of natural image sequences is of fundamental importance to both the understanding of biological visual systems and the development of Bayesian approaches for solving a wide variety of machine vision and image processing problems. Previous methods are based on measuring spatiotemporal power spectra and by optimizing the best linear filters to achieve independent or sparse representations of the time-varying image signals. Here we propose a different approach, in which we investigate the temporal variations of local phase structures in the complex wavelet transform domain. We observe that natural image sequences exhibit strong prior of temporal motion smoothness, by which local phases of wavelet coefficients can be well predicted from their temporal neighbors. We study how such a statistical regularity is interfered with "unnatural" image distortions and demonstrate the potentials of using temporal motion smoothness measures for reduced-reference video quality assessment.
Motion-based perceptual quality assessment of video
There is a great deal of interest in methods to assess the perceptual quality of a video sequence in a full reference framework. Motion plays an important role in human perception of video and videos suffer from several artifacts that have to deal with inaccuracies in the representation of motion in the test video compared to the reference. However, existing algorithms to measure video quality focus primarily on capturing spatial artifacts in the video signal, and are inadequate at modeling motion perception and capturing temporal artifacts in videos. We present an objective, full reference video quality index known as the MOtion-based Video Integrity Evaluation (MOVIE) index that integrates both spatial and temporal aspects of distortion assessment. MOVIE explicitly uses motion information from the reference video and evaluates the quality of the test video along the motion trajectories of the reference video. The performance of MOVIE is evaluated using the VQEG FR-TV Phase I dataset and MOVIE is shown to be competitive with, and even out-perform, existing video quality assessment systems.
No reference perceptual quality metrics: approaches and limitations
David Hands, Damien Bayart, Andrew Davis, et al.
To predict subjective quality it is necessary to develop and validate approaches that accurately predict video quality. For perceptual quality models, developers have implemented methods that utilise information from both the original and the processed signals (full reference and reduced reference methods). For many practical applications, no reference (NR) methods are required. It has been a major challenge for developers to produce no reference methods that attain the necessary predictive performance for the methods to be deployed by industry. In this paper, we present a comparison between no reference methods operating on either the decoded picture information alone or using a bit-stream / decoded picture hybrid analysis approach. Two NR models are introduced: one using decoded picture information only; the other using a hybrid approach. Validation data obtained from subjective quality tests are used to examine the predictive performance of both models. The strengths and limitations of the two NR methods are discussed.
Subjective video quality assessment methods for recognition tasks
Carolyn G. Ford, Mark A. McFarland, Irena W. Stange
To develop accurate objective measurements (models) for video quality assessment, subjective data is traditionally collected via human subject testing. The ITU has a series of Recommendations that address methodology for performing subjective tests in a rigorous manner. These methods are targeted at the entertainment application of video. However, video is often used for many applications outside of the entertainment sector, and generally this class of video is used to perform a specific task. Examples of these applications include security, public safety, remote command and control, and sign language. For these applications, video is used to recognize objects, people or events. The existing methods, developed to assess a person's perceptual opinion of quality, are not appropriate for task-based video. The Institute for Telecommunication Sciences, under a program from the Department of Homeland Security and the National Institute for Standards and Technology's Office of Law Enforcement, has developed a subjective test method to determine a person's ability to perform recognition tasks using video, thereby rating the quality according to the usefulness of the video quality within its application. This new method is presented, along with a discussion of two examples of subjective tests using this method.
Image utility assessment and a relationship with image quality assessment
David M. Rouse, Romuald Pepion, Sheila S. Hemami, et al.
Present quality assessment (QA) algorithms aim to generate scores for natural images consistent with subjective scores for the quality assessment task. For the quality assessment task, human observers evaluate a natural image based on its perceptual resemblance to a reference. Natural images communicate useful information to humans, and this paper investigates the utility assessment task, where human observers evaluate the usefulness of a natural image as a surrogate for a reference. Current QA algorithms implicitly assess utility insofar as an image that exhibits strong perceptual resemblance to a reference is also of high utility. However, a perceived quality score is not a proxy for a perceived utility score: a decrease in perceived quality may not affect the perceived utility. Two experiments are conducted to investigate the relationship between the quality assessment and utility assessment tasks. The results from these experiments provide evidence that any algorithm optimized to predict perceived quality scores cannot immediately predict perceived utility scores. Several QA algorithms are evaluated in terms of their ability to predict subjective scores for the quality and utility assessment tasks. Among the QA algorithms evaluated, the visual information fidelity (VIF) criterion, which is frequently reported to provide the highest correlation with perceived quality, predicted both perceived quality and utility scores reasonably. The consistent performance of VIF for both the tasks raised suspicions in light of the evidence from the psychophysical experiments. A thorough analysis of VIF revealed that it artificially emphasizes evaluations at finer image scales (i.e., higher spatial frequencies) over those at coarser image scales (i.e., lower spatial frequencies). A modified implementation of VIF, denoted VIF*, is presented that provides statistically significant improvement over VIF for the quality assessment task and statistically worse performance for the utility assessment task. A novel utility assessment algorithm, referred to as the natural image contour evaluation (NICE), is introduced that conducts a comparison of the contours of a test image to those of a reference image across multiple image scales to score the test image. NICE demonstrates a viable departure from traditional QA algorithms that incorporate energy-based approaches and is capable of predicting perceived utility scores.
Region of Interest, Sharpness and Blurring
icon_mobile_dropdown
Optimal region-of-interest based visual quality assessment
Ulrich Engelke, Hans-Jürgen Zepernick
Visual content typically exhibits regions that particularly attract the viewer's attention, usually referred to as regions-of-interest (ROI). In the context of visual quality one may expect that distortions occurring in the ROI are perceived more annoyingly than distortions in the background (BG). This is especially true given that the human visual system is highly space variant in sampling visual signals. However, this phenomenon of visual attention is only seldom taken into account in visual quality metric design. In this paper, we thus provide a framework for incorporation of visual attention into the design of an objective quality metric by means of regionbased segmentation of the image. To support the metric design we conducted subjective experiments to both quantify the subjective quality of a set of distorted images and also to identify ROI in a set of reference images. Multiobjective optimization is then applied to find the optimal weighting of the ROI and BG quality metrics. It is shown that the ROI based metric design allows to increase quality prediction performance of the considered metric and also of two other contemporary quality metrics.
Perceptually significant spatial pooling techniques for image quality assessment
Spatial pooling strategies used in recent Image Quality Assessment (IQA) algorithms have generally been that of simply averaging the values of the obtained scores across the image. Given that certain regions in an image are perceptually more important than others, it is not unreasonable to suspect that gains can be achieved by using an appropriate pooling strategy. In this paper, we explore two hypothesis that explore spatial pooling strategies for the popular SSIM metrics.1, 2 The first is visual attention and gaze direction - 'where' a human looks. The second is that humans tend to perceive 'poor' regions in an image with more severity than the 'good' ones - and hence penalize images with even a small number of 'poor' regions more heavily. The improvements in correlation between the objective metrics' score and human perception is demonstrated by evaluating the performance of these pooling strategies on the LIVE database3 of images.
A methodology for coupling a visual enhancement device to human visual attention
The Human Variation Model views disability as simply "an extension of the natural physical, social, and cultural variability of mankind." Given this human variation, it can be difficult to distinguish between a prosthetic device such as a pair of glasses (which extends limited visual abilities into the "normal" range) and a visual enhancement device such as a pair of binoculars (which extends visual abilities beyond the "normal" range). Indeed, there is no inherent reason why the design of visual prosthetic devices should be limited to just providing "normal" vision. One obvious enhancement to human vision would be the ability to visually "zoom" in on objects that are of particular interest to the viewer. Indeed, it could be argued that humans already have a limited zoom capability, which is provided by their highresolution foveal vision. However, humans still find additional zooming useful, as evidenced by their purchases of binoculars equipped with mechanized zoom features. The fact that these zoom features are manually controlled raises two questions: (1) Could a visual enhancement device be developed to monitor attention and control visual zoom automatically? (2) If such a device were developed, would its use be experienced by users as a simple extension of their natural vision? This paper details the results of work with two research platforms called the Remote Visual Explorer (ReVEx) and the Interactive Visual Explorer (InVEx) that were developed specifically to answer these two questions.
Analysis of sharpness increase by image noise
Takehito Kurihara, Naokazu Aoki, Hiroyuki Kobayashi
Motivated by the reported increase in sharpness by image noise, we investigated how noise affects sharpness perception. We first used natural images of tree bark with different amounts of noise to see whether noise enhances sharpness. Although the result showed sharpness decreased as noise amount increased, some observers seemed to perceive more sharpness with increasing noise, while the others did not. We next used 1D and 2D uni-frequency patterns as stimuli in an attempt to reduce such variability in the judgment. The result showed, for higher frequency stimuli, sharpness decreased as the noise amount increased, while sharpness of the lower frequency stimuli increased at a certain noise level. From this result, we thought image noise might reduce sharpness at edges, but be able to improve sharpness of lower frequency component or texture in image. To prove this prediction, we experimented again with the natural image used in the first experiment. Stimuli were made by applying noise separately to edge or to texture part of the image. The result showed noise, when added to edge region, only decreased sharpness, whereas when added to texture, could improve sharpness. We think it is the interaction between noise and texture that sharpens image.
Psychophysical study of LCD motion-blur perception
Sylvain Tourancheau, Patrick Le Callet, Kjell Brunnström, et al.
Motion-blur is still an important issue on liquid crystal displays (LCD). In the last years, efforts have been done in the characterization and the measurement of this artifact. These methods permit to picture the blurred profile of a moving edge, according to the scrolling speed and to the gray-to-gray transition considered. However, other aspects should be taken in account in order to understand the way LCD motion-blur is perceived. In the last years, a couple of works have adressed the problem of LCD motion-blur perception, but only few speeds and transitions have been tested. In this paper, we have explored motion-blur perception over 20 gray-to-gray transitions and several scrolling speeds. Moreover, we have used three different displays, to explore the influence of the luminance range as well as the blur shape on the motion-blur perception. A blur matching experiment has been set up to obtain the relation between objective measurements and perception. In this experiment, observers must adjust a stationary test blur (simulated from measurements) until it matches their perception of the blur occuring on a moving edge. Result shows that the adjusted perceived blur is always lower than the objective measured blur. This effect is greater for low contrast edges than for high contrast edges. This could be related to the motion sharpening phenomenon.
Image Analysis and Perception
icon_mobile_dropdown
Pattern masking investigations of the second-order visual mechanisms
Human visual system is sensitive to both the first-order and the second-order variations in an image. The latter one is especially important for the digital image processing as it allows human observers to perceive the envelope of the pixel intensities as smooth surface instead of the discrete pixels. Here we used pattern masking paradigm to measure the detection threshold of contrast modulated (CM) stimuli, which comprise the modulation of the contrast of horizontal gratings by a vertical Gabor function, under different modulation depth of the CM stimuli. The threshold function showed a typical dipper shape: the threshold decreased with modulation depth (facilitation) at low pedestal depth modulations and then increased (suppression) at high pedestal modulation. The data was well explained by a modified divisive inhibition model that operated both on depth modulation and carrier contrast in the input images. Hence the divisive inhibition, determined by both the first- and the second-order information in the stimuli, is necessary to explain the discrimination between two second-order stimuli.
Parsed and fixed block representations of visual information for image retrieval
Soo Hyun Bae, Biing-Hwang Juang
The theory of linguistics teaches us the existence of a hierarchical structure in linguistic expressions, from letter to word root, and on to word and sentences. By applying syntax and semantics beyond words, one can further recognize the grammatical relationship between among words and the meaning of a sequence of words. This layered view of a spoken language is useful for effective analysis and automated processing. Thus, it is interesting to ask if a similar hierarchy of representation of visual information does exist. A class of techniques that have a similar nature to the linguistic parsing is found in the Lempel-Ziv incremental parsing scheme. Based on a new class of multidimensional incremental parsing algorithms extended from the Lempel-Ziv incremental parsing, a new framework for image retrieval, which takes advantage of the source characterization property of the incremental parsing algorithm, was proposed recently. With the incremental parsing technique, a given image is decomposed into a number of patches, called a parsed representation. This representation can be thought of as a morphological interface between elementary pixel and a higher level representation. In this work, we examine the properties of two-dimensional parsed representation in the context of imagery information retrieval and in contrast to vector quantization; i.e. fixed square-block representations and minimum average distortion criteria. We implemented four image retrieval systems for the comparative study; three, called IPSILON image retrieval systems, use parsed representation with different perceptual distortion thresholds and one uses the convectional vector quantization for visual pattern analysis. We observe that different perceptual distortion in visual pattern matching does not have serious effects on the retrieval precision although allowing looser perceptual thresholds in image compression result poor reconstruction fidelity. We compare the effectiveness of the use of the parsed representations, as constructed under the latent semantic analysis (LSA) paradigm so as to investigate their varying capabilities in capturing semantic concepts. The result clearly demonstrates the superiority of the parsed representation.
Efficient construction of saliency map
Wen-Fu Lee, Tai-Hsiang Huang, Yi-Hsin Huang, et al.
The saliency map is useful for many applications such as image compression, display, and visualization. However, the bottom-up model used in most saliency map construction methods is computationally expensive. The purpose of this paper is to improve the efficiency of the model for automatic construction of the saliency map of an image while preserving its accuracy. In particular, we remove the contrast sensitivity function and the visual masking component of the bottom-up visual attention model and retain the components related to perceptual decomposition and center-surround interaction that are critical properties of human visual system. The simplified model is verified by performance comparison with the ground truth. In addition, a salient region enhancement technique is adopted to enhance the connectivity of the saliency map, and the saliency maps of three color channels are fused to enhance the prediction accuracy. Experimental results show that the average correlation between our algorithm and the ground truth is close to that between the original model and the ground truth, while the computational complexity is reduced by 98%.
Unsupervised image segmentation by automatic gradient thresholding for dynamic region growth in the CIE L*a*b* color space
Sreenath Rao Vantaram, Eli Saber, Vincent Amuso, et al.
In this paper, we propose a novel unsupervised color image segmentation algorithm named GSEG. This Gradient-based SEGmentation method is initialized by a vector gradient calculation in the CIE L*a*b* color space. The obtained gradient map is utilized for initially clustering low gradient content, as well as automatically generating thresholds for a computationally efficient dynamic region growth procedure, to segment regions of subsequent higher gradient densities in the image. The resultant segmentation is combined with an entropy-based texture model in a statistical merging procedure to obtain the final result. Qualitative and quantitative evaluation of our results on several hundred images, utilizing a recently proposed evaluation metric called the Normalized Probabilistic Rand index shows that the GSEG algorithm is robust to various image scenarios and performs favorably against published segmentation techniques.
Harmonic analysis for cognitive vision: perisaccadic perception
The data model for image representation in terms of projective Fourier transform (PFT) is well adapted to both image perspective transformations and the retinotopic mappings of the brain visual pathways. Here we model first aspects of the human visual process in which the understanding of a scene is built up in a sequence of fixations for visual information acquisition followed by fast saccadic eye movements that reposition the fovea on the next target. We make about three saccades per second with an eyeball's maximum speed of 700 deg/sec. The visual sensitivity is markedly reduced during saccadic eye movements such that, three times per second, there are instant large changes in the retinal images without almost any information consciously carried across fixations. Inverse Projective Fourier transform is computable by FFT in coordinates given by a complex logarithm that also approximates the retinotopy. Thus, it gives the cortical image representation, and a simple translation in log coordinates brings the presaccadic scene into the postsaccadic reference frame, eliminating the need for starting processing anew three times per second at each fixation.
Improved colour to greyscale via integrability correction
Mark S. Drew, David Connah, Graham D. Finlayson, et al.
The classical approach to converting colour to greyscale is to code the luminance signal as a grey value image. However, the problem with this approach is that the detail at equiluminant edges vanishes, and in the worst case the greyscale reproduction of an equiluminant image is a single uniform grey value. The solution to this problem, adopted by all algorithms in the field, is to try to code colour difference (or contrast) in the greyscale image. In this paper we reconsider the Socolinsky and Wolff algorithm for colour to greyscale conversion. This algorithm, which is the most mathematically elegant, often scores well in preference experiments but can introduce artefacts which spoil the appearance of the final image. These artefacts are intrinsic to the method and stem from the underlying approach which computes a greyscale image by a) calculating approximate luminance-type derivatives for the colour image and b) re-integrating these to obtain a greyscale image. Unfortunately, the sign of the derivative vector is sometimes unknown on an equiluminant edge and, in the current theory, is set arbitrarily. However, choosing the wrong sign can lead to unnatural contrast gradients (not apparent in the colour original). Our contribution is to show how this sign problem can be ameliorated using a generalised definition of luminance and a Markov relaxation.
Preserving visual saliency in image to sound substitution systems
Codruta O. Ancuti, Cosmin Ancuti, Philippe Bekaert
Color plays a significant role in the scene interpretation in terms of visual perception. Numerous visual substitution systems deal with grayscale images disregarding this information from original image. Visually percept color-based details often fade due to the grayscale conversion and that can mislead the overall comprehension of the considered scene. We present a decolorization method that considers color contrast and preserve color saliency after transformation. We exploit this model to enhance the perception of visually disable persons over the interpreted images by the substitution system. The results demonstrate that our enhance system is capable to improves the overall scene interpretation in comparison with similar substitution system.
3D Perception, Environments, and Applications
icon_mobile_dropdown
Model based evaluation of human perception of stereoscopically visualized semi-transparent surfaces
Michael Kleiber, Carsten Winkelholz, Verena Kinder
Depicting three dimensional surfaces in such a way that distances between these surfaces can be estimated quickly and accurately is a challenging task. A promising approach is the use of semi-transparent textures i.e. only some parts of the surface are colored. We conducted an experiment to determine the performance of subjects in perceiving distances between an opaque ground surface and specific points on an overlayed surface which was visualized using isolines and curvature oriented strokes. The results show that response times for curvature oriented strokes were faster compared to isolines. For a trusted interpretation of these results, a plausible explanation has to be given. We hypothesize that users visually integrate the available three dimensional positions and thereby come to an estimate. Further experiments were carried out in order to formulate a model which describes the involved perceptual process as several attention shifts between three dimensional positions. The results of the experiments are reported here.
Influence of chroma variations on naturalness and image quality of stereoscopic images
Andre Kuijsters, Wijnand A. Ijsselsteijn, Marc T. M. Lambooij, et al.
The computational view on image quality of Janssen and Blommaert states that the quality of an image is determined by the degree to which the image is both useful (discriminability) and natural (identifiability). This theory is tested by creating two manipulations. Firstly, multiplication of the chroma values of each pixel with a constant in the CIELab color space, i.e., chroma manipulation, is expected to increase only the usefulness by increasing the distances between the individual color points, enhancing the contrast. Secondly, introducing stereoscopic depth by varying the screen disparity, i.e., depth manipulation, is expected to increase both the usefulness and the naturalness. Twenty participants assessed perceived image quality, perceived naturalness and perceived depth of the manipulated versions of two natural scenes. The results revealed a small, yet significant shift between image quality and naturalness as a function of the chroma manipulation. In line with previous research, preference in quality was shifted to higher chroma values in comparison to preference in naturalness. Introducing depth enhanced the naturalness scores, however, in contrast to our expectations, not the image quality scores. It is argued that image quality is not sufficient to evaluate the full experience of 3D. Image quality appears to be only one of the attributes underlying the naturalness of stereoscopic images.
Color rendering indices in global illumination methods
Human perception of material colors depends heavily on the nature of the light sources used for illumination. One and the same object can cause highly different color impressions when lit by a vapor lamp or by daylight, respectively. Based on state-of-the-art colorimetric methods we present a modern approach for calculating color rendering indices (CRI), which were defined by the International Commission on Illumination (CIE) to characterize color reproduction properties of illuminants. We update the standard CIE method in three main points: firstly, we use the CIELAB color space, secondly, we apply a Bradford transformation for chromatic adaptation, and finally, we evaluate color differences using the CIEDE2000 total color difference formula. Moreover, within a real-world scene, light incident on a measurement surface is composed of a direct and an indirect part. Neumann and Schanda1 have shown for the cube model that interreflections can influence the CRI of an illuminant. We analyze how color rendering indices vary in a real-world scene with mixed direct and indirect illumination and recommend the usage of a spectral rendering engine instead of an RGB based renderer for reasons of accuracy of CRI calculations.
Luminance, disparity, and range statistics in 3D natural scenes
Yang Liu, Lawrence K. Cormack, Alan C. Bovik
Range maps have been actively studied in the last few years in the context of depth perception in natural scenes. With the availability of co-registered luminance information, we have the ability to examine and model the statistical relationships between luminance, range and disparity. In this study, we find that a onesided generalized gaussian distribution closely fits the prior of the range gradient. This finding sheds new light on statistical modeling of 2D and 3D image features in natural scenes.
Three-dimensional visualization of geographical terrain data using temporal parallax difference induction
Vision III Imaging, Inc. (the Company) has developed Parallax Image Display (PIDTM) software tools to critically align and display aerial images with parallax differences. Terrain features are rendered obvious to the viewer when critically aligned images are presented alternately at 4.3 Hz. The recent inclusion of digital elevation models in geographic data browsers now allows true three-dimensional parallax to be acquired from virtual globe programs like Google Earth. The authors have successfully developed PID methods and code that allow three-dimensional geographical terrain data to be visualized using temporal parallax differences.
Measuring hand, head, and vehicle motions in commuting environments
Viewing video on mobile devices is becoming increasingly common. The small field-of-view and the vibrations in common commuting environments present challenges (hardware and software) for the imaging community. By monitoring the vibration of the display, it could be possible to stabilize an image on the display by shifting a portion of a large image with the display (a field-of-view expansion approach). However, the image should not be shifted exactly per display motion because eye movements have a 'self-adjustment' ability to partially or completely compensate for external motions that can make a perfect compensation appear to overshoot. In this work, accelerometers were used to measure the motion of a range of vehicles, and observers' heads and hands as they rode in those vehicles to support the development of display motion compensation algorithms.
Art and Perception
icon_mobile_dropdown
Visually representing reality: aesthetics and accessibility aspects
This paper gives an overview of the visual representation of reality with three imaging technologies: painting, photography and electronic imaging. The contribution of the important image aspects, called dimensions hereafter, such as color, fine detail and total image size, to the degree of reality and aesthetic value of the rendered image are described for each of these technologies. Whereas quite a few of these dimensions - or approximations, or even only suggestions thereof - were already present in prehistoric paintings, apparent motion and true stereoscopic vision only recently were added - unfortunately also introducing accessibility and image safety issues. Efforts are made to reduce the incidence of undesirable biomedical effects such as photosensitive seizures (PSS), visually induced motion sickness (VIMS), and visual fatigue from stereoscopic images (VFSI) by international standardization of the image parameters to be avoided by image providers and display manufacturers. The history of this type of standardization, from an International Workshop Agreement to a strategy for accomplishing effective international standardization by ISO, is treated at some length. One of the difficulties to be mastered in this process is the reconciliation of the, sometimes opposing, interests of vulnerable persons, thrill-seeking viewers, creative video designers and the game industry.
Estimating the position of illuminants in paintings under weak model assumptions: an application to the works of two Baroque masters
David Kale, David G. Stork
The problems of estimating the position of an illuminant and the direction of illumination in realist paintings have been addressed using algorithms from computer vision. These algorithms fall into two general categories: In model-independent methods (cast-shadow analysis, occluding-contour analysis, ...), one does not need to know or assume the three-dimensional shapes of the objects in the scene. In model-dependent methods (shape-fromshading, full computer graphics synthesis, ...), one does need to know or assume the three-dimensional shapes. We explore the intermediate- or weak-model condition, where the three-dimensional object rendered is so simple one can very confidently assume its three-dimensional shape and, further, that this shape admits an analytic derivation of the appearance model. Specifically, we can assume that floors and walls are flat and that they are horizontal and vertical, respectively. We derived the maximum-likelihood estimator for the two-dimensional spatial location of a point source in an image as a function of the pattern of brightness (or grayscale value) over such a planar surface. We applied our methods to two paintings of the Baroque, paintings for which the question of the illuminant position is of interest to art historians: Georges de la Tour's Christ in the carpenter's studio (1645) and Caravaggio's The calling of St. Matthew (1599-1600). Our analyses show that a single point source (somewhat near to the depicted candle) is a slightly better explanation of the pattern of brightness on the floor in Christ than are two point sources, one in place of each of the figures. The luminance pattern on the rear wall in The calling implies the source is local, a few meters outside the picture frame-not the infinitely distant sun. Both results are consistent with previous rebuttals of the recent art historical claim that these paintings were executed by means of tracing optically projected images. Our method is the first application of such weak-model methods for inferring the location of illuminants in realist paintings and should find use in other questions in the history of art.
Efficient visual system processing of spatial and luminance statistics in representational and non-representational art
Daniel J. Graham, Jay D. Friedenberg, Daniel N. Rockmore
An emerging body of research suggests that artists consistently seek modes of representation that are efficiently processed by the human visual system, and that these shared properties could leave statistical signatures. In earlier work, we showed evidence that perceived similarity of representational art could be predicted using intensity statistics to which the early visual system is attuned, though semantic content was also found to be an important factor. Here we report two studies that examine the visual perception of similarity. We test a collection of non-representational art, which we argue possesses useful statistical and semantic properties, in terms of the relationship between image statistics and basic perceptual responses. We find two simple statistics-both expressed as single values-that predict nearly a third of the overall variance in similarity judgments of abstract art. An efficient visual system could make a quick and reasonable guess as to the relationship of a given image to others (i.e., its context) by extracting these basic statistics early in the visual stream, and this may hold for natural scenes as well as art. But a major component of many types of art is representational content. In a second study, we present findings related to efficient representation of natural scene luminances in landscapes by a well-known painter. We show empirically that elements of contemporary approaches to high-dynamic range tone-mapping-which are themselves deeply rooted in an understanding of early visual system coding-are present in the way Vincent Van Gogh transforms scene luminances into painting luminances. We argue that global tone mapping functions are a useful descriptor of an artist's perceptual goals with respect to global illumination and we present evidence that mapping the scene to a painting with different implied lighting properties produces a less efficient mapping. Together, these studies suggest that statistical regularities in art can shed light on visual processing.
Painted or printed? Correlation analysis of the brickwork in Jan van der Heyden's View of Oudezijds Voorburgwal with the Oude Kerke in Amsterdam
David G. Stork, Sean Meador, Petria Nobel
The title painting, in the Royal Picture Gallery Mauritshuis in The Hague, is remarkable in that every figure and every part of every building is clearly discernible in the minutest detail: decorations, weathercock, bells in the church tower, and so on. Thousands of individual bricks are visible in the buildings at the left and the question has been posed by art scholars as to whether these bricks were laboriously painted individually or instead more efficiently pressed to the painting by some form of template, for instance by pressing a wet print against the painting. Close inspection of the painting in raking light reveals that the mortar work is rendered in thick, protruding paint, but such visual analysis, while highly suggestive, does not prove van der Heyden employed counterproofing; as such evidence must be sought in order to corroborate this hypothesis. If some form of counterproofing was employed by the artist, there might be at least some repeated patterns of the bricks, as the master print master was shifted from place to place in the painting. Visual search for candidate repeated passages of bricks by art scholars has proven tedious and unreliable. For this reason, we instead used a method based on computer forensics for detecting nearly identical repeated patterns within an image: discrete crosscorrelation. Specifically, we preprocessed a high-resolution photograph of the painting and used thresholding and image processing to enhance the brickwork. Then we convolved small portions of this processed image of the brickwork with all areas of brickwork throughout the painting. Our results reveal only small regions of moderate cross-correlation. Most importantly, the limited spatial extent of matching regions shows that the peaks found are not significantly higher than would occur by chance in a hand-executed work or in one created using a single counterproof. To our knowledge, ours is the first use of cross-correlation to search for repeated patterns in a realist painting to answer a question in the history of art.
Chiasmus
Chiasmus is a responsive and dynamically reflective, two-sided volumetric surface that embodies phenomenological issues such as the formation of images, observer and machine perception and the dynamics of the screen as a space of image reception. It consists of a square grid of 64 individually motorized cube elements engineered to move linearly. Each cube is controlled by custom software that analyzes video imagery for luminance values and sends these values to the motor control mechanisms to coordinate the individual movements. The resolution of the sculptural screen from the individual movements allows its volume to dynamically alter, providing novel and unique perspectives of its mobile form to an observer.
Interactive Paper Session
icon_mobile_dropdown
Model validation of channel zapping quality
Robert Kooij, Floris Nicolai, Kamal Ahmed, et al.
In an earlier paper we showed, that perceived quality of channel zapping is related to the perceived quality of download time of web browsing, as suggested by ITU-T Rec.G.1030. We showed this by performing subjective tests resulting in an excellent fit with a 0.99 correlation. This was what we call a lean forward experiment and gave the rule of thumb result that the zapping time must be less than 0.43 sec to be good ( > 3.5 on the MOS scale). To validate the model we have done new subjective experiments. These experiments included lean backwards zapping i.e. sitting in a sofa with a remote control. The subjects are more forgiving in this case and the requirement could be relaxed to 0.67 sec. We also conducted subjective experiments where the zapping times are varying. We found that the MOS rating decreases if zapping delay times are varying. In our experiments we assumed uniformly distributed delays, where the variance cannot be larger than the mean delay. We found that in order to obtain a MOS rating of at least 3.5, that the maximum allowed variance, and thus also the maximum allowed mean zapping delay, is 0.46 sec.
Application of a visual model to the design of an ultra-high definition upscaler
A Visual Model (VM) is used to aid in the design of an Ultra-high Definition (UHD) upscaling algorithm that renders High Definition legacy content on a UHD display. The costly development of such algorithms is due, in part, to the time spent subjectively evaluating the adjustment of algorithm structural variations and parameters. The VM provides an image map that gives feedback to the design engineer about visual differences between algorithm variations, or about whether a costly algorithm improvement will be visible at expected viewing distances. Such visual feedback reduces the need for subjective evaluation. This paper presents the results of experimentally verifying the VM against subjective tests of visibility improvement versus viewing distance for three upscaling algorithms. Observers evaluated image differences for upscaled versions of high-resolution stills and HD (Blu-ray) images, viewing a reference and test image, and controlled a linear blending weight to determine the image discrimination threshold. The required thresholds vs. viewing distance varied as expected, with larger amounts of the test image required at further distances. We verify the VM by comparison of predicted discrimination thresholds versus the subjective data. After verification, VM visible difference maps are presented to illustrate the practical use of the VM during design.
Hyperbolic modeling for metaphorical processing and visual computations
In this paper, we lay groundwork for a model of extended dendritic processing based on temporal signalling using a model in hyperbolic space. The intended goal is to create a processing environment in which metaphorical and analogical processing is natural to the components. A secondary goal is to create a processing model which is naturally complex, naturally based in fractal and complex flows, and creates communication based on a compatibility rather than a duplication model. This is a still a work in progress, but some gains are made in creating the background model.
Facilitation of listening comprehension by visual information under noisy listening condition
Chiho Kashimada, Takumi Ito, Kazuki Ogita, et al.
Comprehension of a sentence under a wide range of delay conditions between auditory and visual stimuli was measured in the environment with low auditory clarity of the level of -10dB and -15dB pink noise. Results showed that the image was helpful for comprehension of the noise-obscured voice stimulus when the delay between the auditory and visual stimuli was 4 frames (=132msec) or less, the image was not helpful for comprehension when the delay between the auditory and visual stimulus was 8 frames (=264msec) or more, and in some cases of the largest delay (32 frames), the video image interfered with comprehension.
Quantifying the image-sticking phenomenon for the checkerboard stimuli: contrast, spatial frequency, edge effect, and noise interference
This study aimed to determine the relationship between the image-sticking phenomenon and human visual perception. The contrast sensitivity of various checkerboard patterns was measured over a wide range of spatial frequencies. Four subjective tests were designed to determine the contrast threshold of spatial frequency, edge effect, and noise interference for the checkerboard stimuli. The experimental results were divided into four parts: (1) Special frequency: the contrast sensitivity remains on 45 dB (dB = 20 log10 (1/contrast)) steadily in low-spatial frequency. The contrast sensitivity dropped drastically when the spatial frequency was increased from 0.5 to 1.3(log C/deg). The spatial frequency 0.5 (log C/deg) had maximum contrast sensitivity (2) Edge effect: the original checkerboard pattern was filtered by convolution with the different mean filter sizes to produce a variety of scale blurred edges, and to estimate the influence of the edge effect as regards human perception. The results showed that sharper edges of checkerboard stimuli can affect the contrast threshold (3) Gaussian noise: checkerboard stimuli add noise with Gaussian distribution to evaluate the addition of the noise effect for checkerboard stimuli. The low-contrast checkerboard stimuli were affect that when σ becomes greater and the level of contrast sensitivity drops (4) Simultaneous edge effect and Gaussian noise interference: the level of contrast sensitivity is lower than the others and the curve of contrast sensitivity is similar to that of Gaussian noise. According to the experimental result the contribution of the spatial frequency, edge effect, and noise interference for human visual perception could be determined.