Proceedings Volume 7527

Human Vision and Electronic Imaging XV

cover
Proceedings Volume 7527

Human Vision and Electronic Imaging XV

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 2 February 2010
Contents: 13 Sessions, 47 Papers, 0 Presentations
Conference: IS&T/SPIE Electronic Imaging 2010
Volume Number: 7527

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 7527
  • Keynote Session
  • Artificial Retina
  • Advanced Brain Imaging, Perception, and Cognition
  • Advances in Image Quality
  • New Directions in Image Quality
  • Visual, Auditory, and Tactile Perception
  • Art and Science Render the High-Dynamic-Range World I
  • Art and Science Render the High-Dynamic-Range World II
  • Perceptual and Cognitive Experiments in Virtual Environments
  • Cognition, Attention, and Eye Movements in Image Analysis
  • Art, Aesthetics, and Perception
  • Interactive Paper Session
Front Matter: Volume 7527
icon_mobile_dropdown
Front Matter: Volume 7527
This PDF file contains the front matter associated with SPIE Proceedings volume 7527, including the Title Page, Copyright information, Table of Contents, Introduction, and the Conference Committee listing.
Keynote Session
icon_mobile_dropdown
Music in film and animation: experimental semiotics applied to visual, sound and musical structures
Roger A. Kendall
The relationship of music to film has only recently received the attention of experimental psychologists and quantificational musicologists. This paper outlines theory, semiotical analysis, and experimental results using relations among variables of temporally organized visuals and music. 1. A comparison and contrast is developed among the ideas in semiotics and experimental research, including historical and recent developments. 2. Musicological Exploration: The resulting multidimensional structures of associative meanings, iconic meanings, and embodied meanings are applied to the analysis and interpretation of a range of film with music. 3. Experimental Verification: A series of experiments testing the perceptual fit of musical and visual patterns layered together in animations determined goodness of fit between all pattern combinations, results of which confirmed aspects of the theory. However, exceptions were found when the complexity of the stratified stimuli resulted in cognitive overload.
Artificial Retina
icon_mobile_dropdown
In vivo operation of the Boston 15-channel wireless subretinal visual prosthesis
Douglas B. Shire, Patrick Doyle, Shawn K. Kelly, et al.
This presentation concerns the engineering development of the Boston visual prosthesis for restoring useful vision to patients blind with degenerative retinal disease. A miniaturized, hermetically-encased, 15-channel wirelessly-operated retinal prosthetic was developed for implantation and pre-clinical studies in Yucatan mini-pig animal models. The prosthesis conforms to the eye and drives a microfabricated polyimide stimulating electrode array having sputtered iridium oxide electrodes. This array is implanted into the subretinal space using a specially-designed ab externo surgical technique; the bulk of the prosthesis is on the surface of the sclera. The implanted device includes a hermetic titanium case containing a 15-channel stimulator chip; secondary power/data receiving coils surround the cornea. Long-term in vitro pulse testing was also performed on the electrodes to ensure their stability over years of operation. Assemblies were first tested in vitro to verify wireless operation of the system in biological saline using a custom RF transmitter circuit and primary coils. Stimulation pulse strength, duration and frequency were programmed wirelessly using a computer with a custom graphical user interface. Operation of the retinal implant was verified in vivo in 3 minipigs for more than three months by measuring stimulus artifacts on the eye surface using contact lens electrodes.
Advanced Brain Imaging, Perception, and Cognition
icon_mobile_dropdown
Human brain imaging during controlled and natural viewing
Stanley A. Klein, Thom Carney, David Kim, et al.
Assorted technologies such as; EEG, MEG, fMRI, BEM, MRI, TMS and BCI are being integrated to understand how human visual cortical areas interact during controlled laboratory and natural viewing conditions. Our focus is on the problem of separating signals from the spatially close early visual areas. The solution involves taking advantage of known functional anatomy to guide stimulus selection and employing principles of spatial and temporal response properties that simplify analysis. The method also unifies MEG and EEG recordings and provides a means for improving existing boundary element head models. In going beyond carefully controlled stimuli, in natural viewing with scanning eye movements, assessing brain states with BCI is a most challenging task. Frequent eye movements contribute artifacts to the recordings. A linear regression method is introduced that is shown to effectively characterize these frequent artifacts and could be used to remove them. In free viewing, saccadic landings initiate visual processing epochs and could be used to trigger strictly time based analysis methods. However, temporal instabilities indicate frequency based analysis would be an important adjunct. The class of Cauchy filter functions is introduced that have narrow time and frequency properties well matched to the EEG/MEG spectrum for avoiding channel leakage.
Drawing in the blind and the sighted as a probe of cortical reorganization
In contrast to other arts, such as music, there is a very little neuroimaging research on visual art and in particular - on drawing. Drawing - from artistic to technical - involves diverse aspects of spatial cognition, precise sensorimotor planning and control as well as a rich set of higher cognitive functions. A new method for learning the drawing skill in the blind that we have developed, and the technological advances of a multisensory MR-compatible drawing system, allowed us to run for the first time a comparative fMRI study on drawing in the blind and the sighted. In each population, we identified widely distributed cortical networks, extending from the occipital and temporal cortices, through the parietal to the frontal lobe. This is the first neuroimaging study of drawing in blind novices, as well as the first study on the learning to draw in either population. We sought to determine the cortical reorganization taking place as a result of learning to draw, despite the lack of visual input to the brains of the blind. Remarkably, we found massive recruitment of the visual cortex on learning to draw, although our subjects had no previous experience, but only a short training with our new drawing method. This finding implies a rapid, learning-based plasticity mechanism. We further proposed that the functional level of the brain reorganization in the blind may still differ from that in the sighted even in areas that overlap between the two populations, such as in the visual cortex. We tested this idea in the framework of saccadic suppression. A methodological innovation allowed us to estimate the retinotopic regions locations in the blind brain. Although the visual cortex of both groups was greatly recruited, only the sighted experienced dramatic suppression in hMT+ and V1, while there was no sign of an analogous process in the blind. This finding has important implications and suggests that the recruitment of the visual cortex in the blind does not assure a full functional parallel.
Advances in electromagnetic brain imaging
Non-invasive and dynamic imaging of brain activity in the sub-millisecond time-scale is enabled by measurements on or near the scalp surface using an array of sensors that measure magnetic fields (magnetoencephalography (MEG)) or electric potentials (electroencephalography (EEG)). Algorithmic reconstruction of brain activity from MEG and EEG data is referred to as electromagnetic brain imaging (EBI). Reconstructing the actual brain response to external events and distinguishing unrelated brain activity has been a challenge for many existing algorithms in this field. Furthermore, even under conditions where there is very little interference, accurately determining the spatial locations and timing of brain sources from MEG and EEG data is challenging problem because it involves solving for unknown brain activity across thousands of voxels from just a few sensors (~300). In recent years, my research group has developed a suite of novel and powerful algorithms for EBI that we have shown to be considerably superior to existing benchmark algorithms. Specifically, these algorithms can solve for many brain sources, including sources located far from the sensors, in the presence of large interference from unrelated brain sources. Our algorithms efficiently model interference contributions to sensors, accurately estimate sparse brain source activity using fast and robust probabilistic inference techniques. Here, we review some of these algorithms and illustrate their performance in simulations and real MEG/EEG data.
Top-down modulation: the crossroads of perception, attention and memory
Adam Gazzaley
Research in our laboratory focuses on understanding the neural mechanisms that serve at the crossroads of perception, memory and attention, specifically exploring how brain region interactions underlie these abilities. To accomplish this, we study top-down modulation, the process by which we enhance neural activity associated with relevant information and suppress activity for irrelevant information, thus establishing a neural basis for all higher-order cognitive operations. We also study alterations in top-down modulation that occur with normal aging. Our experiments are performed on human participants, using a multimodal approach that integrates functional MRI (fMRI), transcranial magnetic stimulation (TMS) and electroencephalography (EEG).
Isolating human brain functional connectivity associated with a specific cognitive process
Michael A. Silver, Ayelet N. Landau, Thomas Z. Lauritzen, et al.
The use of functional magnetic resonance imaging (fMRI) to measure functional connectivity among brain areas has the potential to identify neural networks associated with particular cognitive processes. However, fMRI signals are not a direct measure of neural activity but rather represent blood oxygenation level-dependent (BOLD) signals. Correlated BOLD signals between two brain regions are therefore a combination of neural, neurovascular, and vascular coupling. Here, we describe a procedure for isolating brain functional connectivity associated with a specific cognitive process. Coherency magnitude (measuring the strength of coupling between two time series) and phase (measuring the temporal latency differences between two time series) are computed during performance of a particular cognitive task and also for a control condition. Subtraction of the coherency magnitude and phase differences for the two conditions removes sources of correlated BOLD signals that do not modulate as a function of cognitive task, resulting in a more direct measure of functional connectivity associated with changes in neuronal activity. We present two applications of this task subtraction procedure, one to measure changes in strength of coupling associated with sustained visual spatial attention, and one to measure changes in temporal latencies between brain areas associated with voluntary visual spatial attention.
Advances in Image Quality
icon_mobile_dropdown
Statistical analysis of subjective preferences for video enhancement
Russell L. Woods, PremNandhini Satgunam, P. Matthew Bronstad, et al.
Measuring preferences for moving video quality is harder than for static images due to the fleeting and variable nature of moving video. Subjective preferences for image quality can be tested by observers indicating their preference for one image over another. Such pairwise comparisons can be analyzed using Thurstone scaling (Farrell, 1999). Thurstone (1927) scaling is widely used in applied psychology, marketing, food tasting and advertising research. Thurstone analysis constructs an arbitrary perceptual scale for the items that are compared (e.g. enhancement levels). However, Thurstone scaling does not determine the statistical significance of the differences between items on that perceptual scale. Recent papers have provided inferential statistical methods that produce an outcome similar to Thurstone scaling (Lipovetsky and Conklin, 2004). Here, we demonstrate that binary logistic regression can analyze preferences for enhanced video.
Tradeoffs in subjective testing methods for image and video quality assessment
David M. Rouse, Romuald Pépion, Patrick Le Callet, et al.
The merit of an objective quality estimator for either still images or video is gauged by its ability to accurately estimate the perceived quality scores of a collection of stimuli. Encounters with radically different distortion types that arise in novel media representations require that researchers collect perceived quality scores representative of these new distortions to confidently evaluate a candidate objective quality estimator. Two common methods used to collect perceived quality scores are absolute categorical rating (ACR)1 and subjective assessment for video quality (SAMVIQ).2, 3 The choice of a particular test method affects the accuracy and reliability of the data collected. An awareness of the potential benefits and/or costs attributed to the ACR and SAMVIQ test methods can guide researchers to choose the more suitable method for a particular application. This paper investigates the tradeoffs of these two subjective testing methods using three different subjective databases that have scores corresponding to each method. The subjective databases contain either still-images or video sequences. This paper has the following organization: Section 2 summarizes the two test methods compared in this paper, ACR and SAMVIQ. Section 3 summarizes the content of the three subjective databases used to evaluate the two test methods. An analysis of the ACR and SAMVIQ test methods is presented in Section 4. Section 5 concludes this paper.
Calibration of the visual difference predictor for estimating visibility of JPEG2000 compression artifacts in CT images
Many visual difference predictors (VDPs) have used basic psychophysical data (such as ModelFest) to calibrate the algorithm parameters and to validate their performances. However, the basic psychophysical data often do not contain sufficient number of stimuli and its variations to test more complex components of a VDP. In this paper we calibrate the Visual Difference Predictor for High Dynamic Range images (HDR-VDP) using radiologists' experimental data for JPEG2000 compressed CT images which contain complex structures. Then we validate the HDR-VDP in predicting the presence of perceptible compression artifacts. 240 CT-scan images were encoded and decoded using JPEG2000 compression at four compression ratios (CRs). Five radiologists participated to independently determine if each image pair (original and compressed images) was indistinguishable or distinguishable. A threshold CR for each image, at which 50% of radiologists would detect compression artifacts, was estimated by fitting a psychometric function. The CT images compressed at the threshold CRs were used to calibrate the HDR-VDP parameters and to validate its prediction accuracy. Our results showed that the HDR-VDP calibrated for the CT image data gave much better predictions than the HDR-VDP calibrated to the basic psychophysical data (ModelFest + contrast masking data for sine gratings).
A subjective study to evaluate video quality assessment algorithms
Automatic methods to evaluate the perceptual quality of a digital video sequence have widespread applications wherever the end-user is a human. Several objective video quality assessment (VQA) algorithms exist, whose performance is typically evaluated using the results of a subjective study performed by the video quality experts group (VQEG) in 2000. There is a great need for a free, publicly available subjective study of video quality that embodies state-of-the-art in video processing technology and that is effective in challenging and benchmarking objective VQA algorithms. In this paper, we present a study and a resulting database, known as the LIVE Video Quality Database, where 150 distorted video sequences obtained from 10 different source video content were subjectively evaluated by 38 human observers. Our study includes videos that have been compressed by MPEG-2 and H.264, as well as videos obtained by simulated transmission of H.264 compressed streams through error prone IP and wireless networks. The subjective evaluation was performed using a single stimulus paradigm with hidden reference removal, where the observers were asked to provide their opinion of video quality on a continuous scale. We also present the performance of several freely available objective, full reference (FR) VQA algorithms on the LIVE Video Quality Database. The recent MOtion-based Video Integrity Evaluation (MOVIE) index emerges as the leading objective VQA algorithm in our study, while the performance of the Video Quality Metric (VQM) and the Multi-Scale Structural SIMilarity (MS-SSIM) index is noteworthy. The LIVE Video Quality Database is freely available for download1 and we hope that our study provides researchers with a valuable tool to benchmark and improve the performance of objective VQA algorithms.
Subjective assessment of HDTV content: comparison of quality across HDTV formats
Joshan Meenowa, David S. Hands, Rhea Young, et al.
This paper reports on a series of experiments designed to examine the subjective quality of HDTV encoded video. A common set of video sequences varying in spatial and temporal complexity will be encoded at different H.264 encoding profiles. Each video clip used in the test will be available in 720p, 1080i and 1080p HDTV formats. Each video clip will then be encoded in CBR mode using H.264 at bitrates ranging from 1 Mbit/s up to 15 Mbit/s with the other encoding parameters identical (e.g., identical key frame intervals and CABAC entropy mode). This approach has been chosen to enable direct comparison across bit-rates and formats most relevant to HDTV broadcast services. Three subjective tests will be performed, one test for each HDTV format. The single-stimulus subjective test method with the ACR rating scale will be employed. A total of 15 non-expert subjects will participate in each test. A different sample of subjects will participate in each test, making 45 subjects in total. The test results will be available by end of the summer 2009.
New Directions in Image Quality
icon_mobile_dropdown
The medium and the message: a revisionist view of image quality
In his book "Understanding Media" social theorist Marshall McLuhan declared: "The medium is the message." The thesis of this paper is that with respect to image quality, imaging system developers have taken McLuhan's dictum too much to heart. Efforts focus on improving the technical specifications of the media (e.g. dynamic range, color gamut, resolution, temporal response) with little regard for the visual messages the media will be used to communicate. We present a series of psychophysical studies that investigate the visual system's ability to "see through" the limitations of imaging media to perceive the messages (object and scene properties) the images represent. The purpose of these studies is to understand the relationships between the signal characteristics of an image and the fidelity of the visual information the image conveys. The results of these studies provide a new perspective on image quality that shows that images that may be very different in "quality", can be visually equivalent as realistic representations of objects and scenes.
Quantifying the relationship between visual salience and visual importance
This paper presents the results of two psychophysical experiments and an associated computational analysis designed to quantify the relationship between visual salience and visual importance. In the first experiment, importance maps were collected by asking human subjects to rate the relative visual importance of each object within a database of hand-segmented images. In the second experiment, experimental saliency maps were computed from visual gaze patterns measured for these same images by using an eye-tracker and task-free viewing. By comparing the importance maps with the saliency maps, we found that the maps are related, but perhaps less than one might expect. When coupled with the segmentation information, the saliency maps were shown to be effective at predicting the main subjects. However, the saliency maps were less effective at predicting the objects of secondary importance and the unimportant objects. We also found that the vast majority of early gaze position samples (0-2000 ms) were made on the main subjects, suggesting that a possible strategy of early visual coding might be to quickly locate the main subject(s) in the scene.
Synchronization mismatch: vernier acuity and perception evaluation for large ultra high resolution tiled displays
We define experiments to measure vernier acuity caused by synchronization mismatch for moving images. The experiments are used to obtain synchronization mismatch acuity threshold as a function of object velocity and as a function of occlusion or gap width. Our main motivation for measuring the synchronization mismatch vernier acuity is its relevance in the application of tiled display systems which create a single contiguous image using individual discrete panels arranged in a matrix with each panel utilizing a distributed synchronization algorithm to display parts of the overall image. We also propose a subjective assessment method for perception evaluation of synchronization mismatch for large ultra high resolution tiled displays. For this we design a synchronization mismatch measurement test video set for various tile configurations for various inter-panel synchronization mismatch values. The proposed method for synchronization mismatch perception can evaluate tiled displays with or without tile bezels. The results from this work can help during design of low cost tiled display systems which utilize distributed synchronization mechanisms for a contiguous or bezeled image display.
A perceptual similarity measure based on smoothing filters and the normalized compression distance
We present an empirical study of properties of the remarkable Normalized Compression Distance (NCD), a mathematical formulation by M. Li et al. that quantifies similarity between two binary strings as the additional amount of algorithmic information required to transform the description of one to the other. In particular, we are interested in the NCD values between an image and its modified versions by common image processing techniques. Experimental data obtained indicate that the NCD is symmetric and transitive, and that it can be used as a reasonable perceptual measure of similarity, but only in the spatial domain. Further, the NCD clusters the common image processing techniques into three groups in a manner consistent with the human perception of similarity. We also introduce two independent modifications to the NCD and study their properties. The first modification calls for applying a median filter and then thresholding to the input images before the NCD is computed, which results in a more uniform distribution of the NCD values. The second modification modifies the NCD formula to reflect the running time as well as the size of the shortest program that transforms one input string to the other. Obtained data show that using this modified NCD, it is possible to classify subtle changes (e.g., watermarking) to a font character image as similar to the original and drastic changes (e.g., rotation by 90 degrees) as different.
Visual, Auditory, and Tactile Perception
icon_mobile_dropdown
Subband analysis and synthesis of real-world textures for objective and subjective determination of roughness
In a previous study we investigated the roughness of real world textures taken from the CUReT database. We showed that people could systematically judge the subjective roughness of these textures. However, we did not determine which objective factors relate to these perceptual judgments of roughness. In the present study we take the first step in this direction using a subband decomposition of the CUReT textures. This subband decomposition is used to predict the subjective roughness judgments of the previous study. We also generated synthetic textures with uniformly distributed white noise of the same variance in each subband, and conducted a perceptual experiment to determine the perceived roughness of both the original and synthesized texture images. The participants were asked to rank-order the images based on the degree of perceived roughness. It was found that the synthesis method produces images that are similar in roughness to the original ones except for a small but systematic deviation.
Psychoacoustic and cognitive aspects of auditory roughness: definitions, models, and applications
Pantelis N. Vassilakis, Roger A. Kendall
The term "auditory roughness" was first introduced in the 19th century to describe the buzzing, rattling auditory sensation accompanying narrow harmonic intervals (i.e. two tones with frequency difference in the range of ~15-150Hz, presented simultaneously). A broader definition and an overview of the psychoacoustic correlates of the auditory roughness sensation, also referred to as sensory dissonance, is followed by an examination of efforts to quantify it over the past one hundred and fifty years and leads to the introduction of a new roughness calculation model and an application that automates spectral and roughness analysis of sound signals. Implementation of spectral and roughness analysis is briefly discussed in the context of two pilot perceptual experiments, designed to assess the relationship among cultural background, music performance practice, and aesthetic attitudes towards the auditory roughness sensation.
Audio-visual interactions in product sound design
Elif Özcan, René van Egmond
Consistent product experience requires congruity between product properties such as visual appearance and sound. Therefore, for designing appropriate product sounds by manipulating their spectral-temporal structure, product sounds should preferably not be considered in isolation but as an integral part of the main product concept. Because visual aspects of a product are considered to dominate the communication of the desired product concept, sound is usually expected to fit the visual character of a product. We argue that this can be accomplished successfully only on basis of a thorough understanding of the impact of audio-visual interactions on product sounds. Two experimental studies are reviewed to show audio-visual interactions on both perceptual and cognitive levels influencing the way people encode, recall, and attribute meaning to product sounds. Implications for sound design are discussed defying the natural tendency of product designers to analyze the "sound problem" in isolation from the other product properties.
Tangible display systems: direct interfaces for computer-based studies of surface appearance
Benjamin A. Darling, James A. Ferwerda
When evaluating the surface appearance of real objects, observers engage in complex behaviors involving active manipulation and dynamic viewpoint changes that allow them to observe the changing patterns of surface reflections. We are developing a class of tangible display systems to provide these natural modes of interaction in computer-based studies of material perception. A first-generation tangible display was created from an off-the-shelf laptop computer containing an accelerometer and webcam as standard components. Using these devices, custom software estimated the orientation of the display and the user's viewing position. This information was integrated with a 3D rendering module so that rotating the display or moving in front of the screen would produce realistic changes in the appearance of virtual objects. In this paper, we consider the design of a second-generation system to improve the fidelity of the virtual surfaces rendered to the screen. With a high-quality display screen and enhanced tracking and rendering capabilities, a secondgeneration system will be better able to support a range of appearance perception applications.
Art and Science Render the High-Dynamic-Range World I
icon_mobile_dropdown
The Ansel Adams zone system: HDR capture and range compression by chemical processing
We tend to think of digital imaging and the tools of PhotoshopTM as a new phenomenon in imaging. We are also familiar with multiple-exposure HDR techniques intended to capture a wider range of scene information, than conventional film photography. We know about tone-scale adjustments to make better pictures. We tend to think of everyday, consumer, silver-halide photography as a fixed window of scene capture with a limited, standard range of response. This description of photography is certainly true, between 1950 and 2000, for instant films and negatives processed at the drugstore. These systems had fixed dynamic range and fixed tone-scale response to light. All pixels in the film have the same response to light, so the same light exposure from different pixels was rendered as the same film density. Ansel Adams, along with Fred Archer, formulated the Zone System, staring in 1940. It was earlier than the trillions of consumer photos in the second half of the 20th century, yet it was much more sophisticated than today's digital techniques. This talk will describe the chemical mechanisms of the zone system in the parlance of digital image processing. It will describe the Zone System's chemical techniques for image synthesis. It also discusses dodging and burning techniques to fit the HDR scene into the LDR print. Although current HDR imaging shares some of the Zone System's achievements, it usually does not achieve all of them.
Ansel Adams: early works
Jodi Throckmorton
Ansel Adams (1902-1984), photographer, musician, naturalist, explorer, critic, and teacher, was a giant in the field of landscape photography. In his images of the unspoiled Western landscape, he strove to capture the sublime: the transcendentalist concept that nature can generate the experience of awe for the viewer. Many viewers are familiar with the heroic, high-contrast prints on high-gloss paper that Adams made to order beginning in the 1970s; much less well known are the intimate prints that the artist crafted earlier in his career. This exhibition focuses on these masterful small prints from the 1920s into the 1950s. During this time period, Adams's printing style changed dramatically. The painterly, soft-focus, warm-toned style of the Parmelian Prints of the High Sierras from the 1920s evolved into the sharp-focus style of the f/64 school of photography that Adams co-founded in the 1930s with Edward Weston and Imogen Cunningham. After World War II, Adams opted for a cooler, higher-contrast look for his prints. Throughout the various styles in which he chose to work, Adams explored the power of nature and succeeded in establishing landscape photography as a legitimate form of modern art.
Art and Science Render the High-Dynamic-Range World II
icon_mobile_dropdown
The drama of illumination: artist's approaches to the creation of HDR in paintings and prints
For many centuries artists have considered and depicted illumination in art, from the effect of sunlight on objects at different times of the day, of shadows and highlights as cast by the moon, through indirect light as that through an open window or the artificial light of the candle or firelight. The presentation will consider artists who were fascinated by the phenomena of natural and artificial illumination and how they were able to render the natural world as a form of dynamic range through pigment. Artists have been long aware of the psychological aspects of the juxtaposition of colour in exploiting the optical qualities and arranging visual effects in painting and prints. Artists in the 16th century were attempting to develop an extended dynamic range through multi-colour, wood-block printing. Artists working at the height of naturalist realism in the 17th through the 19th century were fascinated by the illusory nature of light on objects. The presentation will also consider the interpretation of dynamic range through the medium of mezzotint, possibly the most subtle of printing methods, which was used by printers to copy paintings, and to create highly original works of art containing a dynamic range of tones.
Darkness and depth in early Renaissance painting
Contrast has always been appreciated as a significant factor in image quality, but it is less widely recognized that it is a key factor in the representation of depth, solidity and three-dimensionality in images in general, and in paintings in particular. This aspect of contrast was a key factor in the introduction of oil paint as a painting medium at the beginning of the fifteenth century, as a practical means of contrast enhancement. However, recent conservatorship efforts have established that the first oil paintings were not, as commonly supposed, by van Eyck in Flanders in the 1430s, but by Masolino da Panicale in Italy in the 1420s. These developments led to the use of chiaroscuro technique in various forms, all of which are techniques for enhanced shadowing.
The luminance of pure black: exploring the effect of surround in the context of electronic displays
Rafal Mantiuk, Scott Daly, Louis Kerofsky
The overall image quality benefits substantially from good reproduction of black tones. Modern displays feature relatively low black level, making them capable rendering good dark tones. However, it is not clear if the black level of those displays is sufficient to produce a "absolute black" color, which appears no brighter than an arbitrary dark surface. To find the luminance necessary to invoke the perception of the absolutely black color, we conduct an experiment in which we measure the highest luminance that cannot be discriminated from the lowest luminance achievable in our laboratory conditions (0.003 cd/m2). We measure these thresholds under varying luminance of surround (up to 900 cd/m2), which simulates a range ambient illumination conditions. We also analyze our results in the context of actual display devices. We conclude that the black level of the LCD display with no backlight dimming is not only insufficient for producing absolute black color, but it may also appear grayish under low ambient light levels.
Object size, spatial-frequency content and retinal contrast
Recent interest in HDR scene capture and display has stimulated measurements of the usable range of contrast information for human vision. These experiments have led to a model that calculates the retinal contrast image. A fraction of the light from each scene pixel is scattered to all retinal pixels. The amount of scattered light decreases with distance from the other pixels. By summing the light falling on each retinal pixel from all the scene pixels we can calculate the retinal image contrast. As objects, such as text letters, get smaller, their retinal contrast gets lower, even though the scene contrast is constant. This paper studies the Landolt C data, a commonly used test targets for measuring visual acuity, using three frameworks. First, it compares the visual acuity measurements with the receptor mosaic dimension. Second, discusses the Campbell and Robson's experiments and the limits of the Contrast Sensitivity Function (CSF). Third, the paper reports the calculated retinal stimulus after intraocular scatter of both Landolt C and Campbell and Robson's stimuli. These three different frameworks are useful in understanding limits of human vision. Each approach gives only one piece of the puzzle. Retinal contrast, CSF, and retinal cone spacing all influence our understanding of human vision limits. We have analyzed Landolt C and CSF using retinal contrast. Glare effect on Landolt C shows that retinal images are significantly different from target images. Veiling glare of the sine-wave stimuli used by Campbell and Robson to measure CSF, results in a retinal contrast decrease. This, above 3-4 cpd, correlates well with the data reported by them.
The effect of exposure on MaxRGB color constancy
The performance of the MaxRGB illumination-estimation method for color constancy and automatic white balancing has been reported in the literature as being mediocre at best; however, MaxRGB has usually been tested on images of only 8-bits per channel. The question arises as to whether the method itself is inadequate, or rather whether it has simply been tested on data of inadequate dynamic range. To address this question, a database of sets of exposure-bracketed images was created. The image sets include exposures ranging from very underexposed to slightly overexposed. The color of the scene illumination was determined by taking an extra image of the scene containing 4 Gretag Macbeth mini Colorcheckers placed at an angle to one another. MaxRGB was then run on the images of increasing exposure. The results clearly show that its performance drops dramatically when the 14-bit exposure range of the Nikon D700 camera is exceeded, thereby resulting in clipping of high values. For those images exposed such that no clipping occurs, the median error in MaxRGB's estimate of the color of the scene illumination is found to be relatively small.
Gloss discrimination and eye movements
Human observers are able to make fine discriminations of surface gloss. What cues are they using to perform this task? In previous studies, we identified two reflection-related cues-the contrast of the reflected image (c, contrast gloss) and the sharpness of reflected image (d, distinctness-of-image gloss)--but these were for objects rendered in standard dynamic range (SDR) images with compressed highlights. In ongoing work, we are studying the effects of image dynamic range on perceived gloss, comparing high dynamic range (HDR) images with accurate reflections and SDR images with compressed reflections. In this paper, we first present the basic findings of this gloss discrimination study then present an analysis of eye movement recordings that show where observers were looking during the gloss discrimination task. The results indicate that: 1) image dynamic range has significant influence on perceived gloss, with surfaces presented in HDR images being seen as glossier and more discriminable than their SDR counterparts; 2) observers look at both light source highlights and environmental interreflections when judging gloss; and 3) both of these results are modulated by surface geometry and scene illumination.
Visual maladaptation in contrast domain
Dawid Pajak, Martin Cadík, Tunç O. Aydin, et al.
In this work we simulate the effect of the human eye's maladaptation to visual perception over time through a supra-threshold contrast perception model that comprises adaptation mechanisms. Specifically, we attempt to visualize maladapted vision on a display device. Given the scene luminance, the model computes a measure of perceived multi-scale contrast by taking into account spatially and temporally varying contrast sensitivity in a maladapted state, which is then processed by the inverse model and mapped to a desired display's luminance assuming perfect adaptation. Our system simulates the effect of maladaptation locally, and models the shifting of peak spatial frequency sensitivity in maladapted vision in addition to the uniform decrease in contrast sensitivity among all frequencies. Through our GPU implementation we demonstrate the visibility loss of scene details due to maladaptation over time at an interactive speed.
Perceptual and Cognitive Experiments in Virtual Environments
icon_mobile_dropdown
Synthetic environments as visualization method for product design
Frank Meijer, Egon L. van den Broek, Theo E. Schouten, et al.
In this paper, we explored the use of low fidelity Synthetic Environments (SE; i.e., a combination of simulation techniques) for product design. We explored the usefulness of low fidelity SE to make design problems explicit. In particular, we were interested in the influence of interactivity on user experience. For this purpose, an industrial design case was taken: the innovation of an airplane galley. A virtual airplane was created in which an interactive model of the galley was placed. First, three groups of participants explored the SE in different conditions: Participants explored the SE interactively (Interactive condition), watched a recording (Passive Dynamic condition), or watched static images (Passive Static condition). Afterwards, participants were tested in a questionnaire on how accurately they had memorized the spatial layout of the SE. The results revealed that interactive SE does not necessarily provoke participants to memorize spatial layouts more accurately. However, the effect of interactive learning is dependent on the participants' Visual Spatial Ability (VSA). Consequently, this finding supports use of interactive exploration of prototypes through low fidelity SE for the product design cycle when taking the individual's characteristics into account.
Cognition, Attention, and Eye Movements in Image Analysis
icon_mobile_dropdown
Interest of perceptive vision for document structure analysis
This work addresses the problem of document image analysis, and more particularly the topic of document structure recognition in old, damaged and handwritten document. The goal of this paper is to present the interest of the human perceptive vision for document analysis. We focus on two aspects of the model of perceptive vision: the perceptive cycle and the visual attention. We present the key elements of the perceptive vision that can be used for document analysis. Thus, we introduce the perceptive vision in an existing method for document structure recognition, which enable both to show how we used the properties of the perceptive vision and to compare the results obtained with and without perceptive vision. We apply our method for the analysis of several kinds of documents (archive registers, old newspapers, incoming mails . . . ) and show that the perceptive vision significantly improves their recognition. Moreover, the use of the perceptive vision simplifies the description of complex documents. At last, the running time is often reduced.
Visual recognition of permuted words
In current study we examine how letter permutation affects in visual recognition of words for two orthographically dissimilar languages, Urdu and German. We present the hypothesis that recognition or reading of permuted and non-permuted words are two distinct mental level processes, and that people use different strategies in handling permuted words as compared to normal words. A comparison between reading behavior of people in these languages is also presented. We present our study in context of dual route theories of reading and it is observed that the dual-route theory is consistent with explanation of our hypothesis of distinction in underlying cognitive behavior for reading permuted and non-permuted words. We conducted three experiments in lexical decision tasks to analyze how reading is degraded or affected by letter permutation. We performed analysis of variance (ANOVA), distribution free rank test, and t-test to determine the significance differences in response time latencies for two classes of data. Results showed that the recognition accuracy for permuted words is decreased 31% in case of Urdu and 11% in case of German language. We also found a considerable difference in reading behavior for cursive and alphabetic languages and it is observed that reading of Urdu is comparatively slower than reading of German due to characteristics of cursive script.
Effects of stimulus size and velocity on steady-state smooth pursuit induced by realistic images
Smooth pursuit eye movements align the retina with moving targets, ideally stabilizing the retinal image. At a steadystate, eye movements typically reach an approximately constant velocity which depends on, and is usually lower than the target velocity. Experiment 1 investigated the effect of target size and velocity on smooth pursuit induced by realistic images (color photographs of an apple and flower subtending 2° and 17°, respectively), in comparison with a small dot subtending a fraction of a degree. The extended stimuli were found to enhance smooth pursuit gain. Experiment 2 examined the absolute velocity limit of smooth pursuit elicited by the small dot and the effect of the extended targets on the velocity limit. The eye velocity for tracking the dot was found to be saturated at about 63 deg/sec while the saturation velocity occurred at higher velocities for the extended images. The difference in gain due to target size was significant between dot and the two extended stimuli, while no statistical difference exists between an apple and flower stimuli of wider angular extent. Detailed knowledge of the smooth pursuit eye movements is important for several areas of electronic imaging, in particular, assessing perceived motion blur of displayed objects.
Art, Aesthetics, and Perception
icon_mobile_dropdown
Human preference for individual colors
Stephen E. Palmer, Karen B. Schloss
Color preference is an important aspect of human behavior, but little is known about why people like some colors more than others. Recent results from the Berkeley Color Project (BCP) provide detailed measurements of preferences among 32 chromatic colors as well as other relevant aspects of color perception. We describe the fit of several color preference models, including ones based on cone outputs, color-emotion associations, and Palmer and Schloss's ecological valence theory. The ecological valence theory postulates that color serves an adaptive "steering' function, analogous to taste preferences, biasing organisms to approach advantageous objects and avoid disadvantageous ones. It predicts that people will tend to like colors to the extent that they like the objects that are characteristically that color, averaged over all such objects. The ecological valence theory predicts 80% of the variance in average color preference ratings from the Weighted Affective Valence Estimates (WAVEs) of correspondingly colored objects, much more variance than any of the other models. We also describe how hue preferences for single colors differ as a function of gender, expertise, culture, social institutions, and perceptual experience.
Aesthetics of color combinations
Karen B. Schloss, Stephen E. Palmer
The previous literature on the aesthetics of color combinations has produced confusing and conflicting claims. For example, some researchers suggest that color harmony increases with increasing hue similarity whereas others say it increases with hue contrast. We argue that this confusion is best resolved by considering three distinct judgments about color pairs: (a) preference for the pair as a whole, (b) perceived harmony of the two colors, and (c) preference for the figural color when viewed against the background color. Empirical support for this distinction shows that pair preference and harmony ratings both increase as hue similarity increases, but preference correlates more strongly with component color preferences and lightness contrast than does harmony. Although ratings of both pair preference and harmony decrease as hue contrast increases, ratings of figural color preference increase as hue contrast with the background increases. Our results refine and clarify well-known and often contradictory claims of artistic color theory.
Preference for art: similarity, statistics, and selling price
Daniel J. Graham, Jay D. Friedenberg, Cyrus H. McCandless, et al.
Factors governing human preference for artwork have long been studied but there remain many holes in our understanding. Bearing in mind contextual factors (both the conditions under which the art is viewed, and the state of knowledge viewers have regarding art) that play some role in preference, we assess in this paper three questions. First, what is the relationship between perceived similarity and preference for different types of art? Second, are we naturally drawn to certain qualities-and perhaps to certain image statistics-in art? And third, do social and economic forces tend to select preferred stimuli, or are these forces governed by non-aesthetic factors such as age, rarity, or artist notoriety? To address the first question, we tested the notion that perceived similarity predicts preference for three classes of paintings: landscape, portrait/still-life, and abstract works. We find that preference is significantly correlated with (a) the first principal component of similarity in abstract works; and (b) the second principal component for landscapes. However, portrait/still-life images did not show a significant correlation between similarity and preference, perhaps due to effects related to face perception. The preference data were then compared to a wide variety of image statistics relevant to early visual system coding. For landscapes and abstract works, nonlinear spatial and intensity statistics relevant to visual processing explained surprisingly large portions of the variance of preference. For abstract works, a quarter of the variance of preference rankings could be explained by a statistic gauging pixel sparseness. For landscape paintings, spatial frequency amplitude spectrum statistics explained one fifth of the variance of preference data. Consistent with results for similarity, image statistics for portrait/still-life works did not correlate significantly with preference. Finally, we addressed the role of value. If there are shared "rules" of preference, one might expect "free markets" to value art in proportion to its aesthetic appeal, at least to some extent. To assess the role of value, a further test of preference was performed on a separate set of paintings recently sold at auction. Results showed that the selling price of these works showed no correlation with preference, while basic statistics were significantly correlated with preference. We conclude that selling price, which could be seen as a proxy for a painting's "value," is not predictive of preference, while shared preferences may to some extent be predictable based on image statistics. We also suggest that contextual and semantic factors play an important role in preference given that image content appears to lead to greater divergence between similarity and preference ratings for representational works, and especially for artwork that prominently depicts faces. The present paper paves the way for a more complete understanding of the relationship between shared human preferences and image statistical regularities, and it outlines the basic geometry of perceptual spaces for artwork.
The death of the object: perceiving non-physical art
Leigh Markopoulos
This paper examines the modernist phenomenon of seeking subject material outside the realm of the purely physical. Starting with the Impressionist project to paint light or at least its effects, the paper traces a trajectory of related experimentation through Man Ray's "rayographs" or solarization images of the 20s and 30s, to the work of conceptual artists of the sixties and seventies who focused their attention on ephemeral media. For example, Michael Asher's sitespecific pressured air works, or Robert Barry's pioneering use of unorthodox materials such as inert gases or carrier waves. More recently Dan Flavin and James Turrell have focused their artistic endeavors on light and its color components, and the Belgian artist Ann Veronica Janssens has used light, color and fog to create interactive sculptures. Today what remains invisible to the human eye has taken on a more sinister, and often political, connotation of deeper forces at work as evinced most clearly by Trevor Paglen's The Other Night Sky (2008) series for which he tracked and photographed the movements of classified aircraft and geostationary satellites in the California night skies. All these works take the subject of art to be aesthetic perception, and consequently favor the means in which the experience of viewing is directed by the artist, rather than the traditional art object.
Rendering nothingness: reality and aesthetics in Haboku landscape for understanding cognition and computer interfaces
The Haboku Landscape of Sesshu Toyo is perhaps one of the finest examples of Japanese and Chinese monk landscapes in existence. We analyze the factors going into this painting from an artistic and aesthetic perspective, and we model the painting using MPEG-7 description. We examine the work done in rendering ink landscapes using computer-generated NPR. Finally we make some observations about measuring aesthetics in Chinese and Japanese ink painting.
How did Leonardo perceive himself? Metric iconography of da Vinci's self-portraits
Some eighteen portraits are now recognized of Leonardo in old age, consolidating the impression from his bestestablished self-portrait of an old man with long white hair and beard. However, his appearance when younger is generally regarded as unknown, although he was described as very beautiful as a youth. Application of the principles of metric iconography, the study of the quantitative analysis of the painted images, provides an avenue for the identification of other portraits that may be proposed as valid portraits of Leonardo during various stages of his life, by himself and by his contemporaries. Overall, this approach identifies portraits of Leonardo by Verrocchio, Raphael, Botticelli, and others. Beyond this physiognomic analysis, Leonardo's first known drawing provides further insight into his core motivations. Topographic considerations make clear that the drawing is of the hills behind Vinci with a view overlooking the rocky promontory of the town and the plain stretching out before it. The outcroppings in the foreground bear a striking resemblance to those of his unique composition, 'The Virgin of the Rocks', suggesting a deep childhood appreciation of this wild terrain. and an identification with that religious man of the mountains, John the Baptist, who was also the topic of Leonardo's last known painting. Following this trail leads to a line of possible selfportraits continuing the age-regression concept back to a self view at about two years of age.
Interactive Paper Session
icon_mobile_dropdown
No-reference image quality assessment based on localized gradient statistics: application to JPEG and JPEG2000
This paper presents a novel system that employs an adaptive neural network for the no-reference assessment of perceived quality of JPEG/JPEG2000 coded images. The adaptive neural network simulates the human visual system as a black box, avoiding its explicit modeling. It uses image features and the corresponding subjective quality score to learn the unknown relationship between an image and its perceived quality. Related approaches in literature extract a considerable number of features to form the input to the neural network. This potentially increases the system's complexity, and consequently, may affect its prediction accuracy. Our proposed method optimizes the feature-extraction stage by selecting the most relevant features. It shows that one can largely reduce the number of features needed for the neural network when using gradient-based information. Additionally, the proposed method demonstrates that a common adaptive framework can be used to support the quality estimation for both compression methods. The performance of the method is evaluated with a publicly available database of images and their quality score. The results show that our proposed no-reference method for the quality prediction of JPEG and JPEG2000 coded images has a comparable performance to the leading metrics available in literature, but at a considerably lower complexity.
Automated videography for residential communications
Andrew F. Kurtz, Carman Neustaedter, Andrew C. Blose
The current widespread use of webcams for personal video communication over the Internet suggests that opportunities exist to develop video communications systems optimized for domestic use. We discuss both prior and existing technologies, and the results of user studies that indicate potential needs and expectations for people relative to personal video communications. In particular, users anticipate an easily used, high image quality video system, which enables multitasking communications during the course of real-world activities and provides appropriate privacy controls. To address these needs, we propose a potential approach premised on automated capture of user activity. We then describe a method that adapts cinematography principles, with a dual-camera videography system, to automatically control image capture relative to user activity, using semantic or activity-based cues to determine user position and motion. In particular, we discuss an approach to automatically manage shot framing, shot selection, and shot transitions, with respect to one or more local users engaged in real-time, unscripted events, while transmitting the resulting video to a remote viewer. The goal is to tightly frame subjects (to provide more detail), while minimizing subject loss and repeated abrupt shot framing changes in the images as perceived by a remote viewer. We also discuss some aspects of the system and related technologies that we have experimented with thus far. In summary, the method enables users to participate in interactive video-mediated communications while engaged in other activities.
Efficient motion weighted spatio-temporal video SSIM index
Recently, Seshadrinathan and Bovik proposed the Motion-based Video Integrity Evaluation (MOVIE) index for VQA.1, 2 MOVIE utilized a multi-scale spatio-temporal Gabor filter bank to decompose the videos and to compute motion vectors. Apart from its psychovisual inspiration, MOVIE is an interesting option for VQA owing to its performance. However, the use of MOVIE in a practical setting may prove to be difficult owing to the presence of the multi-scale optical flow computation. In order to bridge the gap between the conceptual elegance of MOVIE and a practical VQA algorithm, we propose a new VQA algorithm - the spatio-temporal video SSIM based on the essence of MOVIE. Spatio-temporal video SSIM utilizes motion information computed from a block-based motion-estimation algorithm and quality measures using a localized set of oriented spatio-temporal filters. In this paper we explain the algorithm and demonstrate its conceptual similarity to MOVIE; we explore its computational complexity and evaluate its performance on the popular VQEG dataset. We show that the proposed algorithm allows for efficient FR VQA without compromising on the performance while retaining the conceptual elegance of MOVIE.
The impact of transmission errors on progressive 720 lines HDTV coded with H.264
Kjell Brunnström, Daniel Stålenbring, Martin Pettersson, et al.
TV sent over the networks based on the Internet Protocol i.e IPTV is moving towards high definition (HDTV). There has been quite a lot of work on how the HDTV is affected by different codecs and bitrates, but the impact of transmission errors over IP-networks have been less studied. The study was focusing on H.264 encoded 1280x720 progressive HDTV format and was comparing three different concealment methods for different packet loss rates. One is included in a propriety decoder, one is part of FFMPEG and different length of freezing. The target is to simulate what typically IPTV settop-boxes will do when encountering packet loss. Another aim is to study whether the presentation upscaled on the full HDTV screen or presented pixel mapped in a smaller area in the center of the sceen would have an effect on the quality. The results show that there were differences between the two packet loss concealment methods in FFMPEG and in the propriety codec. Freezing seemed to have similar effect as been reported before. For low rates of transmission errors the coding impairments has impact on the quality, but for higher degree of transmission errors these does not affect the quality, since they become overshadowed by transmission error. An interesting effect where the higher bitrate videos goes from having higher quality for lower degree of packet loss, to having lower quality than the lower bitrate video at higher packet loss, was discovered. The different way of presenting the video i.e. upscaled or not-upscaled was significant on the 95% level, but just about.
Divisive normalization in channelized Hotelling observer
Channelized Hotelling observers have been evaluated in signal detection in medical images. However, these model observers fall short of overestimate of detection performance compared with human observers. Here, we present a modified channelized Hotelling observer with divisive normalization mechanism. In this model, images first traverse a series of Gabor filters of different orientations, phases and spatial frequencies. Then, the channels outputs pass through a nonlinear process and pool in several channels. Finally, these channels responses are normalized through a divisive operation. Human performances of nodule detection in chest radiography are evaluated by 2AFC experiments. The same detection tasks are performed by the model observers. The results show that the modified channelized Hotelling observer can well predict the human performances.
Perceptual quality assessment of color images using adaptive signal representation
Perceptual image distortion measures can play a fundamental role in evaluating and optimizing imaging systems and image processing algorithms. Many existing measures are formulated to represent "just noticeable differences" (JNDs), as measured in psychophysical experiments on human subjects. But some image distortions, such as those arising from small changes in the intensity of the ambient illumination, are far more tolerable to human observers than those that disrupt the spatial structure of intensities and colors. Here, we introduce a framework in which we quantify these perceptual distortions in terms of "just intolerable differences" (JIDs). As in (Wang & Simoncelli, Proc. ICIP 2005), we first construct a set of spatio-chromatic basis functions to approximate (as a first-order Taylor series) a set of "non-structural" distortions that result from changes in lighting/imaging/viewing conditions. These basis functions are defined on local image patches, and are adaptive, in that they are computed as functions of the undistorted reference image. This set is then augmented with a complete basis arising from a linear approximation of the CIELAB color space. Each basis function is weighted by a scale factor to convert it into units corresponding to JIDs. Each patch of the error image is represented using this weighted overcomplete basis, and the overall distortion metric is computed by summing the squared coefficients over all such (overlapping) patches. We implement an example of this metric, incorporating invariance to small changes in the viewing and lighting conditions, and demonstrate that the resulting distortion values are more consistent with human perception than those produced by CIELAB or S-CIELAB.
Study of recognizing multiple persons' complicated hand gestures from the video sequence acquired by a moving camera
Luo Dan, Jun Ohya
Recognizing hand gestures from the video sequence acquired by a dynamic camera could be a useful interface between humans and mobile robots. We develop a state based approach to extract and recognize hand gestures from moving camera images. We improved Human-Following Local Coordinate (HFLC) System, a very simple and stable method for extracting hand motion trajectories, which is obtained from the located human face, body part and hand blob changing factor. Condensation algorithm and PCA-based algorithm was performed to recognize extracted hand trajectories. In last research, this Condensation Algorithm based method only applied for one person's hand gestures. In this paper, we propose a principal component analysis (PCA) based approach to improve the recognition accuracy. For further improvement, temporal changes in the observed hand area changing factor are utilized as new image features to be stored in the database after being analyzed by PCA. Every hand gesture trajectory in the database is classified into either one hand gesture categories, two hand gesture categories, or temporal changes in hand blob changes. We demonstrate the effectiveness of the proposed method by conducting experiments on 45 kinds of sign language based Japanese and American Sign Language gestures obtained from 5 people. Our experimental recognition results show better performance is obtained by PCA based approach than the Condensation algorithm based method.