Show all abstracts
View Session
- Front Matter: Volume 8291
- Keynote Session
- Computational Photography
- Material Perception
- Perceptual Image Quality
- Multisensory Integration and Brain Plasticity
- Stereoscopic 3D Image Quality: Quantifying Perception and Comfort: Joint Session with Conference 8288
- Medical Image Quality: Features, Tasks and Semantics
- Visual Attention: Task and Image Quality: Joint Session with Conference 8293
- Art Theory, Perception, and Rendering
- Computer Vision and Image Analysis of Art
- Interactive Paper Session: Art and Perception
- Interactive Paper Session: Perception and Image Quality
Front Matter: Volume 8291
Front Matter: Volume 8291
Show abstract
This PDF file contains the front matter associated with SPIE Proceedings Volume 8291, including the Title Page, Copyright information, Table of Contents, and the Conference Committee listing.
Keynote Session
The general solution to HDR rendering
Show abstract
Our High-Dynamic-Range (HDR) world is the result of nonuniform illumination. We like to believe that 21st century
technology makes it possible to accurately reproduce any scene. On further study, we find that scene rendition remains a
best compromise. Despite all the remarkable accomplishments in digital imaging, we cannot capture and reproduce the
light in the world exactly. With still further study, we find that accurate reproduction is not necessary. We need an
interdisciplinary study of image making - painting, photography and image processing - to find the general solution.
HDR imaging would be very confusing, without two observations that resolve many paradoxes. First, optical veiling
glare, that depends on the scene content, severely limits the range of light on cameras' sensors, and on retinas. Second,
the neural spatial image processing in human vision counteracts glare with variable scene dependent responses. The
counter actions of these optical and neural processes shape the goals of HDR imaging. Successful HDR increases the
apparent contrast of details lost in the shadows and highlights of conventional images. They change the spatial
relationships by altering the local contrast of edges and gradients. The goal of HDR imaging is displaying calculated
appearance, rather than accurate light reproduction. By using this strategy we can develop universal algorithms that
process all images, LDR and HDR, achromatic and color, by mimicking human vision. The study of the general solution
for HDR imaging incorporates painting photography, vision research, color constancy and digital image processing.
Computational Photography
Application of non-linear transform coding to image processing
Show abstract
Sparse coding learns its basis non-linearly, but the basis elements are still linearly combined to form an image.
Is this linear combination of basis elements a good model for natural images? We here use a non-linear synthesis
rule, such that at each location in the image the point-wise maximum over all basis elements is used to synthesize
the image. We present algorithms for image approximation and basis learning using this synthesis rule. With
these algorithms we explore the the pixel-wise maximum over the basis elements as an alternative image model
and thus contribute to the problem of finding a proper representation of natural images.
How to make a small phone camera shoot like a big DSLR: creating and fusing multi-modal exposure series
Thomas Binder,
Florian Kriener,
Christian Wichner,
et al.
Show abstract
The present work aims at improving the image quality of low-cost cameras based on multiple exposures, machine
learning, and a perceptual quality measure. The particular implementation consists of two cameras, one being
a high-quality DSLR, the other part of a cell phone. The cameras are connected via USB. Since the system is
designed to take many exposures of the same scene, a stable mechanical coupling of the cameras and the use of
a tripod are required. Details on the following issues are presented: design aspects of the mechanical coupling of
the cameras, camera control via FCam and the Picture Transfer Protocol (PTP), further aspects of the design of
the control software, and post processing of the exposures from both cameras. The cell phone images are taken
with different exposure times and different focus settings and are simultaneously fused. By using the DSLR
image as a reference, the parameters of the fusion scheme are learned from examples and can be used to optimize
the design of the cell phone. First results show that the depth of field can be extended, the dynamic range can
be improved and the noise can be reduced.
Single lens 3D-camera with extended depth-of-field
Christian Perwaß,
Lennart Wietzke
Show abstract
Placing a micro lens array in front of an image sensor transforms a normal camera into a single lens 3D camera,
which also allows the user to change the focus and the point of view after a picture has been taken. While
the concept of such plenoptic cameras is known since 1908, only recently the increased computing power of
low-cost hardware and the advances in micro lens array production, have made the application of plenoptic
cameras feasible. This text presents a detailed analysis of plenoptic cameras as well as introducing a new type
of plenoptic camera with an extended depth of field and a maximal effective resolution of up to a quarter of the
sensor resolution.
3D holoscopic video imaging system
Show abstract
Since many years, integral imaging has been discussed as a technique to overcome the limitations of standard still
photography imaging systems where a three-dimensional scene is irrevocably projected onto two dimensions. With the
success of 3D stereoscopic movies, a huge interest in capturing three-dimensional motion picture scenes has been
generated. In this paper, we present a test bench integral imaging camera system aiming to tailor the methods of light
field imaging towards capturing integral 3D motion picture content. We estimate the hardware requirements needed to
generate high quality 3D holoscopic images and show a prototype camera setup that allows us to study these
requirements using existing technology. The necessary steps that are involved in the calibration of the system as well as
the technique of generating human readable holoscopic images from the recorded data are discussed.
Material Perception
Predictive rendering for accurate material perception: modeling and rendering fabrics
Show abstract
In computer graphics, rendering algorithms are used to simulate the appearance of objects and materials in a wide
range of applications. Designers and manufacturers rely entirely on these rendered images to previsualize scenes and
products before manufacturing them. They need to differentiate between different types of fabrics, paint finishes,
plastics, and metals, often with subtle differences, for example, between silk and nylon, formaica and wood. Thus,
these applications need predictive algorithms that can produce high-fidelity images that enable such subtle material
discrimination.
From color to appearance in the real world
Marc S. Ellens,
Francis Lamy
Show abstract
X-Rite's declared ambition is to create a digital ecosystem for appearance; a daunting challenge that has many
dimensions and has proven so massive that all previous attempts efforts have failed so far. After having invested three
years in exploring the problem, we can now deliver the first elements of answers and the practical path to tackle this
massive undertaking.
We will explore the practical implications of the intermediation of two stages between color and full appearance,
extended color and augmented color, and how these steps, rooted in the realities of the ecosystem they serve constitute
vectors and enablers of a more effective transition.
We will survey the road map implication in the design for measurement and capture instrumentations, packaging digital
formats and delivery infrastructure as well as rendering and display devices that will enable true value creation built on
appearance attributes.
Mixing material modes
Show abstract
We present a novel setup in which real objects made of different materials can be mixed optically. For the materials we
chose mutually very different materials, which we assume to represent canonical modes. The appearance of 3D objects
consisting of any material can be described as linear superposition of 3D objects of different canonical materials, as in
"painterly mixes". In this paper we studied mixtures of matte, glossy and velvety objects, representing diffuse, forward
and asperity scattering modes.
Observers rated optical mixtures on four scales: matte-glossy, hard-soft, cold-warm, light-heavy. The ratings were done
for the three combinations of glossy, matte, and velvety green birds. For each combination we tested 7 weightings.
Matte-glossy ratings varied most over the stimuli and showed highest (most glossy) scores for the rather glossy bird and
lowest (most matte) for the rather velvety bird. Hard-soft and cold-warm were rated highest (most soft and warm) for
rather velvety and lowest (most hard and cold) for rather glossy birds. Light-heavy was rated only somewhat higher
(heavier) for rather glossy birds. The ratings varied systematically with the weights of the contributions, corresponding to
gradually changing mixtures of material modes. We discuss a range of possibilities for our novel setup.
Tangible display systems: bringing virtual surfaces into the real world
Show abstract
We are developing tangible display systems that enable natural interaction with virtual surfaces. Tangible display systems are based on modern mobile devices that incorporate electronic image displays, graphics hardware, tracking systems, and digital cameras. Custom software allows the orientation of a device and the position of the observer to be tracked in real-time. Using this information, realistic images of surfaces with complex textures and material properties illuminated by environment-mapped lighting, can be rendered to the screen at interactive rates. Tilting or moving in front of the device produces realistic changes in surface lighting and material appearance. In this way, tangible displays allow virtual surfaces to be observed and manipulated as naturally as real ones, with the added benefit that surface geometry and material properties can be modified in real-time. We demonstrate the utility of tangible display systems in four application areas: material appearance research; computer-aided appearance design; enhanced access to digital library and museum collections; and new tools for digital artists.
Perceptual Image Quality
Quality estimation for images and video with different spatial resolutions
A. Murat Demirtas,
Hamid Jafarkhani,
Amy R. Reibman
Show abstract
Full-reference (FR) quality estimators (QEs) for images and video are typically designed assuming that the
displayed, degraded image has the same spatial resolution as the original, reference image. No-reference (NR)
QEs use no knowledge about the reference image to assess quality of the displayed image. However, in many
practical systems, a reference image may be available that has a different spatial resolution than the displayed
image.
In this paper, we explore objective quality estimation when the displayed image to be evaluated has a different
spatial resolution than the reference image. We begin by identifying a range of potential weaknesses that might
be present in a QE designed for this situation. Then, we create pairs of images with potential False Ties, in which
a QE estimates that two images have equal quality while viewers disagree. Our subjective tests demonstrate that existing QEs do not accurately assess quality in a variety of scenarios for which images have different spatial resolutions.
Automatic parameter prediction for image denoising algorithms using perceptual quality features
Show abstract
A natural scene statistics (NSS) based blind image denoising approach is proposed, where denoising is performed
without knowledge of the noise variance present in the image. We show how such a parameter estimation can
be used to perform blind denoising by combining blind parameter estimation with a state-of-the-art denoising
algorithm.1 Our experiments show that for all noise variances simulated on a varied image content, our approach
is almost always statistically superior to the reference BM3D implementation in terms of perceived visual quality
at the 95% confidence level.
Viewer preferences for classes of noise removal algorithms for high definition content
Show abstract
Perceived video quality studies were performed on a number of key classes of noise removal algorithms to determine
viewer preference. The noise removal algorithm classes represent increase in complexity from linear filter to nonlinear
filter to adaptive filter to spatio-temporal filter. The subjective results quantify the perceived quality improvements that
can be obtained with increasing complexity. The specific algorithm classes tested include: linear spatial one channel
filter, nonlinear spatial two-channel filter, adaptive nonlinear spatial filter, multi-frame spatio-temporal adaptive filter.
All algorithms were applied on full HD (1080P) content. Our subjective results show that spatio-temporal (multi-frame)
noise removal algorithm performs best amongst the various algorithm classes. The spatio-temporal algorithm
improvement compared to original video sequences is statistically significant. On the average, noise-removed video
sequences are preferred over original (noisy) video sequences. The Adaptive bilateral and non-adaptive bilateral two
channel noise removal algorithms perform similarly on the average thus suggesting that a non-adaptive parameter tuned
algorithm may be adequate.
Image quality assessment in the low quality regime
Guilherme O. Pinto,
Sheila S. Hemami
Show abstract
Traditionally, image quality estimators have been designed and optimized to operate over the entire quality range
of images in a database, from very low quality to visually lossless. However, if quality estimation is limited to a
smaller quality range, their performances drop dramatically, and many image applications only operate over such
a smaller range. This paper is concerned with one such range, the low-quality regime, which is defined as the
interval of perceived quality scores where there exists a linear relationship between the perceived quality scores
and the perceived utility scores and exists at the low-quality end of image databases. Using this definition, this
paper describes a subjective experiment to determine the low-quality regime for databases of distorted images
that include perceived quality scores but not perceived utility scores, such as CSIQ and LIVE. The performances
of several image utility and quality estimators are evaluated in the low-quality regime, indicating that utility
estimators can be successfully applied to estimate perceived quality in this regime. Omission of the lowestfrequency
image content is shown to be crucial to the performances of both kinds of estimators. Additionally,
this paper establishes an upper-bound for the performances of quality estimators in the LQR, using a family
of quality estimators based on VIF. The resulting optimal quality estimator indicates that estimating quality
in the low-quality regime is robust to exact frequency pooling weights, and that near-optimal performance can
be achieved by a variety of estimators providing that they substantially emphasize the appropriate frequency
content.
Multisensory Integration and Brain Plasticity
The question of simultaneity in multisensory integration
Lynnette Leone,
Mark E. McCourt
Show abstract
Early reports of audiovisual (AV) multisensory integration (MI) indicated that unisensory stimuli must evoke
simultaneous physiological responses to produce decreases in reaction time (RT) such that for unisensory stimuli with
unequal RTs the stimulus eliciting the faster RT had to be delayed relative to the stimulus eliciting the slower RT. The
"temporal rule" states that MI depends on the temporal proximity of unisensory stimuli, the neural responses to which
must fall within a window of integration. Ecological validity demands that MI should occur only for simultaneous events
(which may give rise to non-simultaneous neural activations). However, spurious neural response simultaneities which
are unrelated to singular environmental multisensory occurrences must somehow be rejected. Using an RT/race model
paradigm we measured AV MI as a function of stimulus onset asynchrony (SOA: ±200 ms, 50 ms intervals) under fully
dark adapted conditions for visual (V) stimuli that were either weak (scotopic 525 nm flashes; 511 ms mean RT) or
strong (photopic 630 nm flashes; 356 ms mean RT). Auditory (A) stimulus (1000 Hz pure tone) intensity was constant.
Despite the 155 ms slower mean RT to the scotopic versus photopic stimulus, facilitative AV MI in both conditions
nevertheless occurred exclusively at an SOA of 0 ms. Thus, facilitative MI demands both physical and physiological
simultaneity. We consider the mechanisms by which the nervous system may take account of variations in response
latency arising from changes in stimulus intensity in order to selectively integrate only those physiological simultaneities
that arise from physical simultaneities.
The spatiotopic 'visual' cortex of the blind
Show abstract
Visual cortex activity in the blind has been shown in sensory tasks. Can it be activated in memory tasks? If so, are
inherent features of its organization meaningfully employed? Our recent results in short-term blindfolded subjects
imply that human primary visual cortex (V1) may operate as a modality-independent 'sketchpad' for working
memory (Likova, 2010a). Interestingly, the spread of the V1 activation approximately corresponded to the spatial
extent of the images in terms of their angle of projection to the subject. We now raise the questions of whether under
long-term visual deprivation V1 is also employed in non-visual memory task, in particular in congenitally blind
individuals, who have never had visual stimulation to guide the development of the visual area organization, and
whether such spatial organization is still valid for the same paradigm that was used in blindfolded individuals. The
outcome has implications for an emerging reconceptualization of the principles of brain architecture and its
reorganization under sensory deprivation. Methods: We used a novel fMRI drawing paradigm in congenitally and
late-onset blind, compared with sighted and blindfolded subjects in three conditions of 20s duration, separated by
20s rest-intervals, (i) Tactile Exploration: raised-line images explored and memorized; (ii) Tactile Memory Drawing:
drawing the explored image from memory; (iii) Scribble: mindless drawing movements with no memory component.
Results and Conclusions: V1 was strongly activated for Tactile Memory Drawing and Tactile Exploration in these
totally blind subjects. Remarkably, after training, even in the memory task, the mapping of V1 activation largely
corresponded to the angular projection of the tactile stimuli relative to the ego-center (i.e., the effective visual angle
at the head); beyond this projective boundary, peripheral V1 signals were dramatically reduced or even suppressed.
The matching extent of the activation in the congenitally blind rules out vision-based explanatory mechanisms, and
supports the more radical idea of V1 as a modality-independent 'projection screen' or a 'sketchpad', whose mapping
scales to the projective dimensions of objects explored in the peri-personal space.
Acoustic-tactile rendering of visual information
Show abstract
In previous work, we have proposed a dynamic, interactive system for conveying visual information via hearing
and touch. The system is implemented with a touch screen that allows the user to interrogate a two-dimensional
(2-D) object layout by active finger scanning while listening to spatialized auditory feedback. Sound is used as the
primary source of information for object localization and identification, while touch is used both for pointing and
for kinesthetic feedback. Our previous work considered shape and size perception of simple objects via hearing
and touch. The focus of this paper is on the perception of a 2-D layout of simple objects with identical size and
shape. We consider the selection and rendition of sounds for object identification and localization. We rely on
the head-related transfer function for rendering sound directionality, and consider variations of sound intensity
and tempo as two alternative approaches for rendering proximity. Subjective experiments with visually-blocked
subjects are used to evaluate the effectiveness of the proposed approaches. Our results indicate that intensity
outperforms tempo as a proximity cue, and that the overall system for conveying a 2-D layout is quite promising.
Stereoscopic 3D Image Quality: Quantifying Perception and Comfort: Joint Session with Conference 8288
Apparent stereo: the Cornsweet illusion can enhance perceived depth
Piotr Didyk,
Tobias Ritschel,
Elmar Eisemann,
et al.
Show abstract
It is both a technical and an artistic challenge to depict three-dimensional content using stereo equipment and
a flat two-dimensional screen. On the one hand, the content needs to fit within the limits of a given display
technology and at the same time achieve a comfortable viewing experience. Given the technological advances
of 3D equipment, especially the latter increases in importance. Modifications to stereo content become
necessary that aim at flattening or even removing binocular disparity to adjust the 3D content to match the
comfort zone in which the clash between accommodation and vergence stays acceptable. However, applying
such modifications can lead to a reduction of crucial depth details. One promising direction is backwardcompatible
stereo, for which the disparity is low enough that overlaid stereo pairs seem almost identical. It
builds upon the Craik-O'Brien-Cornsweet effect, a visual illusion, which uses so-called Cornsweet profiles to
produce a local contrast that leads to a perceived brightness increase. Similarly, Cornsweet profiles in disparity
can lead to an illusion of depth. Applying them skilfully at depth discontinuities allows for a reduction of the
overall disparity range to ensure a comfortable yet convincing stereo experience. The present work extends
the previous idea by showing that Cornsweet profiles can also be used to enhance the 3D impression. This
operation can help in regions where the disparity range was compressed, but also to emphasize parts of a scene.
A user study measures the performance of backward-compatible stereo and our disparity enhancement.
Perceived depth of multi parallel-overlapping-transparent-stereoscopic-surfaces
Show abstract
The geometric relational expression of horizontal disparity, viewing distance, and depth magnitude between objects in
stereopsis suggests that, for a given viewing distance, the magnitude of perceived depth of objects would be the same as
long as the disparity magnitudes are the same. However, we found that this is not necessarily the case for random dot
stereograms that depict parallel-overlapping-transparent-stereoscopic-surfaces (POTS). Data from three experiments
indicated that, when the stimulus size is relatively large (e.g., 13 x 20 arc deg), the magnitude of reproduced depth
between two POTS is larger by 6% than that for an identical pair of stereo-surfaces but with an additional stereo-surface
located between the pair. The results are discussed in terms of global stereopsis which "operates" for relatively large
stimulus sizes.
Diagnosing perceptual distortion present in group stereoscopic viewing
Melissa Burton,
Brice Pollock,
Jonathan W. Kelly,
et al.
Show abstract
Stereoscopic displays are an increasingly prevalent tool for experiencing virtual environments, and the inclusion of
stereo has the potential to improve distance perception within the virtual environment. When multiple users
simultaneously view the same stereoscopic display, only one user experiences the projectively correct view of the virtual
environment, and all other users view the same stereoscopic images while standing at locations displaced from the center
of projection (CoP). This study was designed to evaluate the perceptual distortions caused by displacement from the
CoP when viewing virtual objects in the context of a virtual scene containing stereo depth cues. Judgments of angles
were distorted after leftward and rightward displacement from the CoP. Judgments of object depth were distorted after
forward and backward displacement from the CoP. However, perceptual distortions of angle and depth were smaller
than predicted by a ray-intersection model based on stereo viewing geometry. Furthermore, perceptual distortions were
asymmetric, leading to different patterns of distortion depending on the direction of displacement. This asymmetry also
conflicts with the predictions of the ray-intersection model. The presence of monocular depth cues might account for
departures from model predictions.
3D discomfort from vertical and torsional disparities in natural images
Show abstract
The two major aspects of camera misalignment that cause visual discomfort when viewing images on a 3D display
are vertical and torsional disparities. While vertical disparities are uniform throughout the image, torsional rotations
introduce a range of disparities that depend on the location in the image. The goal of this study was to determine
the discomfort ranges for the kinds of natural image that people are likely to take with 3D cameras rather than the
artificial line and dot stimuli typically used for laboratory studies. We therefore assessed visual discomfort on a
five-point scale from 'none' to 'severe' for artificial misalignment disparities applied to a set of full-resolution
images of indoor scenes.
For viewing times of 2 s, discomfort ratings for vertical disparity in both 2D and 3D images rose rapidly toward
the discomfort level of 4 ('severe') by about 60 arcmin of vertical disparity. Discomfort ratings for torsional
disparity in the same image rose only gradually, reaching only the discomfort level of 3 ('strong') by about 50 deg
of torsional disparity. These data were modeled with a second-order hyperbolic compression function incorporating
a term for the basic discomfort of the 3D display in the absence of any misalignments through a Minkowski norm.
These fits showed that, at a criterion discomfort level of 2 ('moderate'), acceptable levels of vertical disparity were
about 15 arcmin. The corresponding values for the torsional disparity were about 30 deg of relative orientation.
Medical Image Quality: Features, Tasks and Semantics
On the development of expertise in interpreting medical images
Show abstract
Medical images represent a core portion of the information clinicians utilize to render diagnostic and treatment decisions.
Fundamentally, viewing a medical image involves two basic processes - visually inspecting the image (visual
perception) and rendering an interpretation (cognition). The interpretation is often followed by a recommendation. The
likelihood of error in the interpretation of medical images is unfortunately not negligible. Errors occur and patients' lives
are impacted. Thus we need to understand how clinicians interact with the information in an image during the
interpretation process. We also need to understand how clinicians develop expertise throughout their careers and why
some people are better at interpreting medical images than others. If we can better understand how expertise develops,
perhaps we can develop better training programs, incorporate more effective ways of teaching image interpretation into
the medical school and residency curriculums, and create new tools that would enhance and perhaps speed up the
learning process. With improved understanding we can also develop ways to further improve decision-making in general
and at every level of the medical imaging profession, thus improving patient care. The science of medical image
perception is dedicated to understanding and improving the clinical interpretation process.
Modeling observer performance for optimizing medical image acquisition and processing
Show abstract
Medical images involve several steps of acquisition, processing, display, and reading to convey information
from the body of a patient to a clinician who needs to decide on a course of action. The effect of all these steps is a large
number of potentially coupled parameters that may need to be adjusted in order to optimize diagnostic task performance.
Model observers were developed to assist this process by providing an objective computational tool that is predictive of
visual performance. The defining characteristic of model observers is the generation of a decision variable from which
the task is readily performed. Various models for generating the decision variable are reviewed, and various challenges
and opportunities for model observers are described.
Evaluation of HVS models in the application of medical image quality assessment
Show abstract
In this study, four of the most widely used Human Visual System (HVS) models are applied on Magnetic Resonance
(MR) images for signal detection task. Their performances are evaluated against gold standard derived from radiologists'
majority decision. The task-based image quality assessment requires taking into account the human perception
specificities, for which various HVS models have been proposed. However to our knowledge, no work was conducted to
evaluate and compare the suitability of these models with respect to the assessment of medical image qualities.
This pioneering study investigates the performances of different HVS models on medical images in terms of
approximation to radiologist performance. We propose to score the performance of each HVS model using the AUC
(Area Under the receiver operating characteristic Curve) and its variance estimate as the figure of merit. The
radiologists' majority decision is used as gold standard so that the estimated AUC measures the distance between the
HVS model and the radiologist perception. To calculate the variance estimate of AUC, we adopted the one-shot method
that is independent of the HVS model's output range. The results of this study will help to provide arguments to the
application of some HVS model on our future medical image quality assessment metric.
Perceptual challenges to computer-aided diagnosis
Show abstract
We review the motivation and development of computer-aided diagnosis (CAD) in diagnostic medical imaging,
particularly in breast cancer screening. After briefly describe in generic terms of typical CAD methods, we focus on the
question of whether CAD helps improve diagnostic accuracy. We review both studies that support the notion that CAD
helps improve diagnostic accuracy and studies that do not. We further identify difficulties in conducting this type of
evaluation studies and suggest areas of perceptual challenges to applications of CAD.
Satisfaction of search experiments in advanced imaging
Show abstract
The objective of our research is to understand the perception of multiple abnormalities in an imaging examination
and to develop strategies for improved diagnostic. We are one of the few laboratories in the world pursuing the goal of
reducing detection errors through a better understanding of the underlying perceptual processes involved. Failure to
detect an abnormality is the most common class of error in diagnostic imaging and generally is considered the most
serious by the medical community. Many of these errors have been attributed to "satisfaction of search," which occurs
when a lesion is not reported because discovery of another abnormality has "satisfied" the goal of the search. We have
gained some understanding of the mechanisms of satisfaction of search (SOS) traditional radiographic modalities.
Currently, there are few interventions to remedy SOS error. For example, patient history that the prompts specific
abnormalities, protects the radiologist from missing them even when other abnormalities are present. The knowledge
gained from this programmatic research will lead to reduction of observer error.
Integrating human- and computer-based approaches to feature extraction and analysis
Show abstract
A major goal of imaging systems is to help doctors, scientists, engineers, and analysts identify patterns and features
in complex data. There is a wide range of imaging, visualization, and graphics systems, ranging from fully
automatic systems that extract features algorithmically to interactive systems that allow the analyst to manipulate
visual representations directly to discover features. Although automatic feature-extraction algorithms are often
directed by human observation, and human pattern recognition is often supported by algorithmic tools, very little
work has been done to explore how to capitalize on the interaction between human and machine pattern
recognition. This paper introduces a preliminary roadmap for guiding research in this space. One key concept is the
explicit consideration of the task, which determines which methods and tools will be most effective. The second is
the explicit inclusion of a "human-in-the-loop," who interacts with the data, the algorithms, and representations, to
identify meaningful features. The third is the inclusion of a process for creating a mathematical representation of
the features that have been "carved out" by the human analyst, for use in comparison, database query or analysis.
Visual Attention: Task and Image Quality: Joint Session with Conference 8293
Examining the effect of task on viewing behavior in videos using saliency maps
Show abstract
Research has shown that when viewing still images, people will look at these images in a different manner if instructed
to evaluate their quality. They will tend to focus less on the main features of the image and, instead, scan the entire image
area looking for clues for its level of quality. It is questionable, however, whether this finding can be extended to videos
considering their dynamic nature. One can argue that when watching a video the viewer will always focus on the
dynamically changing features of the video regardless of the given task. To test whether this is true, an experiment was
conducted where half of the participants viewed videos with the task of quality evaluation while the other half were
simply told to watch the videos as if they were watching a movie on TV or a video downloaded from the internet. The
videos contained content which was degraded with compression artifacts over a wide range of quality. An eye tracking
device was used to record the viewing behavior in both conditions. By comparing the behavior during each task, it was
possible to observe a systematic difference in the viewing behavior which seemed to correlate to the quality of the
videos.
Characterizing eye movements during temporal and global quality assessment of h.264 compressed video sequences
Claire Mantel,
Nathalie Guyader,
Patricia Ladret,
et al.
Show abstract
Studies have shown that the deployment of visual attention is closely link to the assessment of image or video quality, though this link is not yet fully understood. The influence of rating temporal quality of compressed videos over the way an observer deploys his attention is investigated in this paper.
We set-up a subjective experiment in which the eye movements of observers are recorded during three different tasks: a free-viewing task (FT), a global quality assessment task and a temporal quality assessment task. The FT acts as a reference to which we compare the eye movements during the two other tasks.
As previously shown, observers assessing global quality gaze at locations dissimilar to those fixated during the FT. For temporal quality assessment, it seems that the fixated locations are closer to FT than the global quality assessment fixated locations.
Our results suggest that the locations observers look at do not depend on the displayed video quality level. Quality however influences the way participants look at videos: the lower the quality, the longer they gaze at a precise location. The area fixated seems to be much smaller during the quality assessment tasks than during the FT for either perfect or poor quality level.
The evolution over time of all indicators suggests that, during the first 1 or 2 seconds, the signal properties of the videos are the main attractors for the participants' eye movements. Instructions only seem to play a role afterwards on the deployment of the participants' visual attention.
A compressed sensing model of crowding in peripheral vision
Show abstract
We here model peripheral vision in a compressed sensing framework as a strategy of optimally guessing what
stimulus corresponds to a sparsely encoded peripheral representation, and find that typical letter-crowding effects
naturally arise from this strategy. The model is simple as it consists of only two convergence stages. We apply
the model to the problem of crowding effects in reading. First, we show a few instructive examples of letter
images that were reconstructed from encodings with different convergence rates. Then, we present an initial
analysis of how the choice of model parameters affects the distortion of isolated and flanked letters.
Foveated self-similarity in nonlocal image filtering
Show abstract
Nonlocal image filters suppress noise and other distortions by searching for similar patches at different locations
within the image, thus exploiting the self-similarity present in natural images. This similarity is typically assessed
by a windowed distance of the patches pixels. Inspired by the human visual system, we introduce a patch foveation
operator and measure patch similarity through a foveated distance, where each patch is blurred with spatially
variant point-spread functions having standard deviation increasing with the spatial distance from the patch
center. In this way, we install a different form of self-similarity in images: the foveated self-similarity.
We consider the Nonlocal Means algorithm (NL-means) for the removal of additive white Gaussian noise as
a simple prototype of nonlocal image filtering and derive an explicit construction of its corresponding foveation
operator, thus yielding the Foveated NL-means algorithm.
Our analysis and experimental study show that, to the purpose of image denoising, the foveated self-similarity
can be a far more effective regularity assumption than the conventional windowed self-similarity. In the comparison
with NL-means, the proposed foveated algorithm achieves a substantial improvement in denoising quality,
according to both objective criteria and visual appearance, particularly due to better contrast and sharpness.
Moreover, foveation is introduced at a negligible cost in terms of computational complexity.
A statistical study of the correlation between interest points and gaze points
Show abstract
This study intends to measure the degree of correlation/similarity between the subjective gaze points (obtained
by eye tracking experiments) and the objective interest points of several well-known detectors such as Harris and
SURF. For each of the latter, we look for the best setting in term of maximization of likeness with the gaze points.
For this task, the Earth Mover's Distance (EMD) is used to compare two data-sets with different cardinalities.
We also used ANOVA to measure the influence of each parameter involved in the detectors' settings as well
as the possible introduced bias. The conclusions of this study are related to the suitability of each detector to
estimate the subjective gaze points.
Interest point analysis as a model for the Poggendorff illusion
Show abstract
This paper describes a recognition mechanism based on the relationships between interest points and their properties that is applied to the problem of modelling the Poggendorff illusion. The recognition mechanism is shown to perform in the same manner as human
vision on the standard illusion and reduced effects are modelled on a variant without parallels. The model shows that the effect can
be explained as a perceptual compromise between alignment of the elements in the oblique axis and their displacement from each
other in the vertical. In addition an explanation is offered how obtuse angled variants of the Poggendorff figures yield stronger effects than the acute angled variants.
Art Theory, Perception, and Rendering
The perception of art and the science of perception
Robert Pepperell
Show abstract
For many centuries, artists have studied the nature of visual experience and how to convincingly render what we see. The
results of these investigations can be found in all the countless artworks deposited in museums and galleries around the
world. Works of art represent a rich source of ideas and understanding about how the world appears to us, and only
relatively recently have those interested in the science of vision started to appreciate the many discoveries made by
artists in this field. In this paper I will discuss some key insights into vision and perception revealed by artists, and show
how they can help current thinking in science and technology about how best to understand the process of seeing. In
particular, I will suggest some artistic ideas continue to present fundamental challenges to conventional ideas about the
nature of visual experience and how it is represented.
Paintings, photographs, and computer graphics are calculated appearances
Show abstract
Painters reproduce the appearances they see, or visualize. The entire human visual system is the first part of that process,
providing extensive spatial processing. Painters have used spatial techniques since the Renaissance to render HDR
scenes. Silver halide photography responds to the light falling on single film pixels. Film can only mimic the retinal
response of the cones at the start of the visual process. Film cannot mimic the spatial processing in humans. Digital
image processing can. This talk studies three dramatic visual illusions and uses the spatial mechanisms found in human
vision to interpret their appearances.
Image integrity and aesthetics: towards a more encompassing definition of visual quality
Show abstract
Visual quality is a multifaceted quantity that depends on multiple attributes of the image/video. According to Keelan's
definition, artifactual attributes concern features of the image that when visible, are annoying and compromise the
integrity of the image. Aesthetic attributes instead depend on the observer's personal taste. Both types of attributes have
been studied in the literature in relation to visual quality, but never in conjunction with each other. In this paper we
perform a psychometric experiment to investigate how artifactual and aesthetic attributes interact, and how they affect
the viewing behavior. In particular, we studied to what extent the appearance of artifacts impacts the aesthetic quality of
images. Our results indicate that indeed image integrity somehow influences the aesthetic quality scores. By means of an
eye-tracker, we also recorded and analyzed the viewing behavior of our participants while scoring aesthetic quality.
Results reveal that, when scoring aesthetic quality, viewing behavior significantly departs from the natural free looking,
as well as from the viewing behavior observed for integrity scoring.
Depicting 3D shape using lines
Doug DeCarlo
Show abstract
Over the last few years, researchers in computer graphics have developed sophisticated mathematical descriptions of lines
on 3D shapes that can be rendered convincingly as strokes in drawings. These innovations highlight fundamental questions
about how human perception takes strokes in drawings as evidence of 3D structure. Answering these questions will lead to
a greater scientific understanding of the flexibility and richness of human perception, as well as to practical techniques
for synthesizing clearer and more compelling drawings. This paper reviews what is known about the mathematics
and perception of computer-generated line drawings of shape and motivates an ongoing program of research to better
characterize the shapes people see when they look at such drawings.
Box spaces in pictorial space: linear perspective versus templates
Show abstract
In the past decades perceptual (or perceived) image quality has been one of the most important criteria for evaluating
digitally processed image and video content. With the growing popularity of new media like stereoscopic displays there
is a tendency to replace image quality with viewing experience as the ultimate criterion. Adopting such a high-level
psychological criterion calls for a rethinking of the premises underlying human judgment. One premise is that perception
is about accurately reconstructing the physical world in front of you ("inverse optics"). That is, human vision is striving
for veridicality. The present study investigated one of its consequences, namely, that linear perspective will always yield
the correct description of the perceived 3D geometry in 2D images. To this end, human observers adjusted the frontal
view of a wireframe box on a television screen so as to look equally deep and wide (i.e. to look like a cube) or twice as
deep as wide. In a number of stimulus configurations, the results showed huge deviations from veridicality suggesting
that the inverse optics model fails. Instead, the results seem to be more in line with a model of "vision as optical interface".
Sound meets image: freedom of expression in texture description
Show abstract
The use of sound was explored as means for expressing perceptual attributes of visual textures. Two sets of 17 visual
textures were prepared: one set taken from the CUReT database, and one set synthesized to replicate the former set.
Participants were instructed to match a sound texture with a visual texture displayed onscreen. A modified version of a
Product Sound Sketching Tool was provided, in which an interactive physical interface was coupled to a frequency
modulation synthesizer. Rather than selecting from a pre-defined set of sound samples, continuous exploration of the
auditory space allowed for an increased freedom of expression. While doing so, participants were asked to describe what
auditory and visual qualities they were paying attention to. It was found that participants were able to create sounds that
matched visual textures. Based on differences in diversity of descriptions, synthetic textures were found to have less
salient perceptual attributes than their original counterparts. Finally, three interesting sound synthesis clusters were
found, corresponding with mutually exclusive description vocabularies.
Computer Vision and Image Analysis of Art
Dynamics of aesthetic appreciation
Show abstract
Aesthetic appreciation is a complex cognitive processing with inherent aspects of cold as well as hot cognition. Research
from the last decades of empirical has shown that evaluations of aesthetic appreciation are highly reliable. Most
frequently, facial attractiveness was used as the corner case for investigating aesthetic appreciation. Evaluating facial
attractiveness shows indeed high internal consistencies and impressively high inter-rater reliabilities, even across
cultures. Although this indicates general and stable mechanisms underlying aesthetic appreciation, it is also obvious that
our taste for specific objects changes dynamically. Aesthetic appreciation on artificial object categories, such as fashion,
design or art is inherently very dynamic. Gaining insights into the cognitive mechanisms that trigger and enable
corresponding changes of aesthetic appreciation is of particular interest for research as this will provide possibilities to
modeling aesthetic appreciation for longer durations and from a dynamic perspective. The present paper refers to a recent
two-step model ("the dynamical two-step-model of aesthetic appreciation"), dynamically adapting itself, which accounts
for typical dynamics of aesthetic appreciation found in different research areas such as art history, philosophy and
psychology. The first step assumes singular creative sources creating and establishing innovative material towards
which, in a second step, people adapt by integrating it into their visual habits. This inherently leads to dynamic changes
of the beholders' aesthetic appreciation.
Museum as an integrated imaging device: visualization of ancient Kyoto cityscape from folding screen artifact
Kimiyoshi Miyata,
Umi Oyabu,
Michihiro Kojima
Show abstract
Museums hold cultural resources such as artworks, historical artifacts, and folklore materials. The National Museum of
Japanese History holds over 200,000 of the cultural resources. A role of museum is to exhibit the cultural resources,
therefore a museum could be referred to as a visualization device for the information society. In this research, visualization
of a history image from cultural resources with interactive user interface will be mentioned. The material focused on is the
oldest extant version of a genre of folding screen paintings that depict the thriving city of Kyoto in the four seasons, named
Rekihaku's "Scenes In and Around Kyoto" designated as a nationally important cultural property in Japan. Over 1,400
people and a lot of residences, temples, and houses are drawn, and those are also information resource telling us about city
scenes and people's life in Kyoto at that time. Historical researches were done by using a high resolution digital image
obtained by a large scaled scanner, and scanned images are used for computer programs to visualize a history image of
ancient Kyoto. Combinations between real materials and information provided by using the computer programs are also
described in this research.
In search of Leonardo: computer-based facial image analysis of Renaissance artworks for identifying Leonardo as subject
Show abstract
One of the enduring mysteries in the history of the Renaissance is the adult appearance of the archetypical "Renaissance Man," Leonardo da Vinci. His only acknowledged self-portrait is from an advanced age, and various candidate images of younger men are difficult to assess given the absence of documentary evidence. One clue about Leonardo's appearance comes from the remark of the contemporary historian, Vasari, that the sculpture of David by Leonardo's master, Andrea del Verrocchio, was based on the appearance of Leonardo
when he was an apprentice. Taking a cue from this statement, we suggest that the more mature sculpture of St. Thomas, also by Verrocchio, might also have been a portrait of Leonardo. We tested the possibility Leonardo was the subject for Verrocchio's sculpture by a novel computational technique for the comparison of
three-dimensional facial configurations. Based on quantitative measures of similarities, we also assess whether
another pair of candidate two-dimensional images are plausibly attributable as being portraits of Leonardo as
a young adult. Our results are consistent with the claim Leonardo is indeed the subject in these works, but
we need comparisons with images in a larger corpora of candidate artworks before our results achieve statistical significance.
Non-destructive analytical imaging of metallic surfaces using spectral measurements and ultrahigh-resolution scanning for cultural heritage investigation
Show abstract
This paper presents a new approach for analyzing spectroscopic characteristic of metallic surfaces using spectroscopic
and image analysis. This method is useful for contactless and non destructive analysis of cultural heritage. Spectral
luminance, CIELAB, and CIEXYZ value of more than 80 metallic surfaces were measured with spectrometer and
scanned to examine spectroscopic characteristics of foils by using multiband images. This analysis through imaging can
improve the method for extracting difference related to types of metallic foils. Firstly, the spectral reflectance of each foil
was measured ranging from 220 to 850 nm in steps of 1 nm. The images were captured with color and monochromatic
camera using color filter in order to analyze the targets by multispectral approach. Then, principal component analysis
(PCA) was conducted with image pixel value of each target. The results have shown that the spectral reflectance whose
peak and change rate at a particular wavelength region differed from each foils, and that the multispectral images
extracted the difference in spectral characteristics related to different types of metallic foils and Japanese papers. This
could be useful in distinguishing among foils. This provides some promise that unknown metallic foils can be identified
through the measurement of their spectroscopic features.
Mapping colors from paintings to tapestries: rejuvenating the faded colors in tapestries based on colors in reference paintings
Show abstract
We addressed the problem of recovering or "rejuvenating" colors in a faded tapestry (the target image) by
automatically mapping colors from digital images of the reference painting or cartoon (the source image). We
divided the overall computational challenge into several subproblems, and implemented their solutions in Matlab:
1) quantizing the colors in both the tapestry and the referent painting, 2) matching corresponding regions in
the works based on spatial location, area and color, 3) mapping colors from regions in the source image to
corresponding regions in the target image, and 4) estimating the fading of colors and excluding color mapping of
areas where color has not faded. There are a number of semi-global design parameters in our algorithm, which we currently set by hand, for instance the parameter that balance the effects of 1) matching color areas in the two images and 2) matching the spatial locations between the two images. We demonstrated our method on synthetic data and on small portions of 16th-century European tapestries of cultural patrimony. Our first steps in a proof-of-concept work suggest that future refinements to our method will lead to a software tool of value to art scholars.
Interactive Paper Session: Art and Perception
Tracking the aging process by multiple 3D scans analysis
Show abstract
Currently, a lot of different 3D scanning devices are used for 3D acquisition of art artifact surface shape and color.
Each of them has different technical parameters starting from measurement principle (structured light, laser
triangulation, interferometry, holography) and ending on parameters like measurement volume size, spatial resolution
and precision of output data and color information. Some of the 3D scanners can grab additional information like
surface normal vectors, BRDF distribution, multispectral color. In this paper, we plan to present results of the
measurements with selected sampling densities together with discussion of the problem of recognition and assessment
of the aging process. We focus our interest on features that are important for the art conservators to define state of
preservation of the object as well as to assess changes on the surface from last and previous measurement. Also
different materials and finishing techniques requires different algorithms for detection and localization of aging
changes. In this paper we consider exemplary stone samples to visualize what object features can be detected and
tracked during aging process. The changes in sandstone surface shape, affected by salt weathering, will be presented as
well as possibilities of identification of surface degradation on real object (garden relief made in sandstone).
Aesthetics and entropy: optimization of the brightness distribution
Show abstract
The purpose of this work is to suggest a way to utilize image statistics to guide optimization of brightness distributions,
towards a goal of complete systematization of image processing to achieve a purely aesthetic objective, with entropy as a
response metric. We start with a survey of classic pictorial photographs, proceed to a heuristic theoretical treatment of
the brightness distribution function, and follow with several pictorial illustrations of the proposed concept.
PHOG analysis of self-similarity in aesthetic images
Show abstract
In recent years, there have been efforts in defining the statistical properties of aesthetic photographs and artworks using
computer vision techniques. However, it is still an open question how to distinguish aesthetic from non-aesthetic images
with a high recognition rate. This is possibly because aesthetic perception is influenced also by a large number of cultural
variables. Nevertheless, the search for statistical properties of aesthetic images has not been futile. For example, we have
shown that the radially averaged power spectrum of monochrome artworks of Western and Eastern provenance falls off
according to a power law with increasing spatial frequency (1/f2 characteristics). This finding implies that this particular
subset of artworks possesses a Fourier power spectrum that is self-similar across different scales of spatial resolution.
Other types of aesthetic images, such as cartoons, comics and mangas also display this type of self-similarity, as do
photographs of complex natural scenes. Since the human visual system is adapted to encode images of natural scenes in a
particular efficient way, we have argued that artists imitate these statistics in their artworks. In support of this notion, we
presented results that artists portrait human faces with the self-similar Fourier statistics of complex natural scenes
although real-world photographs of faces are not self-similar.
In view of these previous findings, we investigated other statistical measures of self-similarity to characterize aesthetic
and non-aesthetic images. In the present work, we propose a novel measure of self-similarity that is based on the
Pyramid Histogram of Oriented Gradients (PHOG). For every image, we first calculate PHOG up to pyramid level 3.
The similarity between the histograms of each section at a particular level is then calculated to the parent section at the
previous level (or to the histogram at the ground level).
The proposed approach is tested on datasets of aesthetic and non-aesthetic categories of monochrome images. The
aesthetic image datasets comprise a large variety of artworks of Western provenance. Other man-made aesthetically
pleasing images, such as comics, cartoons and mangas, were also studied. For comparison, a database of natural scene
photographs is used, as well as datasets of photographs of plants, simple objects and faces that are in general of low
aesthetic value.
As expected, natural scenes exhibit the highest degree of PHOG self-similarity. Images of artworks also show high selfsimilarity
values, followed by cartoons, comics and mangas. On average, other (non-aesthetic) image categories are less
self-similar in the PHOG analysis. A measure of scale-invariant self-similarity (PHOG) allows a good separation of the
different aesthetic and non-aesthetic image categories.
Our results provide further support for the notion that, like complex natural scenes, images of artworks display a higher
degree of self-similarity across different scales of resolution than other image categories. Whether the high degree of
self-similarity is the basis for the perception of beauty in both complex natural scenery and artworks remains to be
investigated.
Interactive Paper Session: Perception and Image Quality
Influence of the source content and encoding configuration on the perceived quality for scalable video coding
Show abstract
In video coding, it is commonly accepted that the encoding parameters such as the quantization step-size have
an influence on the perceived quality. It is also sometimes accepted that using given encoding parameters, the perceived quality does not change significantly according to the encoded source content. In this paper, we present the outcomes of two video subjective quality assessment experiments in the context of Scalable Video
Coding. We encoded a large set of video sequences under a group of constant quality scenarios based on two spatially scalable layers. One first experiment explores of the relation between a wide range of quantization parameters for each layer and the perceived quality, while the second experiment uses a subset of the encoding
scenarios on a large number of video sequences. The two experiments are aligned on a common scale using a set of shared processed video sequences, resulting in a database containing the subjective scores for 60 different sources combined with 20 SVC scenarios. We propose a detailed analysis of the experimental results of the two
experiments, bringing a clear insight of the relation between the encoding parameters combination of the scalable
layers and the perceived quality, as well as spreading light on the differences in terms of quality depending on the
encoded source content. As an endeavour to analyse these differences, we propose a classification of the sources
with regards to their relative behaviour when compared to the average of other source contents. We use this
classification to identify potential factors to explain the differences between source contents.
Virtual displays for 360-degree video
Show abstract
In this paper we describe a novel approach for comparing users' spatial cognition when using different depictions of 360-
degree video on a traditional 2D display. By using virtual cameras within a game engine and texture mapping of these
camera feeds to an arbitrary shape, we were able to offer users a 360-degree interface composed of four 90-degree views,
two 180-degree views, or one 360-degree view of the same interactive environment. An example experiment is described
using these interfaces. This technique for creating alternative displays of wide-angle video facilitates the exploration of
how compressed or fish-eye distortions affect spatial perception of the environment and can benefit the creation of
interfaces for surveillance and remote system teleoperation.
An evaluation of different setups for simulating lighting characteristics
Show abstract
The advance of technology continuously enables new luminaire designs and concepts. Evaluating such designs has
traditionally been done using actual prototypes, in a real environment. The iterations needed to build, verify, and
improve luminaire designs incur substantial costs and slow down the design process. A more attractive way is to evaluate
designs using simulations, as they can be made cheaper and quicker for a wider variety of prototypes. However, the
value of such simulations is determined by how closely they predict the outcome of actual perception experiments.
In this paper, we discuss an actual perception experiment including several lighting settings in a normal office
environment. The same office environment also has been modeled using different software tools, and photo-realistic
renderings have been created of these models. These renderings were subsequently processed using various tonemapping
operators in preparation for display. The total imaging chain can be considered a simulation setup, and we have
executed several perception experiments on different setups. Our real interest is in finding which imaging chain gives us
the best result, or in other words, which of them yields the closest match between virtual and real experiment.
To answer this question, first of all an answer has to be found to the question, "which simulation setup matches the real
world best?" As there is no unique, widely accepted measure to describe the performance of a certain setup, we consider
a number of options and discuss the reasoning behind them along with their advantages and disadvantages.
Biological visual attention guided automatic image segmentation with application in satellite imaging
M. I. Sina,
A.-M. Cretu,
P. Payeur
Show abstract
Taking inspiration from the significantly superior performance of humans to extract and interpret visual information, the
exploitation of biological visual mechanisms can contribute to the improvement of the performance of computational
image processing systems. Computational models of visual attention have already been shown to significantly improve
the speed of scene understanding by attending only the regions of interest, while distributing the resources where they
are required. However, there are only few attention-based computational systems that have been used in practical
applications dealing with real data and up to now, none of the computational attention models was demonstrated to work
under a wide range of image content, characteristics and scales such as those encountered in satellite imaging. This paper
outlines some of the difficulties that the current generation of visual attention-inspired models encounter when dealing
with satellite images. It then proposes a novel algorithm for automatic image segmentation and regions of interest search
that combines elements of human visual attention with Legendre moments applied on the probability density function of
color histograms. The experimental results demonstrate that the proposed approach obtains better results than one of the
most evolved current computational attention model proposed in the literature.
A neurobiologically based two-stage model for human color vision
Charles Q. Wu
Show abstract
Presently there are two dominant theories for human color vision: Young-Helmholtz-Maxwell's Trichromatic Theory
and Hering's Opponent-Color Theory. It has been widely accepted that the Trichromatic Theory holds true for retinal
color processing whereas the Opponent-Color Theory works for cortical color processing-this conception has become
the "Standard Model" for human color vision. My purposes in the present paper are threefold: First, to demonstrate that
the Opponent-Color Theory is fundamentally untenable, based on both theoretical and empirical grounds; second, to
resurrect a two-stage trichromatic model, in which both retinal and cortical color processing are trichromatic, proposed
by W. McDougall more than a century ago; and third, to map the cortical color processing stage (which directly
correlates with color consciousness) in this model to layer 4 within the primary visual cortex (V1) of the human brain.
The oscillatory activities and its synchronization in auditory-visual integration as revealed by event-related potentials to bimodal stimuli
Show abstract
Neural mechanism of auditory-visual speech integration is always a hot study of multi-modal perception. The
articulation conveys speech information that helps detect and disambiguate the auditory speech. As important
characteristic of EEG, oscillations and its synchronization have been applied to cognition research more and more. This
study analyzed the EEG data acquired by unimodal and bimodal stimuli using time frequency and phase synchrony
approach, investigated the oscillatory activities and its synchrony modes behind evoked potential during auditory-visual
integration, in order to reveal the inherent neural integration mechanism under these modes. It was found that beta
activity and its synchronization differences had relationship with gesture N1-P2, which happened in the earlier stage of
speech coding to pronouncing action. Alpha oscillation and its synchronization related with auditory N1-P2 might be mainly responsible for auditory speech process caused by anticipation from gesture to sound feature. The visual gesture changing enhanced the interaction of auditory brain regions. These results provided explanations to the power and connectivity change of event-evoked oscillatory activities which matched ERPs during auditory-visual speech integration.
Quality assessment of images illuminated by dim LCD backlight
Show abstract
We consider the quality assessment of images displayed on a liquid crystal display (LCD) with dim backlight-a
situation where the power consumption of the LCD is set to a low level. This energy saving mode of LCD decreases the
perceived image quality. In particular, some image regions may appear so dark that they become non-perceptible to
human eye. The problem becomes more severe when the image is illuminated with very dim backlight. Ignoring the
effect of dim backlight on image quality assessment and directly applying an image quality assessment metric to the
entire image may produce results inconsistent with human evaluation. We propose a method to fix the problem. The
proposed method works as a precursor of image quality assessment. Specifically, given an image and the backlight
intensity level of the LCD on which the image is to be displayed, the method automatically classifies the pixels of an
image into perceptible and non-perceptible pixels according to the backlight intensity level and excludes the nonperceptible
pixels from quality assessment. Experimental results are shown to demonstrate the performance of the proposed method.
Parallax scanning methods for stereoscopic three-dimensional imaging
Show abstract
Under certain circumstances, conventional stereoscopic imagery is subject to being misinterpreted. Stereo
perception created from two static horizontally separated views can create a "cut out" 2D appearance for objects
at various planes of depth. The subject volume looks three-dimensional, but the objects themselves appear flat.
This is especially true if the images are captured using small disparities.
One potential explanation for this effect is that, although three-dimensional perception comes primarily from
binocular vision, a human's gaze (the direction and orientation of a person's eyes with respect to their
environment) and head motion also contribute additional sub-process information. The absence of this
information may be the reason that certain stereoscopic imagery appears "odd" and unrealistic. Another
contributing factor may be the absence of vertical disparity information in a traditional stereoscopy display.
Recently, Parallax Scanning technologies have been introduced, which provide (1) a scanning methodology,
(2) incorporate vertical disparity, and (3) produce stereo images with substantially smaller disparities than the
human interocular distances.1 To test whether these three features would improve the realism and reduce the
cardboard cutout effect of stereo images, we have applied Parallax Scanning (PS) technologies to commercial
stereoscopic digital cinema productions and have tested the results with a panel of stereo experts.
These informal experiments show that the addition of PS information into the left and right image capture
improves the overall perception of three-dimensionality for most viewers. Parallax scanning significantly
increases the set of tools available for 3D storytelling while at the same time presenting imagery that is easy and
pleasant to view.
Reduced reference image quality assessment via sub-image similarity based redundancy measurement
Show abstract
The reduced reference (RR) image quality assessment (IQA) has been attracting much attention from researchers for its
loyalty to human perception and flexibility in practice. A promising RR metric should be able to predict the perceptual
quality of an image accurately while using as few features as possible. In this paper, a novel RR metric is presented,
whose novelty lies in two aspects. Firstly, it measures the image redundancy by calculating the so-called Sub-image
Similarity (SIS), and the image quality is measured by comparing the SIS between the reference image and the test
image. Secondly, the SIS is computed by the ratios of NSE (Non-shift Edge) between pairs of sub-images. Experiments
on two IQA databases (i.e. LIVE and CSIQ databases) show that by using only 6 features, the proposed metric can work
very well with high correlations between the subjective and objective scores. In particular, it works consistently well
across all the distortion types.
Color impact in visual attention deployment considering emotional images
C. Chamaret
Show abstract
Color is a predominant factor in the human visual attention system. Even if it cannot be sufficient to the global or
complete understanding of a scene, it may impact the visual attention deployment. We propose to study the color impact
as well as the emotion aspect of pictures regarding the visual attention deployment. An eye-tracking campaign has been
conducted involving twenty people watching half pictures of database in full color and the other half of database in grey
color.
The eye fixations of color and black and white images were highly correlated leading to the question of the integration of
such cues in the design of visual attention model. Indeed, the prediction of two state-of-the-art computational models
shows similar results for the two color categories. Similarly, the study of saccade amplitude and fixation duration versus
time viewing did not bring any significant differences between the two mentioned categories. In addition, spatial
coordinates of eye fixations reveal an interesting indicator for investigating the differences of visual attention
deployment over time and fixation number.
The second factor related to emotion categories shows evidences of emotional inter-categories differences between color
and grey eye fixations for passive and positive emotion. The particular aspect associated to this category induces a
specific behavior, rather based on high frequencies, where the color components influence the visual attention deployment.