Proceedings Volume 8291

Human Vision and Electronic Imaging XVII

cover
Proceedings Volume 8291

Human Vision and Electronic Imaging XVII

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 9 March 2012
Contents: 13 Sessions, 57 Papers, 0 Presentations
Conference: IS&T/SPIE Electronic Imaging 2012
Volume Number: 8291

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 8291
  • Keynote Session
  • Computational Photography
  • Material Perception
  • Perceptual Image Quality
  • Multisensory Integration and Brain Plasticity
  • Stereoscopic 3D Image Quality: Quantifying Perception and Comfort: Joint Session with Conference 8288
  • Medical Image Quality: Features, Tasks and Semantics
  • Visual Attention: Task and Image Quality: Joint Session with Conference 8293
  • Art Theory, Perception, and Rendering
  • Computer Vision and Image Analysis of Art
  • Interactive Paper Session: Art and Perception
  • Interactive Paper Session: Perception and Image Quality
Front Matter: Volume 8291
icon_mobile_dropdown
Front Matter: Volume 8291
This PDF file contains the front matter associated with SPIE Proceedings Volume 8291, including the Title Page, Copyright information, Table of Contents, and the Conference Committee listing.
Keynote Session
icon_mobile_dropdown
The general solution to HDR rendering
Our High-Dynamic-Range (HDR) world is the result of nonuniform illumination. We like to believe that 21st century technology makes it possible to accurately reproduce any scene. On further study, we find that scene rendition remains a best compromise. Despite all the remarkable accomplishments in digital imaging, we cannot capture and reproduce the light in the world exactly. With still further study, we find that accurate reproduction is not necessary. We need an interdisciplinary study of image making - painting, photography and image processing - to find the general solution. HDR imaging would be very confusing, without two observations that resolve many paradoxes. First, optical veiling glare, that depends on the scene content, severely limits the range of light on cameras' sensors, and on retinas. Second, the neural spatial image processing in human vision counteracts glare with variable scene dependent responses. The counter actions of these optical and neural processes shape the goals of HDR imaging. Successful HDR increases the apparent contrast of details lost in the shadows and highlights of conventional images. They change the spatial relationships by altering the local contrast of edges and gradients. The goal of HDR imaging is displaying calculated appearance, rather than accurate light reproduction. By using this strategy we can develop universal algorithms that process all images, LDR and HDR, achromatic and color, by mimicking human vision. The study of the general solution for HDR imaging incorporates painting photography, vision research, color constancy and digital image processing.
Computational Photography
icon_mobile_dropdown
Application of non-linear transform coding to image processing
Sparse coding learns its basis non-linearly, but the basis elements are still linearly combined to form an image. Is this linear combination of basis elements a good model for natural images? We here use a non-linear synthesis rule, such that at each location in the image the point-wise maximum over all basis elements is used to synthesize the image. We present algorithms for image approximation and basis learning using this synthesis rule. With these algorithms we explore the the pixel-wise maximum over the basis elements as an alternative image model and thus contribute to the problem of finding a proper representation of natural images.
How to make a small phone camera shoot like a big DSLR: creating and fusing multi-modal exposure series
Thomas Binder, Florian Kriener, Christian Wichner, et al.
The present work aims at improving the image quality of low-cost cameras based on multiple exposures, machine learning, and a perceptual quality measure. The particular implementation consists of two cameras, one being a high-quality DSLR, the other part of a cell phone. The cameras are connected via USB. Since the system is designed to take many exposures of the same scene, a stable mechanical coupling of the cameras and the use of a tripod are required. Details on the following issues are presented: design aspects of the mechanical coupling of the cameras, camera control via FCam and the Picture Transfer Protocol (PTP), further aspects of the design of the control software, and post processing of the exposures from both cameras. The cell phone images are taken with different exposure times and different focus settings and are simultaneously fused. By using the DSLR image as a reference, the parameters of the fusion scheme are learned from examples and can be used to optimize the design of the cell phone. First results show that the depth of field can be extended, the dynamic range can be improved and the noise can be reduced.
Single lens 3D-camera with extended depth-of-field
Christian Perwaß, Lennart Wietzke
Placing a micro lens array in front of an image sensor transforms a normal camera into a single lens 3D camera, which also allows the user to change the focus and the point of view after a picture has been taken. While the concept of such plenoptic cameras is known since 1908, only recently the increased computing power of low-cost hardware and the advances in micro lens array production, have made the application of plenoptic cameras feasible. This text presents a detailed analysis of plenoptic cameras as well as introducing a new type of plenoptic camera with an extended depth of field and a maximal effective resolution of up to a quarter of the sensor resolution.
3D holoscopic video imaging system
Johannes H Steurer, Matthias Pesch, Christopher Hahne
Since many years, integral imaging has been discussed as a technique to overcome the limitations of standard still photography imaging systems where a three-dimensional scene is irrevocably projected onto two dimensions. With the success of 3D stereoscopic movies, a huge interest in capturing three-dimensional motion picture scenes has been generated. In this paper, we present a test bench integral imaging camera system aiming to tailor the methods of light field imaging towards capturing integral 3D motion picture content. We estimate the hardware requirements needed to generate high quality 3D holoscopic images and show a prototype camera setup that allows us to study these requirements using existing technology. The necessary steps that are involved in the calibration of the system as well as the technique of generating human readable holoscopic images from the recorded data are discussed.
Material Perception
icon_mobile_dropdown
Predictive rendering for accurate material perception: modeling and rendering fabrics
In computer graphics, rendering algorithms are used to simulate the appearance of objects and materials in a wide range of applications. Designers and manufacturers rely entirely on these rendered images to previsualize scenes and products before manufacturing them. They need to differentiate between different types of fabrics, paint finishes, plastics, and metals, often with subtle differences, for example, between silk and nylon, formaica and wood. Thus, these applications need predictive algorithms that can produce high-fidelity images that enable such subtle material discrimination.
From color to appearance in the real world
Marc S. Ellens, Francis Lamy
X-Rite's declared ambition is to create a digital ecosystem for appearance; a daunting challenge that has many dimensions and has proven so massive that all previous attempts efforts have failed so far. After having invested three years in exploring the problem, we can now deliver the first elements of answers and the practical path to tackle this massive undertaking. We will explore the practical implications of the intermediation of two stages between color and full appearance, extended color and augmented color, and how these steps, rooted in the realities of the ecosystem they serve constitute vectors and enablers of a more effective transition. We will survey the road map implication in the design for measurement and capture instrumentations, packaging digital formats and delivery infrastructure as well as rendering and display devices that will enable true value creation built on appearance attributes.
Mixing material modes
S. C. Pont, J. J. Koenderink, A. J. van Doorn, et al.
We present a novel setup in which real objects made of different materials can be mixed optically. For the materials we chose mutually very different materials, which we assume to represent canonical modes. The appearance of 3D objects consisting of any material can be described as linear superposition of 3D objects of different canonical materials, as in "painterly mixes". In this paper we studied mixtures of matte, glossy and velvety objects, representing diffuse, forward and asperity scattering modes. Observers rated optical mixtures on four scales: matte-glossy, hard-soft, cold-warm, light-heavy. The ratings were done for the three combinations of glossy, matte, and velvety green birds. For each combination we tested 7 weightings. Matte-glossy ratings varied most over the stimuli and showed highest (most glossy) scores for the rather glossy bird and lowest (most matte) for the rather velvety bird. Hard-soft and cold-warm were rated highest (most soft and warm) for rather velvety and lowest (most hard and cold) for rather glossy birds. Light-heavy was rated only somewhat higher (heavier) for rather glossy birds. The ratings varied systematically with the weights of the contributions, corresponding to gradually changing mixtures of material modes. We discuss a range of possibilities for our novel setup.
Tangible display systems: bringing virtual surfaces into the real world
We are developing tangible display systems that enable natural interaction with virtual surfaces. Tangible display systems are based on modern mobile devices that incorporate electronic image displays, graphics hardware, tracking systems, and digital cameras. Custom software allows the orientation of a device and the position of the observer to be tracked in real-time. Using this information, realistic images of surfaces with complex textures and material properties illuminated by environment-mapped lighting, can be rendered to the screen at interactive rates. Tilting or moving in front of the device produces realistic changes in surface lighting and material appearance. In this way, tangible displays allow virtual surfaces to be observed and manipulated as naturally as real ones, with the added benefit that surface geometry and material properties can be modified in real-time. We demonstrate the utility of tangible display systems in four application areas: material appearance research; computer-aided appearance design; enhanced access to digital library and museum collections; and new tools for digital artists.
Perceptual Image Quality
icon_mobile_dropdown
Quality estimation for images and video with different spatial resolutions
A. Murat Demirtas, Hamid Jafarkhani, Amy R. Reibman
Full-reference (FR) quality estimators (QEs) for images and video are typically designed assuming that the displayed, degraded image has the same spatial resolution as the original, reference image. No-reference (NR) QEs use no knowledge about the reference image to assess quality of the displayed image. However, in many practical systems, a reference image may be available that has a different spatial resolution than the displayed image. In this paper, we explore objective quality estimation when the displayed image to be evaluated has a different spatial resolution than the reference image. We begin by identifying a range of potential weaknesses that might be present in a QE designed for this situation. Then, we create pairs of images with potential False Ties, in which a QE estimates that two images have equal quality while viewers disagree. Our subjective tests demonstrate that existing QEs do not accurately assess quality in a variety of scenarios for which images have different spatial resolutions.
Automatic parameter prediction for image denoising algorithms using perceptual quality features
A natural scene statistics (NSS) based blind image denoising approach is proposed, where denoising is performed without knowledge of the noise variance present in the image. We show how such a parameter estimation can be used to perform blind denoising by combining blind parameter estimation with a state-of-the-art denoising algorithm.1 Our experiments show that for all noise variances simulated on a varied image content, our approach is almost always statistically superior to the reference BM3D implementation in terms of perceived visual quality at the 95% confidence level.
Viewer preferences for classes of noise removal algorithms for high definition content
Perceived video quality studies were performed on a number of key classes of noise removal algorithms to determine viewer preference. The noise removal algorithm classes represent increase in complexity from linear filter to nonlinear filter to adaptive filter to spatio-temporal filter. The subjective results quantify the perceived quality improvements that can be obtained with increasing complexity. The specific algorithm classes tested include: linear spatial one channel filter, nonlinear spatial two-channel filter, adaptive nonlinear spatial filter, multi-frame spatio-temporal adaptive filter. All algorithms were applied on full HD (1080P) content. Our subjective results show that spatio-temporal (multi-frame) noise removal algorithm performs best amongst the various algorithm classes. The spatio-temporal algorithm improvement compared to original video sequences is statistically significant. On the average, noise-removed video sequences are preferred over original (noisy) video sequences. The Adaptive bilateral and non-adaptive bilateral two channel noise removal algorithms perform similarly on the average thus suggesting that a non-adaptive parameter tuned algorithm may be adequate.
Image quality assessment in the low quality regime
Guilherme O. Pinto, Sheila S. Hemami
Traditionally, image quality estimators have been designed and optimized to operate over the entire quality range of images in a database, from very low quality to visually lossless. However, if quality estimation is limited to a smaller quality range, their performances drop dramatically, and many image applications only operate over such a smaller range. This paper is concerned with one such range, the low-quality regime, which is defined as the interval of perceived quality scores where there exists a linear relationship between the perceived quality scores and the perceived utility scores and exists at the low-quality end of image databases. Using this definition, this paper describes a subjective experiment to determine the low-quality regime for databases of distorted images that include perceived quality scores but not perceived utility scores, such as CSIQ and LIVE. The performances of several image utility and quality estimators are evaluated in the low-quality regime, indicating that utility estimators can be successfully applied to estimate perceived quality in this regime. Omission of the lowestfrequency image content is shown to be crucial to the performances of both kinds of estimators. Additionally, this paper establishes an upper-bound for the performances of quality estimators in the LQR, using a family of quality estimators based on VIF. The resulting optimal quality estimator indicates that estimating quality in the low-quality regime is robust to exact frequency pooling weights, and that near-optimal performance can be achieved by a variety of estimators providing that they substantially emphasize the appropriate frequency content.
Multisensory Integration and Brain Plasticity
icon_mobile_dropdown
The question of simultaneity in multisensory integration
Lynnette Leone, Mark E. McCourt
Early reports of audiovisual (AV) multisensory integration (MI) indicated that unisensory stimuli must evoke simultaneous physiological responses to produce decreases in reaction time (RT) such that for unisensory stimuli with unequal RTs the stimulus eliciting the faster RT had to be delayed relative to the stimulus eliciting the slower RT. The "temporal rule" states that MI depends on the temporal proximity of unisensory stimuli, the neural responses to which must fall within a window of integration. Ecological validity demands that MI should occur only for simultaneous events (which may give rise to non-simultaneous neural activations). However, spurious neural response simultaneities which are unrelated to singular environmental multisensory occurrences must somehow be rejected. Using an RT/race model paradigm we measured AV MI as a function of stimulus onset asynchrony (SOA: ±200 ms, 50 ms intervals) under fully dark adapted conditions for visual (V) stimuli that were either weak (scotopic 525 nm flashes; 511 ms mean RT) or strong (photopic 630 nm flashes; 356 ms mean RT). Auditory (A) stimulus (1000 Hz pure tone) intensity was constant. Despite the 155 ms slower mean RT to the scotopic versus photopic stimulus, facilitative AV MI in both conditions nevertheless occurred exclusively at an SOA of 0 ms. Thus, facilitative MI demands both physical and physiological simultaneity. We consider the mechanisms by which the nervous system may take account of variations in response latency arising from changes in stimulus intensity in order to selectively integrate only those physiological simultaneities that arise from physical simultaneities.
The spatiotopic 'visual' cortex of the blind
Visual cortex activity in the blind has been shown in sensory tasks. Can it be activated in memory tasks? If so, are inherent features of its organization meaningfully employed? Our recent results in short-term blindfolded subjects imply that human primary visual cortex (V1) may operate as a modality-independent 'sketchpad' for working memory (Likova, 2010a). Interestingly, the spread of the V1 activation approximately corresponded to the spatial extent of the images in terms of their angle of projection to the subject. We now raise the questions of whether under long-term visual deprivation V1 is also employed in non-visual memory task, in particular in congenitally blind individuals, who have never had visual stimulation to guide the development of the visual area organization, and whether such spatial organization is still valid for the same paradigm that was used in blindfolded individuals. The outcome has implications for an emerging reconceptualization of the principles of brain architecture and its reorganization under sensory deprivation. Methods: We used a novel fMRI drawing paradigm in congenitally and late-onset blind, compared with sighted and blindfolded subjects in three conditions of 20s duration, separated by 20s rest-intervals, (i) Tactile Exploration: raised-line images explored and memorized; (ii) Tactile Memory Drawing: drawing the explored image from memory; (iii) Scribble: mindless drawing movements with no memory component. Results and Conclusions: V1 was strongly activated for Tactile Memory Drawing and Tactile Exploration in these totally blind subjects. Remarkably, after training, even in the memory task, the mapping of V1 activation largely corresponded to the angular projection of the tactile stimuli relative to the ego-center (i.e., the effective visual angle at the head); beyond this projective boundary, peripheral V1 signals were dramatically reduced or even suppressed. The matching extent of the activation in the congenitally blind rules out vision-based explanatory mechanisms, and supports the more radical idea of V1 as a modality-independent 'projection screen' or a 'sketchpad', whose mapping scales to the projective dimensions of objects explored in the peri-personal space.
Acoustic-tactile rendering of visual information
Pubudu Madhawa Silva, Thrasyvoulos N. Pappas, Joshua Atkins, et al.
In previous work, we have proposed a dynamic, interactive system for conveying visual information via hearing and touch. The system is implemented with a touch screen that allows the user to interrogate a two-dimensional (2-D) object layout by active finger scanning while listening to spatialized auditory feedback. Sound is used as the primary source of information for object localization and identification, while touch is used both for pointing and for kinesthetic feedback. Our previous work considered shape and size perception of simple objects via hearing and touch. The focus of this paper is on the perception of a 2-D layout of simple objects with identical size and shape. We consider the selection and rendition of sounds for object identification and localization. We rely on the head-related transfer function for rendering sound directionality, and consider variations of sound intensity and tempo as two alternative approaches for rendering proximity. Subjective experiments with visually-blocked subjects are used to evaluate the effectiveness of the proposed approaches. Our results indicate that intensity outperforms tempo as a proximity cue, and that the overall system for conveying a 2-D layout is quite promising.
Stereoscopic 3D Image Quality: Quantifying Perception and Comfort: Joint Session with Conference 8288
icon_mobile_dropdown
Apparent stereo: the Cornsweet illusion can enhance perceived depth
Piotr Didyk, Tobias Ritschel, Elmar Eisemann, et al.
It is both a technical and an artistic challenge to depict three-dimensional content using stereo equipment and a flat two-dimensional screen. On the one hand, the content needs to fit within the limits of a given display technology and at the same time achieve a comfortable viewing experience. Given the technological advances of 3D equipment, especially the latter increases in importance. Modifications to stereo content become necessary that aim at flattening or even removing binocular disparity to adjust the 3D content to match the comfort zone in which the clash between accommodation and vergence stays acceptable. However, applying such modifications can lead to a reduction of crucial depth details. One promising direction is backwardcompatible stereo, for which the disparity is low enough that overlaid stereo pairs seem almost identical. It builds upon the Craik-O'Brien-Cornsweet effect, a visual illusion, which uses so-called Cornsweet profiles to produce a local contrast that leads to a perceived brightness increase. Similarly, Cornsweet profiles in disparity can lead to an illusion of depth. Applying them skilfully at depth discontinuities allows for a reduction of the overall disparity range to ensure a comfortable yet convincing stereo experience. The present work extends the previous idea by showing that Cornsweet profiles can also be used to enhance the 3D impression. This operation can help in regions where the disparity range was compressed, but also to emphasize parts of a scene. A user study measures the performance of backward-compatible stereo and our disparity enhancement.
Perceived depth of multi parallel-overlapping-transparent-stereoscopic-surfaces
The geometric relational expression of horizontal disparity, viewing distance, and depth magnitude between objects in stereopsis suggests that, for a given viewing distance, the magnitude of perceived depth of objects would be the same as long as the disparity magnitudes are the same. However, we found that this is not necessarily the case for random dot stereograms that depict parallel-overlapping-transparent-stereoscopic-surfaces (POTS). Data from three experiments indicated that, when the stimulus size is relatively large (e.g., 13 x 20 arc deg), the magnitude of reproduced depth between two POTS is larger by 6% than that for an identical pair of stereo-surfaces but with an additional stereo-surface located between the pair. The results are discussed in terms of global stereopsis which "operates" for relatively large stimulus sizes.
Diagnosing perceptual distortion present in group stereoscopic viewing
Melissa Burton, Brice Pollock, Jonathan W. Kelly, et al.
Stereoscopic displays are an increasingly prevalent tool for experiencing virtual environments, and the inclusion of stereo has the potential to improve distance perception within the virtual environment. When multiple users simultaneously view the same stereoscopic display, only one user experiences the projectively correct view of the virtual environment, and all other users view the same stereoscopic images while standing at locations displaced from the center of projection (CoP). This study was designed to evaluate the perceptual distortions caused by displacement from the CoP when viewing virtual objects in the context of a virtual scene containing stereo depth cues. Judgments of angles were distorted after leftward and rightward displacement from the CoP. Judgments of object depth were distorted after forward and backward displacement from the CoP. However, perceptual distortions of angle and depth were smaller than predicted by a ray-intersection model based on stereo viewing geometry. Furthermore, perceptual distortions were asymmetric, leading to different patterns of distortion depending on the direction of displacement. This asymmetry also conflicts with the predictions of the ray-intersection model. The presence of monocular depth cues might account for departures from model predictions.
3D discomfort from vertical and torsional disparities in natural images
Christopher W. Tyler, Lora T. Likova, Kalin Atanassov, et al.
The two major aspects of camera misalignment that cause visual discomfort when viewing images on a 3D display are vertical and torsional disparities. While vertical disparities are uniform throughout the image, torsional rotations introduce a range of disparities that depend on the location in the image. The goal of this study was to determine the discomfort ranges for the kinds of natural image that people are likely to take with 3D cameras rather than the artificial line and dot stimuli typically used for laboratory studies. We therefore assessed visual discomfort on a five-point scale from 'none' to 'severe' for artificial misalignment disparities applied to a set of full-resolution images of indoor scenes. For viewing times of 2 s, discomfort ratings for vertical disparity in both 2D and 3D images rose rapidly toward the discomfort level of 4 ('severe') by about 60 arcmin of vertical disparity. Discomfort ratings for torsional disparity in the same image rose only gradually, reaching only the discomfort level of 3 ('strong') by about 50 deg of torsional disparity. These data were modeled with a second-order hyperbolic compression function incorporating a term for the basic discomfort of the 3D display in the absence of any misalignments through a Minkowski norm. These fits showed that, at a criterion discomfort level of 2 ('moderate'), acceptable levels of vertical disparity were about 15 arcmin. The corresponding values for the torsional disparity were about 30 deg of relative orientation.
Medical Image Quality: Features, Tasks and Semantics
icon_mobile_dropdown
On the development of expertise in interpreting medical images
Medical images represent a core portion of the information clinicians utilize to render diagnostic and treatment decisions. Fundamentally, viewing a medical image involves two basic processes - visually inspecting the image (visual perception) and rendering an interpretation (cognition). The interpretation is often followed by a recommendation. The likelihood of error in the interpretation of medical images is unfortunately not negligible. Errors occur and patients' lives are impacted. Thus we need to understand how clinicians interact with the information in an image during the interpretation process. We also need to understand how clinicians develop expertise throughout their careers and why some people are better at interpreting medical images than others. If we can better understand how expertise develops, perhaps we can develop better training programs, incorporate more effective ways of teaching image interpretation into the medical school and residency curriculums, and create new tools that would enhance and perhaps speed up the learning process. With improved understanding we can also develop ways to further improve decision-making in general and at every level of the medical imaging profession, thus improving patient care. The science of medical image perception is dedicated to understanding and improving the clinical interpretation process.
Modeling observer performance for optimizing medical image acquisition and processing
Medical images involve several steps of acquisition, processing, display, and reading to convey information from the body of a patient to a clinician who needs to decide on a course of action. The effect of all these steps is a large number of potentially coupled parameters that may need to be adjusted in order to optimize diagnostic task performance. Model observers were developed to assist this process by providing an objective computational tool that is predictive of visual performance. The defining characteristic of model observers is the generation of a decision variable from which the task is readily performed. Various models for generating the decision variable are reviewed, and various challenges and opportunities for model observers are described.
Evaluation of HVS models in the application of medical image quality assessment
In this study, four of the most widely used Human Visual System (HVS) models are applied on Magnetic Resonance (MR) images for signal detection task. Their performances are evaluated against gold standard derived from radiologists' majority decision. The task-based image quality assessment requires taking into account the human perception specificities, for which various HVS models have been proposed. However to our knowledge, no work was conducted to evaluate and compare the suitability of these models with respect to the assessment of medical image qualities. This pioneering study investigates the performances of different HVS models on medical images in terms of approximation to radiologist performance. We propose to score the performance of each HVS model using the AUC (Area Under the receiver operating characteristic Curve) and its variance estimate as the figure of merit. The radiologists' majority decision is used as gold standard so that the estimated AUC measures the distance between the HVS model and the radiologist perception. To calculate the variance estimate of AUC, we adopted the one-shot method that is independent of the HVS model's output range. The results of this study will help to provide arguments to the application of some HVS model on our future medical image quality assessment metric.
Perceptual challenges to computer-aided diagnosis
We review the motivation and development of computer-aided diagnosis (CAD) in diagnostic medical imaging, particularly in breast cancer screening. After briefly describe in generic terms of typical CAD methods, we focus on the question of whether CAD helps improve diagnostic accuracy. We review both studies that support the notion that CAD helps improve diagnostic accuracy and studies that do not. We further identify difficulties in conducting this type of evaluation studies and suggest areas of perceptual challenges to applications of CAD.
Satisfaction of search experiments in advanced imaging
The objective of our research is to understand the perception of multiple abnormalities in an imaging examination and to develop strategies for improved diagnostic. We are one of the few laboratories in the world pursuing the goal of reducing detection errors through a better understanding of the underlying perceptual processes involved. Failure to detect an abnormality is the most common class of error in diagnostic imaging and generally is considered the most serious by the medical community. Many of these errors have been attributed to "satisfaction of search," which occurs when a lesion is not reported because discovery of another abnormality has "satisfied" the goal of the search. We have gained some understanding of the mechanisms of satisfaction of search (SOS) traditional radiographic modalities. Currently, there are few interventions to remedy SOS error. For example, patient history that the prompts specific abnormalities, protects the radiologist from missing them even when other abnormalities are present. The knowledge gained from this programmatic research will lead to reduction of observer error.
Integrating human- and computer-based approaches to feature extraction and analysis
Bernice E. Rogowitz, Alyssa Goodman
A major goal of imaging systems is to help doctors, scientists, engineers, and analysts identify patterns and features in complex data. There is a wide range of imaging, visualization, and graphics systems, ranging from fully automatic systems that extract features algorithmically to interactive systems that allow the analyst to manipulate visual representations directly to discover features. Although automatic feature-extraction algorithms are often directed by human observation, and human pattern recognition is often supported by algorithmic tools, very little work has been done to explore how to capitalize on the interaction between human and machine pattern recognition. This paper introduces a preliminary roadmap for guiding research in this space. One key concept is the explicit consideration of the task, which determines which methods and tools will be most effective. The second is the explicit inclusion of a "human-in-the-loop," who interacts with the data, the algorithms, and representations, to identify meaningful features. The third is the inclusion of a process for creating a mathematical representation of the features that have been "carved out" by the human analyst, for use in comparison, database query or analysis.
Visual Attention: Task and Image Quality: Joint Session with Conference 8293
icon_mobile_dropdown
Examining the effect of task on viewing behavior in videos using saliency maps
Research has shown that when viewing still images, people will look at these images in a different manner if instructed to evaluate their quality. They will tend to focus less on the main features of the image and, instead, scan the entire image area looking for clues for its level of quality. It is questionable, however, whether this finding can be extended to videos considering their dynamic nature. One can argue that when watching a video the viewer will always focus on the dynamically changing features of the video regardless of the given task. To test whether this is true, an experiment was conducted where half of the participants viewed videos with the task of quality evaluation while the other half were simply told to watch the videos as if they were watching a movie on TV or a video downloaded from the internet. The videos contained content which was degraded with compression artifacts over a wide range of quality. An eye tracking device was used to record the viewing behavior in both conditions. By comparing the behavior during each task, it was possible to observe a systematic difference in the viewing behavior which seemed to correlate to the quality of the videos.
Characterizing eye movements during temporal and global quality assessment of h.264 compressed video sequences
Claire Mantel, Nathalie Guyader, Patricia Ladret, et al.
Studies have shown that the deployment of visual attention is closely link to the assessment of image or video quality, though this link is not yet fully understood. The influence of rating temporal quality of compressed videos over the way an observer deploys his attention is investigated in this paper. We set-up a subjective experiment in which the eye movements of observers are recorded during three different tasks: a free-viewing task (FT), a global quality assessment task and a temporal quality assessment task. The FT acts as a reference to which we compare the eye movements during the two other tasks. As previously shown, observers assessing global quality gaze at locations dissimilar to those fixated during the FT. For temporal quality assessment, it seems that the fixated locations are closer to FT than the global quality assessment fixated locations. Our results suggest that the locations observers look at do not depend on the displayed video quality level. Quality however influences the way participants look at videos: the lower the quality, the longer they gaze at a precise location. The area fixated seems to be much smaller during the quality assessment tasks than during the FT for either perfect or poor quality level. The evolution over time of all indicators suggests that, during the first 1 or 2 seconds, the signal properties of the videos are the main attractors for the participants' eye movements. Instructions only seem to play a role afterwards on the deployment of the participants' visual attention.
A compressed sensing model of crowding in peripheral vision
Jens Hocke, Michael Dorr, Erhardt Barth
We here model peripheral vision in a compressed sensing framework as a strategy of optimally guessing what stimulus corresponds to a sparsely encoded peripheral representation, and find that typical letter-crowding effects naturally arise from this strategy. The model is simple as it consists of only two convergence stages. We apply the model to the problem of crowding effects in reading. First, we show a few instructive examples of letter images that were reconstructed from encodings with different convergence rates. Then, we present an initial analysis of how the choice of model parameters affects the distortion of isolated and flanked letters.
Foveated self-similarity in nonlocal image filtering
Nonlocal image filters suppress noise and other distortions by searching for similar patches at different locations within the image, thus exploiting the self-similarity present in natural images. This similarity is typically assessed by a windowed distance of the patches pixels. Inspired by the human visual system, we introduce a patch foveation operator and measure patch similarity through a foveated distance, where each patch is blurred with spatially variant point-spread functions having standard deviation increasing with the spatial distance from the patch center. In this way, we install a different form of self-similarity in images: the foveated self-similarity. We consider the Nonlocal Means algorithm (NL-means) for the removal of additive white Gaussian noise as a simple prototype of nonlocal image filtering and derive an explicit construction of its corresponding foveation operator, thus yielding the Foveated NL-means algorithm. Our analysis and experimental study show that, to the purpose of image denoising, the foveated self-similarity can be a far more effective regularity assumption than the conventional windowed self-similarity. In the comparison with NL-means, the proposed foveated algorithm achieves a substantial improvement in denoising quality, according to both objective criteria and visual appearance, particularly due to better contrast and sharpness. Moreover, foveation is introduced at a negligible cost in terms of computational complexity.
A statistical study of the correlation between interest points and gaze points
This study intends to measure the degree of correlation/similarity between the subjective gaze points (obtained by eye tracking experiments) and the objective interest points of several well-known detectors such as Harris and SURF. For each of the latter, we look for the best setting in term of maximization of likeness with the gaze points. For this task, the Earth Mover's Distance (EMD) is used to compare two data-sets with different cardinalities. We also used ANOVA to measure the influence of each parameter involved in the detectors' settings as well as the possible introduced bias. The conclusions of this study are related to the suitability of each detector to estimate the subjective gaze points.
Interest point analysis as a model for the Poggendorff illusion
This paper describes a recognition mechanism based on the relationships between interest points and their properties that is applied to the problem of modelling the Poggendorff illusion. The recognition mechanism is shown to perform in the same manner as human vision on the standard illusion and reduced effects are modelled on a variant without parallels. The model shows that the effect can be explained as a perceptual compromise between alignment of the elements in the oblique axis and their displacement from each other in the vertical. In addition an explanation is offered how obtuse angled variants of the Poggendorff figures yield stronger effects than the acute angled variants.
Art Theory, Perception, and Rendering
icon_mobile_dropdown
The perception of art and the science of perception
Robert Pepperell
For many centuries, artists have studied the nature of visual experience and how to convincingly render what we see. The results of these investigations can be found in all the countless artworks deposited in museums and galleries around the world. Works of art represent a rich source of ideas and understanding about how the world appears to us, and only relatively recently have those interested in the science of vision started to appreciate the many discoveries made by artists in this field. In this paper I will discuss some key insights into vision and perception revealed by artists, and show how they can help current thinking in science and technology about how best to understand the process of seeing. In particular, I will suggest some artistic ideas continue to present fundamental challenges to conventional ideas about the nature of visual experience and how it is represented.
Paintings, photographs, and computer graphics are calculated appearances
Painters reproduce the appearances they see, or visualize. The entire human visual system is the first part of that process, providing extensive spatial processing. Painters have used spatial techniques since the Renaissance to render HDR scenes. Silver halide photography responds to the light falling on single film pixels. Film can only mimic the retinal response of the cones at the start of the visual process. Film cannot mimic the spatial processing in humans. Digital image processing can. This talk studies three dramatic visual illusions and uses the spatial mechanisms found in human vision to interpret their appearances.
Image integrity and aesthetics: towards a more encompassing definition of visual quality
Visual quality is a multifaceted quantity that depends on multiple attributes of the image/video. According to Keelan's definition, artifactual attributes concern features of the image that when visible, are annoying and compromise the integrity of the image. Aesthetic attributes instead depend on the observer's personal taste. Both types of attributes have been studied in the literature in relation to visual quality, but never in conjunction with each other. In this paper we perform a psychometric experiment to investigate how artifactual and aesthetic attributes interact, and how they affect the viewing behavior. In particular, we studied to what extent the appearance of artifacts impacts the aesthetic quality of images. Our results indicate that indeed image integrity somehow influences the aesthetic quality scores. By means of an eye-tracker, we also recorded and analyzed the viewing behavior of our participants while scoring aesthetic quality. Results reveal that, when scoring aesthetic quality, viewing behavior significantly departs from the natural free looking, as well as from the viewing behavior observed for integrity scoring.
Depicting 3D shape using lines
Doug DeCarlo
Over the last few years, researchers in computer graphics have developed sophisticated mathematical descriptions of lines on 3D shapes that can be rendered convincingly as strokes in drawings. These innovations highlight fundamental questions about how human perception takes strokes in drawings as evidence of 3D structure. Answering these questions will lead to a greater scientific understanding of the flexibility and richness of human perception, as well as to practical techniques for synthesizing clearer and more compelling drawings. This paper reviews what is known about the mathematics and perception of computer-generated line drawings of shape and motivates an ongoing program of research to better characterize the shapes people see when they look at such drawings.
Box spaces in pictorial space: linear perspective versus templates
In the past decades perceptual (or perceived) image quality has been one of the most important criteria for evaluating digitally processed image and video content. With the growing popularity of new media like stereoscopic displays there is a tendency to replace image quality with viewing experience as the ultimate criterion. Adopting such a high-level psychological criterion calls for a rethinking of the premises underlying human judgment. One premise is that perception is about accurately reconstructing the physical world in front of you ("inverse optics"). That is, human vision is striving for veridicality. The present study investigated one of its consequences, namely, that linear perspective will always yield the correct description of the perceived 3D geometry in 2D images. To this end, human observers adjusted the frontal view of a wireframe box on a television screen so as to look equally deep and wide (i.e. to look like a cube) or twice as deep as wide. In a number of stimulus configurations, the results showed huge deviations from veridicality suggesting that the inverse optics model fails. Instead, the results seem to be more in line with a model of "vision as optical interface".
Sound meets image: freedom of expression in texture description
Reinier J. Jansen, René van Egmond, Huib de Ridder
The use of sound was explored as means for expressing perceptual attributes of visual textures. Two sets of 17 visual textures were prepared: one set taken from the CUReT database, and one set synthesized to replicate the former set. Participants were instructed to match a sound texture with a visual texture displayed onscreen. A modified version of a Product Sound Sketching Tool was provided, in which an interactive physical interface was coupled to a frequency modulation synthesizer. Rather than selecting from a pre-defined set of sound samples, continuous exploration of the auditory space allowed for an increased freedom of expression. While doing so, participants were asked to describe what auditory and visual qualities they were paying attention to. It was found that participants were able to create sounds that matched visual textures. Based on differences in diversity of descriptions, synthetic textures were found to have less salient perceptual attributes than their original counterparts. Finally, three interesting sound synthesis clusters were found, corresponding with mutually exclusive description vocabularies.
Computer Vision and Image Analysis of Art
icon_mobile_dropdown
Dynamics of aesthetic appreciation
Aesthetic appreciation is a complex cognitive processing with inherent aspects of cold as well as hot cognition. Research from the last decades of empirical has shown that evaluations of aesthetic appreciation are highly reliable. Most frequently, facial attractiveness was used as the corner case for investigating aesthetic appreciation. Evaluating facial attractiveness shows indeed high internal consistencies and impressively high inter-rater reliabilities, even across cultures. Although this indicates general and stable mechanisms underlying aesthetic appreciation, it is also obvious that our taste for specific objects changes dynamically. Aesthetic appreciation on artificial object categories, such as fashion, design or art is inherently very dynamic. Gaining insights into the cognitive mechanisms that trigger and enable corresponding changes of aesthetic appreciation is of particular interest for research as this will provide possibilities to modeling aesthetic appreciation for longer durations and from a dynamic perspective. The present paper refers to a recent two-step model ("the dynamical two-step-model of aesthetic appreciation"), dynamically adapting itself, which accounts for typical dynamics of aesthetic appreciation found in different research areas such as art history, philosophy and psychology. The first step assumes singular creative sources creating and establishing innovative material towards which, in a second step, people adapt by integrating it into their visual habits. This inherently leads to dynamic changes of the beholders' aesthetic appreciation.
Museum as an integrated imaging device: visualization of ancient Kyoto cityscape from folding screen artifact
Kimiyoshi Miyata, Umi Oyabu, Michihiro Kojima
Museums hold cultural resources such as artworks, historical artifacts, and folklore materials. The National Museum of Japanese History holds over 200,000 of the cultural resources. A role of museum is to exhibit the cultural resources, therefore a museum could be referred to as a visualization device for the information society. In this research, visualization of a history image from cultural resources with interactive user interface will be mentioned. The material focused on is the oldest extant version of a genre of folding screen paintings that depict the thriving city of Kyoto in the four seasons, named Rekihaku's "Scenes In and Around Kyoto" designated as a nationally important cultural property in Japan. Over 1,400 people and a lot of residences, temples, and houses are drawn, and those are also information resource telling us about city scenes and people's life in Kyoto at that time. Historical researches were done by using a high resolution digital image obtained by a large scaled scanner, and scanned images are used for computer programs to visualize a history image of ancient Kyoto. Combinations between real materials and information provided by using the computer programs are also described in this research.
In search of Leonardo: computer-based facial image analysis of Renaissance artworks for identifying Leonardo as subject
Christopher W. Tyler, William A. P. Smith, David G. Stork
One of the enduring mysteries in the history of the Renaissance is the adult appearance of the archetypical "Renaissance Man," Leonardo da Vinci. His only acknowledged self-portrait is from an advanced age, and various candidate images of younger men are difficult to assess given the absence of documentary evidence. One clue about Leonardo's appearance comes from the remark of the contemporary historian, Vasari, that the sculpture of David by Leonardo's master, Andrea del Verrocchio, was based on the appearance of Leonardo when he was an apprentice. Taking a cue from this statement, we suggest that the more mature sculpture of St. Thomas, also by Verrocchio, might also have been a portrait of Leonardo. We tested the possibility Leonardo was the subject for Verrocchio's sculpture by a novel computational technique for the comparison of three-dimensional facial configurations. Based on quantitative measures of similarities, we also assess whether another pair of candidate two-dimensional images are plausibly attributable as being portraits of Leonardo as a young adult. Our results are consistent with the claim Leonardo is indeed the subject in these works, but we need comparisons with images in a larger corpora of candidate artworks before our results achieve statistical significance.
Non-destructive analytical imaging of metallic surfaces using spectral measurements and ultrahigh-resolution scanning for cultural heritage investigation
Jun Kaneko, Jay Arre Toque, Yusuke Murayama, et al.
This paper presents a new approach for analyzing spectroscopic characteristic of metallic surfaces using spectroscopic and image analysis. This method is useful for contactless and non destructive analysis of cultural heritage. Spectral luminance, CIELAB, and CIEXYZ value of more than 80 metallic surfaces were measured with spectrometer and scanned to examine spectroscopic characteristics of foils by using multiband images. This analysis through imaging can improve the method for extracting difference related to types of metallic foils. Firstly, the spectral reflectance of each foil was measured ranging from 220 to 850 nm in steps of 1 nm. The images were captured with color and monochromatic camera using color filter in order to analyze the targets by multispectral approach. Then, principal component analysis (PCA) was conducted with image pixel value of each target. The results have shown that the spectral reflectance whose peak and change rate at a particular wavelength region differed from each foils, and that the multispectral images extracted the difference in spectral characteristics related to different types of metallic foils and Japanese papers. This could be useful in distinguishing among foils. This provides some promise that unknown metallic foils can be identified through the measurement of their spectroscopic features.
Mapping colors from paintings to tapestries: rejuvenating the faded colors in tapestries based on colors in reference paintings
Marie Ström, Eija Johansson, David G. Stork
We addressed the problem of recovering or "rejuvenating" colors in a faded tapestry (the target image) by automatically mapping colors from digital images of the reference painting or cartoon (the source image). We divided the overall computational challenge into several subproblems, and implemented their solutions in Matlab: 1) quantizing the colors in both the tapestry and the referent painting, 2) matching corresponding regions in the works based on spatial location, area and color, 3) mapping colors from regions in the source image to corresponding regions in the target image, and 4) estimating the fading of colors and excluding color mapping of areas where color has not faded. There are a number of semi-global design parameters in our algorithm, which we currently set by hand, for instance the parameter that balance the effects of 1) matching color areas in the two images and 2) matching the spatial locations between the two images. We demonstrated our method on synthetic data and on small portions of 16th-century European tapestries of cultural patrimony. Our first steps in a proof-of-concept work suggest that future refinements to our method will lead to a software tool of value to art scholars.
Interactive Paper Session: Art and Perception
icon_mobile_dropdown
Tracking the aging process by multiple 3D scans analysis
Currently, a lot of different 3D scanning devices are used for 3D acquisition of art artifact surface shape and color. Each of them has different technical parameters starting from measurement principle (structured light, laser triangulation, interferometry, holography) and ending on parameters like measurement volume size, spatial resolution and precision of output data and color information. Some of the 3D scanners can grab additional information like surface normal vectors, BRDF distribution, multispectral color. In this paper, we plan to present results of the measurements with selected sampling densities together with discussion of the problem of recognition and assessment of the aging process. We focus our interest on features that are important for the art conservators to define state of preservation of the object as well as to assess changes on the surface from last and previous measurement. Also different materials and finishing techniques requires different algorithms for detection and localization of aging changes. In this paper we consider exemplary stone samples to visualize what object features can be detected and tracked during aging process. The changes in sandstone surface shape, affected by salt weathering, will be presented as well as possibilities of identification of surface degradation on real object (garden relief made in sandstone).
Aesthetics and entropy: optimization of the brightness distribution
The purpose of this work is to suggest a way to utilize image statistics to guide optimization of brightness distributions, towards a goal of complete systematization of image processing to achieve a purely aesthetic objective, with entropy as a response metric. We start with a survey of classic pictorial photographs, proceed to a heuristic theoretical treatment of the brightness distribution function, and follow with several pictorial illustrations of the proposed concept.
PHOG analysis of self-similarity in aesthetic images
Seyed Ali Amirshahi, Michael Koch, Joachim Denzler, et al.
In recent years, there have been efforts in defining the statistical properties of aesthetic photographs and artworks using computer vision techniques. However, it is still an open question how to distinguish aesthetic from non-aesthetic images with a high recognition rate. This is possibly because aesthetic perception is influenced also by a large number of cultural variables. Nevertheless, the search for statistical properties of aesthetic images has not been futile. For example, we have shown that the radially averaged power spectrum of monochrome artworks of Western and Eastern provenance falls off according to a power law with increasing spatial frequency (1/f2 characteristics). This finding implies that this particular subset of artworks possesses a Fourier power spectrum that is self-similar across different scales of spatial resolution. Other types of aesthetic images, such as cartoons, comics and mangas also display this type of self-similarity, as do photographs of complex natural scenes. Since the human visual system is adapted to encode images of natural scenes in a particular efficient way, we have argued that artists imitate these statistics in their artworks. In support of this notion, we presented results that artists portrait human faces with the self-similar Fourier statistics of complex natural scenes although real-world photographs of faces are not self-similar. In view of these previous findings, we investigated other statistical measures of self-similarity to characterize aesthetic and non-aesthetic images. In the present work, we propose a novel measure of self-similarity that is based on the Pyramid Histogram of Oriented Gradients (PHOG). For every image, we first calculate PHOG up to pyramid level 3. The similarity between the histograms of each section at a particular level is then calculated to the parent section at the previous level (or to the histogram at the ground level). The proposed approach is tested on datasets of aesthetic and non-aesthetic categories of monochrome images. The aesthetic image datasets comprise a large variety of artworks of Western provenance. Other man-made aesthetically pleasing images, such as comics, cartoons and mangas, were also studied. For comparison, a database of natural scene photographs is used, as well as datasets of photographs of plants, simple objects and faces that are in general of low aesthetic value. As expected, natural scenes exhibit the highest degree of PHOG self-similarity. Images of artworks also show high selfsimilarity values, followed by cartoons, comics and mangas. On average, other (non-aesthetic) image categories are less self-similar in the PHOG analysis. A measure of scale-invariant self-similarity (PHOG) allows a good separation of the different aesthetic and non-aesthetic image categories. Our results provide further support for the notion that, like complex natural scenes, images of artworks display a higher degree of self-similarity across different scales of resolution than other image categories. Whether the high degree of self-similarity is the basis for the perception of beauty in both complex natural scenery and artworks remains to be investigated.
Interactive Paper Session: Perception and Image Quality
icon_mobile_dropdown
Influence of the source content and encoding configuration on the perceived quality for scalable video coding
Yohann Pitrey, Marcus Barkowsky, Romuald Pépion, et al.
In video coding, it is commonly accepted that the encoding parameters such as the quantization step-size have an influence on the perceived quality. It is also sometimes accepted that using given encoding parameters, the perceived quality does not change significantly according to the encoded source content. In this paper, we present the outcomes of two video subjective quality assessment experiments in the context of Scalable Video Coding. We encoded a large set of video sequences under a group of constant quality scenarios based on two spatially scalable layers. One first experiment explores of the relation between a wide range of quantization parameters for each layer and the perceived quality, while the second experiment uses a subset of the encoding scenarios on a large number of video sequences. The two experiments are aligned on a common scale using a set of shared processed video sequences, resulting in a database containing the subjective scores for 60 different sources combined with 20 SVC scenarios. We propose a detailed analysis of the experimental results of the two experiments, bringing a clear insight of the relation between the encoding parameters combination of the scalable layers and the perceived quality, as well as spreading light on the differences in terms of quality depending on the encoded source content. As an endeavour to analyse these differences, we propose a classification of the sources with regards to their relative behaviour when compared to the average of other source contents. We use this classification to identify potential factors to explain the differences between source contents.
Virtual displays for 360-degree video
Stephen Gilbert, Wutthigrai Boonsuk, Jonathan W. Kelly
In this paper we describe a novel approach for comparing users' spatial cognition when using different depictions of 360- degree video on a traditional 2D display. By using virtual cameras within a game engine and texture mapping of these camera feeds to an arbitrary shape, we were able to offer users a 360-degree interface composed of four 90-degree views, two 180-degree views, or one 360-degree view of the same interactive environment. An example experiment is described using these interfaces. This technique for creating alternative displays of wide-angle video facilitates the exploration of how compressed or fish-eye distortions affect spatial perception of the environment and can benefit the creation of interfaces for surveillance and remote system teleoperation.
An evaluation of different setups for simulating lighting characteristics
Bart Salters, Michael Murdoch, Dragan Sekulovksi, et al.
The advance of technology continuously enables new luminaire designs and concepts. Evaluating such designs has traditionally been done using actual prototypes, in a real environment. The iterations needed to build, verify, and improve luminaire designs incur substantial costs and slow down the design process. A more attractive way is to evaluate designs using simulations, as they can be made cheaper and quicker for a wider variety of prototypes. However, the value of such simulations is determined by how closely they predict the outcome of actual perception experiments. In this paper, we discuss an actual perception experiment including several lighting settings in a normal office environment. The same office environment also has been modeled using different software tools, and photo-realistic renderings have been created of these models. These renderings were subsequently processed using various tonemapping operators in preparation for display. The total imaging chain can be considered a simulation setup, and we have executed several perception experiments on different setups. Our real interest is in finding which imaging chain gives us the best result, or in other words, which of them yields the closest match between virtual and real experiment. To answer this question, first of all an answer has to be found to the question, "which simulation setup matches the real world best?" As there is no unique, widely accepted measure to describe the performance of a certain setup, we consider a number of options and discuss the reasoning behind them along with their advantages and disadvantages.
Biological visual attention guided automatic image segmentation with application in satellite imaging
M. I. Sina, A.-M. Cretu, P. Payeur
Taking inspiration from the significantly superior performance of humans to extract and interpret visual information, the exploitation of biological visual mechanisms can contribute to the improvement of the performance of computational image processing systems. Computational models of visual attention have already been shown to significantly improve the speed of scene understanding by attending only the regions of interest, while distributing the resources where they are required. However, there are only few attention-based computational systems that have been used in practical applications dealing with real data and up to now, none of the computational attention models was demonstrated to work under a wide range of image content, characteristics and scales such as those encountered in satellite imaging. This paper outlines some of the difficulties that the current generation of visual attention-inspired models encounter when dealing with satellite images. It then proposes a novel algorithm for automatic image segmentation and regions of interest search that combines elements of human visual attention with Legendre moments applied on the probability density function of color histograms. The experimental results demonstrate that the proposed approach obtains better results than one of the most evolved current computational attention model proposed in the literature.
A neurobiologically based two-stage model for human color vision
Charles Q. Wu
Presently there are two dominant theories for human color vision: Young-Helmholtz-Maxwell's Trichromatic Theory and Hering's Opponent-Color Theory. It has been widely accepted that the Trichromatic Theory holds true for retinal color processing whereas the Opponent-Color Theory works for cortical color processing-this conception has become the "Standard Model" for human color vision. My purposes in the present paper are threefold: First, to demonstrate that the Opponent-Color Theory is fundamentally untenable, based on both theoretical and empirical grounds; second, to resurrect a two-stage trichromatic model, in which both retinal and cortical color processing are trichromatic, proposed by W. McDougall more than a century ago; and third, to map the cortical color processing stage (which directly correlates with color consciousness) in this model to layer 4 within the primary visual cortex (V1) of the human brain.
The oscillatory activities and its synchronization in auditory-visual integration as revealed by event-related potentials to bimodal stimuli
Jia Guo, Peng Xu, Li Yao, et al.
Neural mechanism of auditory-visual speech integration is always a hot study of multi-modal perception. The articulation conveys speech information that helps detect and disambiguate the auditory speech. As important characteristic of EEG, oscillations and its synchronization have been applied to cognition research more and more. This study analyzed the EEG data acquired by unimodal and bimodal stimuli using time frequency and phase synchrony approach, investigated the oscillatory activities and its synchrony modes behind evoked potential during auditory-visual integration, in order to reveal the inherent neural integration mechanism under these modes. It was found that beta activity and its synchronization differences had relationship with gesture N1-P2, which happened in the earlier stage of speech coding to pronouncing action. Alpha oscillation and its synchronization related with auditory N1-P2 might be mainly responsible for auditory speech process caused by anticipation from gesture to sound feature. The visual gesture changing enhanced the interaction of auditory brain regions. These results provided explanations to the power and connectivity change of event-evoked oscillatory activities which matched ERPs during auditory-visual speech integration.
Quality assessment of images illuminated by dim LCD backlight
Tai-Hsiang Huang, Chen-Tai Kao, Homer H. Chen
We consider the quality assessment of images displayed on a liquid crystal display (LCD) with dim backlight-a situation where the power consumption of the LCD is set to a low level. This energy saving mode of LCD decreases the perceived image quality. In particular, some image regions may appear so dark that they become non-perceptible to human eye. The problem becomes more severe when the image is illuminated with very dim backlight. Ignoring the effect of dim backlight on image quality assessment and directly applying an image quality assessment metric to the entire image may produce results inconsistent with human evaluation. We propose a method to fix the problem. The proposed method works as a precursor of image quality assessment. Specifically, given an image and the backlight intensity level of the LCD on which the image is to be displayed, the method automatically classifies the pixels of an image into perceptible and non-perceptible pixels according to the backlight intensity level and excludes the nonperceptible pixels from quality assessment. Experimental results are shown to demonstrate the performance of the proposed method.
Parallax scanning methods for stereoscopic three-dimensional imaging
Under certain circumstances, conventional stereoscopic imagery is subject to being misinterpreted. Stereo perception created from two static horizontally separated views can create a "cut out" 2D appearance for objects at various planes of depth. The subject volume looks three-dimensional, but the objects themselves appear flat. This is especially true if the images are captured using small disparities. One potential explanation for this effect is that, although three-dimensional perception comes primarily from binocular vision, a human's gaze (the direction and orientation of a person's eyes with respect to their environment) and head motion also contribute additional sub-process information. The absence of this information may be the reason that certain stereoscopic imagery appears "odd" and unrealistic. Another contributing factor may be the absence of vertical disparity information in a traditional stereoscopy display. Recently, Parallax Scanning technologies have been introduced, which provide (1) a scanning methodology, (2) incorporate vertical disparity, and (3) produce stereo images with substantially smaller disparities than the human interocular distances.1 To test whether these three features would improve the realism and reduce the cardboard cutout effect of stereo images, we have applied Parallax Scanning (PS) technologies to commercial stereoscopic digital cinema productions and have tested the results with a panel of stereo experts. These informal experiments show that the addition of PS information into the left and right image capture improves the overall perception of three-dimensionality for most viewers. Parallax scanning significantly increases the set of tools available for 3D storytelling while at the same time presenting imagery that is easy and pleasant to view.
Reduced reference image quality assessment via sub-image similarity based redundancy measurement
The reduced reference (RR) image quality assessment (IQA) has been attracting much attention from researchers for its loyalty to human perception and flexibility in practice. A promising RR metric should be able to predict the perceptual quality of an image accurately while using as few features as possible. In this paper, a novel RR metric is presented, whose novelty lies in two aspects. Firstly, it measures the image redundancy by calculating the so-called Sub-image Similarity (SIS), and the image quality is measured by comparing the SIS between the reference image and the test image. Secondly, the SIS is computed by the ratios of NSE (Non-shift Edge) between pairs of sub-images. Experiments on two IQA databases (i.e. LIVE and CSIQ databases) show that by using only 6 features, the proposed metric can work very well with high correlations between the subjective and objective scores. In particular, it works consistently well across all the distortion types.
Color impact in visual attention deployment considering emotional images
C. Chamaret
Color is a predominant factor in the human visual attention system. Even if it cannot be sufficient to the global or complete understanding of a scene, it may impact the visual attention deployment. We propose to study the color impact as well as the emotion aspect of pictures regarding the visual attention deployment. An eye-tracking campaign has been conducted involving twenty people watching half pictures of database in full color and the other half of database in grey color. The eye fixations of color and black and white images were highly correlated leading to the question of the integration of such cues in the design of visual attention model. Indeed, the prediction of two state-of-the-art computational models shows similar results for the two color categories. Similarly, the study of saccade amplitude and fixation duration versus time viewing did not bring any significant differences between the two mentioned categories. In addition, spatial coordinates of eye fixations reveal an interesting indicator for investigating the differences of visual attention deployment over time and fixation number. The second factor related to emotion categories shows evidences of emotional inter-categories differences between color and grey eye fixations for passive and positive emotion. The particular aspect associated to this category induces a specific behavior, rather based on high frequencies, where the color components influence the visual attention deployment.