Proceedings Volume 9014

Human Vision and Electronic Imaging XIX

cover
Proceedings Volume 9014

Human Vision and Electronic Imaging XIX

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 18 March 2014
Contents: 12 Sessions, 43 Papers, 0 Presentations
Conference: IS&T/SPIE Electronic Imaging 2014
Volume Number: 9014

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 9014
  • HVEI Keynote Session I
  • Auditory/Visual Interactions: From Detection to Cognition
  • Perception and Appearance of Materials: Texture, Luminance, and Noise: Joint Session with Conferences 9014 and 9018
  • Real-World and Natural Environments
  • Methodologies for Quantifying Perceptual Quality: Joint Session with Conferences 9014 and 9016
  • Perceptual Issues for Video
  • Quality of Experience: Cognition, Emotion, and Aesthetics
  • Quality of Experience: User Experience in a Social Context
  • Color Perception and Applications: The Bright Side of Color
  • Art, Perception, and Pictorial Space
  • Interactive Paper Session
Front Matter: Volume 9014
icon_mobile_dropdown
Front Matter: Volume 9014
This PDF file contains the front matter associated with SPIE Proceedings Volume 9014, including the Title Page, Copyright information, Table of Contents, Invited Panel Discussion, and Conference Committee listing.
HVEI Keynote Session I
icon_mobile_dropdown
Seven challenges for image quality research
Image quality assessment has been a topic of recent intense research due to its usefulness in a wide variety of applications. Owing in large part to efforts within the HVEI community, image-quality research has particularly benefited from improved models of visual perception. However, over the last decade, research in image quality has largely shifted from the previous broader objective of gaining a better understanding of human vision, to the current limited objective of better fitting the available ground-truth data. In this paper, we discuss seven open challenges in image quality research. These challenges stem from lack of complete perceptual models for: natural images; suprathreshold distortions; interactions between distortions and images; images containing multiple and nontraditional distortions; and images containing enhancements. We also discuss challenges related to computational efficiency. The objective of this paper is not only to highlight the limitations in our current knowledge of image quality, but to also emphasize the need for additional fundamental research in quality perception.
Auditory/Visual Interactions: From Detection to Cognition
icon_mobile_dropdown
Audiovisual focus of attention and its application to Ultra High Definition video compression
Martin Rerabek, Hiromi Nemoto, Jong-Seok Lee, et al.
Using Focus of Attention (FoA) as a perceptual process in image and video compression belongs to well-known approaches to increase coding efficiency. It has been shown that foveated coding, when compression quality varies across the image according to region of interest, is more efficient than the alternative coding, when all region are compressed in a similar way. However, widespread use of such foveated compression has been prevented due to two main conflicting causes, namely, the complexity and the efficiency of algorithms for FoA detection. One way around these is to use as much information as possible from the scene. Since most video sequences have an associated audio, and moreover, in many cases there is a correlation between the audio and the visual content, audiovisual FoA can improve efficiency of the detection algorithm while remaining of low complexity. This paper discusses a simple yet efficient audiovisual FoA algorithm based on correlation of dynamics between audio and video signal components. Results of audiovisual FoA detection algorithm are subsequently taken into account for foveated coding and compression. This approach is implemented into H.265/HEVC encoder producing a bitstream which is fully compliant to any H.265/HEVC decoder. The influence of audiovisual FoA in the perceived quality of high and ultra-high definition audiovisual sequences is explored and the amount of gain in compression efficiency is analyzed.
Influence of audio triggered emotional attention on video perception
Freddy Torres, Hari Kalva
Perceptual video coding methods attempt to improve compression efficiency by discarding visual information not perceived by end users. Most of the current approaches for perceptual video coding only use visual features ignoring the auditory component. Many psychophysical studies have demonstrated that auditory stimuli affects our visual perception. In this paper we present our study of audio triggered emotional attention and it’s applicability to perceptual video coding. Experiments with movie clips show that the reaction time to detect video compression artifacts was longer when video was presented with the audio information. The results reported are statistically significant with p=0.024.
3D sound and 3D image interactions: a review of audio-visual depth perception
Jonathan S. Berry, David A. T. Roberts, Nicolas S. Holliman
There has been much research concerning visual depth perception in 3D stereoscopic displays and, to a lesser extent, auditory depth perception in 3D spatial sound systems. With 3D sound systems now available in a number of different forms, there is increasing interest in the integration of 3D sound systems with 3D displays. It therefore seems timely to review key concepts and results concerning depth perception in such display systems. We first present overviews of both visual and auditory depth perception, before focussing on cross-modal effects in audio-visual depth perception, which may be of direct interest to display and content designers.
Perception and Appearance of Materials: Texture, Luminance, and Noise: Joint Session with Conferences 9014 and 9018
icon_mobile_dropdown
Roughness vs. contrast in natural textures
We investigate the effect of contrast enhancement on the subjective roughness of visual textures. Our analysis is based on subjective experiments with seventeen images from the CUReT database in three variants: original, synthesized textures, and contrast-enhanced synthesized textures. In Experiment 1, participants were asked to adjust the contrast of a synthesized image so that it became similar in roughness to the original image. A new adaptive procedure that extends the staircase paradigm was used for efficient placement of the stimuli. In Experiment 2, the subjective roughness and the subjective contrast of the original, synthesized, and contrastenhanced synthesized images were determined using a pairwise comparison paradigm. The results of the two experiments show that although contrast enhancement of a synthesized image results in a similar subjective roughness as the original, the subjective contrast of that image is considerably higher than that of the original image. Future research should give more insights in the interaction between roughness and contrast.
An investigation of visual selection priority of objects with texture and crossed and uncrossed disparities
Dar'ya Khaustova, Jérôme Fournier, Emmanuel Wyckens, et al.
The aim of this research is to understand the difference in visual attention to 2D and 3D content depending on texture and amount of depth. Two experiments were conducted using an eye-tracker and a 3DTV display. Collected fixation data were used to build saliency maps and to analyze the differences between 2D and 3D conditions. In the first experiment 51 observers participated in the test. Using scenes that contained objects with crossed disparity, it was discovered that such objects are the most salient, even if observers experience discomfort due to the high level of disparity. The goal of the second experiment is to decide whether depth is a determinative factor for visual attention. During the experiment, 28 observers watched the scenes that contained objects with crossed and uncrossed disparities. We evaluated features influencing the saliency of the objects in stereoscopic conditions by using contents with low-level visual features. With univariate tests of significance (MANOVA), it was detected that texture is more important than depth for selection of objects. Objects with crossed disparity are significantly more important for selection processes when compared to 2D. However, objects with uncrossed disparity have the same influence on visual attention as 2D objects. Analysis of eyemovements indicated that there is no difference in saccade length. Fixation durations were significantly higher in stereoscopic conditions for low-level stimuli than in 2D. We believe that these experiments can help to refine existing models of visual attention for 3D content.
Memory texture as a mechanism of improvement in preference by adding noise
Yinzhu Zhao, Naokazu Aoki, Hiroyuki Kobayashi
According to color research, people have memory colors for familiar objects, which correlate with high color preference. As a similar concept to this, we propose memory texture as a mechanism of texture preference by adding image noise (1/f noise or white noise) to photographs of seven familiar objects. Our results showed that (1) memory texture differed from real-life texture; (2) no consistency was found between memory texture and real-life texture; (3) correlation existed between memory texture and preferred texture; and (4) the type of image noise which is more appropriate to texture reproduction differed by object.
Real-World and Natural Environments
icon_mobile_dropdown
Computer vision enhances mobile eye-tracking to expose expert cognition in natural-scene visual-search tasks
Tommy P. Keane, Nathan D. Cahill, John A. Tarduno, et al.
Mobile eye-tracking provides the fairly unique opportunity to record and elucidate cognition in action. In our research, we are searching for patterns in, and distinctions between, the visual-search performance of experts and novices in the geo-sciences. Traveling to regions resultant from various geological processes as part of an introductory field studies course in geology, we record the prima facie gaze patterns of experts and novices when they are asked to determine the modes of geological activity that have formed the scene-view presented to them. Recording eye video and scene video in natural settings generates complex imagery that requires advanced applications of computer vision research to generate registrations and mappings between the views of separate observers. By developing such mappings, we could then place many observers into a single mathematical space where we can spatio-temporally analyze inter- and intra-subject fixations, saccades, and head motions. While working towards perfecting these mappings, we developed an updated experiment setup that allowed us to statistically analyze intra-subject eye-movement events without the need for a common domain. Through such analyses we are finding statistical differences between novices and experts in these visual-search tasks. In the course of this research we have developed a unified, open-source, software framework for processing, visualization, and interaction of mobile eye-tracking and high-resolution panoramic imagery.
An adaptive hierarchical sensing scheme for sparse signals
Henry Schütze, Erhardt Barth, Thomas Martinetz
In this paper, we present Adaptive Hierarchical Sensing (AHS), a novel adaptive hierarchical sensing algorithm for sparse signals. For a given but unknown signal with a sparse representation in an orthogonal basis, the sensing task is to identify its non-zero transform coefficients by performing only few measurements. A measurement is simply the inner product of the signal and a particular measurement vector. During sensing, AHS partially traverses a binary tree and performs one measurement per visited node. AHS is adaptive in the sense that after each measurement a decision is made whether the entire subtree of the current node is either further traversed or omitted depending on the measurement value. In order to acquire an N -dimensional signal that is K-sparse, AHS performs O(K log N/K) measurements. With AHS, the signal is easily reconstructed by a basis transform without the need to solve an optimization problem. When sensing full-size images, AHS can compete with a state-of-the-art compressed sensing approach in terms of reconstruction performance versus number of measurements. Additionally, we simulate the sensing of image patches by AHS and investigate the impact of the choice of the sparse coding basis as well as the impact of the tree composition.
Referenceless perceptual fog density prediction model
We propose a perceptual fog density prediction model based on natural scene statistics (NSS) and “fog aware” statistical features, which can predict the visibility in a foggy scene from a single image without reference to a corresponding fogless image, without side geographical camera information, without training on human-rated judgments, and without dependency on salient objects such as lane markings or traffic signs. The proposed fog density predictor only makes use of measurable deviations from statistical regularities observed in natural foggy and fog-free images. A fog aware collection of statistical features is derived from a corpus of foggy and fog-free images by using a space domain NSS model and observed characteristics of foggy images such as low contrast, faint color, and shifted intensity. The proposed model not only predicts perceptual fog density for the entire image but also provides a local fog density index for each patch. The predicted fog density of the model correlates well with the measured visibility in a foggy scene as measured by judgments taken in a human subjective study on a large foggy image database. As one application, the proposed model accurately evaluates the performance of defog algorithms designed to enhance the visibility of foggy images.
Dynamics of backlight luminance for using smartphone in dark environment
Nooree Na, Jiho Jang, Hyeon-Jeong Suk
This study developed dynamic backlight luminance, which gradually changes as time passes for comfortable use of a smartphone display in a dark environment. The study was carried out in two stages. In the first stage, a user test was conducted to identify the optimal luminance by assessing the facial squint level, subjective glare evaluation, eye blink frequency and users’ subjective preferences. Based on the results of the user test, the dynamics of backlight luminance was designed. It has two levels of luminance: the optimal level for initial viewing to avoid sudden glare or fatigue to users' eyes, and the optimal level for constant viewing, which is comfortable, but also bright enough for constant reading of the displayed material. The luminance for initial viewing starts from 10 cd/m2, and it gradually increases to 40 cd/m2 for users’ visual comfort at constant viewing for 20 seconds; In the second stage, a validation test on dynamics of backlight luminance was conducted to verify the effectiveness of the developed dynamics. It involving users' subjective preferences, eye blink frequency, and brainwave analysis using the electroencephalogram (EEG) to confirm that the proposed dynamic backlighting enhances users' visual comfort and visual cognition, particularly for using smartphones in a dark environment.
Effects of image size and interactivity in lighting visualization
Michael J. Murdoch, Mariska G. M. Stokkermans
Rendered images of varied lighting conditions in a virtual environment have been shown to provide a perceptually accurate visual impression of those in a real environment, providing a valuable tool set for the development and communication of new lighting solutions. In order to further improve this tool set, an experiment was conducted to assess the impact of image size and viewing interactivity on perceptual accuracy. It was found that a high-quality TVsized display outperforms a smaller laptop screen and a larger projected image on most measures, and that the expected value of the interactive panoramic format was masked by the fatigue of using it repeatedly.
On the delights of being an ex-cataract patient: Visual experiences before and after cataract operations; what they indicate
This paper is about changes in the author's visual perception over most of his lifetime, but in particular in the period before and after cataract operations. The author was myopic (-3D) until the operations, and emmetropic afterwards – with mild astigmatic aberrations that can be compensated with cylindrical spectacles, but in his case rarely are, because of the convenience of not needing to wear distance glasses in daily life anymore. The perceptual changes concern color vision, stereopsis and visual acuity. The post-cataract changes were partly expected, for example less yellow and more blue images, but partly wholly unexpected, and accompanied by feelings of excitement and pleasure; even delight. These unexpected changes were a sudden, strongly increased depth vision and the sensation of seeing suddenly sharper than ever before, mainly at intermediate viewing distances. The visual acuity changes occur after, exceptionally, his distance glasses are put on. All these sensations lasted or last only for a short time. Those concerning stereopsis were dubbed 'super depth', and were confined to the first two months after the second cataract operation. Those concerning acuity were termed 'super-sharpness impression'; SSI. These can be elicited more or less at will, by putting on the spectacles described; but will then disappear again, although the spectacles are kept on. Ten other ex-cataract patients have been interviewed on their post-operation experiences. The 'super-depth' and SSI experiences may be linked to assumed neurophysiological mechanisms such as the concept of Bayesian reweighting of perceptual criteria.
Methodologies for Quantifying Perceptual Quality: Joint Session with Conferences 9014 and 9016
icon_mobile_dropdown
X-Eye: A reference format for eye tracking data to facilitate analyses across databases
Stefan Winkler, Florian M. Savoy, Ramanathan Subramanian
Datasets of images annotated with eye tracking data constitute important ground truth for the development of saliency models, which have applications in many areas of electronic imaging. While comparisons and reviews of saliency models abound, similar comparisons among the eye tracking databases themselves are rare. In an earlier paper, we reviewed the content and purpose of over two dozen databases available in the public domain and discussed their commonalities and differences. A major issue is that the formats of the various datasets vary a lot owing to the nature of tools used for eye movement recordings, and often specialized code is required to use the data for further analysis. In this paper, we therefore propose a common reference format for eye tracking data, together with conversion routines for 16 existing image eye tracking databases to that format. Furthermore, we conduct a few analyses on these datasets as examples of what X-Eye facilitates.
Modeling the leakage of LCD displays with local backlight for quality assessment
Claire Mantel, Jari Korhonen, Jesper Melgaard Pedersen, et al.
The recent technique of local backlight dimming has a significant impact on the quality of images displayed with a LCD screen with LED local dimming. Therefore it represents a necessary step in the quality assessment chain, independently from the other processes applied to images. This paper investigates the modeling of one of the major spatial artifacts produced by local dimming: leakage. Leakage appears in dark areas when the backlight level is too high for LC cells to block sufficiently and the final displayed brightness is higher than it should. A subjective quality experiment was run on videos displayed on LCD TV with local backlight dimming viewed from a 0° and 15° angles. The subjective results are then compared objective data using different leakage models: constant over the whole display or horizontally varying and three leakage factor (no leakage, measured at 0° and 15° respectively). Results show that for dark sequences accounting for the leakage artifact in the display model is definitely an improvement. Approximating that leakage is constant over the screen seems valid when viewing from a 15° angle while using a horizontally varying model might prove useful for 0° viewing.
On improving the pooling in HDR-VDP-2 towards better HDR perceptual quality assessment
Manish Narwaria, Matthieu Perreira Da Silva, Patrick Le Callet, et al.
High Dynamic Range (HDR) signals capture much higher contrasts as compared to the traditional 8-bit low dynamic range (LDR) signals. This is achieved by representing the visual signal via values that are related to the real-world luminance, instead of gamma encoded pixel values which is the case with LDR. Therefore, HDR signals cover a larger luminance range and tend to have more visual appeal. However, due to the higher luminance conditions, the existing methods cannot be directly employed for objective quality assessment of HDR signals. For that reason, the HDR Visual Difference Predictor (HDR-VDP-2) has been proposed. HDR-VDP-2 is primarily a visibility prediction metric i.e. whether the signal distortion is visible to the eye and to what extent. Nevertheless, it also employs a pooling function to compute an overall quality score. This paper focuses on the pooling aspect in HDR-VDP-2 and employs a comprehensive database of HDR images (with their corresponding subjective ratings) to improve the prediction accuracy of HDR-VDP-2. We also discuss and evaluate the existing objective methods and provide a perspective towards better HDR quality assessment.
Perceptual Issues for Video
icon_mobile_dropdown
Theory and practice of perceptual video processing in broadcast encoders for cable, IPTV, satellite, and internet distribution
This paper describes the theory and application of a perceptually-inspired video processing technology that was recently incorporated into professional video encoders now being used by major cable, IPTV, satellite, and internet video service providers. We will present data that show that this perceptual video processing (PVP) technology can improve video compression efficiency by up to 50% for MPEG-2, H.264, and High Efficiency Video Coding (HEVC). The PVP technology described in this paper works by forming predicted eye-tracking attractor maps that indicate how likely it might be that a free viewing person would look at particular area of an image or video. We will introduce in this paper the novel model and supporting theory used to calculate the eye-tracking attractor maps. We will show how the underlying perceptual model was inspired by electrophysiological studies of the vertebrate retina, and will explain how the model incorporates statistical expectations about natural scenes as well as a novel method for predicting error in signal estimation tasks. Finally, we will describe how the eye-tracking attractor maps are created in real time and used to modify video prior to encoding so that it is more compressible but not noticeably different than the original unmodified video.
Temporal perceptual coding using a visual acuity model
Velibor Adzic, Robert A. Cohen, Anthony Vetro
This paper describes research and results in which a visual acuity (VA) model of the human visual system (HVS) is used to reduce the bitrate of coded video sequences, by eliminating the need to signal transform coefficients when their corresponding frequencies will not be detected by the HVS. The VA model is integrated into the state of the art HEVC HM codec. Compared to the unmodified codec, up to 45% bitrate savings are achieved while maintaining the same subjective quality of the video sequences. Encoding times are reduced as well.
Characterizing perceptual artifacts in compressed video streams
Kai Zeng, Tiesong Zhao, Abdul Rehman, et al.
To achieve optimal video quality under bandwidth and power constraints, modern video coding techniques employ lossy coding schemes, which often create compression artifacts that may lead to degradation of perceptual video quality. Understanding and quantifying such perceptual artifacts play important roles in the development of effective video compression, streaming and quality enhancement systems. Moreover, the characteristics of compression artifacts evolve over time due to the continuous adoption of novel coding structures and strategies during the development of new video compression standards. In this paper, we reexamine the perceptual artifacts created by standard video compression, summarizing commonly observed spatial and temporal perceptual distortions in compressed video, with emphasis on the perceptual temporal artifacts that have not been well identified or accounted for in previous studies. Furthermore, a floating effect detection method is proposed that not only detects the existence of floating, but also segments the spatial regions where floating occurs∗.
Zero shot prediction of video quality using intrinsic video statistics
We propose a no reference (NR) video quality assessment (VQA) model. Recently, ‘completely blind’ still picture quality analyzers have been proposed that do not require any prior training on, or exposure to, distorted images or human opinions of them. We have been trying to bridge an important but difficult gap by creating a ‘completely blind’ VQA model. The principle of this new approach is founded on intrinsic statistical regularities that are observed in natural vidoes. This results in a video ‘quality analyzer’ that can predict the quality of distorted videos without any external knowledge about the pristine source, anticipated distortions or human judgments. Hence, the model is zero shot. Experimental results show that, even with such paucity of information, the new VQA algorithm performs better than the full reference (FR) quality measure PSNR on the LIVE VQA database. It is also fast and efficient. We envision that the proposed method is an important step towards making real time monitoring of ‘completely blind’ video quality feasible.
Quality of Experience: Cognition, Emotion, and Aesthetics
icon_mobile_dropdown
Personalized visual aesthetics
Edward A. Vessel, Jonathan Stahl, Natalia Maurer, et al.
How is visual information linked to aesthetic experience, and what factors determine whether an individual finds a particular visual experience pleasing? We have previously shown that individuals’ aesthetic responses are not determined by objective image features but are instead a function of internal, subjective factors that are shaped by a viewers’ personal experience. Yet for many classes of stimuli, culturally shared semantic associations give rise to similar aesthetic taste across people. In this paper, we investigated factors that govern whether a set of observers will agree in which images are preferred, or will instead exhibit more “personalized” aesthetic preferences. In a series of experiments, observers were asked to make aesthetic judgments for different categories of visual stimuli that are commonly evaluated in an aesthetic manner (faces, natural landscapes, architecture or artwork). By measuring agreement across observers, this method was able to reveal instances of highly individualistic preferences. We found that observers showed high agreement on their preferences for images of faces and landscapes, but much lower agreement for images of artwork and architecture. In addition, we found higher agreement for heterosexual males making judgments of beautiful female faces than of beautiful male faces. These results suggest that preferences for stimulus categories that carry evolutionary significance (landscapes and faces) come to rely on similar information across individuals, whereas preferences for artifacts of human culture such as architecture and artwork, which have fewer basic-level category distinctions and reduced behavioral relevance, rely on a more personalized set of attributes.
Identifying image preferences based on demographic attributes
Elena A. Fedorovskaya, Daniel R. Lawrence
The intent of this study is to determine what sorts of images are considered more interesting by which demographic groups. Specifically, we attempt to identify images whose interestingness ratings are influenced by the demographic attribute of the viewer’s gender. To that end, we use the data from an experiment where 18 participants (9 women and 9 men) rated several hundred images based on “visual interest” or preferences in viewing images. The images were selected to represent the consumer “photo-space” - typical categories of subject matter found in consumer photo collections. They were annotated using perceptual and semantic descriptors. In analyzing the image interestingness ratings, we apply a multivariate procedure known as forced classification, a feature of dual scaling, a discrete analogue of principal components analysis (similar to correspondence analysis). This particular analysis of ratings (i.e., ordered-choice or Likert) data enables the investigator to emphasize the effect of a specific item or collection of items. We focus on the influence of the demographic item of gender on the analysis, so that the solutions are essentially confined to subspaces spanned by the emphasized item. Using this technique, we can know definitively which images’ ratings have been influenced by the demographic item of choice. Subsequently, images can be evaluated and linked, on one hand, to their perceptual and semantic descriptors, and, on the other hand, to the preferences associated with viewers’ demographic attributes.
Chamber QoE: a multi-instrumental approach to explore affective aspects in relation to quality of experience
Katrien De Moor, Filippo Mazza, Isabelle Hupont, et al.
Evaluating (audio)visual quality and Quality of Experience (QoE) from the user’s perspective, has become a key element in optimizing users’ experiences and their quality. Traditionally, the focus lies on how multi-level quality features are perceived by a human user. The interest has however gradually expanded towards human cognitive, affective and behavioral processes that may impact on, be an element of, or be influenced by QoE, and which have been underinvestigated so far. In addition, there is a major discrepancy between the new, broadly supported and more holistic conceptualization of QoE proposed by Le Callet et al. (2012) and traditional, standardized QoE assessment. This paper explores ways to tackle this discrepancy by means of a multi-instrumental approach. More concretely, it presents results from a lab study on video quality (N=27), aimed at going beyond the dominant QoE assessment paradigm and at exploring affective aspects in relation to QoE and in relation to perceived overall quality. Four types of data were collected: ‘traditional’ QoE self-report measures were complemented with ‘alternative’, emotional state- and user engagement-related self-report measures to evaluate QoE. In addition, we collected EEG (physiological) data, gazetracking data and facial expressions (behavioral) data. The video samples used in test were longer in duration than is common in standard tests allowing us to study e.g. more realistic experience and deeper user engagement. Our findings support the claim that the traditional QoE measures need to be reconsidered and extended with additional, affective staterelated measures.
Quality of Experience: User Experience in a Social Context
icon_mobile_dropdown
Alone or together: measuring users' viewing experience in different social contexts
In the past decades, a lot of effort has been invested in predicting the users’ Quality of Visual Experience (QoVE) in order to optimize online video delivery. So far, the objective approaches to measure QoVE have been mainly based on an estimation of the visibility of artifacts generated by signal impairments at the moment of delivery and on a prediction of how annoying these artifacts are to the end user. Recently, it has been shown that other aspects, such as user interest or viewing context, also have a crucial influence on QoVE. Social context is one of these aspects, but it has been poorly investigated in relation to QoVE so far. In this paper, we report the outcomes of an experiment that aims at unveiling the role that social context, and in particular co-located co-viewing, plays within the visual experience and the annoyance of coding artifacts. The results show that social context significantly influences user’s QoVE, whereas the appearance of artifacts doesn’t have impact on viewing experience, although users can still notice them. The results suggest that quantifying the impact of social context on user experience is of major importance to accurately predict QoVE towards video delivery optimization.
Would you hire me? Selfie portrait images perception in a recruitment context
F. Mazza, M. P. Da Silva, P. Le Callet
Human content perception has been underlined to be important in multimedia quality evaluation. Recently aesthetic considerations have been subject of research in this field. First attempts in aesthetics took into account perceived low-level features, especially taken from photography theory. However they demonstrated to be insuf- ficient to characterize human content perception. More recently image psychology started to be considered as higher cognitive feature impacting user perception. In this paper we follow this idea introducing social cognitive elements. Our experiments focus on the influence of different versions of portrait pictures in context where they are showed aside some completely unrelated informations; this can happen for example in social networks interactions between users, where profile pictures are present aside almost every user action. In particular, we tested this impact on resumes between professional portrait and self shot pictures. Moreover, as we run tests in crowdsourcing, we will discuss the use of this methodology for these tests. Our final aim is to analyse social biases’ impact on multimedia aesthetics evaluation and how this bias influences messages that go along with pictures, as in public online platforms and social networks.
Assessing the impact of image manipulation on users' perceptions of deception
Valentina Conotter, Duc-Tien Dang-Nguyen, Giulia Boato, et al.
Generally, we expect images to be an honest reflection of reality. However, this assumption is undermined by the new image editing technology, which allows for easy manipulation and distortion of digital contents. Our understanding of the implications related to the use of a manipulated data is lagging behind. In this paper we propose to exploit crowdsourcing tools in order to analyze the impact of different types of manipulation on users’ perceptions of deception. Our goal is to gain significant insights about how different types of manipulations impact users’ perceptions and how the context in which a modified image is used influences human perception of image deceptiveness. Through an extensive crowdsourcing user study, we aim at demonstrating that the problem of predicting user-perceived deception can be approached by automatic methods. Analysis of results collected on Amazon Mechanical Turk platform highlights how deception is related to the level of modifications applied to the image and to the context within modified pictures are used. To the best of our knowledge, this work represents the first attempt to address to the image editing debate using automatic approaches and going beyond investigation of forgeries.
Color Perception and Applications: The Bright Side of Color
icon_mobile_dropdown
Spectral compression: Weighted principal component analysis versus weighted least squares
Farnaz Agahian, Brian Funt, Seyed Hossein Amirshahi
Two weighted compression schemes, Weighted Least Squares (wLS) and Weighted Principal Component Analysis (wPCA), are compared by considering their performance in minimizing both spectral and colorimetric errors of reconstructed reflectance spectra. A comparison is also made among seven different weighting functions incorporated into ordinary PCA/LS to give selectively more importance to the wavelengths that correspond to higher sensitivity in the human visual system. Weighted compression is performed on reflectance spectra of 3219 colored samples (including Munsell and NCS data) and spectral and colorimetric errors are calculated in terms of CIEDE2000 and root mean square errors. The results obtained indicate that wLS outperforms wPCA in weighted compression with more than three basis vectors. Weighting functions based on the diagonal of Cohen’s R matrix lead to the best reproduction of color information under both A and D65 illuminants particularly when using a low number of basis vectors.
Creating experimental color harmony map
Christel Chamaret, Fabrice Urban, Josselin Lepinel
Starting in the 17th century with Newton, color harmony is a topic that did not reach a consensus on definition, representation or modeling so far. Previous work highlighted specific characteristics for color harmony on com- bination of color doublets or triplets by means of a human rating on a harmony scale. However, there were no investigation involving complex stimuli or pointing out how harmony is spatially located within a picture. The modeling of such concept as well as a reliable ground-truth would be of high value for the community, since the applications are wide and concern several communities: from psychology to computer graphics. We propose a protocol for creating color harmony maps from a controlled experiment. Through an eye-tracking protocol, we focus on the identification of disharmonious colors in pictures. The experiment was composed of a free viewing pass in order to let the observer be familiar with the content before a second pass where we asked ”to search for the most disharmonious areas in the picture”. Twenty-seven observers participated to the experiments that was composed of a total of 30 different stimuli. The high inter-observer agreement as well as a cross-validation confirm the validity of the proposed ground-truth.
Exploring the use of memory colors for image enhancement
Su Xue, Minghui Tan, Ann McNamara, et al.
Memory colors refer to those colors recalled in association with familiar objects. While some previous work introduces this concept to assist digital image enhancement, their basis, i.e., on-screen memory colors, are not appropriately investigated. In addition, the resulting adjustment methods developed are not evaluated from a perceptual view of point. In this paper, we first perform a context-free perceptual experiment to establish the overall distributions of screen memory colors for three pervasive objects. Then, we use a context-based experiment to locate the most representative memory colors; at the same time, we investigate the interactions of memory colors between different objects. Finally, we show a simple yet effective application using representative memory colors to enhance digital images. A user study is performed to evaluate the performance of our technique.
Perceptual evaluation of colorized nighttime imagery
Alexander Toet, Michael J. de Jong, Maarten A. Hogervorst, et al.
We recently presented a color transform that produces fused nighttime imagery with a realistic color appearance (Hogervorst and Toet, 2010, Information Fusion, 11-2, 69-77). To assess the practical value of this transform we performed two experiments in which we compared human scene recognition for monochrome intensified (II) and longwave infrared (IR) imagery, and color daylight (REF) and fused multispectral (CF) imagery. First we investigated the amount of detail observers can perceive in a short time span (the gist of the scene). Participants watched brief image presentations and provided a full report of what they had seen. Our results show that REF and CF imagery yielded the highest precision and recall measures, while both II and IR imagery yielded significantly lower values. This suggests that observers have more difficulty extracting information from monochrome than from color imagery. Next, we measured eye fixations of participants who freely explored the images. Although the overall fixation behavior was similar across image modalities, the order in which certain details were fixated varied. Persons and vehicles were typically fixated first in REF, CF and IR imagery, while they were fixated later in II imagery. In some cases, color remapping II imagery and fusion with IR imagery restored the fixation order of these image details. We conclude that color remapping can yield enhanced scene perception compared to conventional monochrome nighttime imagery, and may be deployed to tune multispectral image representation such that the resulting fixation behavior resembles the fixation behavior for daylight color imagery.
Art, Perception, and Pictorial Space
icon_mobile_dropdown
Reaching into Pictorial Spaces
While binocular viewing of 2D pictures generates an impression of 3D objects and space, viewing a picture monocularly through an aperture produces a more compelling impression of depth and the feeling that the objects are “out there”, almost touchable. Here, we asked observers to actually reach into pictorial space under both binocular- and monocular-aperture viewing. Images of natural scenes were presented at different physical distances via a mirror-system and their retinal size was kept constant. Targets that observers had to reach for in physical space were marked on the image plane, but at different pictorial depths. We measured the 3D position of the index finger at the end of each reach-to-point movement. Observers found the task intuitive. Reaching responses varied as a function of both pictorial depth and physical distance. Under binocular viewing, responses were mainly modulated by the different physical distances. Instead, under monocular viewing, responses were modulated by the different pictorial depths. Importantly, individual variations over time were minor, that is, observers conformed to a consistent pictorial space. Monocular viewing of 2D pictures thus produces a compelling experience of an immersive space and tangible solid objects that can be easily explored through motor actions.
A framework for the study of vision in active observers
Carlo Nicolini, Carlo Fantoni, Giovanni Mancuso, et al.
We present a framework for the study of active vision, i.e., the functioning of the visual system during actively self-generated body movements. In laboratory settings, human vision is usually studied with a static observer looking at static or, at best, dynamic stimuli. In the real world, however, humans constantly move within dynamic environments. The resulting visual inputs are thus an intertwined mixture of self- and externally-generated movements. To fill this gap, we developed a virtual environment integrated with a head-tracking system in which the influence of self- and externally-generated movements can be manipulated independently. As a proof of principle, we studied perceptual stationarity of the visual world during lateral translation or rotation of the head. The movement of the visual stimulus was thus parametrically tethered to self-generated movements. We found that estimates of object stationarity were less biased and more precise during head rotation than translation. In both cases the visual stimulus had to partially follow the head movement to be perceived as immobile. We discuss a range of possibilities for our setup among which the study of shape perception in active and passive conditions, where the same optic flow is replayed to stationary observers.
Shading and shadowing on Canaletto's Piazza San Marco
Whereas the 17th century painter Canaletto was a master in linear perspective of the architectural elements, he seems to have had considerable difficulty with linear perspective of shadows. A common trick to avoid shadow perspective problems is to set the (solar) illumination direction parallel to the projection screen. We investigated in one painting where Canaletto clearly used this trick, whether he followed this light direction choice consistently through in how he shades the persons. We approached this question with a perceptual experiment where we measured perceived light directions in isolated details of the paintings. Specifically, we controlled whether observers could only see the (cast) shadow, only shading or both. We found different trends in all three conditions. The results indicate that Canaletto probably used different shading than the parallel light direction would predict. We interpret the results as a form or artistic freedom that Canaletto used to shade the persons individually.
3D space perception as embodied cognition in the history of art images
Embodied cognition is a concept that provides a deeper understanding of the aesthetics of art images. This study considers the role of embodied cognition in the appreciation of 3D pictorial space, 4D action space, its extension through mirror reflection to embodied self-­‐cognition, and its relation to the neuroanatomical organization of the aesthetic response.
Interactive Paper Session
icon_mobile_dropdown
Color visualization of cyclic magnitudes
Alfredo Restrepo, Viviana Estupiñán
We exploit the perceptual, circular ordering of the hues in a technique for the visualization of cyclic variables. The hue is thus meaningfully used for the indication of variables such as the azimuth and the units of the measurement of time. The cyclic (or circular) variables may be both of the continuous type or the discrete type; among the first there is azimuth and among the last you find the musical notes and the days of the week. A correspondence between the values of a cyclic variable and the chromatic hues, where the natural circular ordering of the variable is respected, is called a color code for the variable. We base such a choice of hues on an assignment of of the unique hues red, yellow, green and blue, or one of the 8 even permutations of this ordered list, to 4 cardinal values of the cyclic variable, suitably ordered; color codes based on only 3 cardinal points are also possible. Color codes, being intuitive, are easy to remember. A possible low accuracy when reading instruments that use this technique is compensated by fast, ludic and intuitive readings; also, the use of a referential frame makes readings precise. An achromatic version of the technique, that can be used by dichromatic people, is proposed.
Quality evaluation of stereo 3DTV systems with open profiling of quality
Sara Kepplinger, Nikolaus Hottong
Current work describes two evaluations in two different locations investigating possible differences in the experience of quality of stereo 3DTV systems. Herein, the work presents the usage of the Open Profiling of Quality method. This method allows going beyond up to now considered distinctive features (e.g., glasses wear comfort, brightness…). During the first evaluation standardized display-settings were used for each tested system. In the second study all systems were tested with their provided factory settings. Other factors like test stimuli, play out technology, laboratory settings, and viewing position were strictly standardized. Additionally, influencing factors like spectacle frames and display design have been minimized by using same eyeglass frames (but different technology) and hiding the display chassis. The results of both evaluations show distinct influences of display technology on quality perception. This is affirmed by the quality describing attributes deriving from the open profiling of quality method beyond the quantitative quality rating. This influence has to be considered within subjective evaluation of quality in order to support test-retest reliability and user centered approaches on quality evaluation of stereo 3D visualization. Different quality perception of different display technologies was confirmed even under different TV settings.
MPEG-4 AVC saliency map computation
M. Ammar, M. Mitrea, M. Hasnaoui
A saliency map provides information about the regions inside some visual content (image, video, ...) at which a human observer will spontaneously look at. For saliency maps computation, current research studies consider the uncompressed (pixel) representation of the visual content and extract various types of information (intensity, color, orientation, motion energy) which are then fusioned. This paper goes one step further and computes the saliency map directly from the MPEG-4 AVC stream syntax elements with minimal decoding operations. In this respect, an a-priori in-depth study on the MPEG-4 AVC syntax elements is first carried out so as to identify the entities appealing the visual attention. Secondly, the MPEG-4 AVC reference software is completed with software tools allowing the parsing of these elements and their subsequent usage in objective benchmarking experiments. This way, it is demonstrated that an MPEG-4 saliency map can be given by a combination of static saliency and motion maps. This saliency map is experimentally validated under a robust watermarking framework. When included in an m-QIM (multiple symbols Quantization Index Modulation) insertion method, PSNR average gains of 2.43 dB, 2.15dB, and 2.37 dB are obtained for data payload of 10, 20 and 30 watermarked blocks per I frame, i.e. about 30, 60, and 90 bits/second, respectively. These quantitative results are obtained out of processing 2 hours of heterogeneous video content.
Visual manifold sensing
Irina Burciu, Adrian Ion-Mărgineanu, Thomas Martinetz, et al.
We present a novel method, Manifold Sensing, for the adaptive sampling of the visual world based on manifolds of increasing but low dimensionality that have been learned with representative data. Because the data set is adapted during sampling, every new measurement (sample) depends on the previously acquired measurements. This leads to an efficient sampling strategy that requires a low total number of measurements. We apply Manifold Sensing to object recognition on UMIST, Robotics Laboratory, and ALOI benchmarks. For face recognition, with only 30 measurements - this corresponds to a compression ratio greater than 2000 - an unknown face can be localized such that its nearest neighbor in the low-dimensional manifold is almost always the actual nearest image. Moreover, the recognition rate obtained by assigning the class of the nearest neighbor is 100%. For a different benchmark with everyday objects, with only 38 measurements - in this case a compression ratio greater than 700 - we obtain similar localization results and, again, a 100% recognition rate.
Visually lossless coding based on temporal masking in human vision
Velibor Adzic, Howard S. Hock, Hari Kalva
This paper presents a method for perceptual video compression that exploits the phenomenon of backward temporal masking. We present an overview of visual temporal masking and discuss models to identify portions of a video sequences masked due to this phenomenon exhibited by the human visual system. A quantization control model based on the psychophysical model of backward visual temporal masking was developed. We conducted two types of subjective evaluations and demonstrated that the proposed method up to 10% bitrate savings on top of state of the art encoder with visually identical video. The proposed methods were evaluated using HEVC encoder.
Face detection on distorted images using perceptual quality-aware features
Suriya Gunasekar, Joydeep Ghosh, Alan C. Bovik
We quantify the degradation in performance of a popular and effective face detector when human–perceived image quality is degraded by distortions due to additive white gaussian noise, gaussian blur or JPEG compression. It is observed that, within a certain range of perceived image quality, a modest increase in image quality can drastically improve face detection performance. These results can be used to guide resource or bandwidth allocation in a communication/delivery system that is associated with face detection tasks. A new face detector based on QualHOG features is also proposed that augments face-indicative HOG features with perceptual quality–aware spatial Natural Scene Statistics (NSS) features, yielding improved tolerance against image distortions. The new detector provides statistically significant improvements over a strong baseline on a large database of face images representing a wide range of distortions. To facilitate this study, we created a new Distorted Face Database, containing face and non–face patches from images impaired by a variety of common distortion types and levels. This new dataset is available for download and further experimentation at www.ideal.ece.utexas.edu/˜suriya/DFD/.
Consciousness and stereoscopic environmental imaging
The question of human consciousness has intrigued philosophers and scientists for centuries: its nature, how we perceive our environment, how we think, our very awareness of thought and self. It has been suggested that stereoscopic vision is “a paradigm of how the mind works” 1 In depth perception, laws of perspective are known, reasoned, committed to memory from an early age; stereopsis, on the other hand, is a 3D experience governed by strict laws but actively joined within the brain―one sees it without explanation. How do we, in fact, process two different images into one 3D module within the mind and does an awareness of this process give us insight into the workings of our own consciousness? To translate this idea to imaging I employed ChromaDepth™ 3D glasses that rely on light being refracted in a different direction for each eye―colors of differing wavelengths appearing at varying distances from the viewer resulting in a 3D space. This involves neither calculation nor manufacture of two images or views. Environmental spatial imaging was developed―a 3D image was generated that literally surrounds the viewer. The image was printed and adhered to a semi-circular mount; the viewer then entered the interior to experience colored shapes suspended in a 3D space with an apparent loss of surface, or picture plane, upon which the image is rendered. By focusing our awareness through perception-based imaging we are able to gain a deeper understanding of how the brain works, how we see.
Bivariate statistical modeling of color and range in natural scenes
Che-Chun Su, Lawrence K. Cormack, Alan C. Bovik
The statistical properties embedded in visual stimuli from the surrounding environment guide and affect the evolutionary processes of human vision systems. There are strong statistical relationships between co-located luminance/chrominance and disparity bandpass coefficients in natural scenes. However, these statistical rela- tionships have only been deeply developed to create point-wise statistical models, although there exist spatial dependencies between adjacent pixels in both 2D color images and range maps. Here we study the bivariate statistics of the joint and conditional distributions of spatially adjacent bandpass responses on both luminance/chrominance and range data of naturalistic scenes. We deploy bivariate generalized Gaussian distributions to model the underlying statistics. The analysis and modeling results show that there exist important and useful statistical properties of both joint and conditional distributions, which can be reliably described by the corresponding bivariate generalized Gaussian models. Furthermore, by utilizing these robust bivariate models, we are able to incorporate measurements of bivariate statistics between spatially adjacent luminance/chrominance and range information into various 3D image/video and computer vision applications, e.g., quality assessment, 2D-to-3D conversion, etc.