Proceedings Volume 6057

Human Vision and Electronic Imaging XI

cover
Proceedings Volume 6057

Human Vision and Electronic Imaging XI

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 2 February 2006
Contents: 10 Sessions, 46 Papers, 0 Presentations
Conference: Electronic Imaging 2006 2006
Volume Number: 6057

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Keynote Session
  • Mechanisms of Luminance, Color, and Temporal Sensitivity
  • Eye Movements, Visual Search, and Attention: A Tribute to Larry Stark
  • Perceptual Image Quality and Applications
  • Visually Tuned Algorithms for the Design and Analysis of Flat-Panel Displays
  • Perceptual Issues in Video Quality
  • Perceptual Approaches to Image Analysis
  • Detection, Recognition, and Navigation in Complex Environments
  • Imaging of Cortical Digital Processes and Structures
  • Poster Session
Keynote Session
icon_mobile_dropdown
Computational neuroimaging: maps and tracts in the human brain
During the last decade, a number of remarkable magnetic resonance imaging (MRI) techniques have been developed for measuring human brain activity and structure. These MRI techniques have been accompanied by the development of signal processing, statistical and visualization methodologies. We review several examples of these methods, drawn mainly from work on the human visual pathways. We provide examples of how two methods- functional MRI (fMRI) and diffusion tensor imaging (DTI) - are used. First, we explain how fMRI enables us to identify and measure several distinct visual field maps and measure how these maps reorganize following disease or injury. Second we explain how DTI enables us to visualize neural structures within the brain's wires (white matter) and measure the patterns of connectivity in individual brains. Throughout, we identify signal processing, statistical, and visualization topics in need of further methodological development.
Mechanisms of Luminance, Color, and Temporal Sensitivity
icon_mobile_dropdown
Local luminance effect on spatial summation in the foveal vision and its implication on image artifact classification
Chien-Chung Chen, San-Yuan Lin, Hui-Ya G. Han, et al.
We investigated the spatial summation effect on pedestals with difference luminance. The targets were luminance modulation defined by Gaussian functions. The size of the Gaussian spot was determined by the scale parameter (standard deviation, σ) which ranged from 0.13°to 1.04°. The local luminance pedestal (2° radius) had mean luminance ranged from 2.9 to 29cd/m2. The no-pedestal condition had a mean luminance 58cd/m2. We used a QUEST adaptive threshold seeking procedure and 2AFC paradigm to measure the target contrast threshold at different target sizes (spatial summation curve) and pedestal luminance. The target threshold decreased as the target spatial extent increased with a slope -0.5 on log-log coordinates. However, if the target size was large enough (σ>0.3°), there was little, if any, threshold reduction as the target size further increased. The spatial summation curve had the same shape at all pedestal luminance levels. The effect of the pedestal was to shift the summation curve vertically on log-log coordinates. Hence, the size and the luminance effects on target detection are separable. The visibility of the Gaussian spot can be modeled by a function with a form f(L)*g(σ) where f(L) is a function of local luminance and g(σ) is a function of size.
Evaluating contrast sensitivity
The problem for proper rendering of spatial frequencies in digital imaging applications is to establish the relative contrast sensitivity of observers at suprathreshold contrast levels in typical viewing environments. In an experimental study two methods of evaluating spatial contrast sensitivity were investigated, using targets of graded tonal modulation, at which observers were asked to point to the perceived threshold locations. The results produced by these two methods were rather different from those of the classical methods of vision science, showing a much lower sensitivity over a broader range of spatial frequencies. These may be regarded as complementary to CSF data derived from single- frequency Gabor stimuli and prove to be better suited to the needs of practical imaging applications.
Spatio-velocity CSF as a function of retinal velocity using unstabilized stimuli
Justin Laird, Mitchell Rosen, Jeff Pelz, et al.
LCD televisions have LC response times and hold-type data cycles that contribute to the appearance of blur when objects are in motion on the screen. New algorithms based on studies of the human visual system's sensitivity to motion are being developed to compensate for these artifacts. This paper describes a series of experiments that incorporate eyetracking in the psychophysical determination of spatio-velocity contrast sensitivity in order to build on the 2D spatiovelocity contrast sensitivity function (CSF) model first described by Kelly and later refined by Daly. We explore whether the velocity of the eye has an additional effect on sensitivity and whether the model can be used to predict sensitivity to more complex stimuli. There were a total of five experiments performed in this research. The first four experiments utilized Gabor patterns with three different spatial and temporal frequencies and were used to investigate and/or populate the 2D spatio-velocity CSF. The fifth experiment utilized a disembodied edge and was used to validate the model. All experiments used a two interval forced choice (2IFC) method of constant stimuli guided by a QUEST routine to determine thresholds. The results showed that sensitivity to motion was determined by the retinal velocity produced by the Gabor patterns regardless of the type of motion of the eye. Based on the results of these experiments the parameters for the spatio-velocity CSF model were optimized to our experimental conditions.
A basis for cones
Why do the human cones have the spectral sensitivities they do? We hypothesize that they may have evolved to their present form because their sensitivities are optimal in terms of their ability to recover the spectrum of incident light. As evidence in favor of this hypothesis, we compare the accuracy with which the incoming spectrum can be approximated by a three-dimensional linear model based on the cone responses and compare this to the optimal approximations defined by models based on principal components analysis, independent component analysis, non-negative matrix factorization and non-negative independent component analysis. We introduce a new method of reconstructing spectra from the cone responses and show that the cones are almost as good as these optimal methods in estimating the spectrum.
High-dynamic-range scene compression in humans
Single pixel dynamic-range compression alters a particular input value to a unique output value - a look-up table. It is used in chemical and most digital photographic systems having S-shaped transforms to render high-range scenes onto low-range media. Post-receptor neural processing is spatial, as shown by the physiological experiments of Dowling, Barlow, Kuffler, and Hubel & Wiesel. Human vision does not render a particular receptor-quanta catch as a unique response. Instead, because of spatial processing, the response to a particular quanta catch can be any color. Visual response is scene dependent. Stockham proposed an approach to model human range compression using low-spatial frequency filters. Campbell, Ginsberg, Wilson, Watson, Daly and many others have developed spatial-frequency channel models. This paper describes experiments measuring the properties of desirable spatial-frequency filters for a variety of scenes. Given the radiances of each pixel in the scene and the observed appearances of objects in the image, one can calculate the visual mask for that individual image. Here, visual mask is the spatial pattern of changes made by the visual system in processing the input image. It is the spatial signature of human vision. Low-dynamic range images with many white areas need no spatial filtering. High-dynamic-range images with many blacks, or deep shadows, require strong spatial filtering. Sun on the right and shade on the left requires directional filters. These experiments show that variable scene- scenedependent filters are necessary to mimic human vision. Although spatial-frequency filters can model human dependent appearances, the problem still remains that an analysis of the scene is still needed to calculate the scene-dependent strengths of each of the filters for each frequency.
Computational model of lightness perception in high dynamic range imaging
Grzegorz Krawczyk, Karol Myszkowski, Hans-Peter Seidel
An anchoring theory of lightness perception by Gilchrist et al. [1999] explains many characteristics of human visual system such as lightness constancy and its spectacular failures which are important in the perception of images. The principal concept of this theory is the perception of complex scenes in terms of groups of consistent areas (frameworks). Such areas, following the gestalt theorists, are defined by the regions of common illumination. The key aspect of the image perception is the estimation of lightness within each framework through the anchoring to the luminance perceived as white, followed by the computation of the global lightness. In this paper we provide a computational model for automatic decomposition of HDR images into frameworks. We derive a tone mapping operator which predicts lightness perception of the real world scenes and aims at its accurate reproduction on low dynamic range displays. Furthermore, such a decomposition into frameworks opens new grounds for local image analysis in view of human perception.
Eye Movements, Visual Search, and Attention: A Tribute to Larry Stark
icon_mobile_dropdown
Remembrance of scanpaths past: experiments with Larry Stark
A search for statistically defensible evidence for repetitive visual scanning of pictorial stimuli is reviewed using experimental approaches developed in Professor Stark's Berkeley laboratory. Examples of visual scanning at the instant of perceptual organization and of scanning during information seeking will be presented including photographs from Stark's laboratory.
The scanpath theory: its definition and later developments
The scanpath theory was defined in 1971 by David Noton and Lawrence Stark in two articles that appeared in Science1 and Scientific American2, and since then it has been considered one of the most influential theory of vision and eye movements. The scanpath theory explains the vision process in a topdown fashion by proposing that an internal cognitive representation controls not only the visual perception but also the related mechanism of active looking eye movements. Evidence supporting the scanpath theory comes from experiments showing the repetitive and idiosyncratic nature of eye movements during experiments with ambiguous figures, visual imagery and dynamic scenes. Similarity metrics were defined in our analysis procedures to quantitatively compare and measure the sequence of eye fixations in different experimental conditions. More recent scanpath experiments performed using different motor read-out systems have served to better understand the structure of the visual image representation in the brain and the presence of several levels of binding. A special emphasis must be given to the role of bottom-up conspicuity elaboration in the control of the scanpath sequence and interconnection of conspicuity with such higher level cognitive representations.
A new metric for definition of gaze area from the geometrical structures of picture composition
M. Yamazaki, M. Kameda
When a human observes any picture, the gaze area which is changed in time by time is defined as dynamic gaze flow. The dynamic gaze flow is used in picture cording, image evaluation and computer vision effectively. On the other hand, the technique with composition is known well the photograph and fine arts. Composition is effective to attract the human attention in some specific area. We propose a new method to analyze scanpath of the dynamic gaze flow based on image analysis. Our proposed method analyzes the composition of input image as the primary factor of gazing, and the obtained dynamic gaze flow is evaluated by comparing with the scanpath of observers which is measured with eye-mark recorder.
Visual search and eye movements in novel and familiar contexts
Adapting to the visual characteristics of a specific environment may facilitate detecting novel stimuli within that environment. We monitored eye movements while subjects searched for a color target on familiar or unfamiliar color backgrounds, in order to test for these performance changes and to explore whether they reflect changes in salience from adaptation vs. changes in search strategies or perceptual learning. The target was an ellipse of variable color presented at a random location on a dense background of ellipses. In one condition, the colors of the background varied along either the LvsM or SvsLM cardinal axes. Observers adapted by viewing a rapid succession of backgrounds drawn from one color axis, and then searched for a target on a background from the same or different color axis. Searches were monitored with a Cambridge Research Systems Video Eyetracker. Targets were located more quickly on the background axis that observers were pre-exposed to, confirming that this exposure can improve search efficiency for stimuli that differ from the background. However, eye movement patterns (e.g. fixation durations and saccade magnitudes) did not clearly differ across the two backgrounds, suggesting that how the novel and familiar backgrounds were sampled remained similar. In a second condition, we compared search on a nonselective color background drawn from a circle of hues at fixed contrast. Prior exposure to this background did not facilitate search compared to an achromatic adapting field, suggesting that subjects were not simply learning the specific colors defining the background distributions. Instead, results for both conditions are consistent with a selective adaptation effect that enhances the salience of novel stimuli by partially discounting the background.
Guiding the mind’s eye: improving communication and vision by external control of the scanpath
Erhardt Barth, Michael Dorr, Martin Böhme, et al.
Larry Stark has emphasised that what we visually perceive is very much determined by the scanpath, i.e. the pattern of eye movements. Inspired by his view, we have studied the implications of the scanpath for visual communication and came up with the idea to not only sense and analyse eye movements, but also guide them by using a special kind of gaze-contingent information display. Our goal is to integrate gaze into visual communication systems by measuring and guiding eye movements. For guidance, we first predict a set of about 10 salient locations. We then change the probability for one of these candidates to be attended: for one candidate the probability is increased, for the others it is decreased. To increase saliency, for example, we add red dots that are displayed very briefly such that they are hardly perceived consciously. To decrease the probability, for example, we locally reduce the temporal frequency content. Again, if performed in a gaze-contingent fashion with low latencies, these manipulations remain unnoticed. Overall, the goal is to find the real-time video transformation minimising the difference between the actual and the desired scanpath without being obtrusive. Applications are in the area of vision-based communication (better control of what information is conveyed) and augmented vision and learning (guide a person's gaze by the gaze of an expert or a computer-vision system). We believe that our research is very much in the spirit of Larry Stark's views on visual perception and the close link between vision research and engineering.
Combining bottom-up and top-down attentional influences
Vidhya Navalpakkam, Laurent Itti
Visual attention to salient and relevant scene regions is crucial for an animal's survival in the natural world. It is guided by a complex interplay of at least two factors: image-driven, bottom-up salience [1] and knowledge-driven, top-down guidance [2, 3]. For instance, a ripe red fruit among green leaves captures visual attention due to its bottom-up salience, while a non-salient camou aged predator is detected through top-down guidance to known predator locations and features. Although both bottom-up and top-down factors are important for guiding visual attention, most existing models and theories are either purely top-down [4] or bottom-up [5, 6]. Here, we present a combined model of bottom-up and top-down visual attention.
Perceptual Image Quality and Applications
icon_mobile_dropdown
Effects of spatial correlations and global precedence on the visual fidelity of distorted images
Damon M. Chandler, Kenny H. Lim, Sheila S. Hemami
This paper presents the results of three psychophysical experiments designed to investigate the effects of spatial correlations and the disruption of global precedence on the visual fidelity of natural images. The first two experiments used a psychophysical scaling paradigm in which subjects placed distorted images along a linear, one-dimensional "distortion axis" where physical distance corresponded to perceived distortion. In the first experiment, images were distorted at fixed levels of total distortion contrast in various fashions chosen to provide a reasonable representation of commonly encountered distortions. In the second experiment, images were simultaneously distorted with both structural distortion and additive white noise. Additionally, a third experiment was performed in which visual dissimilarities between all pairs of images were measured for the images used in second experiment. Results revealed that structural distortion generally gave rise to the greatest perceived distortion among the types of distortions tested; and, at highly suprathreshold distortion contrasts, additive white noise gave rise to the least perceived distortion. Furthermore, although structural distortion and noise appear to correspond to two separate perceptual dimensions, the addition of low-contrast white noise to an image which already contained structural distortion decreased the perceived distortion of the image, despite the increase in the total contrast of the distortions. These findings suggest that a measure of visual fidelity must take into account the spatial correlation between the distortions and the image, masking imposed by the images, the perceived contrast of the distortions, and the cross masking effects which occur between multiple types of distortions.
Pseudo no reference image quality metric using perceptual data hiding
Regarding the important constraints due to subjective quality assessment, objective image quality assessment has recently been extensively studied. Such metrics are usually of three kinds, they might be Full Reference (FR), Reduced Reference (RR) or No Reference (NR) metrics. We focus here on a new technique, which recently appeared in quality assessment context: data-hiding-based image quality metric. Regarding the amount of data to be transmitted for quality assessment purpose, watermarking based techniques are considered as pseudo noreference metric: A little overhead due to the embedded watermark is added to the image. Unlike most existing techniques, the proposed embedding method exploits an advanced perceptual model in order to optimize both the data embedding and extraction. A perceptually weighted watermark is embedded into the host image, and an evaluation of this watermark allows to assess the host image's quality. In such context, the watermark robustness is crucial; it must be suffciently robust to be detected after very strong distortions, but it must also be suffciently fragile to be degraded along with the host image. In other words, the watermark distortion must be proportional to the image's distortion. Our work is compared to existing standard RR and NR metrics in terms of both the correlation with subjective assessment and of data overhead induced by the mark.
Attention-based color correction
Fred W. M. Stentiford, Matt D. Walker
This paper proposes a new algorithm that extracts color correction parameters from pairs of images and enables the perceived illumination of one image to be imposed on the other. The algorithm does not rely upon prior assumptions regarding illumination constancy and operates between images that can be significantly different in content. The work derives from related research on visual attention and similarity in which the performance distributions of large numbers of randomly generated features reveal characteristics of images being analysed. A proposed color correction service to be offered over a mobile network is described.
Contrast enhancement of medical images using multiscale decomposition
Each image acquired from a medical imaging system is often part of a two-dimensional (2-D) image set whose total presents a three-dimensional (3-D) object for diagnosis. Unfortunately, sometimes these images are of poor quality. These distortions cause an inadequate object-of-interest presentation, which can result in inaccurate image analysis. Blurring is considered a serious problem. Therefore, "deblurring" an image to obtain better quality is an important issue in medical image processing. In our research, the image is initially decomposed. Contrast improvement is achieved by modifying the coefficients obtained from the decomposed image. Small coefficient values represent subtle details and are amplified to improve the visibility of the corresponding details. The stronger image density variations make a major contribution to the overall dynamic range, and have large coefficient values. These values can be reduced without much information loss.
Alpha stable human visual system models for digital halftoning
A. J. González, J. Bacca, G. R. Arce, et al.
Human visual system (HVS) modeling has become a critical component in the design of digital halftoning algorithms. Methods that exploit the characteristics of the HVS include the direct binary search (DBS) and optimized tone-dependent halftoning approaches. The spatial sensitivity of the HVS is lowpass in nature, reflecting the physiological characteristics of the eye. Several HVS models have been proposed in the literature, among them, the broadly used Nasanen's exponential model. As shown experimentally by Kim and Allebach,1 Nasanen's model is constrained in shape and richer models are needed in order to attain better halftone attributes and to control the appearance of undesired patterns. As an alternative, they proposed a class of HVS models based on mixtures of bivariate Gaussian density functions. The mathematical characteristics of the HVS model thus play a key role in the synthesis of model-based halftoning. In this work, alpha stable functions, an elegant class of models richer than mixed Gaussians, are exploited. These are more efficient than Gaussian mixtures as they use less parameters to characterize the tails and bandwidth of the model. It is shown that a decrease in the model's bandwidth leads to homogeneous halftone patterns and conversely, models with heavier tails yield smoother textures. These characteristics, added to their simplicity, make alpha stable models a powerful tool for HVS characterization.
Study of asthenopia caused by the viewing of stereoscopic images: measurement by MEG and other devices
Three-Dimensional (hereafter, 3D) imaging is one of the very powerful tools to help the people to understand the spatial relationship of objects. Various glassless 3D imaging technologies for 3D TV, personal computers, PDA and cellular phones have been developed. These devices are often viewed for long periods. Most of the people who watch 3D images for a long time, experience asthenopia or eye fatigue. This concerns a preliminary study that attempted to find the basic cause of the problem by using MEG and the other devices. Plans call for further neurophysiological study on this subject. The purpose of my study is to design a standard or guidelines for shooting, image processing, and displaying 3D images to create the suitable images with higher quality and less or no asthenopia. Although it is difficult to completely avoid asthenopia when viewing 3D images, it would be useful if guidelines for the production of such images could be established that reduced its severity. The final goal of my research is to formulate such guidelines with an objective basis derived from measurement results from MEG and other devices. In addition to the study I was in charge of the work to install the world largest glasses-free 3D display to Japan Pavilion Nagakute in the 2005 World Exposition, Aichi, Japan during March, 25th to September 25th, 2005. And several types of large screen for 3D movies were available for testing, the result of the test to this report are added.
Visually Tuned Algorithms for the Design and Analysis of Flat-Panel Displays
icon_mobile_dropdown
Perceptual image quality improvement for large screen displays
Fritz Lebowsky, Yong Huang, Haiyun Wang
To display a low-resolution digital image on a large screen display with a native resolution that reaches or exceeds the well-known HDTV standard, it is desirable to re-scale the image to full screen resolution. However, many well known re-scale algorithms, such as cubic spline interpolation or bi-cubic interpolation filters, only reach limited performance in terms of perceived sharpness and leave contour artifacts like jaggedness or haloing, which easily appear above the threshold of visual perception with regard to a reduced viewing distance. Since human vision demonstrates a non-linear behavior towards the detection of these artifacts, we elaborate some simple image operations that better adapt to the image quality constraints imposed by most of the linear methods known to us today. In particular, we propose a nonlinear method that can reduce or hide the above-mentioned contour artifacts in the spatial frequency domain of the display raster and increase perceived sharpness to a very noticeable degree. As a result, low-resolution images look better on large screen displays due to an enhanced local contrast and show less annoying artifacts than some of the classical approaches reviewed in this paper.
LCD motion-blur analysis, perception, and reduction using synchronized backlight flashing
One of the image quality issues of LC TV is the motion blur. In this paper, the LCD motion blur is modeled using a frequency domain analysis, where the motion of an object causes temporal component in the spatial/temporal spectrum. The combination of display temporal low-pass filtering and eye tracking causes the perception of motion blur. One way to reduce motion blur is to use backlight flashing, where the shorter "on" duration reduces the display temporal aperture function, thus improves the temporal transfer function of the display. The backlight flashing was implemented on a LCD with a backlight system consisting of an array of light emitting diodes (LED). The LED can be flashed on for a short duration after LCD reaches the target level. The effect of motion blur reduction was evaluated both objectively and subjectively. In the objective experiment, the retina image is derived from a sequence of captured images using a high speed camera. The subjective study compares the motion blur to an edge with a simulated edge blur. The comparison of objective and subjective experiments shows a good agreement. Both objective measurement and subjective experiment shows clear improvement in motion blur reduction with synchronized backlight flashing.
Human vision-based algorithm to hide defective pixels in LCDs
Tom Kimpe, Stefaan Coulier, Gert Van Hoey
Producing displays without pixel defects or repairing defective pixels is technically not possible at this moment. This paper presents a new approach to solve this problem: defects are made invisible for the user by using image processing algorithms based on characteristics of the human eye. The performance of this new algorithm has been evaluated using two different methods. First of all the theoretical response of the human eye was analyzed on a series of images and this before and after applying the defective pixel compensation algorithm. These results show that indeed it is possible to mask a defective pixel. A second method was to perform a psycho-visual test where users were asked whether or not a defective pixel could be perceived. The results of these user tests also confirm the value of the new algorithm. Our "defective pixel correction" algorithm can be implemented very efficiently and cost-effectively as pixel-dataprocessing algorithms inside the display in for instance an FPGA, a DSP or a microprocessor. The described techniques are also valid for both monochrome and color displays ranging from high-quality medical displays to consumer LCDTV applications.
Using optimal rendering to visually mask defective subpixels
Dean S. Messing, Louis J. Kerofsky
Visually Optimal Rendering is a subpixel-based method of rendering imagery on a colour matrix display that jointly maximises displayed resolution and minimises attendant colour aliasing. This paper first outlines the Constrained Optimisation framework we have developed for the design of Optimal Rendering Filters. This framework takes into account the subpixel geometry and colour primaries of the panel, and the luminance and chrominance Contrast Sensitivity Functions of the visual system. The resulting Optimal Rendering Filter Matrix can be designed for any regular 2D lattice of subpixels, including multi-primary lattices. The mathematical machinery of Visually Optimal Rendering is then applied to the problem of visually masking defective subpixels on the display. The rendering filters that result are able to reduce black subpixel defects for any primary colour to the point of near invisibility at normal viewing distances. These filters have the interesting property of being shift-varying. This property allows the Optimal Filter Array to intelligently modify the values surrounding a defect in a way that takes advantage of the visual system's differing sensitivities to luminance and chrominance detail in order to best mask a defect.
Perceptual Issues in Video Quality
icon_mobile_dropdown
Perceptual study of the impact of varying frame rate on motion imagery interpretability
The development of a motion imagery (MI) quality scale, akin to the National Image Interpretibility Rating Scale (NIIRS) for still imagery, would have great value to designers and users of surveillance and other MI systems. A multiphase study has adopted a perceptual approach to identifying the main MI attributes that affect interpretibility. The current perceptual study measured frame rate effects for simple motion imagery interpretation tasks of detecting and identifying a known target. By using synthetic imagery, there was full control of the contrast and speed of moving objects, motion complexity, the number of confusers, and the noise structure. To explore the detectibility threshold, the contrast between the darker moving objects and the background was set at 5%, 2%, and 1%. Nine viewers were to detect or identify a moving synthetic "bug" in each of 288 10-second clip. We found that frame rate, contrast, and confusers had a statistically significant effect on image interpretibility (at the 95% level), while the speed and background showed no significant effect. Generally, there was a significant loss in correct detection and identification for frame rates below 10 F/s. Increasing the contrast improved detection and at high contrast, confusers did not affect detection. Confusers reduced detection of higher speed objects. Higher speed improved detection, but complicated identification, although this effect was small. Higher speed made detection harder at 1 Frame/s, but improved detection at 30 F/s. The low loss of quality at moderately lower frame rates may have implications for bandwidth limited systems. A study is underway to confirm, with live action imagery, the results reported here with synthetic.
Color preference and perceived color naturalness of digital videos
Chin Chye Koh, John M. Foley, Sanjit K. Mitra
In this study, we examined the subjective attributes of color preference and perceived naturalness of digital videos when the chroma of the color content was varied. More specifically, the objectives were: (1) to determine how the scaling of color chroma affected the mean color preference rating and the mean naturalness rating across subjects and (2) to determine how preference and naturalness were related. To this end, psychophysical experiments were carried out in which naive subjects provided us with their subjective assessments of these two attributes. Data from the experiments indicated that for both preference and naturalness, the mean opinion scores (MOS) increased to a maximum and decreased as the mean chroma (MC) of the videos was varied. Naturalness MOS (NMOS) tended to reach a maximum at lower MC than preference MOS (PMOS). The maximums for both PMOS and NMOS occurred at relatively close MC levels for most of the videos. The individual functions relating MOS to chroma were modeled by simple Gaussian functions relatively well. However, the mean, amplitude and standard deviation varied with video. It was possible to reduce the total number of parameters from 36 to 15 parameters by making use of relationships between the model parameters. In this model the chroma corresponding to maximum preference is about 1.1 times the chroma corresponding to maximum naturalness.
Stabilising viewing distances in subjective assessment of mobile video
The perceptual quality of mobile video is affected by many factors, including codec selection, compression rate, network performance and device characteristics. Given the options associated with generating and transmitting mobile video, there is an industry requirement for video quality measurement tools to ensure that the best achievable quality is delivered to customers. International standards bodies are now considering alternative multimedia perceptual quality methods for mobile video. In order to fairly evaluate the performance of objective perceptual quality metrics, it is important that the subjective testing provides stable and reliable subjective data. This paper describes subjective studies examining the effect of viewing distance in subjective quality assessment of low resolution video.
Predicting subjective video quality from separated spatial and temporal assessment
Ricardo R. Pastrana-Vidal, Jean C. Gicquel, Jean L. Blin, et al.
During real time video communications over packet networks, various degradations can occur on spatial or temporal signal axes. The end-user may perceive loss of image clearness-sharpness and fluidity impairments on visual information. The overall perceived degradation may indeed be seen as a combined contribution of both perceptual axes. A significant perceptual interaction between spatial and temporal quality has been highlighted by a set of subjective quality assessment tests. We show that, at least in our experimental conditions, the overall visual quality can be estimated from independent spatial (clearness-sharpness) and temporal (fluidity) quality assessments. Four visual quality prediction models are presented. The models' predictions show significant correlation with mean opinion scores from a group of observers. The model showing the highest performance takes into account non linear human assessment characteristics. Our results lead to a better understanding of spatiotemporal interactions and they could be useful for the conception of automatic video quality metrics.
Handling of annoying variations of performances in video algorithm optimization
Evaluation and optimization, with an ever increasing variety of material, are getting more and more time-consuming tasks in video algorithm development. An additional difficulty in moving video is that frame-by-frame perceived performance can significantly differ from real-time perceived performance. This paper proposes a way to handle this difficulty in a more systematic and objective way than with usual long tuning procedures. We take the example of interpolation algorithms where variations of sharpness or contrast look annoying in real-time whereas the frame-by-frame performance looks well acceptable. These variations are analyzed to get an objective measure for the real-time annoyance. We show that the reason for the problem is that most interpolation algorithms are optimized across intraframe criteria ignoring that the achievable intrinsic performance may vary from frame to frame. Our method is thus based on interframe optimization taking into account the measured annoyance. The optimization criteria are steered frame by frame depending on the achievable performance of the current interpolation and the achieved performance in previous frames. Our policy can be described as "better be good all time than very good from time to time." The advantage is that it is automatically controlled by the compromise wished in the given application.
Structural similarity quality metrics in a coding context: exploring the space of realistic distortions
Perceptual image quality metrics have explicitly accounted for human visual system (HVS) sensitivity to subband noise by estimating thresholds above which distortion is just-noticeable. A recently proposed class of quality metrics, known as structural similarity (SSIM), models perception implicitly by taking into account the fact that the HVS is adapted for extracting structural information (relative spatial covariance) from images. We compare specific SSIM implementations both in the image space and the wavelet domain. We also evaluate the effectiveness of the complex wavelet SSIM (CWSSIM), a translation-insensitive SSIM implementation, in the context of realistic distortions that arise from compression and error concealment in video transmission applications. In order to better explore the space of distortions, we propose models for typical distortions encountered in video compression/transmission applications. We also derive a multi-scale weighted variant of the complex wavelet SSIM (WCWSSIM), with weights based on the human contrast sensitivity function to handle local mean shift distortions.
Lossy compression of high dynamic range images and video
Rafał Mantiuk, Karol Myszkowski, Hans-Peter Seidel
Most common image and video formats have been designed to work with existing output devices, like LCD or CRT monitors. As the display technology makes huge progress, these formats can no longer represent the data that the new devices can display. Therefore a shift towards higher precision image and video formats seems to be imminent. To overcome limitations of the common image and video formats, such as JPEG, PNG or MPEG, we propose a novel color space, which can accommodate an extended dynamic range and guarantees the precision that is below the visibility threshold. The proposed color space, which is derived from the contrast detection data, can represent the full range of luminance values and the complete color gamut that is visible to the human eye. We show that only minor changes are required to the existing encoding algorithms to accommodate a new color space and therefore greatly enhance information content of the visual data. We demonstrate this on two compression algorithms for High Dynamic Range (HDR) visual data: for static images and for video. We argue that the proposed HDR representation is a simple and universal way to encode visual data independently of the display or capture technology.
Perceptual Approaches to Image Analysis
icon_mobile_dropdown
A closer look at texture metrics for visualization
This paper presents some insights into perceptual metrics for texture pattern categorization. An increasing number of researchers in the field of visualization are trying to exploit texture patterns to overcome the innate limitations of three dimensional color spaces. However, a comprehensive understanding of the most important features by which people group textures is essential for effective texture utilization in visualization. There have been a number of studies aiming at finding the perceptual dimensions of the texture. However, in order to use texture for multivariate visualization we need to first realize the circumstances under which each of these classification holds. In this paper we discuss the results of our three recent studies intended to gain greater insight into perceptual texture metrics. The first and second experiments investigate the role that orientation, scale and contrast play in characterizing a texture pattern. The third experiment is designed to understand the perceptual rules people utilize in arranging texture patterns based on the perceived directionality. Finally, in our last section we present our current effort in designing a computational method which orders the input textures based on directionality and explain its correlation with the human study.
M-HinTS: mimicking humans in texture sorting
Various texture analysis algorithms have been developed the last decades. However, no computational model has arisen that mimics human texture perception adequately. In 2000, Payne, Hepplewhite, and Stoneham and in 2005, Van Rikxoort, Van den Broek, and Schouten achieved mappings between humans and artificial classifiers of respectively around 29% and 50%. In the current research, the work of Van Rikxoort et al. was replicated, using the newly developed, online card sorting experimentation platform M-HinTS: http://eidetic.ai.ru. nl/M-HinTS/. In two separate experiments, color and gray scale versions of 180 textures, drawn from the OuTex and VisTex texture databases were clustered by 34 subjects. The mutual agreement among these subjects was 51% and 52% for, respectively, the experiments with color and gray scale textures. The average agreement between the k-means algorithm and the participants was 36%, where k-means approximated some participants up to 60%. Since last year's results were not replicated, an additional data analysis was developed, which uses the semantic labels available in the database. This analysis shows that semantics play an important role in human texture clustering and once more illustrate the complexity of texture recognition. The current findings, the introduction of M-HinTS, and the set of analyzes discussed, are the start of a next phase in unraveling human texture recognition.
Inference and segmentation in cortical processing
We present a modelling framework for cortical processing aimed at understanding how, maintaining biological plausibility, neural network models can: (a) approximate general inference algorithms like belief propagation, combining bottom-up and top-down information, (b) solve Rosenblatt's classical superposition problem, which we link to the binding problem, and (c) do so based on an unsupervised learning approach. The framework leads to two related models: the first model shows that the use of top-down feedback significantly improves the network's ability to perform inference of corrupted inputs; the second model, including oscillatory behavior in the processing units, shows that the superposition problem can be efficiently solved based on the unit's phases.
Perceptually based techniques for semantic image classification and retrieval
Dejan Depalov, Thrasyvoulos Pappas, Dongge Li, et al.
The accumulation of large collections of digital images has created the need for efficient and intelligent schemes for content-based image retrieval. Our goal is to organize the contents semantically, according to meaningful categories. We present a new approach for semantic classification that utilizes a recently proposed color-texture segmentation algorithm (by Chen et al.), which combines knowledge of human perception and signal characteristics to segment natural scenes into perceptually uniform regions. The color and texture features of these regions are used as medium level descriptors, based on which we extract semantic labels, first at the segment and then at the scene level. The segment features consist of spatial texture orientation information and color composition in terms of a limited number of locally adapted dominant colors. The focus of this paper is on region classification. We use a hierarchical vocabulary of segment labels that is consistent with those used in the NIST TRECVID 2003 development set. We test the approach on a database of 9000 segments obtained from 2500 photographs of natural scenes. For training and classification we use the Linear Discriminant Analysis (LDA) technique. We examine the performance of the algorithm (precision and recall rates) when different sets of features (e.g., one or two most dominant colors versus four quantized dominant colors) are used. Our results indicate that the proposed approach offers significant performance improvements over existing approaches.
Is Wölfflin's system for characterizing art possible to validate by methods used in cognitive-based image-retrieval (CBIR)?
The hypothesis of this study was to find out whether it is possible to capture Woelfflin's basic concepts using methods within CBIR for estimating global characteristics of art works. If results from regression analysis of behavioral data can be linked to global spectral and spatial characteristics of the same art works, then this would substantiate our hypothesis. From a regression analysis assuming a linear relationship between trained observers' ratings of art works representing Woelfflin's concepts and three global image processing features commonly used in CBIR and assumed to be significantly related to the same concepts, we found results that give support to our hypothesis - it seems possible to grasp some of the art concepts by CBIR methods.
Detection, Recognition, and Navigation in Complex Environments
icon_mobile_dropdown
Symbol discriminability models for improved flight displays
Aviation display system designers and evaluators need to know how discriminable displayed symbols will be over a wide range of conditions to assess the adequacy and effectiveness of flight display systems. If flight display symbols are to be safely recognized by pilots, it is necessary that they can be easily discriminated from each other. Sometimes psychophysical measurements can answer this question, but computational modeling may be required to assess the numerous conditions and even help design the empirical experiments that may be needed. Here we present an image discrimination model that includes position compensation. The model takes as input the luminance values for the pixels of two symbol images, the effective viewing distance, and gives as output the discriminability in just-noticeable-differences (d') and the x and y offset in pixels needed to minimize the discriminability. The model predictions are shown to be a useful upper bound for human symbol identification performance.
Is haptic watermarking worth it?
S. Belloni, A. Formaglio, G. Menegaz, et al.
Usage of 3D data and models is receiving a growing interest for several applications, like training, museum displays, multimodal interfaces, aid for impaired people. In such a framework the need will soon raise to protect 3D data from misuse. Among the technologies that can be used to this aim, digital watermarking has a prominent role, due to its versatility and non-obtrusiveness. A basic requirement of any watermarking scheme is that the embedded code is invisible, or non-perceivable, by the end user of the data. This requirement also holds for 3D objects, it is then necessary that the human ability of perceiving a signal hidden in a 3D object is studied. In this paper we present a preliminary analysis aiming at comparing the perceptibility of the hidden signal when the 3D model is sensed through different senses, namely vision (through common rendering techniques and subsequent display on a monitor) and touch (through a haptic interface). Specifically our investigation aimed at assessing whether ensuring watermark invisibility is sufficient to ensure that the watermark presence can not be felt haptically. The answer stemming from our preliminary analysis seems to be a clear no, even if further studies will be necessary before a definitive answer can be given.
Display conditions that influence wayfinding in virtual environments
As virtual environments may be used in training and evaluation for critical real navigation tasks, it is important to investigate the factors influencing navigational performance in virtual environments. We have carried out controlled experiments involving two visual factors known to induce or sustain vection, the illusory perception of self-motion. The first experiment had subjects navigate mazes with either a narrow or wide field of view. We measured the percentage of wrong turns, the total time taken for each attempt, and we examined subjects' drawings of the mazes. We found that a wide field of view can have a substantial effect on navigational abilities, even when the wide field of view does not offer any additional clues to the task, and really only provides a larger view of blank walls on the sides. The second experiment evaluated the effect of perspective accuracy in the scene by comparing the use of displays that were corrected for changing head position against those that were not corrected. The perspective corrections available through headtracking did not appear have any influence on navigational abilities. Another component of our study suggests that during navigation in a virtual environment, memory for directions may not be as effective as it could be with supplemental symbolic representations.
Imaging of Cortical Digital Processes and Structures
icon_mobile_dropdown
Modeling the time course of attention signals in human primary visual cortex
Previous neuroimaging studies have documented the existence of attention signals in human visual cortex, but little is known about the time course of these signals. A recent study reported persistent activity in early visual cortex whose duration was correlated with the duration of sustained attention1. The present study extends these findings by modeling the time course of sustained attention signals with a linear function with duration equal to the period of sustained attention but with variable amplitude and slope. Subjects performed a visual detection task in which a variable-duration delay period occurred before every target presentation. This design required the subjects to allocate visuospatial attention throughout the delay period. Functional magnetic resonance imaging (fMRI) was used to record activity in primary visual cortex (cortical area V1) during performance of the task. There were significant individual differences in the time course of attention signals, with some subjects displaying time courses consistent with constant amplitude attention signals, while others showed decreasing amplitude of attention-related activity during the delay period. These individual differences in time course of attention signals were correlated with behavioral response bias, suggesting that they may reflect differences in the types of attention used by the subjects to perform the detection task. In particular, those subjects who had constant amplitude sustained attention signals may have been employing relatively more endogenous, or top-down attention, while the subjects who exhibited attention signals that decreased over time may have been using relatively more exogenous, or bottom-up attention.
Advances in multifocal methods for imaging human brain activity
The typical multifocal stimulus used in visual evoked potential (VEP) studies consists of about 60 checkerboard stimulus patches each independently contrast reversed according to an m-sequence. Cross correlation of the response (EEG, MEG, ERG, or fMRI) with the m-sequence results in a series of response kernels for each response channel and each stimulus patch. In the past the number and complexity of stimulus patches has been constrained by graphics hardware, namely the use of look-up-table (LUT) animation methods. To avoid such limitations we replaced the LUTs with true color graphic sprites to present arbitrary spatial patterns. To demonstrate the utility of the method we have recorded simultaneously from 192 cortically scaled stimulus patches each of which activate about 12mm2 of cortex in area V1. Because of the sparseness of cortical folding, very small stimulus patches and robust estimation of dipole source orientation, the method opens a new window on precise spatio-temporal mapping of early visual areas. The use of sprites also enables multiplexing stimuli such that at each patch location multiple stimuli can be presented. We have presented patterns with different orientations (or spatial frequencies) at the same patch locations but independently temporally modulated, effectively doubling the number of stimulus patches, to explore cell population interactions at the same cortical locus. We have also measured nonlinear responses to adjacent pairs of patches, thereby getting an edge response that doubles the spatial sampling density to about 1.8 mm on cortex.
Poster Session
icon_mobile_dropdown
Psychophysical measurement for perceptual image brightness enhancement based on image classification
Inji Kim, Wonhee Choe, Seong-Deok Lee
The purpose of this study is to examine the difference in perceptual brightness enhancement per image category through perceptual brightness measurement. Perceptual brightness is measured via psychophysical experiment and brightness enhancement is performed by TMF (Tone Mapping Function). The classification process is comprised of two steps. It is possible to classify histograms into six groups. The three different TMFs for each category selected using the criteria and TMF application strengths. A psychophysical experiment to measure perceptual brightness enhancement was carried out. The experiment was to determine the equal perceptual brightness point between an original image and the corresponding set of TMF images. The results showed that the mean luminance for each category is significantly different. The results from brightness matching indicate that perceptual brightness enhancement is dependent on image category. We can propose that image category should be considered for advanced brightness enhancement methods.
Simple color conversion method to perceptible images for color vision deficiencies
Mitsuhiko Meguro, Chihiro Takahashi, Toshio Koga
In this paper, we propose a color conversion method for realizing barrier free systems for color-defective vision. Human beings are perceiving colors by a ratio of reaction values by three kinds of cones on the retina. The three cones have different sensitivity to a wavelength of light. Nevertheless, dichromats, who are lacking of one of the three cones, tends to be diffcult for discriminating colors of a certain combination. The proposed techniques make new images by converting color for creating perceptible combination of color. The proposed method has three parts of processes. Firstly, we do image segmentation based on the color space L*a*b*. Secondly, we judge whether mean colors of divided regions of the segmented image tend to be confusion or not by using confusion color loci and color vision models of the persons with color-defective vision. Finally, the proposed technique realizes the perceptible images for dichromats by changing the confusion color in several regions of images. We show how effectiveness of the method by some application results.
Toward a taxonomy of textures for image retrieval
Janet S. Payne, T. John Stonham
Image retrieval remains a difficult task, in spite of the many research efforts applied over the past decade or more, from IBM's QBIC onwards. Colour, texture and shape have all been used for content-based image retrieval (CBIR); texture is particularly effective, alone or with colour. Many researchers have expressed the hope that textures can be organised and classified in the way that colour can; however, it seems likely that such an ambition is unrealisable. While the goal of content-based retrieval is to retrieve "more images like this one," there is the difficulty of judging what is meant by similarity for images. It seems appropriate to search on what the images actually look like to potential users of such systems. No single computational method for textural classification matches human perceptual similarity judgements. However, since different methods are effective for different kinds of textures, a way of identifying or grouping such classes should lead to more effective retrievals. In this research, working with the Brodatz texture images, participants were asked to select up to four other textures which they considered similar to each of the Brodatz textures in turn. A principal components analysis was performed upon the correlations between their rankings, which was then used to derive a 'mental map' of the composite similarity ranking for each texture. These similarity measures can be considered as a matrix of distances in similarity space; hierarchical cluster analysis produces a perceptually appropriate dendrogram with eight distinct clusters.
Using words as lexical basis functions for automatically indexing face images in a manner that correlates with human perception of similarity
To facilitate collaboration between computers and people, computers should be able to perceive the world in a manner that correlates well with human perception. A good example of this is face image retrieval. Mathematically-based face indexing methods that are not based primarily on how humans perceive faces can produce retrievals that are disappointing to human users. This raises the question "Can human faces be automatically indexed in a manner that correlates well with human perception of similarity?" Humans use words to describe faces - words such as braided, graybearded, bearded, bespectacled, bald, blondish, blond, freckled, blue eyed, mustached, pale, Caucasian, brown eyed, dark skinned, or black eyed. Such words represent dimensions that span a shared concept space for faces. Therefore they might provide a useful guide to indexing faces in an intuitive manner. This paper describes research that uses descriptive words such as these to index faces. Each word guides the design of one feature detector that produces a scalar coefficient, and those coefficients collectively define a feature vector for each face. Given these feature vectors, it is possible to compute a similarity measure between pairs of faces, and to compare that computed similarity to the similarity, as perceived by humans.
Subjective video quality evaluation for multimedia applications
Quan Huynh-Thu, Mohammed Ghanbari, David Hands, et al.
Video quality can be measured using both subjective and objective assessment methods. Subjective experiments are crucially important as they constitute a benchmark for evaluating the performance of objective quality metrics. Subjective quality assessment of television pictures has received extensive attention from experts over the past decades. On the other hand, emerging applications such as PC-based video streaming and mobile video streaming require new subjective test methodologies. Although some recent studies have compared different test methodologies and procedures, most concerned television pictures. No studies for multimedia-type video really validated the repeatability and reliability of the assessment method and the experimental procedure. This paper outlines a methodology for conducting subjective evaluation of video quality for multimedia applications in a repeatable and reliable manner across different laboratories. Using video material at low-resolution, low-bit rate and low-frame rate, the same experiment was conducted by two different laboratories, i.e. test material was identical in both experiments. Laboratory set-up was different, i.e. different computers and display panels were used, and viewing distance was not fixed. Results show that quality ratings obtained in both experiments are statistically identical. This is an important validation step for the Video Quality Experts Group, which will conduct an ambitious campaign of subjective experiments using many different test laboratories.
Texture segmentation using adaptive Gabor filters based on HVS
Sheng Bi, Dequn Liang
A texture segmentation algorithm based on HVS (Human Visual System) is proposed in this paper. Psychophysical and Neurophysiological conclusions have supported the hypothesis that the processing of afferent pictorial information in the HVS (the visual cortex in particular) involves two stages: the preattentive stage, and the focused attention stage. To simulate the preattentive stage of HVS, ring and wedge filtering methods are used to segment coarsely and the texture number in the input image is gotten. As texture is the repeating patterns of local variations in image intensity, we can use a part of the texture as the whole region representation. The inscribed squares in the coarse regions are transformed respectively to frequency domain and each spectrum is analyzed in detail. New texture measurements based on the Fourier spectrums are given. Through analyzing the measurements of the texture, including repeatability directionality and regularity, we can extract the feature, and determine the parameters of the Gabor filter-bank. Then to simulate the focused attention stage of HVS, the determined Gabor filter-bank is used to filter the original input image to produce fine segmentation regions. This approach performs better in computational complexity and feature extraction than the fixed parameters and fixed stages Gabor filter-bank approaches.