Proceedings Volume 10817

Optoelectronic Imaging and Multimedia Technology V

Qionghai Dai, Tsutomu Shimura
cover
Proceedings Volume 10817

Optoelectronic Imaging and Multimedia Technology V

Qionghai Dai, Tsutomu Shimura
Purchase the printed version of this volume at proceedings.com or access the digital version at SPIE Digital Library.

Volume Details

Date Published: 21 December 2018
Contents: 7 Sessions, 45 Papers, 0 Presentations
Conference: SPIE/COS Photonics Asia 2018
Volume Number: 10817

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 10817
  • Virtual Reality and 3D Display
  • Depth and Light Field
  • Computational Acquisition and Analysis I
  • Computational Acquisition and Analysis II
  • Computational Optics
  • Poster Session
Front Matter: Volume 10817
icon_mobile_dropdown
Front Matter: Volume 10817
This PDF file contains the front matter associated with SPIE Proceedings Volume 10817 including the Title Page, Copyright information, Table of Contents, Introduction, and Conference Committee listing.
Virtual Reality and 3D Display
icon_mobile_dropdown
Fast stereo matching using image pyramid for lunar rover
Haichao Li, Feng Li, Liang Chen
Dense stereo matching has been widely used in lunar rover and it is still a challenging problem, and the main task is to calculate the disparity map given two rectified images of one scene. Most algorithms assume that a maximal possible disparity exists and search all disparities in the range from the minimum to this maximal disparity. In the case of large images and wide disparity search range this can be very computational cost. To solve these problems, we propose a novel hierarchical stereo matching that reconstructs the disparity map of the scene based on the pyramid image and more global matching (MGM) method. This strategy first generates an image pyramid from the original images. And then for the coarsest level images of the pyramid, the disparity map is computed based on the full disparity search range of the coarsest level images. The disparity map of the coarse image is then used as prior to restrict the disparity search space for finer layer matching. We conduct a number of experiments with lunar rover images to evaluate the performance of method, and the experimental results proved the total amount of calculation of the novel MGM method is only 10% of the previous method. And the speed of stereo matching is highly increased and is also more accurate on lunar scenes from the obtained dense disparity maps.
Tunable liquid lens equipped virtual reality adapter for scientific, medical, and therapeutic goals
We adapted virtual reality appliances to use them in scientific medical diagnostics and therapeutic aims by incorporated as part of appliance ocular system two electrically tunable Optotune liquid lenses, controllable by computer USB ports within 20D range. In present time we used the appliance to investigate the mono- and/or binocularly illusory perception dominances (Shapiro and al;2008), nonlinearities caused by controversial Gestalt recognition of motion. Back layer of stimuli elements contained internal motion sources - clockwise spinning disk on neutral background. Disk consisted of centrosymmetric radial rays which luminance was sinus mode modulated along the disc circumference. Spinning speed was 0.15 rps. Spatial frequencies of modulation were within the range 1.8-3.7 cpd. Frontal stimuli layer contained concentrically allocated six circular apertures in neutral background rotating with the same speed however counterclockwise. Apertures lied concentrically with symmetry center collocated with the hidden center of internal motion. Observers made judgements (Tobserv=2sec) at different viewing eccentricity: 1) movement of disks seemed jumbled; spinning perceived continuous, however 2) counter- or 3) clockwise. Illusory perception – clockwise perception of movement sets on when viewing peripherally. Favorable conditions of that are equality of the frontal stimuli layer luminance to mean luminance of modulations, increasing contrast of modulations and increasing viewing eccentricity. Increasing of modulation spatial frequency diminishes illusion onset probability. Virtual reality appliances completed with tunable liquid lenses electrically controlling defocus allow to perform such experiments and to check patients with unilateral eye cataract.
Depth-aware interactive display method for vision-tangible mixed reality
Zhenliang Zhang, Yue Li, Jie Guo, et al.
Vision-tangible mixed reality (VTMR) is a further development of the traditional mixed reality. It provides an experience of directly manipulating virtual objects at the perceptual level of vision. In this paper, we propose a mixed reality system called “VTouch”. VTouch is composed of an optical see-through head-mounted display (OST-HMD) and a depth camera, supporting a direct 6 degree-of-freedom transformation and a detailed manipulation of 6 sides of the Rubik’s cube. All operations can be performed based on the spatial physical detection between virtual and real objects. We have not only implemented a qualitative analysis of the effectiveness of the system by a functional test, but also performed quantitative experiments to test the effects of depth occlusion. In this way, we put forward basic design principles and give suggestions for future development of similar systems. This kind of mixed reality system is significant for promoting the development of the intelligent environment with state-of-the-art interaction techniques.
Depth and Light Field
icon_mobile_dropdown
Innovative hole-filling method for depth-image-based rendering (DIBR) based on context learning
A new convolutional neural network is proposed for hole filling in the synthesized virtual view generated by depth image-based rendering (DIBR). A context encoder in the network is trained to make predictions of the hole region based on the rendered virtual view, with an adversarial discriminator reducing the errors and producing sharper and more precise result. A texture network in the end of the framework extracts the style of the image and achieves a natural output which is closer to reality. The experiment results demonstrate both subjectively and objectively that the proposed method obtain better 3D video quality compared to previous methods. The average peak signal-to-noise ratio (PSNR) increases by 0.36 dB.
Curvature feature extraction based ICP points cloud registration method
3D reconstruction of objects has been an important topic in the field of computer vision. Limited by the optical measurement methods such as structured light, time of flight and binocular imaging, the data measured at multiple viewpoints have to be registered in order to obtain the complete information of the object. Iterative Closest Points (ICP) algorithm is classical in points registration field. However, Euclidean distance is only used in ICP algorithm to calculate the corresponding point pair, which has instability. And it is not necessary to perform a recent search for all points in target point cloud and source point cloud. Therefore, we propose an improved ICP registration method based on curvature feature extraction. First, the statistical outlier removal and voxel grid filter are applied for denoising and streamlining of large-scale scattered point cloud. Then, the corresponding points are extracted according to the curvature feature. In every corresponding points searching, they are matched by the relationship between surface local feature and point distance, which can not only reflect to basic geometrical feature, but also give ICP algorithm good iterative initial value. Next, we use ICP method to build a least squares problem, and singular value decomposition for covariance matrix to obtain the coordinate transformation matrix. In the iteration, the kd-tree is used to accelerate the pair search, and the iteration is repeated until the limit of the distance error function is satisfied finally. We configure PCL on Visual Studio for testing. The experimental results show that the proposed algorithm is more effective than traditional ICP in terms of run time and accuracy.
Influence of textures on the robustness of the depth estimation algorithm for light field
Shuo Peng, Yao Hu, Qun Hao
3D reconstruction of scenes has been widely used in many fields including computer vision, virtual reality, mapping and so on. The light field camera, which could be easily built by inserting a microlens array in the traditional camera, has a simple structure as well as the capability to catch 4D information in one shot and implement 3D reconstruction by follow-up data processing. For these features, light field 3D reconstruction has a promising prospect in scenes with confined space or the need for real time imaging. In the light field 3D reconstruction process, the accurate match between pixels and image planes is a key for exactly recovery of depth information. Simultaneously using defocus information and correspondence information yields good results in many scenes by combining the merits of both methods. Defocus information is based on the difference of special points while correspondence information is based on the angular invariance of one point. Hence, both of them are significantly influenced by textures of targets. This paper focuses on the influence of textures on the robustness of the depth estimation algorithm. First, we made a theoretical analysis of the relationship between these two kind of information and the texture of target based on the information extracting methods. Next, we simulated the information extracting and depth reconstruction based on the inverse of light field refocusing algorithm. After the analysis of the simulation, we gave some artificial textures for reference in the end. Our work in this paper could play a directive role in many applications such as adding specific texture to the surface of target to improve the accuracy of 3D reconstruction or the improvement of the algorithm for particular texture.
Computational Acquisition and Analysis I
icon_mobile_dropdown
Superresolution imaging through scattering media by spectrum correlation
Zhouping Wang, Xin Jin, Qionghai Dai
High-resolution imaging through strongly scattering media is a long-standing challenge with broad applications including biomedical imaging, remote sensing, navigation and so on. In the past decade, several techniques have been proposed to solve this problem. Among these techniques, speckle-correlation-based method can non-invasively extract the spectral amplitude of imaging target from captured scrambled image by conducting autocorrelation and Fourier transform operations. The imaging target can be reconstructed by iterative phase retrieval, while the imaging resolution is restricted by the low-pass property of scattering imaging system. Disproportionality of different frequency components of extracted imaging target spectral amplitude blurs the reconstruction results. In this paper, we propose the spectrum correlation approach to correct the proportional relations among low and medium frequency components of extracted imaging target spectral amplitude. By modeling the propagation of scattering wavefront and calculating the autocorrelation expectation of point spread function (PSF) at different depths, statistical spectral amplitude distribution of PSF is generated to be the weighting factor for imaging target spectrum correction. Nonlinear translation of speckle pattern spectrum expectation is introduced for error reduction in high-frequency components of extracted imaging target spectral amplitude. Finally, corrected spectral amplitude is utilized as the input of phase-retrieval algorithm for imaging target reconstruction. Simulated experiments are presented to demonstrate the effectiveness of the proposed method. After the spectrum correction of extracted spectral amplitude, imaging resolution of speckle-correlation-based scattering imaging system is improved, which can be applied to reconstruct target with smaller size or located deeper in the scattering media.
Monocular vision avoidance method based on fully convolutional networks
Ming Chang, Ming Liu, Yuejin Zhao, et al.
Visual obstacle avoidance is a practical application of machine vision technology. With the development of unmanned and artificial intelligence, visual obstacle avoidance technology has become a research hotspot, because the avoiding obstacle is an indispensable ability for robots to explore the unknown world. The traditional methods often rely on edge detection or feature point extraction, which has poor robustness and is difficult to meet practical applications. Convolutional neural networks (CNNs) shine in a variety of machine vision problems (image classification, target detection, image segmentation, image generation, etc.), showing an obviously robustness over traditional algorithms. Based on this, this paper proposes a method to solve the task of avoiding obstacle by using the Fully convolutional networks (FCNs) to extract accessible area. This paper also proves the robustness and effectiveness of the method through a series of experiments.
A denoising method of medical ultrasound image based on guided image filtering and fractional derivative
The speckle noise in the imaging process of medical ultrasound imaging will be mixed with effective information, which will reduce the image quality and affect the doctor's diagnosis. Therefore, it is of great significance to study the denoising method of medical ultrasound images. Guided image filtering is a kind of edge-preserving algorithm, which can smooth the image at the same time reserving the edge of the image. However, because guided image filtering is insensitive to texture details, it can result in the loss of detailed information of the medical ultrasound image, and the fractional differential method can just compensate for this disadvantage. In order to reserve the edge features and texture features of medical images while removing noise, we propose a denoising method of medical ultrasound image based on guided image filtering and fractional derivative. Firstly, we logarithmically transform medical ultrasound images so the multiplicative noise is convert into additive noise. Then, in order to retain the detailed information of the medical ultrasound image, it is necessary to enhance its sensitivity to the texture details of the guide filter. In this paper, the image is processed with a fractional differential mask to obtain enhanced texture information, which is then imported into the guided image filter. Next, the medical ultrasound image is processed using the guided image filter containing texture information, and finally an exponential transformation is performed to obtain a denoised image. Through experiments, we can conclude that the proposed algorithm not only can effectively enhance the visual effects of ultrasound images while removing noise, but also can effectively preserve edge and texture information.
Programmable compressed sensing using simple deterministic sensing matrices
This paper presents a methodology for programmable on-sensor compressed sensing (CS) of images using pixel binning techniques. This work uses simple sparse deterministic matrices for the ease of implementation. The use of binning technique allows pixels to shift between CS and non-CS modes by the use of control signals. It also allows flexibility to perform CS on selected regions of image. Operation on CS mode has potential to cut raw data rate and power consumption by more than half, double the frame rate and increase low light sensitivity while achieving reconstruction PSNR close to 34 dB. This work also shows that CS has a very good potential for reconstruction of binned images.
Infrared and visible light image registration algorithm based on clustering and mutual information
Feiyan Cheng, Junsheng Shi, Lijun Yun, et al.
Image registration has always been the hot topic in image research field, and the mutual information registration method has become a commonly used method in image registration because of its high precision and good robustness. Unfortunately, it has a problem for infrared and visible image registration. Lots of rich background detail information is usually provided by the visible light band, while the infrared image can locate an object (heat source) with a higher temperature, and often can't obtain the background information. The large difference in the background information of the two images not only interferes with the accuracy of the registration algorithm but also brings a lot of computation. In this paper, a method of fuzzy c-means clustering is used to separate foreground and background which reduces the background information interference for registration, based on the feature that the infrared image and the visible image have a high uniformity in the target area and a large difference in the background area. Then, the mutual information of the foreground image marked by clustering algorithm is calculated as the similarity measure to achieve the purpose of registration. Finally, the algorithm is tested by the infrared and visible images acquired actually. The results show that the two image’s registration is perfectly implemented and verify the effectiveness of this method.
Computational Acquisition and Analysis II
icon_mobile_dropdown
Novel 3D mesh quality assessment method based on curvature analysis
Yaoyao Lin, Mei Yu, Gangyi Jiang, et al.
With the wide applications of three-dimensional (3D) mesh model in digital entertainment, animation, virtual reality and other fields, there are more and more processing techniques for 3D mesh models, including watermarking, compression, and simplification. These processing techniques will inevitably lead to various distortions in 3D mesh. Thus, it is necessary to design effective tools for 3D mesh quality assessment. In this work, considering that the curvature can measure concavity and convexity of surface well, and the human eyes are also very sensitive to the change of curvature, we propose a new objective 3D mesh quality assessment method. Curvature features are used to evaluate the visual difference between the reference and distorted meshes. Firstly, the Gaussian curvature and the mean curvature on the vertices of the reference and distorted meshes are calculated, and then the correlation function is used to measure the correlation coefficient of these meshes. In this case, the degree of degradation of the distorted mesh can be well represented. Finally, the Support Vector Regression model is used to fuse the two features and the objective quality score could be obtained. The proposed method is compared with seven existing 3D mesh quality assessment methods. Experimental results on the LIRIS_EPFL_GenPurpose Database show that the PLCC and SROCC of the proposed method are increased by 13.60% and 6.23%, compared with the best results of the seven representative methods. It implies that the proposed model has stronger consistency with the subjective visual perception of human eyes.
Color characterization for displays based on color appearance matching
The complexity of cross-media color reproduction is that even the problem of device dependence of color space is solved, color distortion still exists in the different background and viewing condition. In this study, the color characterization for the computer monitor is established with visual matching experiments that based on the color appearance model CIECAM02 and back propagation neural network (BPNN). After analyzing prediction results and the influence of training methods, transfer function, the number of hidden layers and nodes of BPNN, ‘log-sigmoid’ is selected as transfer function, the structure of BPNN is 3-6-6-6-3 in this paper. The average prediction color difference of training samples and test samples are 1.016 and 1.726 respectively within acceptable range of color difference of human vision.
Moving target detection based on edge optical flow from satellite videos
Haichao Li, Liang Chen, Feng Li
Moving target tracking from high resolution satellite videos has tremendous potential applications such as visual surveillance, traffic monitoring and so on. This paper proposed a moving target detection algorithm based on the optical flow of edge points for the video sequences in dynamic scene without image registration. Given two consecutive frames, they are firstly processed following with Canny edge operator. And then, the displacements of each pair of edges are computed based on bidirectional optical flow method, which provides high speed and precision. Thirdly, the displacement histogram of all the matched edges is established, which is used to eliminate the influence of the background for the images without registration. Finally, the edge points of moving target are matched based on edge constraint, and the moving target region is determined. Experiment results show that our method has an excellent performance of target tracking for the high resolution satellite videos without image registration.
Transform coefficients distribution of the future versatile video coding (VVC) standard
The future video coding standard Versatile Video Coding (VVC) presents better encoding performance than the predecessor standards by employing a set of tools, including a multi-type tree block partition structure, more intra prediction directions, multiple transform functions, larger transform with high-frequency zeroing. Therefore, VVC could be more effective to remove the redundancy. Based on our experiments, the transform coefficients distribution (TCD) produced by the encoder of VVC would have a sharper peak. Particularly, the previously widely used Laplacian distribution and Cauchy distribution cannot fit the sharper TCD well. Note that the Laplacian distribution is included in the generalized Gaussian distribution (GGD) with the shape parameter equals one. Moreover, the smaller shape parameter will lead to a sharper peak. With this motivation, we propose to use the GGD with shape parameter equals 1/2, denoted as S/2 distribution, to model the sharper TCD of VVC. The experimental results show that the proposed S/2 distribution outperforms the widely used Laplacian and Cauchy distributions in terms of TCD fitting both in the main body and the tail parts. It also presents competitive performance with GGD, though there is only one parameter in S/2 distribution. We further propose a rate estimation model based on the S/2 distribution. The results show that the model based on the S/2 distribution is more accurate than one based on the Laplacian or Cauchy distribution in rate estimation.
Computational Optics
icon_mobile_dropdown
The new-type silicon photomultiplier for ToF LiDAR and other pulse detecting applications
Dmitry A. Shushakov, Sergey V. Bogdanov, Nikolay A. Kolobov, et al.
DEPHAN detector is a prototype of a new-type silicon photomultiplier (HD-SiPM) with the amplifying channels (cells) integrated into solid photosensitive area. The new proposed design enables to increase cell density (4.5⋅104 per sq. mm) and significantly widen dynamic range of the detector preserving or even improving its fill factor (>80%) and cross-talk probability (<2%). Threshold sensitivity of the “LiDAR-emulating” system for red and NIR wavelength, its dependence on background illumination, and other key characteristics were measured with the DEPHAN detector and reference detectors (APD, SiPM). Results confirm prospects of the new-type detector for ToF-LiDAR and 3D-imaging systems.
Optimization algorithms for wavefront sensorless adaptive optics
The acquisition of phase information of light field is the key technology of adaptive optics. Using intensity of the light field to derive the phase distribution of the light field has become a common application technique for phase recovery. Research shows that iterative algorithm is an effective method for phase recovery of light field, but some iterative algorithms have the disadvantages of being sensitive to initial values, easy to fall into local extremum and slow convergence. Here we mainly focus on two iterative optimization algorithms for wavefront distortion correction without wavefront sensing adaptive optics. The first is the Gerchberg-Saxton (GS) algorithm, which combines two complex amplitude distributions on the plane of the optical propagation perpendicular to the optical axis and recovers the phase from the intensity distribution. The second is a genetic algorithm that achieves an optimal solution for the evaluation function through a series of hybridization, mutation, and selection operations. In order to improve its convergence rate, we take Zernike polynomial coefficient required for wavefront reconstruction as the optimization object instead of voltages on corrector traditionally. We numerically simulate the performance of two algorithms, use Zernike polynomial to fit the static aberration, and study a series of parameters, especially single-order aberrations and random multi-order aberrations as the initial phase to the correction effect, and the correction performance of the two algorithms is respectively evaluated using two evaluation functions, Sum-Square Error (SSE) and Strehl Ratio (SR). Time consumption is also mentioned to evaluate the performance of two algorithms.
Poster Session
icon_mobile_dropdown
Self-augmented deep generative network for blind image deblurring
Ke Peng, Zhiguo Jiang, Haopeng Zhang
Image deblurring is a challenging ill-posed problem in computer vision. In this paper, we propose two endto- end generative networks to solve the problem of blind image deblurring and blurring. We chain them together to enhance each other constantly, which means that the output of the one generator is delivered to the another and a more realistic and relevant output is expected. We propose the deblur generator to generate sharp images from blur ones, which is what exactly we want in blind image deblurring. We also propose the self augmented block to enhance the performance of the generative network. Every generative filter is also associated with its own discriminator to compose a conditional GAN to promote the result of the generator. Additionally, to emphasize the edges of the image on the deblur generator, we use reconstructed loss to constrain the generator. The experiments on the benchmark datasets prove the effective of the deblur generator against state-of-the-art algorithms both quantitatively and qualitatively.
Color clustering for ocean surface over Chinese surrounding sea areas based on octree color quantization
This research aims to study the color distribution of ocean surface over Chinese surrounding sea areas in the CIELAB color space. We measure the spectral reflectance of the East China Sea, the South China Sea and the Philippine Sea by using an underwater vertical profile spectrometer. Based on the standard formula of 1931CIEXYZ tristimulus values, the tristimulus values of the ocean surface color of each sea area were calculated and the ocean surface color were reproduced. The octree color quantization was used to quantify the chromaticity values of each sea area and the main chroma value information distribution of ocean surface was obtained in the CIELAB uniform color space. The obtained results are encouraging in that the chroma information of the ocean surface show differences in different sea areas and have their own characteristics.
An optimization and visualization method of cutting lines for lettering images clipped from adhesive tape (Withdrawal Notice)
Publisher’s Note: This manuscript, originally published on 2 November 2018, has been withdrawn by the publisher for editorial reasons.
Non-electrical reflective seven-segment numeric display using image splitter (Withdrawal Notice)
Publisher’s Note: This manuscript, originally published on 2 November 2018, has been withdrawn by the publisher for editorial reasons.
Iterative optimization procedure based design of phase masks suitable for computational depth of field extension
Jingjing Xia, Ling Zhang, Hui Zhao, et al.
In wave-front coded imaging system, the phase mask placed in the pupil plane of the imaging system aims to reshape the PSF (point spread function) or OTF (optical transfer function) to realize DOF (depth of field) extension. How to design a suitable phase mask to provide a highly controlled response of system PSF or OTF is crucial to computational imaging application. Traditionally, AF (ambiguity function) is a powerful tool to assess the DOF extension effect generated by phase masks with known phase function. However, in this paper, we investigate an iterative optimization based procedure to recover the unknown phase mask using AF in a backward way. First, a set of desired PSFs or OTFs at different defocus planes is combined together to construct an initial estimate of AF. Second, the corresponding mutual function is calculated through Fourier transform. Third, SVD (singular value decomposition) is applied to the mutual function. Fourth, only the term corresponding to the biggest eigenvalue is kept and inverse Fourier transform is used to generate a new estimate of AF. Fifth, the input desired OTFs are used to update the newly estimated AF. This procedure iterates until the OTFs extracted from the estimated AF are highly consistent with the input ones using the MSE (meansquare-error) as criterion. In the paper, we systematically study this powerful procedure using numerical simulation and investigate the probability of recovering the rectangular non-separable phase masks. After that, experiments are carried out to justify the effectiveness of the procedure.
A super-resolution imaging system based on sub-pixel camera shifting
In subpixel shift super-resolution (SR) imaging, accurate sub-pixel image registration is a key issue. Traditional superresolution reconstruction methods use a motion estimation algorithm to estimate a shift, and then adopt different methods for SR image reconstruct. In this paper, we focus on designing a SR imaging system, in which instead of moving a camera only, the imaging lens before the camera is also moved. By doing so, we reduce the shifting resolution requirement. As the camera with the lens move 13μm, the image moves 1μm. A set of 16 or 9 low-resolution (LR) images of a scene are captured with the system. The sub-pixel shifts between these LR images are 1μm and 2μm, respectively. Then Projection onto Convex Sets (POCS) algorithm is used to reconstruct the SR image. The results show much higher spatial resolution comparing to the LR images.
An optical flow network for enhancing the edge information
The deep convolution neural network has been widely tackled for optical flow estimation in recent works. Due to advantages of extracting abstract features and efficiency, the accuracy of optical flow estimation using CNN is improved steadily. However, the edge information for most flow predictions is vague. Here, two methods are presented to add extra useful information in training our optical flow network, for the purpose of enhancing edge information of the result. The edges map is added into the input section, and the motion boundary is considered for the input section. Experimental result shows that the accuracy with both methods is higher than the control experiment. 3.71% and 7.54% are improved by comparing just a pair of frames in the input section respectively.
Image super-resolution reconstruction based on residual dictionary learning by support vector regression
Jianfei Li, Xiaoping Yang, Zhihong Chen, et al.
The traditional algorithms of image super-resolution reconstruction are not effective enough to be used in reconstructing high-frequency information of an image. In order to improve the quality of image reconstruction and restore more high-frequency information, the residual dictionary is introduced which can capture the high-frequency information of images such as the edges, angles and corners. The common dictionary is generated by training and learning pairs of low-resolution and high-resolution images. The dictionary combined by common dictionary and residual dictionary is obtained in which more high-frequency information of the images can be restored while the spatial structure of images can be preserved well. The processing of training and testing dictionary is conducted by Support Vector Regression (SVR). Compared with other algorithms in experiments, the proposed method improves its PSNR and SSIM by 3% ~ 4% and 2% ~ 3% on some different images respectively.
A scale adaptive tracker based on point feature
Hao Sun, Xiaoping Yang, Zhihong Chen, et al.
In this paper, we propose a algorithm to tracking target using point feature. The point feature is extracted from the pixels in the first frame and used to label the pixels in the next frame as belonging to either target or background. The positive and negative samples are extracted from the pixels of target and surrounding background, and used to train several weak classifiers, which combine to build a strong classifier using AdaBoost algorithm. The negative samples are given the greater weights than positive samples, which is to avoid that a large number of pixels in background are labeled incorrectly. To efficiently learn a large number of samples, the adopted weak classifier is a linear perceptron model, which is trained and updated using stochastic gradient descent. Only the dot-product between matrices and the sum of matrix elements need to be calculated. To distinguish the similar targets, the histogram-based mean shift algorithm is applied to eliminate those wrong image patches. The histogram of target will be updated over the time. The experiment results show that the proposed algorithm can estimate scale better when scale change, posture change and occlusion occurs.
A structure decomposition-based method for restoring images corrupted by impulse noise
Huasong Chen, Jiawei Liu, Qiansheng Feng, et al.
Images are often corrupted by impulse noise due to transmission errors, malfunctioning pixel elements in camera sensors, faulty memory locations in the imaging process. This paper proposes a two-phase method for impulse noise. In the first phase, a suitable noise is applied to identify the image pixels contaminated by noise. Then, in the second phase, based upon the information on the location of noise-free pixels, images are recovered by using a structure decomposition denoising method. In order to solve the denoising model, split bregman iteration combined with alternating minimization algorithm is utilized. Numerical results demonstrate that the proposed method is a significantly advance over several state-of-the-art techniques on restoration performance.
Free-view walkthrough for the light field display
Light field display requires a large number of views to achieve an ideal three-dimensional display. Techniques have been proposed for generating virtual views between cameras, using depth information and feature matching between multiple images. However, these methods cannot generate views in front of the cameras and behind the cameras to implement free-view walkthrough for the light filed display. Here a simple and robust method is presented to synthetize virtual views. The key to this technique lies in interpreting the input images as a 4D optical function - the light filed, and new views are generated in real time by ray tracing in appropriate directions. The 4D optical function completely describes the state of the flow of light in an unobstructed space. Once a light field is created, new views of arbitrary camera positions can be constructed by combining and resampling the pre-acquired images. The pixel information constructing a new view can be obtained through the interpolation algorithm, and the weighting factor varies with the position of corresponding pixels based on ray tracing.
High-efficiency generating of photorealistic super-multiview image for glasses-free three-dimensional display based on distributed rendering
Glasses-free three-dimensional (3D) displays can provide 3D perception without wearing special glasses or headgear. Compared with the two-dimensional(2D) images, super-multi-view(SMV) images for glasses-free 3D displays can provide more realistic 3D scenes for viewers. Among the existing methods, the ray-tracing technique is a main rendering approach to generate a virtual image. However, it is often very time-consuming to produce photorealistic 3D images because of backward ray-tracing computation with high sampling density. Here, a high-efficient distributed rendering method based on cloud computing technique is presented to reduce temporal redundancy for the generation of SMV images. Experimental results show the effectiveness of our proposed method. Compared with the primary methods working with a single computer, the computation time is decreased to 512 s from 1418 s to reconstruct a 50 viewpoints 3D image with 256 times of sampling, which achieves the resolution of 3840×2160.
The enhancement of depth estimation based on multi-scale convolution kernels
Depth prediction is essential for three-dimensional optical displays. The accuracy of the depth map influences the quality of virtual viewpoint synthesis. Due to the relatively simple end-to-end structures of CNNs, the performance for poor and repetitive texture is barely satisfactory. In consideration of the shortage of existing network structures, the two main structures are proposed to optimize the depth map. (i) Inspired by GoogLeNet, the inception module is added at the beginning of the network. (ii) Assuming that the disparity map has only horizontal disparity, two sizes of rectangular convolution kernels are introduced to the network structure. Experimental results demonstrate that our structures of the CNN reduce the error rate from 19.23% to 14.08%.
A parallel matching algorithm for archaeological fragment object designs based on Hausdorff distance
Cultural relic objects got from archaeology are often incomplete fragments. Some old and classical designs are preserved in these objects’ surfaces. But the objects are generally incomplete and their design is only the partial of the full design. Since the sherds or fragment objects suffer from serious corrosion, the fragment designs are often obscure and do not enough to cover all complete designs. Obviously, manual matching is inefficient and practically impossible for some complicated partial to global matches. This paper presents a feature matching algorithm to overcome these difficulties. Firstly, the color image of sherd object is converted to gray image. Then we detect the edges and the feature curves of the design and get one-pixel-wide edges and curves. Gray-scale image will be enhanced and removed noises. Some evident missing curves will be added to and incorrect curves will be removed manually. The fast matching algorithm is used to exclude the impossible matching designs in the design image database. For any possible matching designs, we use the parallel image matching algorithm based on the Hausdorff distance. The matching process consists a translation and a rotation transform. We divide the translation t into a number of subsets which will be assigned to different processors to compute any rotation Hausdorff distance and match the image of fragment object. All processors will stop when any processor with successful matching result. The experiment on a set of fragment designs shows that the algorithm is efficient and better than traditional matching method.
Effects of rescaling bilinear interpolant on image interpolation quality
Rescaling bilinear (RB) interpolant’s pixels is a novel image interpolation scheme. In the current study, we investigate the effects on the quality of interpolated images. RB determines the lower and upper bounds using the standard deviation of the four nearest pixels to find the new interval or range that will be used to rescale the bilinear interpolant’s pixels. The products of the rescaled-pixels and corresponding distance-based-weights are added to estimate the new pixel value, to be assigned at the empty locations of the destination image. Effects of RB on image interpolation quality were investigated using standard full-reference and non-reference objective image quality metrics, particularly those focusing on interpolated images features and distortion similarities. Furthermore, variance and mean based metrics were also employed to further investigate the effects in terms of contrast and intensity increment or decrement. The Matlab based simulations demonstrated generally superior performances of RB compared to the traditional bilinear (TB) interpolation algorithm. The studied scheme’s major drawback was a higher processing time and tendency to rely on the image type and/or specific interpolation scaling ratio to achieve superior performances. Potential applications of rescaling based bilinear interpolation may also include ultrasound scan conversion in cardiac ultrasound, endoscopic ultrasound, etc.
Luminance regionalization-based saliency detection for high dynamic range image
Junjun Zhang, Mei Yu, Yang Song, et al.
The existing saliency detection methods are not suitable for high dynamic range (HDR) images. In this work, based on human visual system, we propose a new method for detecting the saliency of HDR images via luminance regionalization. First, considering the visual characteristics of a wider luminance range of HDR images, luminance information of the HDR image is extracted, and the HDR image is divided into high, medium, and low luminance regions by luminance thresholding. Then, saliency map of each luminance region is detected, respectively. Color and texture features are extracted for the high luminance region, luminance and texture features are extracted for the low luminance region, and an existing LDR image saliency detection method is used for the medium luminance region. Finally, the three saliency maps are linearly fused to obtain the final HDR image saliency map. Experimental results on two public databases (EPFL HDR eye tracking database and TMID database) demonstrate that the proposed method performs well when against the five state-of-the-art methods in terms of detecting the salient regions of HDR images.
New quality assessment method for dense light fields
Zhijiao Huang, Mei Yu, Haiyong Xu, et al.
Light field has richer scene information than traditional images, including not only spatial information but also directional information. Aiming at multiple distortion problem of dense light field, combining with spatial and angular domain information, a light field image quality assessment method based on dense distortion curve analysis and scene information statistics is proposed in this paper. Firstly, the mean difference between all multi-view images in the angular domain of dense light field is extracted, and a corresponding distortion curve is drawn. Three statistical features are obtained by fitting the curve, which are slope, median and peak, respectively represent the distortion deviation, interpolation period and the maximum distortion. Then, the mean information entropy and mean gradient magnitude of the light field are extracted as the global and local features of the spatial domain. Finally, the extracted features are trained and tested by the Support Vector Regression. The experiment is conducted on the public MPI dense light field database. Experimental results show that the PLCC of the proposed method is 0.89, better than the existing methods, especially for different types of distorted contents.
Comparison of restoration methods for turbulence-degraded images
Tingxiang Yang, Anton A. Maraev
All optical imaging systems, which work in atmospheric turbulence, are influenced by refraction index fluctuations that often affect beam’s utility. We present a comparison of four methods (Wiener Filter (WF), Regularized Filter (RF), Lucy-Richardson Method (LRM), Blind Deconvolution Method (BDM)), which are used to improve the quality of an image observed through horizontal-path atmospheric turbulence. We use simulation and real image acquisition in both weak and strong turbulence conditions to assess performance of chosen methods. Simulation is used to generate an image distorted along the horizontal optical path by convolution of turbulence-degraded point spread function (PSF) with original image, the result being restored by these four methods. Additionally, modified von Karman phase power spectrum density (PSD) is used for generation of phase screens to simulate weak and strong Kolmogorov turbulence, with different Fried parameters. Restoration results are compared by MTF (Modulation Transfer Function) on edges, and running time. It is shown that LRM produces the best image quality. For studying how the methods perform in real images, an experiment is carried out to capture the real images against varying turbulence strengths. These turbulence-induced images are restored by the same methods mentioned above. The experimental results are compared by the above criteria with modeling results, this demonstrates that LRM shows the best image quality, with highest MTF values and relatively short processing time.
High-level semantic information extraction of remote-sensing images based on deep-learning image classification
In recent years, remote sensing imaging technology has developed rapidly. A growing number of high resolution remote sensing images become available, which largely facilitates the research and applications of remote sensing images. Landcover classification is one of the most important tasks of remote sensing image applications [1]. However, traditional classification methods rely on manual feature design, which is time-consuming and requires expertise. It is difficult to apply to massive data. Compared with the traditional classification methods, deep learning [2] can automatically acquire the most intrinsic and discriminative features of the image. Based on the deep learning image classification, this paper designs a high-level semantic information extraction system with high efficiency and robustness. A deep fully convolutional networks (FCN) is designed to extract the features from remote sensing images and to predict the landcover classes of each image, which include building, tree, road, and grass. On the basis of the classification results, we use binarization to highlight the building objects. Then the noise of the binarized image is removed by Gaussian filtering and morphological image processing. After that we set a threshold to delete small misdiagnosis areas. At last the connected domain algorithm is applied to detect the buildings and calculate the building number in each image. The forest coverage is then obtained by computing the proportion of the pixels with ‘tree’ class label to the total number of the pixels in each image. Different from the traditional image interpretation method, this systematic high-level semantic information extraction framework not only detects the number of buildings in the scene but also extracts forest coverage. Moreover, more high-level information extraction can be easily supplemented to this framework, such as road localization or interested object detection.
Streak tube imaging system based on compressive sensing
Jingya Cao, Shaokun Han, Fei Liu, et al.
Although the streak tube imaging lidar(STIL) is widely applied in target recognition and imaging, combining the compressive sensing(CS) theory with it has only just begun. To the best of our knowledge, most studies on this combination theory are about ultra-fast imaging. We harness the advantages of streak tube and CS to provide a novel idea in three-dimensional imaging. The imaging system model is built, and mainly structures are introduced such as fiber array and digital micromirror device(DMD). Simulation experiments are organized. In the process of reconstructing the intensity image and range image of the target, the extraction methods of measurement matrix required by the CS algorithm are given respectively.
A novel adaptive Gaussian-weighted mean filter for removal of high-density salt-and-pepper noise from images
Salt-and-pepper (SAP) noise is one of the common impulse noises. It is generated mostly during the process of image capture and storage, due to false locations in memory and damaged image sensors. SAP noise seriously degrades the quality of images and affects the performance of subsequent image processing, such as edge detection, image segmentation and object recognition. Thus, it is quite necessary to remove SAP noise from corrupted images efficiently. In this paper, we propose a fast-adaptive Gaussian-weighted mean filter (FAGWMF) for removing salt-and-pepper noises. Our denoising filter consists of four stages. At the first stage, we preprocess the image by enlarging and flipping the image. Then, we detect noisy pixels by comparing the pixel value with the maximum (255) and minimum (0) value. One pixel is regarded as noise if its value is equal to the maximum or minimum value. Otherwise, it is regarded as noisefree pixel. At the third stage, we determine the working window size by enlarging the filter window continuously until the quantity of noise-free pixels it includes reaches to the predetermined threshold or the window radius reaches to the predetermined maximum value. At the last stage, we replace each noise candidate by its Gaussian-weighted mean value of the noise-free pixels in window. Using Gaussian-weighted template, the central candidates will get larger weights than those on edges, which helps to preserve the edge information efficiently. Simulation results show that compared to some state-of-the-art algorithms, our proposed filter has faster execution speed and better restoration quality.
No-reference video quality assessment based on perceptual features extracted from multi-directional video spatiotemporal slices images
As video applications become more popular, no-reference video quality assessment (NR-VQA) has become a focus of research. In many existing NR-VQA methods, perceptual feature extraction is often the key to success. Therefore, we design methods to extract the perceptual features that contain a wider range of spatiotemporal information from multidirectional video spatiotemporal slices (STS) images (the images generated by cutting video data parallel to temporal dimension in multiple directions) and use support vector machine (SVM) to perform a successful NR video quality evaluation in this paper. In the proposed NR-VQA design, we first extracted the multi-directional video STS images to obtain as much as possible the overall video motion representation. Secondly, the perceptual features of multi-directional video STS images such as the moments of feature maps, joint distribution features from the gradient magnitude and filtering response of Laplacian of Gaussian, and motion energy characteristics were extracted to characterize the motion statistics of videos. Finally, the extracted perceptual features were fed in SVM or multilayer perceptron (MLP) to perform training and testing. And the experimental results show that the proposed method has achieved the state-of-theart quality prediction performance on the largest existing annotated video database.
Aliasing artefact index for image interpolation quality assessment
A preliminary study of a non-reference aliasing artefact index (AAI) metric is presented in this paper. We focus on the effects of combining a full-reference metric and interpolation algorithm. The nearest neighbor algorithm (NN) is used as the gold standard against which test-algorithms are judged in terms of aliased structures. The structural similarity index (SSIM) metric is used to evaluate a test image (i.e. a test-algorithm’s image) and a reference image (i.e. the NN’s image). Preliminary experiments demonstrated promising effects of the AAI metric against state-of-the-art non-reference metrics mentioned. A new study may further develop the studied metric for potential applications in image quality adaptation and/or monitoring in medical imaging.
Extrapolation for image interpolation
Unlike traditional linear interpolation algorithms, which compute all kernel pixels locations, a novel image interpolation algorithm that uses the preliminary pixels kernel and extrapolated pixels adjustment has been proposed for interpolation operations. The proposed interpolation algorithm is mainly based on the weighting functions of the preliminary interpolation kernel and linearly extrapolated pixels adjustments. Experimentally, the proposed method demonstrated generally higher performance than state-of-art algorithms mentioned with objective evaluations as well as comparable performances with subjective evaluations. Potential applications may include the ultrasound scan conversion for displaying the sectored image.
Remote sensing image classification method based on active-learning deep neural network
The deep neural network algorithm has been widely used in remote sensing image classification. However, training classifiers require a large number of marked samples, which are costly. We propose a method based on active learning deep neural network. Firstly, deep neural network algorithm uses training samples to obtain the initial classifier, and then active learning is used to choose the most informative samples from unmarked samples to be marked by experts, the marked samples will be rejoined into training samples, in this way to update the classifier iteratively. This method requires only a small amount of training samples to achieve or even exceed the classification accuracy that a large number of training samples can achieve.
Spectrum restoration methods of a polarimetric-spectro imager based on light field architecture
Lijuan Su, Yujian Liu, Yan Yuan
The light field polarimetric-spectro imager can capture the polarization, spectral and intensity characteristics of targets in a single shot. Due to diffraction and misalignment of components, there is aliasing among the spectral and polarization channels. In order to obtain accurate spectra of targets, the inversion methods are needed to process the captured data. We study three inversion algorithms, including the Least Squares algorithm, the Truncated Generalized Singular Value Decomposition algorithm and the Tikhonov regulation algorithm. Firstly, we simulated the light field spectral-polarization data with different noise levels and reconstructed spectra using three methods. The result shows that the Tikhonov algorithm can reconstruct spectrum with better accuracy and has better robustness than the other two algorithms. The algorithms are also used to process the data captured by a prototype system. The results also show the superior of the Tikhonov algorithm.
The properties of opaque CsI photocathode
Yong-an Liu, Zhe Liu, Li-zhi Sheng, et al.
Cesium iodide (CsI) photocathode is widely used in various UV (ultraviolet) detecting devices because of its high quantum efficiency (QE) and good stability under short exposure to humid air. In this paper, the performance of the opaque CsI photocathode is studied, including imaging performance, influence of humidity on the quantum efficiency and the stability of the CsI photocathode under FUV irradiation. In the experiment, the input surface of the MCP was evenly divided into four parts. Different thicknesses of the CsI photocathode were deposited directly on the front surface of micro-channel plates (opaque photocathode). The response of different thicknesses and the stability of UV quantum efficiency of CsI photocathode under FUV illumination were studied by using UV monochromator. At the same time, the influence of humid air exposure on the quantum efficiency of CsI photocathode was tested. According to the experimental results, a FUV detector (vacuum tube) based on opaque CsI photocathode was fabricated and the quantum efficiency of the detector was tested. Absolute quantum efficiency of the FUV detector is over 15.5% at 121nm.