Proceedings Volume 9026

Video Surveillance and Transportation Imaging Applications 2014

Robert P. Loce, Eli Saber, Ned Lecky
cover
Proceedings Volume 9026

Video Surveillance and Transportation Imaging Applications 2014

Robert P. Loce, Eli Saber, Ned Lecky
View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 5 March 2014
Contents: 9 Sessions, 31 Papers, 0 Presentations
Conference: IS&T/SPIE Electronic Imaging 2014
Volume Number: 9026

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 9026
  • Video Surveillance
  • Event Detection and Classification
  • Person and Action Detection
  • Human Body Action
  • Transportation Imaging I
  • Transportation Imaging II
  • Applications of Video Surveillance
  • Interactive Paper Session
Front Matter: Volume 9026
icon_mobile_dropdown
Front Matter: Volume 9026
This PDF file contains the front matter associated with SPIE Proceedings Volume 9026, including the Title Page, Copyright information, Table of Contents, and Conference Committee listing.
Video Surveillance
icon_mobile_dropdown
PHACT: Parallel HOG and Correlation Tracking
Histogram of Oriented Gradients (HOG) based methods for the detection of humans have become one of the most reliable methods of detecting pedestrians with a single passive imaging camera. However, they are not 100 percent reliable. This paper presents an improved tracker for the monitoring of pedestrians within images. The Parallel HOG and Correlation Tracking (PHACT) algorithm utilises self learning to overcome the drifting problem. A detection algorithm that utilises HOG features runs in parallel to an adaptive and stateful correlator. The combination of both acting in a cascade provides a much more robust tracker than the two components separately could produce.
Improved Edge Directed Super-Resolution (EDSR) with hardware realization for surveillance, transportation, and multimedia applications
Yue Wang, Osborn de Lima, Eli Saber, et al.
In this paper, we present an improved Edge Directed Super Resolution (EDSR) technique to produce enhanced edge definition and improved image quality in the resulting high resolution image. The basic premise behind this algorithm remains, like its predecessor, to utilize gradient and spatial information and interpolate along the edge direction in a multiple pass iterative fashion. The edge direction map generated from horizontal and vertical gradients and resized to the target resolution is quantized into eight directions over a 5 × 5 block compared to four directions over a 3 × 3 block in the previous algorithm. This helps reduce the noise caused in part due to the quantization error and the super resolved results are significantly improved. In addition, an appropriate weighting encompassing the degree of similarity between the quantized edge direction and the actual edge direction is also introduced. In an attempt to determine the optimal super resolution parameters for the case of still image capture, a hardware setup was utilized to investigate and evaluate those factors. In particular, the number of images captured as well as the amount of sub pixel displacement that yield a high quality result was studied. This is done by utilizing a XY stage capable of sub-pixel movement. Finally, an edge preserving smoothing algorithm contributes to improved results by reducing the high frequency noise introduced by the super resolution process. The algorithm showed favorable results on a wide variety of datasets obtained from transportation to multimedia based print/scan application in addition to images captured with the aforementioned hardware setup.
Rotation-invariant histogram features for threat object detection on pipeline right-of-way
Alex Mathew, Vijayan K. Asari
We present a novel algorithm that automatically detects anomalies in pipeline right of the way (ROW) regions in aerial surveillance videos. Early detection of anomalies on pipeline ROWs avoids failures or leaks. Vehicles that are potential threats vary in size, shape and color. The detection algorithm should be fast to enable detection in real-time. In this paper, we propose a rotation-invariant gradient histogram based descriptor built in CIELab color space. An SVM with radial basis kernel is used as the classifier. We use only the a and b components since they represent the color values. The region of interest is divided into concentric circular regions. The number of such concentric regions is based on the size of the target. The inner regions capture the local characteristics and finer details of the image. Larger regions capture the global characteristics. A noise reducing differentiation kernel is used to compute the gradient of the region to cope with motion blur and noise introduced by atmospheric aberrations. A gradient orientation histogram is constructed in each region by voting the magnitude of gradients. The final descriptor is build by concatenating the magnitude of DFT of orientation histograms collected from a component and b components. The magnitude of Discrete Fourier Transform (DFT) of the histogram is invariant to rotations. DFT can be efficiently computed as Fast Fourier Transfrom (FFT). Since the algorithm uses a sliding window detector, it can easily be parallelized.
Development of a multispectral active stereo vision system for video surveillance applications
In this paper, a multispectral active stereo vision system is developed for tracking the motion of moving objects in different environment. The development of such a multispectral surveillance system composed by combining visible(color) and infrared sensors. The aim is to behind proposing such a network of visible and thermal sensors is to give an optimal performance in various weather conditions (foggy, snowing, dark and rainy) and increase the detection performance. The detection of moving objects is performed by mean of the optical flow between the images of two different spectrum. The optical flow has been computed by formulating an energy minimization problem and subsequently solving it by a numerical optimization algorithm. The convergence of the numerical scheme is given in case of different images. Finally, a number of experimental results (optical flow from stereo images) are given to prove the applicability of such a system in various applications and situations where a robust surveillance and tracking system is needed.
Extrinsic self-calibration of multiple cameras with non-overlapping views in vehicles
Due to decreasing sensor prices and increasing processing performance, the use of multiple cameras in vehicles becomes an attractive possibility for environment perception. This contribution focuses on non-overlapping multi-camera configurations on a mobile platform and its purely vision-based self-calibration as well as its restrictions. The usage of corresponding features between the cameras is very difficult to realize and likely to fail due to different appearances in different views and motion-dependent time delays. Instead, the hand-eye calibration (HEC) technique based on visual odometry is considered to solve this problem by exploiting the cameras motions. For that purpose, this contribution presents an approach to continuously calibrate cameras by making use of the so-called motion adjustment (MA) and an IEKF. Visual odometry in driving vehicles often struggles in estimating the relative magnitudes of the translational motion, which is crucial for the HEC. So, MA simultaneously estimates the extrinsic parameters up to scale as well as the relative motion magnitudes. Furthermore, the estimation process is embedded into a global fusion framework to benefit from the redundant information resulting from multiple cameras in order to yield more robust results. This paper presents results with simulated and real data.
Event Detection and Classification
icon_mobile_dropdown
Real-time anomaly detection in dense crowded scenes
Habib Ullah, Mohib Ullah, Nicola Conci
In this paper we propose a novel approach to detect anomalies in crowded scenes. This is achieved by analyzing the crowd behavior by extracting the corner features. For each corner feature we collect a set of motion features. The motion features are used to train an MLP neural network during the training stage, and the behavior of crowd is inferred on the test samples. Considering the difficulty of tracking individuals in dense crowds due to multiple occlusions and clutter, in this work we extract corner features and consider them as an approximate representation of the people motion. Corner features are then advected over a temporal window through optical flow tracking. Corner features well match the motion of individuals and their consistency, and accuracy is higher both in structured and unstructured crowded scenes compared to other detectors. In the current work, corner features are exploited to extract motion information, which is used as input prior to train the neural network. The MLP neural network is subsequently used to highlight the dominant corner features that can reveal an anomaly in the crowded scenes. The experimental evaluation is conducted on a set of benchmark video sequences commonly used for crowd motion analysis. In addition, we show that our approach outperforms a state of the art technique proposed in.
Enhancing event detection in video using robust background and quality modeling
Richard J. Wood, David Reed, Brian Collins, et al.
Automated event recognition in video data has numerous practical applications for security and transportation. The ability to recognize events in practice depends on precisely detecting and tracking objects of interest in the video data. Numerous factors, such as lighting, weather, camera placement, scene complexity, and data compression can degrade the performance of automated algorithms. As a preprocessing step, developing a set of robust background models can substantially improve system performance. Our object detection and tracking algorithms estimate the object position and attributes within the context of this model to provide more reliable event recognition under challenging conditions. We present an approach to robustly modeling the background as a function of the data acquisition conditions. One element of this approach is automated assessment of the image quality which informs the choice of which background model to use for a given video stream. The video quality model rests on a suite of image metrics computed in real-time from the video, whereas the background models are constructed from historical data collected over a range of conditions. We will describe the formulation of both models. Results from a recent experiment will quantify the empirical performance for recognition of events of interest.
Person and Action Detection
icon_mobile_dropdown
Real-time detection of small faces in HD video
In this paper, we propose a novel approach for real-time face detection in HD videos. The major goal is to detect small faces at a distance for HD videos as well as to preserve low false-alarm rate. The proposed method firstly fast scan face candidates in the down-sampled chrominance channels by using skin color features. Given the candidates, we precisely filter true facial objects from the candidates which location and size are subjected to be mapped into the original luminance channel by using the binary texture features. In our experiment, we showed that the proposed face detection method archived about 20ms/frame in HD videos. It means that the proposed method would be more than 30 times faster than the baseline Viola-Jones one in processing time for 720P-sized video. The gain of processing time will be higher for larger-sized video such as 1080P. And we also see that the false-alarm rate was extremely reduced as about 1/20 of the baseline approach.
Human behavior understanding for assisted living by means of hierarchical context free grammars
A. Rosani, N. Conci, F. G. B. De Natale
Human behavior understanding has attracted the attention of researchers in various fields over the last years. Recognizing behaviors with sufficient accuracy from sensors analysis is still an unsolved problem, because of many reasons, including the low accuracy of the data, differences in the human behaviors as well as the gap between low-level sensors data and high-level scene semantics. In this context, an application that is attracting the interest of both public and industrial entities is the possibility to allow elderly or physically impaired people conducting a normal life at home. Ambient intelligence (AmI) technologies, intended as the possibility of automatically detecting and reacting to the status of the environment and of the persons, is probably the major enabling factor for the achievement of such an ambitious objective. AmI technologies require suitable networks of sensors and actuators, as well as adequate processing and communication technologies. In this paper we propose a solution based on context free grammars for human behavior understanding with an application to assisted living. First, the grammars of the different actions performed by a person in his/her daily life are discovered. Then, a longterm analysis of the behavior is used to generate a control grammar, taking care of the context when an action is performed, and adding semantics. The proposed framework is tested on a dataset acquired in a real environment and compared with state of the art methods already available for the problem considered.
Human interaction recognition through two-phase sparse coding
B. Zhang, N. Conci, Francesco G. B. De Natale
In this paper, we propose a novel method to recognize two-person interactions through a two-phase sparse coding approach. In the first phase, we adopt the non-negative sparse coding on the spatio-temporal interest points (STIPs) extracted from videos, and then construct the feature vector for each video by sum-pooling and l2-normalization. At the second stage, we apply the label-consistent KSVD (LC-KSVD) algorithm on the video feature vectors to train a new dictionary. The algorithm has been validated on the TV human interaction dataset, and the experimental results show that the classification performance is considerably improved compared with the standard bag-of-words approach and the single layer non-negative sparse coding.
Human Body Action
icon_mobile_dropdown
Representing activities with layers of velocity statistics for multiple human action recognition in surveillance applications
Fabio Martínez, Antoine Manzanera, Eduardo Romero
A novel action recognition strategy in a video-surveillance context is herein presented. The method starts by computing a multiscale dense optical flow, from which spatial apparent movement regions are clustered as Regions of Interest (RoIs). Each ROI is summarized at each time by an orientation histogram. Then, a multilayer structure dynamically stores the orientation histograms associated to any of the found RoI in the scene and a set of cumulated temporal statistics is used to label that RoI using a previously trained support vector machine model. The method is evaluated using classic human action and public surveillance datasets, with two different tasks: (1) classification of short sequences containing individual actions, and (2) Frame-level recognition of human action in long sequences containing simultaneous actions. The accuracy measurements are: 96:7% (sequence rate) for the classification task, and 95:3% (frame rate) for recognition in surveillance scenes.
Optical flow based Kalman filter for body joint prediction and tracking using HOG-LBP matching
Binu M. Nair, Kimberley D. Kendricks, Vijayan K. Asari, et al.
We propose a real-time novel framework for tracking specific joints in the human body on low resolution imagery using optical flow based Kalman tracker without the need of a depth sensor. Body joint tracking is necessary for a variety of surveillance based applications such as recognizing gait signatures of individuals, identifying the motion patterns associated with a particular action and the corresponding interactions with objects in the scene to classify a certain activity. The proposed framework consists of two stages; the initialization stage and the tracking stage. In the initialization stage, the joints to be tracked are either manually marked or automatically obtained from other joint detection algorithms in the first few frames within a window of interest and appropriate image descriptions of each joint are computed. We employ the use of a well-known image coding scheme known as the Local Binary Patterns (LBP) to represent the joint local region where this image coding removes the variance to non-uniform lighting conditions as well as enhances the underlying edges and corner. The image descriptions of the joint region would then include a histogram computed from the LBP-coded ROI and a HOG (Histogram of Oriented Gradients) descriptor to represent the edge information. Next the tracking stage can be divided into two phases: Optical flow based detection of joints in corresponding frames of the sequence and prediction /correction phases of Kalman tracker with respect to the joint coordinates. Lucas Kanade optical flow is used to locate the individual joints in consecutive frames of the video based on their location in the previous frame. But more often, mismatches can occur due to the rotation of the joint region and the rotation variance of the optical flow matching technique. The mismatch is then determined by comparing the joint region descriptors using Chi-squared metric between a pair of frames and depending on this statistic, either the prediction phase or the correction phase of the corresponding Kalman filter is called. The Kalman filter for each joint is modeled and designed based on a linear approximation of the joint trajectory where its true form is mostly sinusoidal in fashion. The framework is tested on a private dataset provided by Air Force Institute of Technology. This dataset consists of a total of 21 video sequences, with each sequence containing an individual walking across the face of the building and climbing up / down a flight of stairs. The challenges associated in this dataset are the very low-resolution imagery along with some interlacing effects. The algorithm has being successfully tested on some sequences of this dataset and three joints mainly, the shoulder, the hip and the elbow are tracked successfully within a window of interest. Future work will involve using these three perfectly trackable joints to estimate positions of other joints which are difficult to track due to their small size and occlusions.
Application-driven merging and analysis of person trajectories for distributed smart camera networks
Jürgen Metzler, Eduardo Monari, Colin Kuntzsch
Tracking of persons and analysis of their trajectories are important tasks of surveillance systems as they support the monitoring personnel. However, this trend is accompanied by an increasing demand on smarter camera networks carrying out surveillance tasks autonomously. Thus, there is a higher system complexity so that requirements on the video analysis algorithms are increasing as well. In this paper, we present a system concept and application for anonymously gathering, processing and analysis of trajectories in distributed smart camera networks. It allows a multitude of analysis techniques such as inspecting individual properties of the observed movement in real-time. Additionally, the anonymous movement data allows long-term storage and big data analyses for statistical purposes. The system described in this paper has been implemented as prototype system and deployed for proof of concept under real conditions at the entrance hall of the Leibniz University Hannover. It shows an overall stable performance, particularly with respect to significant illumination changes over hours, as well as regarding the reduction of false positives by post processing and trajectory merging performed on top of a panorama based person detection module.
Real-time human versus animal classification using pyro-electric sensor array and Hidden Markov Model
In this paper, we propose a real-time human versus animal classification technique using a pyro-electric sensor array and Hidden Markov Model. The technique starts with the variational energy functional level set segmentation technique to separate the object from background. After segmentation, we convert the segmented object to a signal by considering column-wise pixel values and then finding the wavelet coefficients of the signal. HMMs are trained to statistically model the wavelet features of individuals through an expectation-maximization learning process. Human versus animal classifications are made by evaluating a set of new wavelet feature data against the trained HMMs using the maximum-likelihood criterion. Human and animal data acquired-using a pyro-electric sensor in different terrains are used for performance evaluation of the algorithms. Failures of the computationally effective SURF feature based approach that we develop in our previous research are because of distorted images produced when the object runs very fast or if the temperature difference between target and background is not sufficient to accurately profile the object. We show that wavelet based HMMs work well for handling some of the distorted profiles in the data set. Further, HMM achieves improved classification rate over the SURF algorithm with almost the same computational time.
Transportation Imaging I
icon_mobile_dropdown
Extended image differencing for change detection in UAV video mosaics
Günter Saur, Wolfgang Krüger, Arne Schumann
Change detection is one of the most important tasks when using unmanned aerial vehicles (UAV) for video reconnaissance and surveillance. We address changes of short time scale, i.e. the observations are taken in time distances from several minutes up to a few hours. Each observation is a short video sequence acquired by the UAV in near-nadir view and the relevant changes are, e.g., recently parked or moved vehicles. In this paper we extend our previous approach of image differencing for single video frames to video mosaics. A precise image-to-image registration combined with a robust matching approach is needed to stitch the video frames to a mosaic. Additionally, this matching algorithm is applied to mosaic pairs in order to align them to a common geometry. The resulting registered video mosaic pairs are the input of the change detection procedure based on extended image differencing. A change mask is generated by an adaptive threshold applied to a linear combination of difference images of intensity and gradient magnitude. The change detection algorithm has to distinguish between relevant and non-relevant changes. Examples for non-relevant changes are stereo disparity at 3D structures of the scene, changed size of shadows, and compression or transmission artifacts. The special effects of video mosaicking such as geometric distortions and artifacts at moving objects have to be considered, too. In our experiments we analyze the influence of these effects on the change detection results by considering several scenes. The results show that for video mosaics this task is more difficult than for single video frames. Therefore, we extended the image registration by estimating an elastic transformation using a thin plate spline approach. The results for mosaics are comparable to that of single video frames and are useful for interactive image exploitation due to a larger scene coverage.
Real-time traffic jam detection and localization running on a smart camera
Yuriy Lipetski, Gernot Loibner, Michael Ulm, et al.
Reliable automatic detection of traffic jam occurrences is of big significance for traffic flow analysis related applications. We present our work aimed at the application of video based real-time traffic jam detection. Our method can handle both calibrated and un-calibrated scenarios, operating in world and in image coordinate systems respectively. The method is designed to be operated on a smart camera, but is also suitable for a standard personal computer. The combination of state-of-the-art algorithms for vehicle detections and velocity estimation allows robust long-term system operation in due to the high recall rate and very low false alarm rate. The proposed method not only detects traffic jam events in real-time, but also precisely localizes traffic jams by their start and end positions per road lane. We describe also our strategy in making computationally heavy algorithms real-time capable even on hardware with a limited computing power.
Transportation Imaging II
icon_mobile_dropdown
Automatic parking lot occupancy computation using motion tracking
Francisco Justo, Hari Kalva, Daniel Raviv
Nowadays it is very hard to find available spots in public parking lots and even harder in public facilities such as universities and sports venues. A system that provides drivers with parking availability and parking lot occupancy will allow users find a parking space much easier and faster. This paper presents a system for automatic parking lot occupancy computation using motion tracking. Methods for complexity reduction are presented. The system showed approximately 96% accuracy in determining parking lot occupancy. We showed that by optimizing the resolution and bitrate of the input video, we can reduce the complexity by 70% and still achieved over 90% of accuracy. The results showed that high quality video is not necessary for the proposed algorithm to obtain accurate results.
Methods for vehicle detection and vehicle presence analysis for traffic applications
This paper presents our work towards robust vehicle detection in dynamic and static scenes from a brief historical perspective up to our current state-of-the-art. We cover several methods (PCA, basic HOG, texture analysis, 3D measurement) which have been developed for, tested, and used in real-world scenarios. The second part of this work presents a new HOG cascade training algorithm which is based on evolutionary optimization principles: HOG features for a low stage count cascade are learned using genetic feature selection methods. We show that with this approach it is possible to create a HOG cascade which has comparable performance to an AdaBoost trained cascade, but is much faster to evaluate.
Object instance recognition using motion cues and instance specific appearance models
Arne Schumann
In this paper we present an object instance retrieval approach. The baseline approach consists of a pool of image features which are computed on the bounding boxes of a query object track and compared to a database of tracks in order to find additional appearances of the same object instance. We improve over this simple baseline approach in multiple ways: 1) we include motion cues to achieve improved robustness to viewpoint and rotation changes, 2) we include operator feedback to iteratively re-rank the resulting retrieval lists and 3) we use operator feedback and location constraints to train classifiers and learn an instance specific appearance model. We use these classifiers to further improve the retrieval results. The approach is evaluated on two popular public datasets for two different applications. We evaluate person re-identification on the CAVIAR shopping mall surveillance dataset and vehicle instance recognition on the VIVID aerial dataset and achieve significant improvements over our baseline results.
Applications of Video Surveillance
icon_mobile_dropdown
Real-time change detection for countering improvised explosive devices
Dennis W. J. M. van de Wouw, Kris van Rens, Hugo van Lint, et al.
We explore an automatic real-time change detection system to assist military personnel during transport and surveillance, by detection changes in the environment with respect to a previous operation. Such changes may indicate the presence of Improvised Explosive Devices (IEDs), which can then be bypassed. While driving, images of the scenes are acquired by the camera and stored with their GPS positions. At the same time, the best matching reference image (from a previous patrol) is retrieved and registered to the live image. Next a change mask is generated by differencing the reference and live image, followed by an adaptive thresholding technique. Post-processing steps such as Markov Random Fields, local texture comparisons and change tracking, further improve time- and space-consistency of changes and suppress noise. The resulting changes are visualized as an overlay on the live video content. The system has been extensively tested on 28 videos, containing over 10,000 manually annotated objects. The system is capable of detecting small test objects of 10 cm3 at a range of 40 meters. Although the system shows an acceptable performance in multiple cases, the performance degrades under certain circumstances for which extensions are discussed.
Use of automated video analysis for the evaluation of bicycle movement and interaction
Heather Twaddle, Tobias Schendzielorz, Oliver Fakler, et al.
With the purpose of developing valid models of microscopic bicycle behavior, a large quantity of video data is collected at three busy urban intersections in Munich, Germany. Due to the volume of data, the manual processing of this data is infeasible and an automated or semi-automated analysis method must be implemented. An open source software, “Traffic Intelligence”, is used and extended to analyze the collected video data with regard to research questions concerning the tactical behavior of bicyclists. In a first step, the feature detection parameters, the tracking parameters and the object grouping parameters are calibrated, making it possible to accurately track and group the objects at intersections used by large volumes of motor vehicles, bicycles and pedestrians. The resulting parameters for the three intersections are presented. A methodology for the classification of road users as cars, bicycles or pedestrians is presented and evaluated. This is achieved by making hypotheses about which features belong to cars, or bicycles and pedestrians, and using grouping parameters specified for that road user group to cluster the features into objects. These objects are then classified based on their dynamic characteristics. A classification structure for the maneuvers of different road users is presented and future applications are discussed.
Mutation detection for inventories of traffic signs from street-level panoramic images
Lykele Hazelhoff, Ivo Creusen, Peter H. N. De With
Road safety is positively influenced by both adequate placement and optimal visibility of traffic signs. As their visibility degrades over time due to e.g. aging, vandalism, accidents and vegetation coverage, up-to-date inven­tories of traffic signs are highly attractive for preserving a high road safety. These inventories are performed in a semi-automatic fashion from street-level panoramic images, exploiting object detection and classification tech­niques. Next to performing inventories from scratch, these systems are also exploited for the efficient retrieval of situation changes by comparing the outcome of the automated system to a baseline inventory (e.g. performed in a previous year). This allows for specific manual interactions to the found changes, while skipping all unchanged situations, thereby resulting in a large efficiency gain. This work describes such a mutation detection approach, with special attention to re-identifying previously found signs. Preliminary results on a geographical area con­taining about 425 km of road show that 91.3% of the unchanged signs are re-identified, while the amount of found differences equals about 35% of the number of baseline signs. From these differences, about 50% correspond to physically changed traffic signs, next to false detections, misclassifications and missed signs. As a bonus, our approach directly results in the changed situations, which is beneficial for road sign maintenance.
Interactive Paper Session
icon_mobile_dropdown
Downhill simplex approach for vehicle headlights detection
Ho-Joong Kang, Ho-Kun Kim, Il-Whan Oh, et al.
Nighttime vehicle detection is an essential problem to be solved in the development of highway surveillance systems that provide information about the vehicle speed, traffic volume, and traffic jams, and so on. In this paper, a novel downhill simplex approach for vehicle headlights detection is presented. In the proposed approach, a rough position of vehicle headlights is detected first. Then, a downhill simplex optimization approach is adopted to find the accurate location of vehicle headlights. For the optimization process, a novel cost function is designed and various headlights are evaluated for possible headlight positions on the detected vehicles, locating an optimal headlight position. Simulation results are provided to show the robustness of the proposed approach for headlights detection.
Template matching based people tracking using a smart camera network
Junzhi Guan, Peter Van Hese, Jorge Oswaldo Niño-Castañeda, et al.
In this paper, we proposes a people tracking system composed of multiple calibrated smart cameras and one fusion server which fuses the information from all cameras. Each smart camera estimates the ground plane positions of people based on the current frame and feedback from the server from the previous time. Correlation coefficient based template matching, which is invariant to illumination changes, is proposed to estimate the position of people in each smart camera. Only the estimated position and the corresponding correlation coefficient are sent to the server. This minimal amount of information exchange makes the system highly scalable with the number of cameras. The paper focuses on creating and updating a good template for the tracked person using feedback from the server. Additionally, a static background image of the empty room is used to improve the results of template matching. We evaluated the performance of the tracker in scenarios where persons are often occluded by other persons or furniture, and illumination changes occur frequently e.g., due to switching the light on or off. For two sequences (one minute for each, one with table in the room, one without table) with frequent illumination changes, the proposed tracker never lose track of the persons. We compare the performance of our tracking system to a state-of-the-art tracking system. Our approach outperforms it in terms of tracking accuracy and people loss.
Embedded image enhancement for high-throughput cameras
Stan J. C. Geerts, Dion Cornelissen, Peter H. N. de With
This paper presents image enhancement for a novel Ultra-High-Definition (UHD) video camera offering 4K images and higher. Conventional image enhancement techniques need to be reconsidered for the high-resolution images and the low-light sensitivity of the new sensor. We study two image enhancement functions and evaluate and optimize the algorithms for embedded implementation in programmable logic (FPGA). The enhancement study involves high-quality Auto White Balancing (AWB) and Local Contrast Enhancement (LCE). We have compared multiple algorithms from literature, both with objective and subjective metrics. In order to objectively compare Local Contrast (LC), an existing LC metric is modified for LC measurement in UHD images. For AWB, we have found that color histogram stretching offers a subjective high image quality and it is among the algorithms with the lowest complexity, while giving only a small balancing error. We impose a color-to-color gain constraint, which improves robustness of low-light images. For local contrast enhancement, a combination of contrast preserving gamma and single-scale Retinex is selected. A modified bilateral filter is designed to prevent halo artifacts, while significantly reducing the complexity and simultaneously preserving quality. We show that by cascading contrast preserving gamma and single-scale Retinex, the visibility of details is improved towards the level appropriate for high-quality surveillance applications. The user is offered control over the amount of enhancement. Also, we discuss the mapping of those functions on a heterogeneous platform to come to an effective implementation while preserving quality and robustness.
On-road anomaly detection by multimodal sensor analysis and multimedia processing
Fatih Orhan, P. Erhan Eren
The use of smartphones in Intelligent Transportation Systems is gaining popularity, yet many challenges exist in developing functional applications. Due to the dynamic nature of transportation, vehicular social applications face complexities such as developing robust sensor management, performing signal and image processing tasks, and sharing information among users. This study utilizes a multimodal sensor analysis framework which enables the analysis of sensors in multimodal aspect. It also provides plugin-based analyzing interfaces to develop sensor and image processing based applications, and connects its users via a centralized application as well as to social networks to facilitate communication and socialization. With the usage of this framework, an on-road anomaly detector is being developed and tested. The detector utilizes the sensors of a mobile device and is able to identify anomalies such as hard brake, pothole crossing, and speed bump crossing. Upon such detection, the video portion containing the anomaly is automatically extracted in order to enable further image processing analysis. The detection results are shared on a central portal application for online traffic condition monitoring.
Modelling dynamics with context-free grammars
Juan-M. García-Huerta, Hugo Jiménez-Hernández, Ana-M. Herrera-Navarro, et al.
This article presents a strategy to model the dynamics performed by vehicles in a freeway. The proposal consists on encode the movement as a set of finite states. A watershed-based segmentation is used to localize regions with high-probability of motion. Each state represents a proportion of a camera projection in a two-dimensional space, where each state is associated to a symbol, such that any combination of symbols is expressed as a language. Starting from a sequence of symbols through a linear algorithm a free-context grammar is inferred. This grammar represents a hierarchical view of common sequences observed into the scene. Most probable grammar rules express common rules associated to normal movement behavior. Less probable rules express themselves a way to quantify non-common behaviors and they might need more attention. Finally, all sequences of symbols that does not match with the grammar rules, may express itself uncommon behaviors (abnormal). The grammar inference is built with several sequences of images taken from a freeway. Testing process uses the sequence of symbols emitted by the scenario, matching the grammar rules with common freeway behaviors. The process of detect abnormal/normal behaviors is managed as the task of verify if any word generated by the scenario is recognized by the grammar.
Overtaking vehicles detection and localization for driver assistance
Jiun-Yann Lin, Chung-Lin Huang
This paper proposed an effective overtaking vehicle detection and localization system based on wheel detection. Using the windshield-mounted camera, it can detect the vehicles in rear-view and side-view. It extracts the MB-LBP feature for the cascaded Adaboost classifiers. Differing from traditional approaches, our method can overcome the perspective variation of the wheels in the images. It can detect the wheels of different aspect rations with varying scales. After wheel detection, it can locate the overtaking vehicle by pairing the front/rear wheels. The experiments show that our system can identify and locate the vehicle appearing in side-view as well as in rear-view. The performance of wheel detector is demonstrated with the precision rate as 0.91 and recall rate as 0.96.
An integrated framework for detecting suspicious behaviors in video surveillance
Thi Thi Zin, Pyke Tin, Hiromitsu Hama, et al.
In this paper, we propose an integrated framework for detecting suspicious behaviors in video surveillance systems which are established in public places such as railway stations, airports, shopping malls and etc. Especially, people loitering in suspicion, unattended objects left behind and exchanging suspicious objects between persons are common security concerns in airports and other transit scenarios. These involve understanding scene/event, analyzing human movements, recognizing controllable objects, and observing the effect of the human movement on those objects. In the proposed framework, multiple background modeling technique, high level motion feature extraction method and embedded Markov chain models are integrated for detecting suspicious behaviors in real time video surveillance systems. Specifically, the proposed framework employs probability based multiple backgrounds modeling technique to detect moving objects. Then the velocity and distance measures are computed as the high level motion features of the interests. By using an integration of the computed features and the first passage time probabilities of the embedded Markov chain, the suspicious behaviors in video surveillance are analyzed for detecting loitering persons, objects left behind and human interactions such as fighting. The proposed framework has been tested by using standard public datasets and our own video surveillance scenarios.
A novel approach to extract closed foreground object contours in video surveillance
Giounona Tzanidou, Eran A. Edirisinghe
In this paper we present a novel approach for the detection of closed contours of foreground objects in videos. The proposed methodology begins with an initial localization of contours that is achieved via background subtraction technique that makes use of mixture of Gaussian distributions to model the background. The features that are used to realize an approximate foreground contour segmentation consist of magnitude of gradient at multiple orientations and phase congruency. In the next stage, canny edges of the incoming frames are computed at multiple scales and thresholds using the saturation and value components of HSV image. The approximate foreground contour is refined by reflecting it on the detected edges. A color ratio based noise and shadow line removal technique has been devised to remove the falsely segmented noise and strong shadow edges. Ultimately, to ensure closed contours, edge completion algorithm by anisotropic diffusion is applied. Once the contour is completed, it undergoes flood fill to define the foreground areas. Detailed experimental results on benchmark dataset showed that the proposed framework performs well in most of the different background scenarios. It effectively tackles the presence of shadows, illumination changes, some cases of dynamic background and thermal videos.