Video Surveillance and Transportation Imaging Applications

Front Matter: Volume 8663

Show abstract

This PDF file contains the front matter associated with SPIE Proceedings Volume 8663 including the Title Page, Copyright Information, Table of Contents, Introduction, and Conference Committee listing.

Group localisation and unsupervised detection and classification of basic crowd behaviour events for surveillance applications

Nadejda S. Roubtsova, Peter H. N. de With

Show abstract

Technology for monitoring crowd behaviour is in demand for surveillance and security applications. The trend in research is to tackle detection of complex crowd behaviour events (panic, ght, evacuation etc.) directly using machine learning techniques. In this paper, we present a contrary, bottom-up approach seeking basic group information: (1) instantaneous location and (2) the merge, split and lateral slide-by events - the three basic motion patterns comprising any crowd behaviour. The focus on such generic group information makes our algorithm suitable as a building block in a variety of surveillance systems, possibly integrated with static content analysis solutions. Our feature extraction framework has optical ow in its core. The framework is universal being motion-based, rather than object-detection-based and generates a large variety of motion-blob- characterising features useful for an array of classi cation problems. Motion-based characterisation is performed on a group as an atomic whole and not by means of superposition of individual human motions. Within that feature space, our classi cation system makes decisions based on heuristic rules and thresholds, without machine learning. Our system performs well on group localisation, consistently generating contours around both moving and halted groups. The visual output of our periodical group localisation is equivalent to tracking and the group contour accuracy ranges from adequate to exceptionally good. The system successfully detects and classi es within our merge/split/slide-by event space in surveillance-type video sequences, di ering in resolution, scale, quality and motion content. Quantitatively, its performance is characterised by a good recall: 83% on detection and 71% on combined detection and classi cation.

Gaussian mixtures for anomaly detection in crowded scenes

Habib Ullah, Lorenza Tenuti, Nicola Conci

Show abstract

In this paper, we propose a fast and robust framework for anomaly detection in crowed scenes. In our method, anomaly is adaptively modeled as a deviation from the normal behavior of crowd observed in the scene. For this purpose, we extract motion features by repeatedly initializing a grid of particles over a temporal window. These features are exploited in a real-time anomaly detection system. In order to model the ordinary behavior of the people moving in the crowd, we use the Gaussian mixture model (GMM) technique, which is robust enough to capture the scene dynamics. As opposed to explicitly modeling the values of all the pixels as a mixture of Gaussians, we adopted the GMM to learn the behavior of the motion features extracted from the particles. Based on the persistence and the variance of each Gaussian distribution, we determine which Gaussians can be associated to the normal behavior of the crowd. Particles with motion features that do not fit the distributions representing normal behavior are signaled as anomaly, until there is a Gaussian able to include them with sufficient evidence supporting it. Experiments are extensively conducted on publically available benchmark dataset, and also on a challenging dataset of video sequences we captured. The experimental results revealed that the proposed method performs effectively for anomaly detection.

Space-time correlation filters for human action detection

Joseph A. Fernandez, B. V. K. Vijaya Kumar

Show abstract

To automate video surveillance systems, algorithms must be developed to automatically detect and classify basic human actions. Many traditional approaches focus on the classification of actions, which usually assumes prior detection, tracking, and segmentation of the human figure from the original video. On the other hand, action detection is a more desirable paradigm, as it is capable of simultaneous localization and classification of the action. This means that no prior segmentation or tracking is required, and multiple action instances may be detected in the same video. Correlation filters have been traditionally applied for object detection in images. In this paper, we report the results of our investigation using correlation filters for human action detection in videos. Correlation filters have previously been explored for action classification, but this is the first time they are evaluated for the more difficult task of action detection. In addition, we investigate several practical implementation issues, including parameter selection, reducing computational time, and exploring the effects of preprocessing and temporal occlusion (i.e., loss of video frames) on performance.

Recognition of two-person interaction in multi-view surveillance video via proxemics cues and spatio-temporal interest points

Bo Zhang, Paolo Rota, Nicola Conci

Show abstract

In this paper we propose a novel method to recognize different types of two-person interactions through multi-view surveillance cameras. From the bird-eye view, proxemics cues are exploited to segment the duration of the interaction, while from the lateral view the corresponding interaction intervals are extracted. The classification is achieved by applying a visual bag-of-words approach, which is used to train a liner multi-class SVM classifier. We test our method on the UNITN social interaction dataset. Experimental results show that using the temporal segmentation can improve the classification performance.

Weighted symbolic analysis of human behavior for event detection

A. Rosani, G. Boato, F. G. B. De Natale

Show abstract

Automatic video analysis and understanding has become a high interest research topic, with applications to video browsing, content-based video indexing, and visual surveillance. However, the automation of this process is still a challenging task, due to clutters produced by low-level processing operations. This common problem can be solved by embedding signi cant contextual information into the data, as well as using simple syntactic approaches to perform the matching between actual sequences and models. In this context we propose a novel framework that employs a symbolic representation of complex activities through sequences of atomic actions based on a weighted Context-Free Grammar.

Collaborative real-time scheduling of multiple PTZ cameras for multiple object tracking in video surveillance

Yu-Che Liu, Chung-Lin Huang

Show abstract

This paper proposes a multi-PTZ-camera control mechanism to acquire close-up imagery of human objects in a surveillance system. The control algorithm is based on the output of multi-camera, multi-target tracking. Three main concerns of the algorithm are (1) the imagery of human object’s face for biometric purposes, (2) the optimal video quality of the human objects, and (3) minimum hand-off time. Here, we define an objective function based on the expected capture conditions such as the camera-subject distance, pan tile angles of capture, face visibility and others. Such objective function serves to effectively balance the number of captures per subject and quality of captures. In the experiments, we demonstrate the performance of the system which operates in real-time under real world conditions on three PTZ cameras.

Tracking small targets in wide area motion imagery data

Alex Mathew, Vijayan K. Asari

Show abstract

Object tracking in aerial imagery is of immense interest to the wide area surveillance community. In this paper, we propose a method to track very small targets such as pedestrians in AFRL Columbus Large Image Format (CLIF) Wide Area Motion Imagery (WAMI) data. Extremely small target sizes, combined with low frame rates and significant view changes, make tracking a very challenging task in WAMI data. Two problems should be tackled for object tracking frame registration and feature extraction. We employ SURF for frame registration. Although there are several feature extraction methods that work reasonably well when the scene is of high resolution, most methods fail when the resolution is very low. In our approach, we represent the target as a collection of intensity histograms and use a robust statistical distance to distinguish between the target and the background. We divide the object into m ×n regions and compute the normalized intensity histogram in each region to build a histogram matrix. The features can be compared using the histogram comparison techniques. For tracking, we use a combination of a bearing-only Kalman filter and the proposed feature extraction technique. The problem of template drift is solved by further localizing the target with a blob detection algorithm. The new template is taken as the detected blob. We show the robustness of the algorithm by giving a comparison of feature extraction part of our method with other feature extraction methods like SURF, SIFT and HoG and tracking part with mean-shift tracking.

Feature descriptors for object matching in real-time tracking applications

Gernot Loibner, Oliver Sidla

Show abstract

This paper addresses the identity switching problem of a tracking by detection approach. When matching a newly found detection to a known trajectory in certain circumstances (e.g. occlusions) the trajectory will switch from one object to another decreasing the overall quality. To overcome this problem four keypoint based approaches are investigated. The keypoint based approaches rely on finding interesting points on and around the tracked object and describing them in a discriminant way. In the verification step the known trajectory keypoint descriptions are matched to the newly calculated detection keypoint description and according to a distance measure, the match is accepted or rejected. Another two approaches which provide abstract representations of the objects are also reviewed. Finally the verification approaches are tested on a publicly available pedestrian data set which shows a crowded town centre.

Ship detection in port surveillance based on context and motion saliency analysis

Xinfeng Bao, Svitlana Zinger, Rob Wijnhoven, et al.

Show abstract

This paper presents an automatic ship detection approach for video-based port surveillance systems. Our approach combines context and motion saliency analysis. The context is represented by the assumption that ships only travel inside a water region. We perform motion saliency analysis since we expect ships to move with higher speed compared to the water flow and static environment. A robust water detection is first employed to extract the water region as contextual information in the video frame, which is achieved by graph-based segmentation and region-based classification. After the water detection, the segments labeled as non-water are merged to form the regions containing candidate ships, based on the spatial adjacency. Finally, ships are detected by checking motion saliency for each candidate ship according to a set of criteria. Experiments are carried out with real-life surveillance videos, where the obtained results prove the accuracy and robustness of the proposed ship detection approach. The proposed algorithm outperforms a state-of-the-art algorithm when applied to the same sets of surveillance videos.

An edge directed super resolution technique for multimedia applications

Osborn de Lima, Sreenath Rao Vantaram, Sankaranarayanan Piramanayagam, et al.

Show abstract

In this paper, we present an Edge Directed Super Resolution (EDSR) technique for grayscale and color images. The proposed algorithm is a multiple pass iterative algorithm aimed at producing better defined images with sharper edges. The basic premise behind this algorithm is interpolating along the edge direction and thus reducing blur that comes from traditional interpolation techniques which operate across the edge in some instances. To this effect, horizontal and vertical gradients derived from the input reference image resized to the target resolution, are utilized to generate an edge direction map, which in turn is quantized into four discrete directions. The process then utilizes the multiple input images shifted by a sub pixel amount to yield a single higher resolution image. A cross correlation based registration approach determines the relative shifts between the frames. In the case of color images, the edge directed super resolution algorithm is applied to the L channel in the L*a*b* color space, since most of the edge information is concentrated in that channel. The two color difference channels a* and b* are resized to a higher resolution using a conventional bicubic interpolation approach. The algorithm developed was applied to grayscale and color images and showed favorable results on a wide variety of datasets ranging from printing to surveillance, to regular consumer photography.

A smart camera based traffic enforcement system: experiences from the field

Oliver Sidla, Gernot Loibner

Show abstract

The observation and monitoring of traffic with smart vision systems for the purpose of improving traffic safety has a big potential. Embedded vision systems can count vehicles and estimate the state of traffic along the road, they can supplement or replace loop sensors with their limited local scope, radar which measures the speed, presence and number of vehicles. This work presents a vision system which has been built to detect and report traffic rule violations at unsecured railway crossings which pose a threat to drivers day and night. Our system is designed to detect and record vehicles passing over the railway crossing after the red light has been activated. Sparse optical flow in conjunction with motion clustering is used for real-time motion detection in order to capture these safety critical events. The cameras are activated by an electrical signal from the railway when the red light turns on. If they detect a vehicle moving over the stopping line, and it is well over this limit, an image sequence will be recorded and stored onboard for later evaluation. The system has been designed to be operational in all weather conditions, delivering human-readable license plate images even under the worst illumination conditions like direct incident sunlight direct view into or vehicle headlights. After several months of operation in the field we can report on the performance of the system, its hardware implementation as well as the implementation of algorithms which have proven to be usable in this real-world application.

Algorithm design for automated transportation photo enforcement camera image and video quality diagnostic check modules

Ajay Raghavan, Bhaskar Saha

Show abstract

Photo enforcement devices for traffic rules such as red lights, toll, stops, and speed limits are increasingly being deployed in cities and counties around the world to ensure smooth traffic flow and public safety. These are typically unattended fielded systems, and so it is important to periodically check them for potential image/video quality problems that might interfere with their intended functionality. There is interest in automating such checks to reduce the operational overhead and human error involved in manually checking large camera device fleets. Examples of problems affecting such camera devices include exposure issues, focus drifts, obstructions, misalignment, download errors, and motion blur. Furthermore, in some cases, in addition to the sub-algorithms for individual problems, one also has to carefully design the overall algorithm and logic to check for and accurately classifying these individual problems. Some of these issues can occur in tandem or have the potential to be confused for each other by automated algorithms. Examples include camera misalignment that can cause some scene elements to go out of focus for wide-area scenes or download errors that can be misinterpreted as an obstruction. Therefore, the sequence in which the sub-algorithms are utilized is also important. This paper presents an overview of these problems along with no-reference and reduced reference image and video quality solutions to detect and classify such faults.

Using visual analytics model for pattern matching in surveillance data

Mohammad S. Habibi

Show abstract

In a persistent surveillance system huge amount of data is collected continuously and significant details are labeled for future references. In this paper a method to summarize video data as a result of identifying events based on these tagged information is explained, leading to concise description of behavior within a section of extended recordings. An efficient retrieval of various events thus becomes the foundation for determining a pattern in surveillance system observations, both in its extended and fragmented versions. The patterns consisting of spatiotemporal semantic contents are extracted and classified by application of video data mining on generated ontology, and can be matched based on analysts interest and rules set forth for decision making. The proposed extraction and classification method used in this paper uses query by example for retrieving similar events containing relevant features, and is carried out by data aggregation. Since structured data forms majority of surveillance information this Visual Analytics model employs KD-Tree approach to group patterns in variant space and time, thus making it convenient to identify and match any abnormal burst of pattern detected in a surveillance video. Several experimental video were presented to viewers to analyze independently and were compared with the results obtained in this paper to demonstrate the efficiency and effectiveness of the proposed technique.

Situation exploration in a persistent surveillance system with multidimensional data

Mohammad S. Habibi

Show abstract

There is an emerging need for fusing hard and soft sensor data in an efficient surveillance system to provide accurate estimation of situation awareness. These mostly abstract, multi-dimensional and multi-sensor data pose a great challenge to the user in performing analysis of multi-threaded events efficiently and cohesively. To address this concern an interactive Visual Analytics (VA) application is developed for rapid assessment and evaluation of different hypotheses based on context-sensitive ontology spawn from taxonomies describing human/human and human/vehicle/object interactions. A methodology is described here for generating relevant ontology in a Persistent Surveillance System (PSS) and demonstrates how they can be utilized in the context of PSS to track and identify group activities pertaining to potential threats. The proposed VA system allows for visual analysis of raw data as well as metadata that have spatiotemporal representation and content-based implications. Additionally in this paper, a technique for rapid search of tagged information contingent to ranking and confidence is explained for analysis of multi-dimensional data. Lastly the issue of uncertainty associated with processing and interpretation of heterogeneous data is also addressed.

Vehicle classification for road tunnel surveillance

Andrés Frías-Velázquez, Peter Van Hese, Aleksandra Pižurica, et al.

Show abstract

Vehicle classification for tunnel surveillance aims to not only retrieve vehicle class statistics, but also prevent accidents by recognizing vehicles carrying dangerous goods. In this paper, we describe a method to classify vehicle images that experience different geometrical variations and challenging photometrical conditions such as those found in road tunnels. Unlike previous approaches, we propose a classification method that does not rely on the length and height estimation of the vehicles. Alternatively, we propose a novel descriptor based on trace transform signatures to extract salient and non-correlated information of the vehicle images. Also, we propose a metric that measures the complexity of the vehicles’ shape based on corner point detection. As a result, these features describe the vehicle’s appearance and shape complexity independently of the scale, pose, and illumination conditions. Experiments with vehicles captured from three different cameras confirm the saliency and robustness of the features proposed, achieving an overall accuracy of 97.5% for the classification of four different vehicle classes. For vehicles transporting dangerous goods, our classification scheme achieves an average recall of 97.6% at a precision of 98.6% for the combination of lorries and tankers, which is a very good result considering the scene conditions.

Vehicle presence analysis for law enforcement applications and parking lot management

Yuriy Lipetski, Oliver Sidla

Show abstract

The efficient and robust detection of the presence of vehicles in restricted parking areas is important for applications in law enforcement as well as for the enforcement of parking rules on private property. We present our work towards this goal aimed at the application of vehicle detection in urban environments. The method is to be suited for smart cameras which have to operate autonomously over extended periods of time. Our system is developed as part of a bigger research effort which combines onsite vehicle presence detection and an associated web management system which is intended to monitor, steer and reroute delivery vehicles.

Video-based parking occupancy detection

Michael Deruytter, Kevin Anckaert

Show abstract

Outdoor vehicle detection is one of the most challenging tasks in the world of Intelligent Transportation Systems (ITS). Today, a substantial amount of different technologies such as video detection, microwave radar detection and electromagnetic loop detection are used worldwide to detect vehicles for ITS applications such as traffic signal control, traffic flow measurement, automatic incident detection and many more. A growth in parking management applications has resulted in technologies being adapted to respond to different environments with different needs. This paper will present a video-based approach to detect vehicles at parking areas.

Motorcycle detection and counting using stereo camera, IR camera, and microphone array

Bo Ling, David R. P. Gibson, Dan Middleton

Show abstract

Detection, classification, and characterization are the key to enhancing motorcycle safety, motorcycle operations and motorcycle travel estimation. Average motorcycle fatalities per Vehicle Mile Traveled (VMT) are currently estimated at 30 times those of auto fatalities. Although it has been an active research area for many years, motorcycle detection still remains a challenging task. Working with FHWA, we have developed a hybrid motorcycle detection and counting system using a suite of sensors including stereo camera, thermal IR camera and unidirectional microphone array. The IR thermal camera can capture the unique thermal signatures associated with the motorcycle’s exhaust pipes that often show bright elongated blobs in IR images. The stereo camera in the system is used to detect the motorcyclist who can be easily windowed out in the stereo disparity map. If the motorcyclist is detected through his or her 3D body recognition, motorcycle is detected. Microphones are used to detect motorcycles that often produce low frequency acoustic signals. All three microphones in the microphone array are placed in strategic locations on the sensor platform to minimize the interferences of background noises from sources such as rain and wind. Field test results show that this hybrid motorcycle detection and counting system has an excellent performance.

Vehicle-triggered video compression/decompression for fast and efficient searching in large video databases

Orhan Bulan, Edgar A. Bernal, Robert P. Loce, et al.

Show abstract

Video cameras are widely deployed along city streets, interstate highways, traffic lights, stop signs and toll booths by entities that perform traffic monitoring and law enforcement. The videos captured by these cameras are typically compressed and stored in large databases. Performing a rapid search for a specific vehicle within a large database of compressed videos is often required and can be a time-critical life or death situation. In this paper, we propose video compression and decompression algorithms that enable fast and efficient vehicle or, more generally, event searches in large video databases. The proposed algorithm selects reference frames (i.e., I-frames) based on a vehicle having been detected at a specified position within the scene being monitored while compressing a video sequence. A search for a specific vehicle in the compressed video stream is performed across the reference frames only, which does not require decompression of the full video sequence as in traditional search algorithms. Our experimental results on videos captured in a local road show that the proposed algorithm significantly reduces the search space (thus reducing time and computational resources) in vehicle search tasks within compressed video streams, particularly those captured in light traffic volume conditions.

Vehicle occupancy detection camera position optimization using design of experiments and standard image references

Peter Paul, Martin Hoover, Mojgan Rabbani

Show abstract

Camera positioning and orientation is important to applications in domains such as transportation since the objects to be imaged vary greatly in shape and size. In a typical transportation application that requires capturing still images, inductive loops buried in the ground or laser trigger sensors are used when a vehicle reaches the image capture zone to trigger the image capture system. The camera in such a system is in a fixed position pointed at the roadway and at a fixed orientation. Thus the problem is to determine the optimal location and orientation of the camera when capturing images from a wide variety of vehicles. Methods from Design for Six Sigma, including identifying important parameters and noise sources and performing systematically designed experiments (DOE) can be used to determine an effective set of parameter settings for the camera position and orientation under these conditions. In the transportation application of high occupancy vehicle lane enforcement, the number of passengers in the vehicle is to be counted. Past work has described front seat vehicle occupant counting using a camera mounted on an overhead gantry looking through the front windshield in order to capture images of vehicle occupants. However, viewing rear seat passengers is more problematic due to obstructions including the vehicle body frame structures and seats. One approach is to view the rear seats through the side window. In this situation the problem of optimally positioning and orienting the camera to adequately capture the rear seats through the side window can be addressed through a designed experiment. In any automated traffic enforcement system it is necessary for humans to be able to review any automatically captured digital imagery in order to verify detected infractions. Thus for defining an output to be optimized for the designed experiment, a human defined standard image reference (SIR) was used to quantify the quality of the line-of-sight to the rear seats of the vehicle. The DOE-SIR method was exercised for determining the optimal camera position and orientation for viewing vehicle rear seats over a variety of vehicle types. The resulting camera geometry was used on public roadway image capture resulting in over 95% acceptable rear seat images for human viewing.

Detection of vehicle occupants in HOV lanes: exploration of image sensing for detection of vehicle occupants

Wayne Daley, Colin Usher, Omar Arif, et al.

Show abstract

One technique to better utilize existing roadway infrastructure is the use of HOV and HOT lanes. Technology to monitor the use of these lanes would assist managers and planners in efficient roadway operation. There are no available occupancy detection systems that perform at acceptable levels of accuracy in permanent field installations. The main goal of this research effort is to assess the possibility of determining passenger use with imaging technology. This is especially challenging because of recent changes in the glass types used by car manufacturers to reduce the solar heat load on the vehicles. We describe in this research a system to use multi-plane imaging with appropriate wavelength selection for sensing passengers in the front and rear seats of vehicles travelling in HOV/HOT lanes. The process of determining the geometric relationships needed, the choice of illumination wavelengths, and the appropriate sensors are described, taking into account driver safety considerations. The paper will also cover the design and implementation of the software for performing the window detection and people counting utilizing both image processing and machine learning techniques. The integration of the final system prototype will be described along with the performance of the system operating at a representative location.

Joint histogram between color and local extrema patterns for object tracking

Subrahmanyam Murala, Q.M. Jonathan Wu, R. Balasubramanian, et al.

Show abstract

In this paper, a new algorithm meant for object tracking application is proposed using local extrema patterns (LEP) and color features. The standard local binary pattern (LBP) encodes the relationship between reference pixel and its surrounding neighbors by comparing gray level values. The proposed method differs from the existing LBP in a manner that it extracts the edge information based on local extrema between center pixel and its neighbors in an image. Further, the joint histogram between RGB color channels and LEP patterns has been build which is used as a feature vector in object tracking. The performance of the proposed method is compared with Ning et al. on three benchmark video sequences. The results after being investigated proposed method show a significant improvement in object tracking application as compared to Ning et al.

Adaptive real-time road detection using VRay and A-MSRG in complex environments

SunHee Weon, SungIl Joo, HyungIl Choi

Show abstract

This paper proposes an adaptive detection method for detecting road regions that have ambiguous boundaries within natural images. The proposed method achieves reliable partitioning of the road region within a natural environment where noise is present through the following two stages. In the first stage, we separate out candidate regions of the road by detecting the road’s boundary through the Radial region split method using VRay(Vanishing point-constrained ray). In the second stage, we apply so called Adaptive-Multiple Seed Region Growing(A-MSRG) approach into the separated candidate region in order to identify the road region in real time. The A-MSRG is an enhanced version of the Seed Region Growing(SRG). For performance evaluation, this study assessed efficiency based on the results of region detection achieved through the proposed combination of the Radial region split method and A-MSRG. We also conducted comparisons against the existing SRG and MSRG methods to confirm the validity of the proposed method.

Intensity estimation method of LED array for visible light communication

Takanori Ito, Tomohiro Yendo, Shintaro Arai, et al.

Show abstract

This paper focuses on a road-to-vehicle visible light communication (VLC) system using LED traffic light as the transmitter and camera as the receiver. The traffic light is composed of a hundred of LEDs on two dimensional plain. In this system, data is sent as two dimensional brightness patterns by controlling each LED of the traffic light individually, and they are received as images by the camera. Here, there are problems that neighboring LEDs on the received image are merged due to less number of pixels in case that the receiver is distant from the transmitter, and/or due to blurring by defocus of the camera. Because of that, bit error rate (BER) increases due to recognition error of intensity of LEDs To solve the problem, we propose a method that estimates the intensity of LEDs by solving the inverse problem of communication channel characteristic from the transmitter to the receiver. The proposed method is evaluated by BER characteristics which are obtained by computer simulation and experiments. In the result, the proposed method can estimate with better accuracy than the conventional methods, especially in case that the received image is blurred a lot, and the number of pixels is small.

An improved background segmentation method for ghost removals

Waqas Hassan, Philip Birch, Rupert Young, et al.

Show abstract

Video surveillance has become common for the maintenance of security in a wide variety of applications. However, the increasingly large amounts of data produced from multiple video camera feeds is making it increasingly difficult for human operators to monitor the imagery for activities likely to give rise to threats. This has led to the development of different automated surveillance systems that can detect, track and analyze video sequences both online and offline and report potential security risks. Segmentation of objects is an important part of such systems and numerous background segmentation techniques have been used in the literature. One common challenge faced by these techniques is adaption in different lighting environments. A new improved background segmentation technique has been presented in this where the main focus is to accurately segment potentially important objects by reducing the overall false detection rate. Historic edge maps and tracking results are analyzed for this purpose. The idea is to obtain an up to date edge map of the segmented region highlighted as foreground areas and compare them with the stored results. The edge maps are obtained using a novel adaptive edge orientation based technique where orientation of the edge is used. Experimental results have shown that the discussed technique gives over 85% matching results even in severe lighting changes.

Retail video analytics: an overview and survey

Jonathan Connell, Quanfu Fan, Prasad Gabbur, et al.

Show abstract

Today retail video analytics has gone beyond the traditional domain of security and loss prevention by providing retailers insightful business intelligence such as store traffic statistics and queue data. Such information allows for enhanced customer experience, optimized store performance, reduced operational costs, and ultimately higher profitability. This paper gives an overview of various camera-based applications in retail as well as the state-ofthe- art computer vision techniques behind them. It also presents some of the promising technical directions for exploration in retail video analytics.

Video-CRM: understanding customer behaviors in stores

Ismail Haritaoglu, Myron Flickner, David Beymer

Show abstract

This paper describes two real-time computer vision systems created 10 years ago that detect and track people in stores to obtain insights of customer behavior while shopping. The first system uses a single color camera to identify shopping groups in the checkout line. Shopping groups are identified by analyzing the inter-body distances coupled with the cashier's activities to detect checkout transactions start and end times. The second system uses multiple overhead narrow-baseline stereo cameras to detect and track people, their body posture and parts to understand customer interactions with products such as "customer picking a product from a shelf". In pilot studies both systems demonstrated real-time performance and sufficient accuracy to enable more detailed understanding of customer behavior and extract actionable real-time retail analytics.

Human object articulation for CCTV video forensics

I. Zafar, M. Fraz, Eran A. Edirisinghe

Show abstract

In this paper we present a system which is focused on developing algorithms for automatic annotation/articulation of humans passing through a surveillance camera in a way useful for describing a person/criminal by a crime scene witness. Each human is articulated/annotated based on two appearance features: 1. primary colors of clothes in the head, body and legs region. 2. presence of text/logo on the clothes. The annotation occurs after a robust foreground extraction based on a modified approach to Gaussian Mixture model and detection of human from segmented foreground images. The proposed pipeline consists of a preprocessing stage where we improve color quality of images using a basic color constancy algorithm and further improve the results using a proposed post-processing method. The results show a significant improvement to the illumination of the video frames. In order to annotate color information for human clothes, we apply 3D Histogram analysis (with respect to Hue, Saturation and Value) on HSV converted image regions of human body parts along with extrema detection and thresholding to decide the dominant color of the region. In order to detect text/logo on the clothes as another feature to articulate humans, we begin with the extraction of connected components of enhanced horizontal, vertical and diagonal edges in the frames. These candidate regions are classified as text or non-text on the bases of their Local Energy based Shape Histogram (LESH) features combined with KL divergence as classification criteria. To detect humans, a novel technique has been proposed that uses a combination of Histogram of Oriented Gradients (HOG) and Contourlet transform based Local Binary Patterns (LBP) with Adaboost as classifier. Initial screening of foreground objects is performed by using HOG features. To further eliminate the false positives due to noise form background and improve results, we apply Contourlet-LBP feature extraction on the images. In the proposed method, we extract the LBP feature descriptor for Contourlet transformed high pass sub-images from vertical and diagonal directional bands. In the final stage, extracted Contourlet-LBP descriptors are applied to Adaboost for classification. The proposed frame work showed fairly fine performance when tested on a CCTV test dataset.

Video Surveillance and Transportation Imaging Applications

Volume Details

Table of Contents

Table of Contents