Video Surveillance and Transportation Imaging Applications 2015

Front Matter: Volume 9407

Show abstract

This PDF file contains the front matter associated with SPIE Proceedings Volume 9407 including the Title Page, Copyright information, Table of Contents, Introduction, and Conference Committee listing.

Road user tracker based on robust regression with GNC and preconditioning

Andreas Leich, Marek Junghans, Karsten Kozempel, et al.

Show abstract

In this paper an early vision tracking algorithm particularly adapted to the tracking of road users in video image sequences is presented. The algorithm is an enhanced version of the regression based motion estimator in Lucas-Kanade style. Robust regression algorithms work in the presence of outliers, while one distinct property of the proposed algorithm is that it can handle with datasets including 90% outliers. Robust regression involves finding the global minimum of a cost function, where the cost function measures if the motion model is conform with the measured data. The minimization task can be addressed with the graduated non convexity (GNC) heuristics. GNC is a scale space analysis of the cost function in parameter space. Although the approach is elegant and reasonable, several attempts to use GNC for solving robust regression tasks known from literature failed in the past. The main improvement of the proposed method compared with prior approaches is the use of a preconditioning technique to avoid GNC from getting stuck in a local minimum.

Vehicle speed estimation using a monocular camera

Wencheng Wu, Vladimir Kozitsky, Martin E. Hoover, et al.

Show abstract

In this paper, we describe a speed estimation method for individual vehicles using a monocular camera. The system includes the following: (1) object detection, which detects an object of interest based on a combination of motion detection and object classification and initializes tracking of the object if detected, (2) object tracking, which tracks the object over time based on template matching and reports its frame-to-frame displacement in pixels, (3) speed estimation, which estimates vehicle speed by converting pixel displacements to distances traveled along the road, (4) object height estimation, which estimates the distance from tracked point(s) of the object to the road plane, and (5) speed estimation with height-correction, which adjusts previously estimated vehicle speed based on estimated object and camera heights. We demonstrate the effectiveness of our algorithm on 30/60 fps videos of 300 vehicles travelling at speeds ranging from 30 to 60 mph. The 95-percentile speed estimation error was within ±3% when compared to a lidar-based reference instrument. Key contributions of our method include (1) tracking a specific set of feature points of a vehicle to ensure a consistent measure of speed, (2) a high accuracy camera calibration/characterization method, which does not interrupt regular traffic of the site, and (3) a license plate and camera height estimation method for improving accuracy of individual vehicle speed estimation. Additionally, we examine the impact of spatial resolution on accuracy of speed estimation and utilize that knowledge to improve computation efficiency. We also improve accuracy and efficiency of tracking over standard methods via dynamic update of templates and predictive local search.

Detecting and extracting identifiable information from vehicles in videos

Siddharth Roheda, Hari Kalva, Mehul Naik

Show abstract

This paper presents a system to detect and extract identifiable information such as license plates, make, model, color, and bumper stickers present on vehicles. The goal of this work is to develop a system that automatically describes a vehicle just as a person would. This information can be used to improve traffic surveillance systems. The presented solution relies on efficient segmentation and structure of license plates to identify and extract information from vehicles. The system was evaluated on videos captures on Florida highways and is expected to work in other regions with little or no modifications. Results show that license plate was successfully segmented 92% of the cases, the make and the model of the car were segmented out and in 93% of the cases and bumper stickers were segmented in 92.5% of the cases. Over all recognition accuracy was 87%.

Efficient integration of spectral features for vehicle tracking utilizing an adaptive sensor

Burak Uzkent, Matthew J. Hoffman, Anthony Vodacek

Show abstract

Object tracking in urban environments is an important and challenging problem that is traditionally tackled using visible and near infrared wavelengths. By inserting extended data such as spectral features of the objects one can improve the reliability of the identification process. However, huge increase in data created by hyperspectral imaging is usually prohibitive. To overcome the complexity problem, we propose a persistent air-to-ground target tracking system inspired by a state-of-the-art, adaptive, multi-modal sensor. The adaptive sensor is capable of providing panchromatic images as well as the spectra of desired pixels. This addresses the data challenge of hyperspectral tracking by only recording spectral data as needed. Spectral likelihoods are integrated into a data association algorithm in a Bayesian fashion to minimize the likelihood of misidentification. A framework for controlling spectral data collection is developed by incorporating motion segmentation information and prior information from a Gaussian Sum filter (GSF) movement predictions from a multi-model forecasting set. An intersection mask of the surveillance area is extracted from OpenStreetMap source and incorporated into the tracking algorithm to perform online refinement of multiple model set. The proposed system is tested using challenging and realistic scenarios generated in an adverse environment.

Detection and recognition of road markings in panoramic images

Cheng Li, Ivo M. Creusen, Lykele Hazelhoff, et al.

Show abstract

Detection of road lane markings is attractive for practical applications such as advanced driver assistance systems and road maintenance. This paper proposes a system to detect and recognize road lane markings in panoramic images. The system can be divided into four stages. First, an inverse perspective mapping is applied to the original panoramic image to generate a top-view road view, in which the potential road markings are segmented based on their intensity difference compared to the surrounding pixels. Second, a feature vector of each potential road marking segment is extracted by calculating the Euclidean distance between the center and the boundary at regular angular steps. Third, the shape of each segment is classified using a Support Vector Machine (SVM). Finally, by modeling the lane markings, previous falsely detected segments can be rejected based on their orientation and position relative to the lane markings. Our experiments show that the system is promising and is capable of recognizing 93%, 95% and 91% of striped line segments, blocks and arrows respectively, as well as 94% of the lane markings.

Topview stereo: combining vehicle-mounted wide-angle cameras to a distance sensor array

Sebastian Houben

Show abstract

The variety of vehicle-mounted sensors in order to fulfill a growing number of driver assistance tasks has become a substantial factor in automobile manufacturing cost. We present a stereo distance method exploiting the overlapping field of view of a multi-camera fisheye surround view system, as they are used for near-range vehicle surveillance tasks, e.g. in parking maneuvers. Hence, we aim at creating a new input signal from sensors that are already installed. Particular properties of wide-angle cameras (e.g. hanging resolution) demand an adaptation of the image processing pipeline to several problems that do not arise in classical stereo vision performed with cameras carefully designed for this purpose. We introduce the algorithms for rectification, correspondence analysis, and regularization of the disparity image, discuss reasons and avoidance of the shown caveats, and present first results on a prototype topview setup.

A machine learning approach for detecting cell phone usage

Beilei Xu, Robert P. Loce

Show abstract

Cell phone usage while driving is common, but widely considered dangerous due to distraction to the driver. Because of the high number of accidents related to cell phone usage while driving, several states have enacted regulations that prohibit driver cell phone usage while driving. However, to enforce the regulation, current practice requires dispatching law enforcement officers at road side to visually examine incoming cars or having human operators manually examine image/video records to identify violators. Both of these practices are expensive, difficult, and ultimately ineffective. Therefore, there is a need for a semi-automatic or automatic solution to detect driver cell phone usage. In this paper, we propose a machine-learning-based method for detecting driver cell phone usage using a camera system directed at the vehicle’s front windshield. The developed method consists of two stages: first, the frontal windshield region localization using the deformable part model (DPM), next, we utilize Fisher vectors (FV) representation to classify the driver’s side of the windshield into cell phone usage violation and non-violation classes. The proposed method achieved about 95% accuracy with a data set of more than 100 images with drivers in a variety of challenging poses with or without cell phones.

Driver alertness detection using Google Glasses

Kuang-Yu Liu, Chung-Lin Huang

Show abstract

This paper proposes an intelligent vehicle system (ITS) to monitor the driver driving behavior. Based on the first-person vision (FPV) technology (or Google glasses), our system can detect the vehicle exterior/interior scene from driver’s viewpoint and estimate driver gazing direction. First, we use “bag of words” image classification approach by applying FAST and BRIEF feature descriptor in the dataset. Then, we use vocabulary dictionary to encode an input image as feature vectors. Finally, we apply SVM classifier to identify whether the input image is vehicle interior scene or not to monitor the driver driving attention. Second, we find the correspondence between the images of the Google glasses and the camera mounted on the wind shield of the vehicle to estimate the gazing direction of the driver. In the experiments, we illustrate the effectiveness of our system.

Close to real-time robust pedestrian detection and tracking

Y. Lipetski, G. Loibner, O. Sidla

Show abstract

Fully automated video based pedestrian detection and tracking is a challenging task with many practical and important applications. We present our work aimed to allow robust and simultaneously close to real-time tracking of pedestrians. The presented approach is stable to occlusions, lighting conditions and is generalized to be applied on arbitrary video data. The core tracking approach is built upon tracking-by-detections principle. We describe our cascaded HOG detector with successive CNN verification in detail. For the tracking and re-identification task, we did an extensive analysis of appearance based features as well as their combinations. The tracker was tested on many hours of video data for different scenarios; the results are presented and discussed.

Development of a portable bicycle/pedestrian monitoring system for safety enhancement

Colin Usher, W. D. R. Daley

Show abstract

Pedestrians involved in roadway accidents account for nearly 12 percent of all traffic fatalities and 59,000 injuries each year. Most injuries occur when pedestrians attempt to cross roads, and there have been noted differences in accident rates midblock vs. at intersections. Collecting data on pedestrian behavior is a time consuming manual process that is prone to error. This leads to a lack of quality information to guide the proper design of lane markings and traffic signals to enhance pedestrian safety. Researchers at the Georgia Tech Research Institute are developing and testing an automated system that can be rapidly deployed for data collection to support the analysis of pedestrian behavior at intersections and midblock crossings with and without traffic signals. This system will analyze the collected video data to automatically identify and characterize the number of pedestrians and their behavior. It consists of a mobile trailer with four high definition pan-tilt cameras for data collection. The software is custom designed and uses state of the art commercial pedestrian detection algorithms. We will be presenting the system hardware and software design, challenges, and results from the preliminary system testing. Preliminary results indicate the ability to provide representative quantitative data on pedestrian motion data more efficiently than current techniques.

Active gated imaging for automotive safety applications

Yoav Grauer, Ezri Sonn

Show abstract

The paper presents the Active Gated Imaging System (AGIS), in relation to the automotive field. AGIS is based on a fast gated-camera equipped with a unique Gated-CMOS sensor, and a pulsed Illuminator, synchronized in the time domain to record images of a certain range of interest which are then processed by computer vision real-time algorithms. In recent years we have learned the system parameters which are most beneficial to night-time driving in terms of; field of view, illumination profile, resolution and processing power. AGIS provides also day-time imaging with additional capabilities, which enhances computer vision safety applications. AGIS provides an excellent candidate for camera-based Advanced Driver Assistance Systems (ADAS) and the path for autonomous driving, in the future, based on its outstanding low/high light-level, harsh weather conditions capabilities and 3D potential growth capabilities.

Arbitrary object localization and tracking via multiple-camera surveillance system embedded in a parking garage

André Ibisch, Sebastian Houben, Matthias Michael, et al.

Show abstract

We illustrate a multiple-camera surveillance system installed in a parking garage to detect arbitrary moving objects. Our system is real-time capable and computes precise and reliable object positions. These objects are tracked to warn of collisions, e.g. between vehicles, pedestrians or other vehicles. The proposed system is based on multiple grayscale cameras connected by a local area network.

Each camera shares its field of view with other cameras to handle occlusions and to enable multi-view vision. We aim at using already installed hardware found in many modern public parking garages. The system’s pipeline starts with the synchronized image capturing process separately for each camera. In the next step, moving objects are selected by a foreground segmentation approach. Subsequently, the foreground objects from a single camera are transformed into view rays in a common world coordinate system and are joined to receive plausible object hypotheses. This transformation requires a one-time initial intrinsic and extrinsic calibration beforehand. Afterwards, these view rays are filtered temporally to arrive at continuous object tracks. In our experiments we used a precise LIDAR-based reference system to evaluate and quantify the proposed system’s precision with a mean localization accuracy of 0.24m for different scenarios.

Unsupervised classification and visual representation of situations in surveillance videos using slow feature analysis for situation retrieval applications

Frank Pagel

Show abstract

Today, video surveillance systems produce thousands of terabytes of data. This source of information can be very valuable, as it contains spatio-temporal information about abnormal, similar or periodic activities. However, a search for certain situations or activities in unstructured large-scale video footage can be exhausting or even pointless. Searching surveillance video footage is extremely difficult due to the apparent similarity of situations, especially for human observers. In order to keep this amount manageable and hence usable, this paper aims at clustering situations regarding their visual content as well as motion patterns. Besides standard image content descriptors like HOG, we present and investigate novel descriptors, called Franklets, which explicitly encode motion patterns for certain image regions. Slow feature analysis (SFA) will be performed for dimension reduction based on the temporal variance of the features. By reducing the dimension with SFA, a higher feature discrimination can be reached compared to standard PCA dimension reduction. The effects of dimension reduction via SFA will be investigated in this paper. Cluster results on real data from the Hamburg Harbour Anniversary 2014 will be presented with both, HOG feature descriptors and Franklets. Furthermore, we could show that by using SFA an improvement to standard PCA techniques could be achieved. Finally, an application to visual clustering with self-organizing maps will be introduced.

An intelligent crowdsourcing system for forensic analysis of surveillance video

Khalid Tahboub, Neeraj Gadgil, Javier Ribera, et al.

Show abstract

Video surveillance systems are of a great value for public safety. With an exponential increase in the number of cameras, videos obtained from surveillance systems are often archived for forensic purposes. Many automatic methods have been proposed to do video analytics such as anomaly detection and human activity recognition. However, such methods face significant challenges due to object occlusions, shadows and scene illumination changes. In recent years, crowdsourcing has become an effective tool that utilizes human intelligence to perform tasks that are challenging for machines. In this paper, we present an intelligent crowdsourcing system for forensic analysis of surveillance video that includes the video recorded as a part of search and rescue missions and large-scale investigation tasks. We describe a method to enhance crowdsourcing by incorporating human detection, re-identification and tracking. At the core of our system, we use a hierarchal pyramid model to distinguish the crowd members based on their ability, experience and performance record. Our proposed system operates in an autonomous fashion and produces a final output of the crowdsourcing analysis consisting of a set of video segments detailing the events of interest as one storyline.

Hierarchical video surveillance architecture: a chassis for video big data analytics and exploration

Sola O. Ajiboye, Philip Birch, Christopher Chatwin, et al.

Show abstract

There is increasing reliance on video surveillance systems for systematic derivation, analysis and interpretation of the data needed for predicting, planning, evaluating and implementing public safety. This is evident from the massive number of surveillance cameras deployed across public locations. For example, in July 2013, the British Security Industry Association (BSIA) reported that over 4 million CCTV cameras had been installed in Britain alone. The BSIA also reveal that only 1.5% of these are state owned. In this paper, we propose a framework that allows access to data from privately owned cameras, with the aim of increasing the efficiency and accuracy of public safety planning, security activities, and decision support systems that are based on video integrated surveillance systems. The accuracy of results obtained from government-owned public safety infrastructure would improve greatly if privately owned surveillance systems ‘expose’ relevant video-generated metadata events, such as triggered alerts and also permit query of a metadata repository. Subsequently, a police officer, for example, with an appropriate level of system permission can query unified video systems across a large geographical area such as a city or a country to predict the location of an interesting entity, such as a pedestrian or a vehicle. This becomes possible with our proposed novel hierarchical architecture, the Fused Video Surveillance Architecture (FVSA). At the high level, FVSA comprises of a hardware framework that is supported by a multi-layer abstraction software interface. It presents video surveillance systems as an adapted computational grid of intelligent services, which is integration-enabled to communicate with other compatible systems in the Internet of Things (IoT).

Gender classification in low-resolution surveillance video: in-depth comparison of random forests and SVMs

Christopher D. Geelen, Rob G. J. Wijnhoven, Gijs Dubbelman, et al.

Show abstract

This research considers gender classification in surveillance environments, typically involving low-resolution images and a large amount of viewpoint variations and occlusions. Gender classification is inherently difficult due to the large intra-class variation and interclass correlation. We have developed a gender classification system, which is successfully evaluated on two novel datasets, which realistically consider the above conditions, typical for surveillance. The system reaches a mean accuracy of up to 90% and approaches our human baseline of 92.6%, proving a high-quality gender classification system. We also present an in-depth discussion of the fundamental differences between SVM and RF classifiers. We conclude that balancing the degree of randomization in any classifier is required for the highest classification accuracy. For our problem, an RF-SVM hybrid classifier exploiting the combination of HSV and LBP features results in the highest classification accuracy of 89.9 0.2%, while classification computation time is negligible compared to the detection time of pedestrians.

Detection and handling of occlusion in an object detection system

Ron M. G. op het Veld, R. G. J. Wijnhoven, Y. Bondarev, et al.

Show abstract

Object detection is an important technique for video surveillance applications. Although different detection algorithms were proposed, they all have problems in detecting occluded objects. In this paper, we propose a novel system for occlusion handling and integrate this in a sliding-window detection framework using HOG features and linear classification. The occlusion handling is obtained by applying multiple classifiers, each covering a different level of occlusion and focusing on the non-occluded object parts. Experiments show that our approach based on 17 classifiers, obtains an increase of 8% in detection performance. To limit computational complexity, we propose a cascaded implementation that only increases the computational cost by 3.4%. Although the paper presents results for pedestrian detection, our approach is not limited to this object class. Finally, our system does not need an additional dataset for training, covering all possible types of occlusions.

Spatio-temporal action localization for human action recognition in large dataset

Sameh Megrhi, Marwa Jmal, Azeddine Beghdadi, et al.

Show abstract

Human action recognition has drawn much attention in the field of video analysis. In this paper, we develop a human action detection and recognition process based on the tracking of Interest Points (IP) trajectory. A pre-processing step that performs spatio-temporal action detection is proposed. This step uses optical flow along with dense speed-up-robust-features (SURF) in order to detect and track moving humans in moving fields of view. The video description step is based on a fusion process that combines displacement and spatio-temporal descriptors. Experiments are carried out on the big data-set UCF-101. Experimental results reveal that the proposed techniques achieve better performances compared to many existing state-of-the-art action recognition approaches.

Person identification from streaming surveillance video using mid-level features from joint action-pose distribution

Binu M. Nair, Vijayan K. Asari

Show abstract

We propose a real time person identification algorithm for surveillance based scenarios from low-resolution streaming video, based on mid-level features extracted from the joint distribution of various types of human actions and human poses. The proposed algorithm uses the combination of an auto-encoder based action association framework which produces per-frame probability estimates of the action being performed, and a pose recognition framework which gives per-frame body part locations. The main focus in this manuscript is to effectively com- bine these per-frame action probability estimates and pose trajectories from a short temporal window to obtain mid-level features. We demonstrate that these mid-level features captures the variation in the action performed with respect to an individual and can be used to distinguish one person from the next. Preliminary analysis on the KTH action dataset where each sequence is annotated with a specific person and a specific action is provided and shows some interesting results which verify this concept.

Scene projection by non-linear transforms to a geo-referenced map for situational awareness

Kevin C. Krucki, Vijay K. Asari

Show abstract

There are many transportation and surveillance cameras currently in use in major cities that are close to the ground and show scenes from a perspective point of view. It can be difficult to follow an object of interest across multiple cameras if many of these cameras are in the same area due to the different orientations of these cameras. This is especially true when compared to wide area aerial surveillance (WAAS). To correct this problem, this research provides a method to non-linearly transform current camera perspective views into real world coordinates that can be placed on a map. Using a perspective transformation, perspective views are transformed into approximate WAAS views and placed on a map. All images are then on the same plane, allowing a user to follow an object of interest across several camera views on a map. While these transformed images will not fit every feature of the map as WAAS images would, the most important aspects of a scene (i.e. roads, cars, people, sidewalks etc.) are accurate enough to give the user situational awareness. Our algorithm is proven to be successful when tested on cameras from the downtown area of Dayton, Ohio.

A vision-based approach for tramway rail extraction

Matthijs H. Zwemer, Dennis W. J. M. van de Wouw, Egbert G. T. Jaspers, et al.

Show abstract

The growing traffic density in cities fuels the desire for collision assessment systems on public transportation. For this application, video analysis is broadly accepted as a cornerstone. For trams, the localization of tramway tracks is an essential ingredient of such a system, in order to estimate a safety margin for crossing traffic participants. Tramway-track detection is a challenging task due to the urban environment with clutter, sharp curves and occlusions of the track. In this paper, we present a novel and generic system to detect the tramway track in advance of the tram position. The system incorporates an inverse perspective mapping and a-priori geometry knowledge of the rails to find possible track segments. The contribution of this paper involves the creation of a new track reconstruction algorithm which is based on graph theory. To this end, we define track segments as vertices in a graph, in which edges represent feasible connections. This graph is then converted to a max-cost arborescence graph, and the best path is selected according to its location and additional temporal information based on a maximum a-posteriori estimate. The proposed system clearly outperforms a railway-track detector. Furthermore, the system performance is validated on 3,600 manually annotated frames. The obtained results are promising, where straight tracks are found in more than 90% of the images and complete curves are still detected in 35% of the cases.

Exploration towards the modeling of gable-roofed buildings using a combination of aerial and street-level imagery

Ivo Creusen, Lykele Hazelhoff, Peter H. N. de With

Show abstract

Extraction of residential building properties is helpful for numerous applications, such as computer-guided feasibility analysis for solar panel placement, determination of real-estate taxes and assessment of real-estate insurance policies. Therefore, this work explores the automated modeling of buildings with a gable roof (the most common roof type within Western Europe), based on a combination of aerial imagery and street-level panoramic images. This is a challenging task, since buildings show large variations in shape, dimensions and building extensions, and may additionally be captured under non-ideal lighting conditions. The aerial images feature a coarse overview of the building due to the large capturing distance. The building footprint and an initial estimate of the building height is extracted based on the analysis of stereo aerial images. The estimated model is then refined using street-level images, which feature higher resolution and enable more accurate measurements, however, displaying a single building side only. Initial experiments indicate that the footprint dimensions of the main building can be accurately extracted from aerial images, while the building height is extracted with slightly less accuracy. By combining aerial and street-level images, we have found that the accuracies of these height measurements are significantly increased, thereby improving the overall quality of the extracted building model, and resulting in an average inaccuracy of the estimated volume below 10%.

On improving IED object detection by exploiting scene geometry using stereo processing

Dennis W. J. M. van de Wouw, Gijs Dubbelman, Peter H. N. de With

Show abstract

Detecting changes in the environment with respect to an earlier data acquisition is important for several applications, such as finding Improvised Explosive Devices (IEDs). We explore and evaluate the benefit of depth sensing in the context of automatic change detection, where an existing monocular system is extended with a second camera in a fixed stereo setup. We then propose an alternative frame registration that exploits scene geometry, in particular the ground plane. Furthermore, change characterization is applied to localized depth maps to distinguish between 3D physical changes and shadows, which solves one of the main challenges of a monocular system. The proposed system is evaluated on real-world acquisitions, containing geo-tagged test objects of 18 18 9 cm up to a distance of 60 meters. The proposed extensions lead to a significant reduction of the false-alarm rate by a factor of 3, while simultaneously improving the detection score with 5%.

Visual analysis of trash bin processing on garbage trucks in low resolution video

Oliver Sidla, Gernot Loibner

Show abstract

We present a system for trash can detection and counting from a camera which is mounted on a garbage collection truck. A working prototype has been successfully implemented and tested with several hours of real-world video. The detection pipeline consists of HOG detectors for two trash can sizes, and meanshift tracking and low level image processing for the analysis of the garbage disposal process. Considering the harsh environment and unfavorable imaging conditions, the process works already good enough so that very useful measurements from video data can be extracted. The false positive/false negative rate of the full processing pipeline is about 5-6% at fully automatic operation. Video data of a full day (about 8 hrs) can be processed in about 30 minutes on a standard PC.

Toward creation of interaction models: simple objects-interaction approach

Teresa Hernández-Díaz, Juan Manuel García-Huerta, Alberto Vazquez-Cervantes, et al.

Show abstract

This paper presents a proposal to manage simple-objects interaction in video surveillance system. The proposal consists on locating a set of features in each video frame. Maxima regions from the second Eigen- value of the tensor matrix are used as features. Afterwards, statics features are discarded (labeling as background) and dynamic features are used to represent objects in motion (foreground). Dynamics features are dynamically clustered with k-neighborhood and EM algorithm. The centroid of each cluster locally represents motion objects, and its displacement through time is denoted by displacement of cumulus over several frames. The behavior of cumulus in time help us to model simple object interactions. These primitives can be used in addition to a causal dependencies across time; i.e. cluster division, cluster fusion and cluster motion with respect to the others, offer information of local dynamics which is referred to local interactions. And based on causal dependencies theory, a graph dependence of local centroids behavior can be built. This graph can represent the local interaction model. In experimental section, the approach is tested in several scenarios, extracting simple interaction objects in controlled/not-controlled scenarios.

Compressive sensing based video object compression schemes for surveillance systems

Sathiya Narayanan, Anamitra Makur

Show abstract

In some surveillance videos, successive frames exhibit correlation in the sense that only a small portion changes (object motion). If the foreground moving objects are segmented from the background they can be coded independently requiring far fewer bits compared to frame-based coding. Huang et al proposed a Compressive Sensing (CS) based Video Object Error Coding (CS-VOEC) where the objects are segmented and coded via motion estimation and compensation. Since motion estimation might be computationally intensive, encoder can be kept simple by performing motion estimation the decoder rather than at the encoder. We propose a novel CS based Video Object Compression (CS-VOC) technique having a simple encoder in which the sensing mechanism is applied directly on the segmented moving objects using a CS matrix. At the decoder, the object motion is first estimated so that a CS reconstruction algorithm can efficiently recover the sparse motion-compensated video object error. In addition to simple encoding, simulation results show our coding scheme performs on par with the state-of-the-art CS based video object error coding scheme. If the object segmentation requires more computations, we propose to deploy a distributed CS framework called Distributed Compressive Video Sensing based Video Object Compression (DCVS-VOC) wherein the object segmentation is done only for key frames.

Improved colorization for night vision system based on image splitting

E. Ali, S. P. Kozaitis

Show abstract

The success of a color night navigation system often depends on the accuracy of the colors in the resulting image. Often, small regions can incorrectly adopt the color of large regions simply due to size of the regions. We presented a method to improve the color accuracy of a night navigation system by initially splitting a fused image into two distinct sections before colorization. We split a fused image into two sections, generally road and sky regions, before colorization and processed them separately to obtain improved color accuracy of each region. Using this approach, small regions were colored correctly when compared to not separating regions.

Evaluation of maritime object detection methods for full motion video applications using the PASCAL VOC Challenge framework

Martin Jaszewski, Shibin Parameswaran, Eric Hallenborg, et al.

Show abstract

We present an initial target detection performance evaluation system for the RAPid Image Exploitation Resource (RAPIER) Full Motion Video (RFMV) maritime target tracking software. We test and evaluate four statistical target detection methods using 30 Hz full motion video from aerial platforms. Using appropriate algorithm performance criteria inspired by the PASCAL Visual Object Classes (VOC) Challenge, we address the tradeoffs between detection fidelity and computational speed/throughput.

Person re-identification in UAV videos using relevance feedback

Arne Schumann, Tobias Schuchert

Show abstract

In this paper we present an approach to recognize individual persons in aerial video data. We start from a baseline approach that matches a number of color and texture image features in order to find possible matches to a query person track. This approach is improved by incorporating operator feedback in three ways. First, features are being weighted based on their degree of agreement (correlation) to the operator feedback. Second, a classifier is trained based on the operator feedback. This classifier learns to distinguish the query person from others. Third, we use feedback from past queries to further restrict the search space and improve feature selection as well as provide additional classifier training data. Finally we also investigate active learning strategies to select those results that the operator should label in order to gain the highest possible improvement of accuracy in the next iteration. We evaluate our approach on a publicly available dataset and demonstrate improvements over the baseline as well as over a previous work.

Aerial surveillance based on hierarchical object classification for ground target detection

Alberto Vázquez-Cervantes, Juan-Manuel García-Huerta, Teresa Hernández-Díaz, et al.

Show abstract

Unmanned aerial vehicles have turned important in surveillance application due to the flexibility and ability to inspect and displace in different regions of interest. The instrumentation and autonomy of these vehicles have been increased; i.e. the camera sensor is now integrated. Mounted cameras allow flexibility to monitor several regions of interest, displacing and changing the camera view. A well common task performed by this kind of vehicles correspond to object localization and tracking. This work presents a hierarchical novel algorithm to detect and locate objects. The algorithm is based on a detection-by-example approach; this is, the target evidence is provided at the beginning of the vehicle's route. Afterwards, the vehicle inspects the scenario, detecting all similar objects through UTM-GPS coordinate references. Detection process consists on a sampling information process of the target object. Sampling process encode in a hierarchical tree with different sampling's densities. Coding space correspond to a huge binary space dimension. Properties such as independence and associative operators are defined in this space to construct a relation between the target object and a set of selected features. Different densities of sampling are used to discriminate from general to particular features that correspond to the target. The hierarchy is used as a way to adapt the complexity of the algorithm due to optimized battery duty cycle of the aerial device. Finally, this approach is tested in several outdoors scenarios, proving that the hierarchical algorithm works efficiently under several conditions.

A novel synchronous multi-intensity IR illuminator hardware implementation for nighttime surveillance

Wen Chih Teng, Jen-Hui Chuang

Show abstract

For nighttime surveillance, it’s difficult for an ordinary IR illuminator (which has a fixed light intensity) to generate a one-size-fits-all light condition for objects at different distances. Often, we will get overexposed (underexposed) image for an object which is too close to (too far from) the illuminator/camera in the nighttime. To partly resolve such problems, a multi-intensity IR illuminator was proposed which can emit periodically varying intensities of IR light toward an object. However, two major problems remain to be resolved when using such an illuminator, i.e., the “synchronization issue” for background modeling and the “viewing issue” for manual monitoring. In this paper, we propose a novel synchronous multi-intensity infrared (SMIIR) illuminator hardware design to help address these two issues. Hopefully this innovation will open the door to a new era of nighttime surveillance.

Video Surveillance and Transportation Imaging Applications 2015

Volume Details

Table of Contents

Table of Contents