Detection and tracking of objects from multiple airborne cameras
Video sensor networks play a vital role in unattended wide area surveillance. Most of the computer vision research in this area deals with networks of stationary electro-optical sensors. But recently there has been increased interest in networks using sensors on mobile platforms such as mobile robots, all-terrain vehicles, and unmanned aerial vehicles (UAVs). Here, we present novel computer-vision techniques for automatic object detection and tracking in mobile sensor networks. We use multiple commercial off-the-shelf (COTS) sensors that enable monitoring over large areas. The effectiveness of the object detection and tracking framework is demonstrated using aerial videos from multiple mobile sensors aboard UAVs (a sample UAV is shown in Figure 1).
The goal of an effective video surveillance system is to detect objects in an area of interest and to find their correspondence across many frames. There are several issues inherent to the problem, such as rapidly changing lighting conditions (e.g. due to cloud cover), shadow, occlusion, and entry/exit of objects. The small size of objects taken through aerial imagery is another issue, making it difficult to detect and track through varying appearance and frequent occlusions. To ease this problem, we use object appearance, shape, and motion models to detect and track objects from a single UAV. Once the objects are tracked successfully, we apply geometric similarity between object trajectories across UAVs to obtain consistent labeling (global object correspondence).
Conventional tracking methods that exploit appearance or position similarity are not suitable because of significant camera motion and object positional variation.
So, we have developed the COCOA system,1 which uses three modules to achieve object detection and tracking in a single UAV. The first module applies ego-motion compensation to all video frames so that the global motion of the UAV is compensated for. This allows moving objects to be detected in the area under surveillance. With this approach, a Harris corner detector is used to find feature points in the image. A small neighborhood around these is then matched by applying correlation to regions around feature points in other frames. The tentative matches are then filtered using RANSAC (the random sample consensus algorithm) to find the feature correspondence that has the best fit homography (geometric relationship). After the global motion is compensated for, the second module detects the motion of independently moving objects such as cars, trucks, tanks, or other objects using a frame-differencing scheme.
The object evidence is accumulated by computing the frame difference of each frame with respect to its p neighboring frames as shown in Figure 2. The log-evidence is accumulated for each frame and used to distinguish foreground pixels from background. Due to the motion of the objects, the boxes that show object position do not fit tightly.
The frame differencing provides a good initialization for object detection and tracking, the third module of the COCOA system. The initial object boundaries are refined using a level set-based segmentation approach that produces a contour that tightly detects independently moving objects. These are then tracked across frames until it exits the UAV field of view. To ensure the adaptivity of the meanshift tracker used, the object template is updated and maintained using the results of motion detection and level-set segmentation. For occlusion handling, we maintain shape and velocity as well as object appearance models.
We tested the object detection on 40 videos from a UAV, while the object detection and tracking was tested on 20 videos. Across all frames with detection rates were over 90% and false alarm rates were 1–3%. The result of one video is shown in Figure 3.
The COCOA system is run separately on each UAV video to generate respective object trajectories. Each in a given UAV is then matched to the trajectories in the others. This is achieved using a maximum likelihood estimation of the trajectory-similarity measurement, based on a cost function using algebraic and geometric distances between trajectories. Since trajectories of the same object acquired by two different UAVs are generated from a common 3D trajectory on the ground plane, we can compute a homography between these two trajectories using a direct linear transform algorithm. The likelihood of correspondence between two object is based on the re-projection distance (itself based on inter-trajectory homography) between them.
In the case of two moving UAVs, we can find the global correspondence using maximum matching of a complete bi-partite graph. In such a graph, all the nodes in a bi-partition represent the trajectories from a particular UAV. The edge-weights between these bi-partitions are the correspondence likelihood estimates. A more complex scenario can have multiple UAVs detecting several objects simultaneously. In order to obtain a globally optimal trajectory correspondence, the solution is equivalent to finding maximum matching of the split G* of the directed acyclic weighted graph D, as detailed elsewhere.2
Experiments were performed on UAV videos and the effectiveness of the likelihood maximization estimate for the global correspondence hypotheses is shown in Figure 4. The chart shows the results where the correct hypothesis is identified when objects show more non-collinear motion. Figure 5 shows sample frames from two different UAVs detecting the same three objects on the ground. The videos had short temporal overlap but the estimated object correspondence was correct.
We have presented an approach for the detection and tracking of objects across multiple airborne electro-optical sensors. In this approach, there is a restriction that some UAVs must have an overlapping field of view for some part of the object trajectory. We are currently looking into ways to remove this constraint for global object correspondence across UAVs.