Image-based analysis enables aerial target detection
Detecting and tracking both moving and stationary targets in Forward-Looking Infrared (FLIR) imagery is a challenging research area in computer vision. In contrast to visual images, those obtained from an infrared sensor have extremely low signal to noise ratios (SNR), providing limited information for performing detection or tracking tasks. In addition, while the techniques used to detect moving targets are often based on the static camera hypothesis,1 sensors used in automatic target recognition applications are typically mounted on moving vehicles such as airplanes, resulting in instabilities during the image acquisition process.1–7
The generation of automatic target detection systems has often been application driven and case specific, and has mainly focused on processing terrestrial sequences, with only minor efforts for aerial and maritime environments, as in Meier.8 Furthermore, the high complexity of the developed algorithms forces users to accept very high computational costs in order to achieve off-line detection of targets in a scene.
There are many and varied techniques for digitally estimating camera motion and stabilizing the sequence of images, although most of these methods can be classified within two main categories.9,10 The first type, flow-based algorithms,4–6 entail very high computational costs, making them impractical for real-time applications. The second type, feature-based methods,3 reduce the computational burden, but do so at the expense of reducing their applicability. Once a sequence is stabilized, there are many different approaches to detect and track the targets present in the scene. However, all of these approaches are still very case specific and have typically been applied only to terrestrial FLIR sequences. To overcome these limitations, we propose an innovative and efficient strategy with a block-based motion estimation and an affine transformation to recover from ego-motion.
This strategy operates on a multi-resolution approach, and improves on the work by Seok,3 minimizing the limitations derived from its oversimplified rotation model. The original model combines rotational and translational motions. The novelty of our strategy relies on the relaxation of the assumed hypothesis, and hence on the enhancement of its applicability, by overcoming the imposition of rotational displacements within the camera motion. In addition, we reduce computational cost by applying a multi-resolution algorithm in which the stabilization technique is applied on the lowest resolution image. After the images have been compensated on the highest resolution level and refined to avoid distortions produced by the sampling process, a dynamic differences-based segmentation is applied, followed by a morphological filtering strategy.
The system described in this paper is composed of three subsystems, as presented in Figure 1. First, a multi-resolution algorithm is applied to the sequence to obtain lower-resolution reproductions of the FLIR images. Next, the digital image stabilization (DIS) system is applied. This consists of two main modules: one for motion estimation and another for motion compensation. The motion estimation module can also be divided: local motion estimation calculates the movement of individual image pixels between two consecutive images through a block-matching algorithm; motion type estimation determines whether the displacements correspond to a pure translational motion, a rotational motion, or both at the same time; and the final segment performs global motion estimation. The second DIS module, motion compensation, removes the undesired ego-motion previously estimated. Finally, after compensating in the highest resolution level and refining, the detection system is applied. This is composed of an image differences module, which allows the segmentation of the targets, and a morphological filtering module, which determines their final shape and location within the image.
The fidelity of the image stabilization technique was evaluated using the peak signal-to-noise ratio (PSNR) measure on both synthetically generated and real sequences. Several conclusions can be drawn from the results obtained. First, the motion type estimation module correctly estimates global motion, even in situations with small transformations (e.g. with small rotation angles). Second, results for pure translations and pure rotations are accurate: the estimated values are very similar to those used in the simulations. Finally, the DIS system has demonstrated that our strategy can accurately stabilize the images from real aerial sequences for further processing.
The detection system has been tested on stabilized images from the evaluated sequences. Figure 2 shows some of the results obtained. The detected regions of interest (ROI) containing the potential targets are shown in the top row of images, while the segmented targets are presented in the bottom row. These demonstrate the accuracy of the implemented approach. First, the ROIs were well detected in the sequences, including the synthetic sequence in which the detection process is more difficult due to the characteristics of the selected image. Second, the targets have been segmented accurately: even in Figure 2(a), which includes extreme contrasts between hot and cold spots. Despite these challenges, the shape of the aircraft was extracted.
These results demonstrate the correct operation of our system, both in stabilizing images and automatically detecting aerial targets from both synthetic and real FLIR sequences. Currently, we are proceeding with further evaluations, specially focused on the generation of a complete automatic target recognition real-time system.