A face-tracking system to detect falls in the elderly

An automated surveillance method that uses multiple image processing can detect, analyze, and track movements to identify emergency situations.
08 August 2013
Philippe Katz, Michael Aron and Ayman Alfalou

As life expectancy increases and birth rates fall, most industrialized countries anticipate a growing elderly population in the coming century. In Western Europe, for example, people aged over 60 represented 20% of the total population in 2000, but this number will reach 42% in 2050.1 Given these projections, and the costs and logistics of caring for the elderly, it is generally recommended that the healthiest dependent people remain in their own homes, rather than transferring to an institutionalized setting. To realize this aim, care applications, or 'smart homes,' have evolved in recent decades.2–12 These heterogeneous systems, designed to assist dependent people in everyday life, include automatic detection methods for falls---the primary cause of accidental death in isolated dependent people. Solutions currently available include wearable sensors (push-buttons8 or accelerometers5, 8,9), but these technologies have major drawbacks. For example, carelessness or cognitive trouble can lead to them being worn intermittently, and the wearer of the sensor needs to be conscious to press the button. Furthermore, when a loss of consciousness occurs slowly, it is undetectable by this kind of technology.


Figure 1. An overview of the fall detection system.

Consequently, we require a system that is able to interpret a situation and detect and analyze a movement. We propose an automated and stand-alone surveillance method, fully integrated within the environment (see Figure 1). A large number of sensors set up in the home would collect different kinds of accessible data: audio, video, IR, or pressure (from sensors embedded in furniture). Information from these would pass to a local calculation unit for testing and analysis. Thus, it would be possible to consider a large variety of situations such as falls, unusual inaction, or a sudden change in habits. Information about these events would go to emergency services, and would provide diagnostic information to health practitioners. Furthermore, an alert would go to relatives by Short Message Service or email.

Based on the concept of a fall as a transition from standing to lying down, we tracked the position of a subject's face to assess temporal and spatial information. At present, our work focuses on this tracking stage.13 Our system has the advantages of being relatively simple and able to simultaneously process detection, identification, and localization. A Fourier transform is applied on an entry plane composed of a reference and a target image (the face to be recognized), and an inverse Fourier transform yields a correlation plane. In this study we use a joint transform correlator (JTC),14 an image processing technique that can be used to compare several images in parallel, and which is particularly suitable for tracking situations.


Figure 2. The joint transform correlator (JTC) algorithm with a synopsis of the histogram optimization. I0refers to the initialization image. Ii is the ithimage of the video sequence under consideration. IRefand IOut are the reference and output images, respectively, and X denotes the result of our histogram similarity criterion, which is compared with a given threshold (thresh).

Figure 3. The simulation room, set up as a hospital or retirement home for our experiment.

Figure 4. Simulated falls and occlusions used to test the algorithm.
Table 1. Analysis of 21,087 frames using a joint transform correlator algorithm, with and without histogram correction. The first line of data (tracked pictures) presents the percentage of correctly detected faces.
TrackingWithout histogramHistogram
Tracked pictures (%) 58.11 81.46
Non-tracked pictures (%) 41.89 18.54

The correlation plane given by a JTC implementation contains two cross-correlation peaks whose location depends on the relative position of reference and target images in the entry plane. This allows the localization of a target motif (namely a face) in a scene. An iterative algorithm---in which the reference image at each timestamp (time t) is replaced by the previously detected face in the target image (t−1)---makes face tracking possible in each video frame, taking into account the different variations of our tracked motif in time (see Figure 2). The algorithm initialization (t=0) is performed by means of a Viola-Jones object detection framework.15 To avoid false detection cases (correlation with the scene background, for example), we realize a histogram comparison between the target and reference image (the person's face). We can detect a large inter-frame variation, making it possible to re-initialize our algorithm if there is a loss of tracking.

We produced an experimental setup to test the reliability of our approach. First, we created a reproduction of a hospital or retirement home (see Figure 3). Second, we imagined a wide variety of scenarios (where the subject is facing away from the detector, rotates, or falls, or where the face is hidden by another object) comprising 21,087 frames (see Figure 4). We recorded the face position on each frame manually, leading to ‘ground truth,’ where in each frame we manually localize the position of the head and register it, so that the information can be used to evaluate the algorithm. Table 1 presents a comparison between the iterative JTC algorithm with and without the histogram similarity stage. The effect of histogram correction is noticeable, giving an improvement of 23 percentage points.

Finally, we experimented with fall detection using a naive method based on speed measurement. A fall is detected when the downward vertical speed of a face across successive video frames exceeds a certain threshold. We also considered the horizontal speed for elliptical falls, weighting it by a 1/4 factor, yielding the formula

where xt and yt are the face coordinates at time t.

The results, obtained on a set of 60 falls (vertical, left, and right), are shown in Table 2. We correctly detected 58% of the total falls. Various factors affected the result. The speed measured depends on the distance between the subject and the camera. If the face is obscured during a fall, the Viola-Jones detector may not be able to re-initialize the algorithm, and the naive fall detection method is unsuitable for slow falls and for when the face follows an elliptical trajectory.

Table 2. Number and percentage of falls detected by our algorithm for vertical, left, and right directions (20 falls simulated for each situation).
VerticalLeftRightTotal
Recognized falls 13 10 12 35
Recognized falls (%) 65 50 60 58.34

Our method can simultaneously detect, localize, and identify the person. Furthermore, it can accurately perform a tracking process. Unfortunately, that process still suffers from some limitations, and the correlation has to be considered as a baseline method, to be improved in future work. A background subtraction (where we define the background scene with a fixed camera, and eliminate it from the results) may be an appropriate enhancement of our system, as would silhouette and skeleton detections for posture identification, which could be fuzzed with our system. Finally, we need to compare our technique with other fall detection systems, using an extended experimental database.16

This work is supported by Project 0OR251, a collaboration between the Malakoff-Médérik Group, Open Society, and ISEN-Brest.


Philippe Katz, Michael Aron, Ayman Alfalou
Vision Laboratory
Institut Supérieur d'Électronique et du Numérique (ISEN)
Brest, France

Philippe Katz received his engineering diploma from ISEN-Brest and his MSc in signals and images in biology and medicine from the University of Brest in 2011. Since then, he has been a PhD student at ISEN. His research interests include image and signal processing and smart homes.

Michael Aron received an engineering diploma in 2002 from Polytech-Sophia, University of Nice-Sophia Antipolis. He received his PhD in computer science from the University of Lorraine in 2009, and conducted his image processing post-doctoral research at The French Research Institute for Exploitation of the Sea. Since 2011, he has been an associate professor at ISEN-Brest. His research topics include computer vision and image processing.

Ayman Alfalou's research interests are in optical engineering, optical information processing, signal and image processing, telecommunications, and optoelectronics. He has published more than 110 refereed journal articles or conference papers, and is a senior member of SPIE, the Optical Society of America, and the Institute of Electrical and Electronics Engineers, and is a member of the Institute of Physics.


References:
1. W. Lutz, W. Sanderson, S. Scherbov, The coming acceleration of global population ageing, Nature 451, p. 716-719, 2008.
2. M. Chan, D. Estve, C. Escriba, E. Campo, A review of smart homes: present state and future challenges, Computer Methods Programs Biomed. 91, p. 55-81, 2008.
3. L. C. De Silva, B. Darussalam, Audiovisual sensing of human movements for home-care and security in a smart environment, Int'l J. Smart Sensing Intell. Syst. 1, p. 220-245, 2008.
4. J. Demongeot, G. Virone, F. Duchene, G. Benchetrit, T. Herve, N. Noury, V. Rialle, Multi-sensors acquisition, data fusion, knowledge mining and alarm triggering in health smart homes for elderly people, Comptes Rendus Biologies 325(6), p. 673-682, 2002.
5. A. Keshavarz, A. M. Tabar, H. Aghajan, Distributed vision-based reasoning for smart home care, ACM Sensys Workshop Distributed Smart Cameras , 2006.
6. http://lifealert.com Life Alert: a medical alert system for home health emergencies. Accessed 4 July 2013.
7. S. G. Miaou, P. H. Sung, C. Y. Huang, A customized human fall detection system using omni-camera images and personal information, Proc. Transdisciplinary Conf. Distributed Diagnosis Home Healthcare D2H2, p. 39-42, 2006.
8. A. Särelä, I. Korhonen, J. Lotjonen, M. Sola, M. Myllymaki, IST Vivago--an intelligent social and remote wellness monitoring system for the elderly, IEEE EMBS Special Topic Conf. Inf. Technol. Appl. Biomed., p. 362-365, 2003.
9. A. M. Tabar, A. Keshavarz, H. Aghajan, Smart home care network using sensor fusion and distributed vision-based reasoning, ACM Int'l Workshop Video Surveillance Sensor Networks, p. 145-154, 2006.
10. http://www.tunstallap.com Tunstall Healthcare, a provider of technology and services to people with long-term health and care needs. Accessed 4 July 2013.
11. A. Williams, D. Ganesan, A. Hanson, Aging in place: fall detection and localization in a distributed smart camera network, ACM Int'l Conf. Multimedia, p. 892-901, 2007.
12. G. Demiris, B. K. Hensel, Technologies for an aging society: a systematic review of ”smart home” applications, Yearbook Med. Inf. 31, p. 33-40, 2008.
13. P. Katz, M. Aron, A. Alfalou, Joint transform correlation for face tracking: elderly fall detection application, Proc. SPIE 8748, p. 87480I, 2013. doi:10.1117/12.2016413
14. C. S. Weaver, J. W. Goodman, A technique for optically convolving two functions, Appl. Opt. 5, p. 1248-1249, 1966.
15. P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple features, IEEE Computer Soc. Conf. Computer Vision Pattern Recognition 1, p. 511-518, 2001.
16. C. Rougier, A. St-Arnaud, J. Rousseau, J. Meunier, Video surveillance for fall detection, Int'l Conf. Innovative Technol., p. 357-382, 2011.
PREMIUM CONTENT
Sign in to read the full article
Create a free SPIE account to get access to
premium articles and original research