An Algorithm To Detect Driver’s Drowsiness Based On Nodding Behaviour
Lam Thanh Hien1 and Do Nang Toan2
1Lac Hong University, Dong Nai, Vietnam and 2 Vietnam National University, Ha Noi, Vietnam
Driver’s drowsiness is one of the major causes of serious accidents in road traffic. Thus, special effort in searching for better assistant technology has been paid. However, several existing approaches fail to work effectively as the head of a drowsy driver is usually in slanting state. Moreover, the shaking of vehicle or the driver’s winking even makes the problem much more complicated. Anyway, head bend posture also signifies a drowsy state. Consequently, this paper proposes a novel approach by considering head nodding behaviour as an input in our detection model. After detecting a human face, some significant facial features are extracted; then, they are used to calculate the predetermined optimal parameters; finally, drowsiness is evaluated based on these thresholds. In our empirical experiments, the proposed algorithm can successfully and accurately detect 96.56% of cases.
Driver drowsiness, Algorithm, Nodding behaviour, Facial Normal
Driver’s drowsiness is one of the major causes of serious accidents in road traffic. It usually occurs when a driver fails to have enough sleep or enough rest for/after a long trip, leading to a decrease in his/her observation and reaction ability. As a matter of fact, after sitting still for a long time, the vibration, noise, and shaking make drivers tired; and if he/she tries to continue the trip without taking a proper rest, he/she may easily fall into drowsiness which results in distraction and unconscious state; hence, he/she naturally loses his/her control of the vehicle awhile. A few seconds of unconsciously losing the control may cause a disaster because the driver fails to have enough time to reflex to avoid obstacles and/or other vehicles.
The above problem has attracted special attention of researchers in searching for optimal models to detect and alert driver’s drowsiness. Grace et al.  established a non-parameter neural network model to estimate the level of PERCLOS in monitoring and detecting the sleepy state of heavy-truck drivers. Vural et al.  used Adaboost and multiple regression approach to classify 30 behavioural faces based on FACS system as shown in Figure 1; and they achieved a successful result of more than 90%. Lin et al.  proposed a wireless real-time electroencephalogram (EEG) system based on brain-computer interface (BCI) to detect drowsiness while Liu et al.  suggested an algorithm based on the movement principles of eyelashes with an acceptable efficiency.
One of the key characteristics of a drowsy person is the head bend posture, which leads us to a more difficult problem – detecting driver’s drowsiness in slanting state, i.e. driver face is not in-line with the equipped camera. Moreover, the shaking of vehicle or the driver’s winking even makes the problem much more complicated. However, head bend posture also signifies a drowsy state. Thus, this paper proposes a novel approach by considering head nodding behaviour as an input in our detection model. Some key parameters presenting head nodding behaviour will be optimized before conducting further steps. Specifically, after detecting a human face, some significant facial features are extracted; then, they are used to calculate the predetermined optimal parameters; finally, drowsiness is evaluated based on these thresholds.
Figure 1. Detection system proposed by Vural et al. 
The rest of this paper is organized as the following: related studies are summarized in Section 2 while Section 3 presents our proposed approach. Our experiments and results are discussed in Section 4; and some concluding remarks make up the last section.
2.1. Active Appearance Model (Aam)
Active Appearance Model (AAM) proposed by Cootes et al.  is actually an algorithm to detect key facial features which carries specific and different characteristics. In AAM, a statistical model respective to the appearance of an object in image is used to combine with an optimal algorithm to determine appropriate parameters in displaying respective image model. However, their approach is built on a quite complicated mapping function and depends on the size of the dataset. Hence, with a modification of AAM, Baker & Matthews  found that their proposed model provides better results and real-time convergence in some specific usage cases. Or Xiao et al.  further modified the AAM model by incorporating 2D and 3D information.
Figure 2: Shape and image structure in AAM
In AAM, object of interest is modelled with a set of shape description features and its image structure as shown in Figure 2, which is actually the sample of image intensity in certain regions constrained by a control set. A statistical model of the object can effectively describe its shape variations, its image structure variations as well as correlation among them. Prominent issues concerned in this approach include establishing a statistical model for image object and designing an optimal searching algorithm. It should be noted that the establishing of a statistical model for an object includes a model for its shape and another one for its structure. Combining these two models results in a certain model for the whole object.
2.2. Facial Normal Model
Gee & Cipolla (1994) proposed the facial normal model with five facial features, including two far corners of eyes, two points of mouth corners, and the tip of nose, among which the four points of eyes and mouth corners make up a plane called facial plane denoted by Oxy. In 3D space, facial normal can be easily achieved by having the normal of facial plane Oxy at the nose tip as shown in Figure 3 [8,9]. Figure 3 demonstrates a coordinate system Oxyz which is assumed to be located at the center of the camera. The horizontal and vertical directions in the image are respectively denoted by Ox and Oy axes while the normal to the image plane is presented by Oz.
Figure 3. Facial Normal Model
The symmetric axis of the facial plane should be first determined by joining the midpoints of the two points of far corners of eyes and two points of far corners of mouth. Then, we need to provide two predetermined ratios, namely as and where Lm, Ln, and Lf are accordingly measured as plotted in Figure 4. From these facts, we can accordingly estimate the direction of facial normal in 3D space.
Figure 4. Fundamental parameters Lm, Ln, and Lf
Because length ratios along the symmetric axis are preserved, we can easily locate the nose base along the axis by using the model ratio Rm. Then, we join the nose base and the nose tip to determine the facial normal in the image. Consequently, the angle t between the facial normal in the image and the Ox axis is used to define tilt direction of the normal whereas the slant angle s between the optical axis and the facial normal in 3D space is also used to establish the normal. Basically, we can obtain the slant angle s from the model ratio Rn . Thus, in the coordinate system Oxyz, the facial normal is determined by
3.Detection Of Nodding Behaviour
3.1. Selection Of Shape Parameters
There have been several approaches proposing shape parameters which can be constructed based on face-operated model, anthropology features, ratio of body parts, colour and types of shapes of each part; for example, willow-leaf eyebrows are usually long, tapering at the tail, round-tapering at the head, thick and bright like curved leaves. In fact, searching for these features to construct an appropriate model takes a lot of time and effort. Hence, to overcome this issue, in the problem of detecting driver’s drowsiness, this paper does not require the shape features of each part but the correlation among some key characteristics presenting different attributes of nodding behaviour. Base on a control set extracted from driver’s face, we construct some fundamental parameters directly computed from the set. In practice, there are several computational methods; however, this paper works with some parameters, such as point-point distance, point-edge distance, the area of triangular constrained by two points of mouth corners and tip nose due to their easy computation and differentiated ability.
This paper uses an image database of faces that are marked with set of points and labeled with either head-up or head-down tags. With each feature, we count respective values of the selected parameters in the database and find detaching thresholds so that the problem can be transformed into equivalent problem of constructing one-level decision tree and evaluating errors. Some features with high detachment ability are selected to determine whether head is bent down. As such, our proposed algorithm automatically extracts control set of points with the AAM. From the set, some control points that can serve the estimate head direction are then selected to become inputs of the facial normal model to compute shape parameters.
3.2. An Algorithm To Detect Nodding Head From Camera
From the above reviews, the following procedure is suggested to detect nodding behaviour:
Input: Frame flows from camera or video;
Output: Head state (Normal, Nodding);
@ Some basic denotations:
- N: Nose point;
- E1: Left eye corner;
- E2: Right eye corner;
- M1: Mouth left corner;
- M2: Mouth right corner;
- H: Point in M1M2 line (NH^M1M2);
- dm: distance from N to H;
- s3: area of NM1M2;
@ Basic steps:
– Create initial values of:
- std_brect(bounding rectangle);
- status := HEAD_NORMAL;
– For each frame:
- calculate cur_dm, cur_s3, cur_brect;
- x := (std_brect.x-cur_brect.x)/std_brect.width;
- y := (std_brect.y-cur_brect.y)/std_brect.width;
- if (y thres1 AND < thres2)
if (status=HEAD_NOD) return;
else if (std_dm/cur_dm thres3 AND std_s3/cur_s3>thres4) status:= HEAD_NOD;
This paper tests the proposed algorithm with two types of data, including: (1) virtual images created from 3D model with several reference parameters that are already determined from defined transformation model and the model is then rotated with different angles and aspects to test the performance of the algorithm; and (2) real images obtained from camera and video. The data are classified so that they can be used in the learning phase of selected parameters with respective evaluation thresholds and in detecting phase as well. Particularly, besides the 3D images, we conducted experiments with 11 videos of 11 different people recorded at Duy Tan University (Da Nang City, Vietnam) at the rate of 15 frames/second and image resolution of 640×480.
In order to assist the process of selecting appropriate parameters, we organize a database of 5,836 pieces of marked images created from 3D model as shown in Figure 5 and real images extracted from camera. This set is used to compute the values of several parameters so that we can select optimal ones. Our experimental results indicate that we can obtain optimal values of concerned parameters as shown in Figure 6 and Figure 7.
Moreover, from the above database, we select out a set of 2,530 images to test the performance of our proposed algorithm in detecting nodding behaviour. It shows that our algorithm can successfully and accurately detect 96.56% of cases as shown in Figure 8. Some cases can’t be correctly detected due to the failure of detecting the set of feature points as shown in Figure 9.
Figure 5.3D virtual image
Figure 6. Distribution of parameter dm
Figure 7. Distribution of another parameter
Figure 8. Samples of correct detection of feature points
Figure 9. Samples of incorrect detection of feature points
Besides, in order to develop an integrative system to monitor driver’s drowsiness, a computational program with the proposed algorithm is tested for its ability of detecting nodding behaviour to alert driver. The testing program is written based on Visual C++2008 with the support from the open source library OpenCV. The inputs for the program are from video or webcam. The program then automatically determine key feature points of the faces by analysing their parts and monitor face declination; when it detects nodding actions, the system alarms on its monitor.
Empirical study shows that the program works at real-time speed and gives high ratio of accurate results in the specified testing environment as shown in Figure 10. However, there are still some cases that the program fails to correctly perform its task because key facial feature points are not accurately detected.
Figure 10. Screen shots of our testing program
Human head in digital image has been an interesting research topic; and several practical applications have been developed from such studies in identifying faces, monitoring human activities, human-machine interaction, and there are still many issues left unsolved. This paper proposes a novel approach in determining nodding behaviour based on optimal selection criteria for some parameters of shape features. Next research would focus on combining some techniques to better detect key facial feature points and integrate into driver monitoring system to improve traffic safety.
 Grace R., Byrne V.E., Bierman D.M., Legrand J.M., Gricourt D., Davis B.K., Staszewski J.J., Carnahan B. (1998), “A drowsy driver detection system for heavy vehicles”, Digital Avionics Systems Conference Proceedings, Vol. 2, I36/1 – I36/8.
 Vural E., Cetin M., Ercil A., Littlewort G., Bartlett M., Movellan J. (2007), “Drowsy Driver Detection through Facial Movement Analysis”, Human–Computer Interaction, Vol. 4796 of LNCS, 6-18.
 Lin C.T., Chang C.J., Lin B.S., Hung S.H., Chao C.F., Wang I.J. (2010), “A Real-Time Wireless Brain–Computer Interface System for Drowsiness Detection”, IEEE Transactions on Biomedical Circuits and Systems, Vol. 4, No. 4, 214-222.
 Liu D., Sun P., Xiao Y.Q., Yin Y. (2010), “Drowsiness Detection Based on Eyelid Movement”, Second International Workshop on Education Technology and Computer Science, Vol. 2, 49 – 52.
 Cootes T.F., Edwards G.J., Taylor C.J. (1998), “Active appearance models”, In H.Burkhardt and B. Neumann, editors, 5th European Conference on Computer Vision, Vol. 2, 484-498.
 Baker S., Matthews I. (2001), “Equivalence and efficiency of image alignment algorithms”, Computer Vision and Pattern Recognition Conference 2001, Vol. 1, 1090-1097.
 Xiao J., Baker S., Matthews I., Kanade T. (2004), “Real-Time Combined 2D+3D Active Appearance Models”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 535 – 542.
 Gee A., Cipolla R. (1994), “Determining the Gaze of Faces in Images”, Image and Vision Computing, Vol. 12, No. 10, 639-647.
 Hien L.T., Toan D.N., Lang T.V. (2015), “Detection of Human Head Direction Based on Facial Normal Algorithm”, International Journal of Electronics Communication and Computer Engineering, Vol. 6, No. 1, 110-114.
Lam Thanh Hien received his MSc. Degree in Applied Informatics Technology in 2004 from INNOTECH Institute, France. He is currently working as a Vice-Rector of Lac Hong University. His main research interests are Information System and Image Processing.
Do Nang Toan is an Associate professor in Computer Science of Vietnam National University. He received BSc. Degree in Applied Mathematics and Informatics in 1990 from Hanoi University and PhD in Computer Science in 2001 from Vietnam Academy of Science and Technology. He is currently work ing as a Head of Department of Virtual reality technology at Institute of Information Technology, Vietnamese Academy of Science and Technology and as Dean of Faculty of Multimedia Communications, Thai Nguyen University of Information and Communication Technology. His main research interests are Pattern recognition, Image processing and Virtual reality.