Abstract
Effective assisted living environments must be able to infer how their occupants interact in a variety of scenarios. Gaze direction provides strong indications of how a person engages with the environment and its occupants. In this paper, we investigate the problem of gaze tracking in multi-camera assisted living environments. We propose a gaze tracking method based on predictions generated by a neural network regressor that relies only on the relative positions of facial keypoints to estimate gaze. For each gaze prediction, our regressor also provides an estimate of its own uncertainty, which is used to weigh the contribution of previously estimated gazes within a tracking framework based on an angular Kalman filter. Our gaze estimation neural network uses confidence gated units to alleviate keypoint prediction uncertainties in scenarios involving partial occlusions or unfavorable views of the subjects. We evaluate our method using videos from the MoDiPro dataset, which we acquired in a real assisted living facility, and on the publicly available MPIIFaceGaze, GazeFollow, and Gaze360 datasets. Experimental results show that our gaze estimation network outperforms sophisticated state-of-the-art methods, while additionally providing uncertainty predictions that are highly correlated with the actual angular error of the corresponding estimates. Finally, an analysis of the temporal integration performance of our method demonstrates that it generates accurate and temporally stable gaze predictions.
Original language | English |
---|---|
Article number | A776 |
Pages (from-to) | 2335-2347 |
Number of pages | 13 |
Journal | IEEE Transactions on Image Processing |
Volume | 32 |
DOIs | |
State | Published - 2023 |
Keywords
- Machine learning
- gaze tracking
- multi-camera assisted living scenario
- neural network regressor
- pose estimation
- uncertainty