Constrained Rotation Optimization: Revisiting Crop-Based Gaze Estimation

Abstract

Appearance-based gaze estimation typically relies on face normalization to reduce appearance variability, but this requires costly and error-prone landmark detection and head pose estimation. While crop-based alternatives have been explored, their geometric properties and performance trade-offs relative to normalization remain underexplored. In this work, we provide the first systematic comparison of normalization versus crop-based gaze estimation. To enable fair comparison, we formalize the crop-based approach through Constrained Rotation Optimization (CROp), making its geometric transformation explicit and comparable to normalization. We further adopt multi-task learning to recover head pose information lost in cropping. Through extensive experiments across various datasets, head pose distributions, and preprocessing conditions, we identify the conditions under which each approach excels. CROp shows advantages under extreme poses and noisy detection, while normalization benefits from landmark-based refinement in moderate conditions. Our analysis provides practical guidelines for choosing preprocessing strategies in real-world gaze estimation systems.

Publication
European conference on computer vision (ECCV)