Recent Publications

» List of All Publications

UniGaze: Towards Universal Gaze Estimation via Large-scale Pre-Training

Despite decades of research on data collection and model architectures, current gaze estimation models encounter significant challenges …

Jiawei Qin, Xucong Zhang, Yusuke Sugano

UniGaze: Towards Universal Gaze Estimation via Large-scale Pre-Training

Learning-by-Generation: Enhancing Gaze Estimation via Controllable Generative Data and Two-Stage Training

Generalization to unseen environments is crucial in appearance-based gaze estimation but is primarily hindered by limitations in …

Jiawei Qin, Xueting Wang, Yusuke Sugano

Learning-by-Generation: Enhancing Gaze Estimation via Controllable Generative Data and Two-Stage Training

Robust Long-term Test-Time Adaptation for 3D Human Pose Estimation through Motion Discretization

Online test-time adaptation addresses the train-test domain gap by adapting the model on unlabeled streaming test inputs before making …

Yilin Wen, Kechuan Dong, Yusuke Sugano

Generative Modeling of Shape-Dependent Self-Contact Human Poses

One can hardly model self-contact of human poses without considering underlying body shapes. For example, the pose of rubbing a belly …

Takehiko Ohkawa, Jihyun Lee, Shunsuke Saito, Jason Saragih, Fabian Prada, Yichen Xu, Shoou-I Yu, Ryosuke Furuta, Yoichi Sato, Takaaki Shiratori

Generative Modeling of Shape-Dependent Self-Contact Human Poses

AssemblyHands-X: Modeling 3D Hand-Body Coordination for Understanding Bimanual Human Activities

Bimanual human activities inherently involve coordinated movements of both hands and body. However, the impact of this coordination in …

Tatsuro Banno, Takehiko Ohkawa, Ruicong Liu, Ryosuke Furuta, Yoichi Sato

AssemblyHands-X: Modeling 3D Hand-Body Coordination for Understanding Bimanual Human Activities

Leveraging RGB Images for Pre-Training of Event-Based Hand Pose Estimation

This paper presents RPEP, the first pre-training method for event-based 3D hand pose estimation using labeled RGB images and unpaired, …

Ruicong Liu, Takehiko Ohkawa, Tze Ho Elden Tse, Mingfang Zhang, Angela Yao, Yoichi Sato

Leveraging RGB Images for Pre-Training of Event-Based Hand Pose Estimation

EgoInstruct: An Egocentric Video Dataset of Face-to-face Instructional Interactions with Multi-modal LLM Benchmarking

Analyzing instructional interactions between an instructor and a learner who are co-present in the same physical space is a critical …

Yuki Sakai, Ryosuke Furuta, Juichun Yen, Yoichi Sato

EgoInstruct: An Egocentric Video Dataset of Face-to-face Instructional Interactions with Multi-modal LLM Benchmarking

Can MLLMs Read the Room? A Multimodal Benchmark for Verifying Truthfulness in Multi-Party Social Interactions

As AI systems become increasingly integrated into human lives, endowing them with robust social intelligence has emerged as a critical …

Caixin Kang, Yifei Huang, Liangyang Ouyang, Mingfang Zhang, Yoichi Sato

Can MLLMs Read the Room? A Multimodal Benchmark for Verifying Truthfulness in Multi-Party Social Interactions

Affordance-Guided Diffusion Prior for 3D Hand Reconstruction

How can we reconstruct 3D hand poses when large portions of the hand are heavily occluded by itself or by objects? Humans often resolve …

Naru Suzuki, Takehiko Ohkawa, Tatsuro Banno, Jihyun Lee, Ryosuke Furuta, Yoichi Sato

Affordance-Guided Diffusion Prior for 3D Hand Reconstruction

Domain-Adaptive Full-Face Gaze Estimation via Novel-View-Synthesis and Feature Disentanglement

Along with the recent development of deep neural networks, appearance-based gaze estimation has succeeded considerably when training …

Jiawei Qin, Takuru Shimoyama, Xucong Zhang, Yusuke Sugano

Domain-Adaptive Full-Face Gaze Estimation via Novel-View-Synthesis and Feature Disentanglement

See all publications