Recent Publications

Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around …

Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain, Rawal Khirodkar, Devansh Kukreja, Kevin J Liang, Jia-Wei Liu, Sagnik Majumder, Yongsen Mao, Miguel Martin, Effrosyni Mavroudi, Tushar Nagarajan, Francesco Ragusa, Santhosh Kumar Ramakrishnan, Luigi Seminara, Arjun Somayazulu, Yale Song, Shan Su, Zihui Xue, Edward Zhang, Jinxu Zhang, Angela Castillo, Changan Chen, Xinzhu Fu, Ryosuke Furuta, Cristina Gonzalez, Prince Gupta, Jiabo Hu, Yifei Huang, Yiming Huang, Weslie Khoo, Anush Kumar, Robert Kuo, Sach Lakhavani, Miao Liu, Mi Luo, Zhengyi Luo, Brighid Meredith, Austin Miller, Oluwatumininu Oguntola, Xiaqing Pan, Penny Peng, Shraman Pramanick, Merey Ramazanova, Fiona Ryan, Wei Shan, Kiran Somasundaram, Chenan Song, Audrey Southerland, Masatoshi Tateno, Huiyu Wang, Yuchen Wang, Takuma Yagi, Mingfei Yan, Xitong Yang, Zecheng Yu, Shengxin Cindy Zha, Chen Zhao, Ziwei Zhao, Zhifan Zhu, Jeff Zhuo, Pablo Arbelaez, Gedas Bertasius, David Crandall, Dima Damen, Jakob Engel, Giovanni Maria Farinella, Antonino Furnari, Bernard Ghanem, Judy Hoffman, C. v. Jawahar, Richard Newcombe, Hyun Soo Park, James M. Rehg, Yoichi Sato, Manolis Savva, Jianbo Shi, Mike Zheng Shou, Michael Wray

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Image-to-Text Translation for Interactive Image Recognition: A Comparative User Study with Non-Expert Users

Interactive machine learning (IML) allows users to build their custom machine learning models without expert knowledge. While most …

Wataru Kawabe, Yusuke Sugano

Image-to-Text Translation for Interactive Image Recognition: A Comparative User Study with Non-Expert Users

Technical Understanding from Interactive Machine Learning Experience: A Study through a Public Event for Science Museum Visitors

While AI technology is becoming increasingly prevalent in our daily lives, the comprehension of machine learning (ML) among non-experts …

Wataru Kawabe, Yuri Nakao, Akihisa Shitara, Yusuke Sugano

Technical Understanding from Interactive Machine Learning Experience: A Study through a Public Event for Science Museum Visitors

Simultaneous control of head pose and expressions in 3D facial keypoint-based GAN

In this work, we present a novel method for simultaneously controlling the head pose and the facial expressions of a given input image …

Tomoyuki Hatakeyama, Ryosuke Furuta, Yoichi Sato

Rotation-Constrained Cross-View Feature Fusion for Multi-View Appearance-based Gaze Estimation

Appearance-based gaze estimation has been actively studied in recent years. However, its generalization performance for unseen head …

Yoichiro Hisadome, Tianyi Wu, Jiawei Qin, Yusuke Sugano

Rotation-Constrained Cross-View Feature Fusion for Multi-View Appearance-based Gaze Estimation

Image Cropping under Design Constraints

Image cropping is essential in image editing for obtaining a compositionally enhanced image. In display media, image cropping is a …

Takumi Nishiyasu, Wataru Shimoda, Yoichi Sato

Proposal-based Temporal Action Localization with Point-level Supervision

Point-level supervised temporal action localization (PTAL) aims at recognizing and localizing actions in untrimmed videos where only a …

Yuan Yin, Yifei Huang, Ryosuke Furuta, Yoichi Sato

Proposal-based Temporal Action Localization with Point-level Supervision

Efficient Annotation and Learning for 3D Hand Pose Estimation: A Survey

In this survey, we present a systematic review of 3D hand pose estimation from the perspective of efficient annotation and learning. 3D …

Takehiko Ohkawa, Ryosuke Furuta, Yoichi Sato

Efficient Annotation and Learning for 3D Hand Pose Estimation: A Survey

DeCo : Decomposition and Reconstruction for Compositional Temporal Grounding via Coarse-to-Fine Contrastive Ranking

Understanding dense action in videos is a fundamental challenge towards the generalization of vision models. Several works show that …

Lijin Yang, Quan Kong, Hsuan-Kung Yang, Wadim Kehl, Yoichi Sato, Norimasa Kobori

See all publications