最近の発表文献

» 全発表文献リスト

Audio-visual localization based on spatial relative sound order

Sound localization is one of the essential tasks in audio-visual learning. Especially, stereo sound localization methods have been …

Tomoya Sato, Yusuke Sugano, Yoichi Sato

Audio-visual localization based on spatial relative sound order

Gazing Into Missteps: Leveraging Eye-Gaze for Unsupervised Mistake Detection in Egocentric Videos of Skilled Human Activities

We address the challenge of unsupervised mistake detection in egocentric video of skilled human activities through the analysis of gaze …

Michele Mazzamuto, Antonino Furnari, Yoichi Sato, Giovanni Maria Farinella

Gazing Into Missteps: Leveraging Eye-Gaze for Unsupervised Mistake Detection in Egocentric Videos of Skilled Human Activities

SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training

We present a framework for pre-training of 3D hand pose estimation from in-the-wild hand images sharing with similar hand …

Nie Lin, Takehiko Ohkawa, Yifei Huang, Mingfang Zhang, Minjie Cai, Ming Li, Ryosuke Furuta, Yoichi Sato

A Multimodal LLM-based Assistant for User-Centric Interactive Machine Learning

This paper proposes a system based on a multimodal large language model (MLLM) to assist non-expert users without prior experience in …

Wataru Kawabe, Yusuke Sugano

A Multimodal LLM-based Assistant for User-Centric Interactive Machine Learning

Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos

We propose a novel benchmark for cross-view knowledge transfer of dense video captioning, adapting models from web instructional videos …

Takehiko Ohkawa, Takuma Yagi, Taichi Nishimura, Ryosuke Furuta, Atsushi Hashimoto, Yoshitaka Ushiku, Yoichi Sato

Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos

Learning Multiple Object States from Actions via Large Language Models

Recognizing the states of objects in a video is crucial in understanding the scene beyond actions and objects. For instance, an egg can …

Masatoshi Tateno, Takuma Yagi, Ryosuke Furuta, Yoichi Sato

Example-based Conditioning for Text-to-Image Generative Models

Recent progress in image generation has made it possible to create high-quality images. Techniques using diffusion models have shown …

Atsush Takada, Wataru Kawabe, Yusuke Sugano

ActionVOS: Action as Prompts for Video Object Segmentation

Delving into the realm of egocentric vision, the advancement of referring video object segmentation (RVOS) stands as pivotal in …

Liangyang Ouyang, Ruicong Liu, Yifei Huang, Ryosuke Furuta, Yoichi Sato

Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects

We interact with the world with our hands and see it through our own (egocentric) perspective. A holistic 3Dunderstanding of such …

Zicong Fan, Takehiko Ohkawa, Linlin Yang, Nie Lin, Zhishan Zhou, Shihao Zhou, Jiajun Liang, Zhong Gao, Xuanyang Zhang, Xue Zhang, Fei Li, Zheng Liu, Feng Lu, Karim Abou Zeid, Bastian Leibe, Jeongwan On, Seungryul Baek, Aditya Prakash, Saurabh Gupta, Kun He, Yoichi Sato, Otmar Hilliges, Hyung Jin Chang, Angela Yao

Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition

Compared with visual signals, Inertial Measurement Units (IMUs) placed on human limbs can capture accurate motion signals while being …

Mingfang Zhang, Yifei Huang, Ruicong Liu, Yoichi Sato

発表文献一覧