Yoichi Sato

Latest

Audio-visual localization based on spatial relative sound order
Gazing Into Missteps: Leveraging Eye-Gaze for Unsupervised Mistake Detection in Egocentric Videos of Skilled Human Activities
SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training
Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos
Learning Multiple Object States from Actions via Large Language Models
ActionVOS: Action as Prompts for Video Object Segmentation
Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects
Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition
WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding
Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation
Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos
Gaze Scanpath Transformer: Predicting Visual Search Target by Spatiotemporal Semantic Modeling of Gaze Scanpath
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Matching Compound Prototypes for Few-Shot Action Recognition
Simultaneous control of head pose and expressions in 3D facial keypoint-based GAN
Image Cropping under Design Constraints
Proposal-based Temporal Action Localization with Point-level Supervision
Efficient Annotation and Learning for 3D Hand Pose Estimation: A Survey
DeCo : Decomposition and Reconstruction for Compositional Temporal Grounding via Coarse-to-Fine Contrastive Ranking
FineBio: A Fine-Grained Video Dataset of Biological Experiments with Hierarchical Annotations
Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction
Technical Report for EgoTracks in Ego4D Challenge 2023
Weakly Supervised Temporal Sentence Grounding With Uncertainty-Guided Self-Training
Fine-grained Affordance Annotation for Egocentric Hand-Object Interaction Videos
Background Mixup Data Augmentation for Hand and Object-in-Contact Detection
CompNVS: Novel View Synthesis with Scene Completion
Compound Prototype Matching for Few-shot Action Recognition
Domain Adaptive Hand Keypoint and Pixel Localization in the Wild
Self-Supervised Learning for Audio-Visual Relationships of Videos with Stereo Sounds
Surgical Skill Assessment via Video Semantic Aggregation
Semantic Image Segmentation by Dynamic Discriminative Prototypes
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Interact before Align: Leveraging Cross-Modal Knowledge for Domain Adaptive Action Recognition
Object Instance Identification in Dynamic Environments
Precise Affordance Annotation for Egocentric Action Video Datasets
Go-Finder: A Registration-Free Wearable System for Assisting Users in Finding Lost Hand-Held Objects
Spatio-Temporal Perturbations for Video Attribution
Neural Routing by Memory
Hand-Object Contact Prediction via Motion-Based Pseudo-Labeling and Guided Progressive Label Correction
Leveraging Human Selective Attention for Medical Image Analysis with Limited Training Data
Stacked Temporal Attention: Improving First-person Action Recognition by Emphasizing Discriminative Clips
Foreground-Aware Stylization and Consensus Pseudo-Labeling for Domain Adaptation of First-Person Hand Segmentation
EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition 2021: Team M3EM Technical Report
Unsupervised Common Particular Object Discovery and Localization by Analyzing a Match Graph
GO-Finder: A Registration-Free Wearable System for Assisting Users in Finding Lost Objects via Hand-Held Object Discovery
Toward Visually Explaining Video Understanding Networks by Perturbation
Light-sheet microscopy-based 3D single-cell tracking assay revealed a correlation between cell cycle and the beginning of endodermal cell internalization in zebrafish early development
Community Detection Using Restrained Random-Walk Similarity
Learning Context-dependent Personal Preferences for Adaptive Recommendation
Mutual Context Network for Jointly Estimating Egocentric Gaze and Action
Generalizing Hand Segmentation in Egocentric Videos With Uncertainty-Guided Model Adaptation
Improving Action Segmentation via Graph Based Temporal Reasoning
An ego-vision system for discovering human joint attention
Support Strategies for Remote Guides in Assisting People with Visual Impairments for Effective Indoor Navigation
Gaze Estimation by Exploring Two-Eye Asymmetry
Manipulation-Skill Assessment from Videos with Spatial Attention Network
BBeep: A Sonic Collision Avoidance System for Blind Travellers and Nearby Pedestrians
Assisting group activity analysis through hand detection and identification in multiple egocentric videos
CoSummary: adaptive fast-forwarding for surgical videos by detecting collaborative scenes using hand regions and gaze positions
Mutual Context Network for Jointly Estimating Egocentric Gaze and Actions
Visualizing Gaze Direction to Support Video Coding of Social Attention for Children with Autism Spectrum Disorder
Browsing Group First-Person Videos with 3D Visualization
Continuous 3D Label Stereo Matching Using Local Expansion Moves
Dynamic Object Scanning: Object-Based Elastic Timeline for Quickly Browsing First-Person Videos
Dynamic Object Scanning: Object-Based Elastic Timeline for Quickly Browsing First-Person Videos
Ego-Surfing: Person Localization in First-Person Videos Using Ego-Motion Signatures
Exploring the Role of Tunnel Vision Simulation in the Design Cycle of Accessible Interfaces
Future Person Localization in First-Person Videos
Hyperspectral Image Super-Resolution With a Mosaic RGB Image
Object-wise 3D Gaze Mapping in Physical Workspace
Predicting Gaze in Egocentric Video by Learning Task-Dependent Attention Transition
SymPS: BRDF Symmetry Guided Photometric Stereo for Shape and Light Source Estimation
Understanding hand-object manipulation by modeling the contextual relationship between actions, grasp types and object attributes
Adaptive Spatial-Spectral Dictionary Learning for Hyperspectral Image Restoration
An Ego-Vision System for Hand Grasp Analysis
Appearance-Based Gaze Estimation via Uncalibrated Gaze Pattern Recovery
Cell tracking for cell image analysis
Community detection using random-walk similarity and application to image clustering
Egoscanning: quickly scanning first-person videos with egocentric elastic timelines
EgoScanning: Quickly Scanning First-Person Videos with Egocentric Elastic Timelines
Fast Multi-frame Stereo Scene Flow with Motion Segmentation
From RGB to Spectrum for Natural Scenes via Manifold-Based Mapping
Hierarchical Gaussian Descriptors with Application to Person Re-Identification
Privacy-Preserving Visual Learning Using Doubly Permuted Homomorphic Encryption
Rapid Prototyping of Accessible Interfaces With Gaze-Contingent Tunnel Vision Simulation
Temporal Localization and Spatial Segmentation of Joint Attention in Multiple First-Person Videos
Auto-Radiometric Calibration in Photometric Stereo
Can Eye Help You?: Effects of Visualizing Eye Fixations on Remote Collaboration Scenarios for Physical Tasks
Discovering Objects of Joint Attention via First-Person Sensing
Exploiting Spectral-Spatial Correlation for Coded Hyperspectral Image Restoration
Hierarchical Gaussian Descriptor for Person Re-identification
Joint Recovery of Dense Correspondence and Cosegmentation in Two Images
Recognizing Micro-Actions and Reactions from Paired Egocentric Videos
Reflectance and Fluorescence Spectral Recovery via Actively Lit RGB Images
Sensing and Controlling Human Gaze in Daily Living Space for Human-Harmonized Information Environments
Separating Reflective and Fluorescent Components Using High Frequency Illumination in the Spectral Domain
Understanding Hand-Object Manipulation with Grasp Types and Object Attributes
Visual Guidance with Unnoticed Blur Effect
Visual Guidance with Unnoticed Blur Effect
Visual Motif Discovery via First-Person Vision
A scalable approach for understanding the visual structures of hand grasps
Adaptive Spatial-Spectral Dictionary Learning for Hyperspectral Image Denoising
Analyzing human attention and behavior via collective visual sensing for the creation of life innovation
Appearance-Based Gaze Estimation With Online Calibration From Mouse Operations
Cell Detection From Redundant Candidate Regions Under Nonoverlapping Constraints
Ego-surfing first person videos
Fast sparse edge-based intrinsic image decomposition guided by chromaticity gradients
From Intensity Profile to Surface Normal: Photometric Stereo for Unknown Light Sources and Isotropic Reflectances
Gaze Estimation From Eye Appearance: A Head Pose-Free Method via Eye Image Synthesis
Illumination and reflectance spectra separation of a hyperspectral image meets low-rank matrix factorization
Separating Fluorescent and Reflective Components by Using a Single Hyperspectral Image
Uncalibrated photometric stereo based on elevation angle recovery from BRDF symmetry of isotropic materials
Image preference estimation with a data-driven approach: A comparative study between gaze and image features
Adaptive Linear Regressionfor Appearance-Based Gaze Estimation
Fast Spectral Reflectance Recovery Using DLP Projector
Influence of stimulus and viewing task types on a learning-based visual saliency model
Interreflection Removal Using Fluorescence
Learning Characteristic Driving Operations in Curve Sections that Reflect Drivers' Skill Levels
Learning gaze biases with head motion for head pose-free gaze estimation
Learning-by-Synthesis for Appearance-Based 3D Gaze Estimation
Person Re-identification via Discriminative Accumulation of Local Features
Reflectance and Fluorescent Spectra Recovery Based on Fluorescent Chromaticity Invariance under Varying Illumination
Sensing, predicting, and utilizing human visual attention
Shape-Preserving Half-Projective Warps for Image Stitching
Sparsity-based color quantization with preserved image details
Spectra Estimation of Fluorescent and Reflective Scenes by Using Ordinary Illuminants
Advanced Driver Assistant System Using Telematics
Appearance-Based Gaze Estimation Using Visual Saliency
Converting Near Infrared Facial Images to Visible Light Images using Skin Pigment Model
Early facial expression recognition using early RankBoost
Efficient Modeling of Objects BRDF with Planned Sampling
Graph-based joint clustering of fixations and visual entities
Head direction estimation from low resolution images with scene adaptation
Image Preference Estimation from Eye Movements with A Data-driven Approach
Separating Reflective and Fluorescent Components Using High Frequency Illumination in the Spectral Domain
Social Group Discovery from Surveillance Videos: A Data-Driven Approach with Attention-Based Cues
Spectral Imaging Using Basis Lights
Uncalibrated Photometric Stereo for Unknown Isotropic Reflectances
Bispectral photometric stereo based on fluorescence
Camera spectral sensitivity estimation from a single image under unknown illumination by using fluorescence
Coupling eye-motion and ego-motion features for first-person activity recognition
Deblurring Vein Images and Removing Skin Wrinkle Patterns by Using Tri-band Illumination
Denoising hyperspectral images using spectral domain statistics
Dominant Driving Operations in Curve Sections Diffentiating Skilled and Unskilled Drivers
Driving skill analysis using machine learning The full curve and curve segmented cases
Head pose-free appearance-based gaze sensing via eye image synthesis
Illumination normalization of face images with cast shadows
Incorporating visual field characteristics into a saliency map
Touch-consistent perspective for direct interaction under motion parallax
Toward Efficient Acquisition of BRDFs with Fewer Samples
A Head Pose-free Approach for Appearance-based Gaze Estimation
Aesthetic quality classification of photographs based on color harmony
Appearance-based head pose estimation with scene-specific adaptation
Attention Prediction in Egocentric Video Using Motion and Visual Saliency
Early facial expression recognition with high-frame rate 3D sensing
Estimating change in head pose from low resolution video using LBP-based tracking
Fast unsupervised ego-action learning for first-person sports videos
Inferring human gaze from appearance via adaptive linear regression
Photometric stereo with auto-radiometric calibration
Surface Reconstruction in Photometric Stereo with Calibration Error
Calibration-free gaze sensing using saliency maps
Can Saliency Map Models Predict Human Egocentric Visual Attention?
Detecting Forgery From Static-Scene Video Based on Inconsistency in Noise Level Functions
Fast Spectral Reflectance Recovery Using DLP Projector
Image Enhancement of Low-light Scenes with Near-infrared Flash Images
In situ estimation of the surface acoustic impedance in realistic interiors by an acoustical inverse approach
Recognition of Blurred Faces via Facial Deblurring Combined with Blur-Tolerant Descriptors
Recognizing Multiple Objects Based on Co-occurence of Categories
Recovery of audio-to-video synchronization through analysis of cross-modality correlation
Segmentation of the Speaker's Face Region with Audiovisual Correlation
Video Segmentation with Motion Smoothness
Video Temporal Super-Resolution Based on Self-similarity
Attached shadow coding: Estimating surface normals from shadows under unknown reflectance and lighting conditions
Detecting Video Forgeries Based on Noise Characteristics
Image Enhancement of Low-Light Scenes with Near-Infrared Flash Images
On the <I>in situ</I> estimation of surface acoustic impedance in interiors of arbitrary shape by acoustical inverse methods
Pose-Invariant Facial Expression Recognition Using Variable-Intensity Templates
Quantitative Evaluation of Automatic Parts Delivery in "Attentive Workbench" Supporting Workers in Cell Production
Recognizing Multiple Objects via Regression Incorporating the Co-occurrence of Categories
Sensation-based photo cropping
Using individuality to track individuals: Clustering individual trajectories in crowds using local appearance and frequency trait
Video segmentation with motion smoothness
Visual localization of non-stationary sound sources
3-D Interaction with Wall-Sized Display and Information Transportation using Mobile Phones
An Incremental Learning Method for Unconstrained Gaze Estimation
An inverse method to estimate the acoustic impedance on the surfaces of complex-shaped interiors
Combining Stochastic and Deterministic Search for Pose-Invariant Facial Expression Recognition
Discovering Primitive Action Categories by Leveraging Relevant Visual Context
Finding Speaker Face Region by Audiovisual Correlation
Incorporating Long-Term Observations of Human Actions for Stable 3D People Tracking
Recognizing Overlapped Human Activities from a Sequence of Primitive Actions via Deleted Interpolation
Recovering audio-to-video synchronization by audiovisual correlation analysis
Recovering the Basic Structure of Human Activities from Noisy Video-Based Symbol Strings
Appearance Sampling of Real Objects for Variable Illumination
In situ estimation of acoustic impedance on the surfaces of realistic interiors: An inverse approach
Incorporating environment models for improving vision-based tracking of people
Information Layout and Interaction on Virtual and Real Rotary Tables
Learning motion patterns and anomaly detection by Human trajectory analysis
Person-Independent Monocular Tracking of Face and Facial Actions with Multilinear Models
Pose-Invariant Facial Expression Recognition Using Variable-Intensity Templates
Shape Reconstruction Based on Similarity in Radiance Changes under Varying Illumination
3D Head Tracking using the Particle Filter with Cascaded Classifiers
Effects of Image Segmentation for Approximating Object Appearance Under Near Lighting
Face Recognition Under Varying Illumination Based on MAP Estimation Incorporating Correlation Between Surface Points
Gaze Estimation from Low Resolution Images
Human Supporting Production Cell "Attentive Workbench"
Robust Content-Dependent Photometric Projector Compensation
Arrangement planning for multiple self-moving trays in human supporting production cell "attentive workbench"
Combining head tracking and mouse input for a GUI on multiple monitors
Creating photorealistic virtual model with polarization-based vision system
Deleted Interpolation Using a Hierarchical Bayesian Grammar Network for Recognizing Human Activity
Fast image synthesis of virtual objects in a real scene with natural shadings
Head Pose Estimation System Based on Particle Filtering with Adaptive Diffusion Control
Information Layout and Interaction Techniques on an Augmented Round Table
Motion Control of Self-Moving Trays for Human Supporting Production Cell "Attentive Workbench"
Radiometric Compensation in a Projector-Camera System Based Properties of Human Vision System
Real-Time Modeling of Face Deformation for 3D Head Pose Estimation
Steerable Projector Calibration
Using Extended Light Sources for Modeling Object Appearance under Varying Illumination
Attentive Workbench: An Intelligent Production Cell Supporting Human Workers
Distributed control of multiple self-moving trays for an intelligent cell production system
EnhancedTable: An Augmented Table System for Supporting Face-to-Face Meeting in Ubiquitous Environment
EnhancedTable: Supporting a Small Meeting in Ubiquitous and Augmented Environment
Reflectance Estimation from Motion under Complex Illumination
Spherical Harmonics vs. Haar Wavelets: Basis for Recovering Illumination from Cast Shadows
Support Vector Machines for Object Recognition under Varying Illumination Conditions
Video Content Manipulation by Means of Content Annotation and Nonsymbolic Gestural Interfaces
Video-Based Tracking of User's Motion for Augmented Desk Interface
Appearance Sampling for Obtaining A Set of Basis Images for Variable Illumination
Illumination from Shadows
Object Recognition Based on Photometric Alignment Using RANSAC
Ubiquitous display for dynamically changing environment
Determining surface orientations of transparent objects based on polarization degrees in visible and infrared wavelengths
Interaction for entertainment contents based on direct manipulation with bare hands
Real-Time Fingertip Tracking and Gesture Recognition
Real-Time Tracking of Multiple Fingertips and Gesture Recognition for Augmented Desk Interface Systems
Two-handed drawing on augmented desk
Two-handed drawing on augmented desk system
Vision-Based Face Tracking System for Large Displays
Eigen-Texture Method: Appearance Compression and Synthesis Based on a 3D Model
Integrating paper and digital information on EnhancedDesk: a method for realtime finger tracking on an augmented desk system
Interactive object registration and recognition for augmented desk interface
Measurement of surface orientations of transparent objects using polarization in highlight
Real-Time Input of 3D Pose and Gestures of a User's Hand and Its Applications for HCI
SnapLink: Interactive Object Registration and Recognition for Augmented Desk Interface
Stability Issues in Recovering Illumination Distribution from Brightness in Shadows
Vision-based face tracking system for window interface: prototype application and empirical studies
Appearance-based visual learning and object recognition with illumination invariance
Fast Tracking of Hands and Fingertips in Infrared Images for Augmented Desk Interface
Interactive textbook and interactive Venn diagram: natural and intuitive interfaces on augmented desk system
Modeling Cultural Heritage through Observation
Robust Localization for 3D Object Recognition Using Local EGI and 3D Template Matching with M-Estimators
Measurement of surface orientations of transparent objects by use of polarization in highlight
Acquiring a Radiance Distribution to Superimpose Virtual Objects onto a Real Scene
Appearance Compression and Synthesis based on 3D Model for Mixed Reality
Appearance modeling for mixed reality: photometric aspects
Eigen-Texture Method: Appearance Compression Based on 3D Model
Illumination Distribution from Brightness in Shadows: Adaptive Estimation of Illumination Distribution with Unknown Reflectance Properties in Shadow Regions
Illumination Distribution from Shadows
Measurement of Surface Orientations of Transparent Objects Using Polarization in Highlight
Object recognition using local EGI and 3D models with M-estimators
Photometric modeling for mixed reality
3D shape and reflectance morphing
A Method for Estimating Illumination Distribution of a Real Scene Based on Soft Shadows
Acquiring a Radiance Distribution to Superimpose Virtual Objects onto Real Scene
Appearance Based Visual Learning and Object Recognition with Illumination Invariance
Consensus Surfaces for Modeling 3D Objects from Multiple Range Images
Localization of insulators in electric distribution systems by using 3D template matching from multiple range images
Measuring Object Surface Shape and Reflectance Properties
3D shape and reflectance morphing
Object shape and reflectance modeling from observation
Object Shape Morphing with Intermediate Reflectance Properties
Visual learning and object verification with illumination invariance
Generating virtual worlds from real worlds using computer vision
Photorealistic object model generation from observation for virtual reality applications
Recovering shape and reflectance properties from a sequence of range and color images
Reflectance Analysis for 3D Computer Graphics Model Generation
Reflectance analysis under solar illumination
Temporal-color space analysis of reflection
Temporal-color space analysis of reflection