Abstract: Audio-Visual Speech Recognition (AVSR) is a promising approach to improving the accuracy and robustness of speech recognition systems with the assistance of visual cues in challenging ...
Abstract: With only video-level event labels, this paper targets at the task of weakly-supervised audio-visual event perception (WS-AVEP), which aims to temporally localize and categorize events that ...