David Harwath
Title
Cited by
Cited by
Year
Unsupervised learning of spoken language with visual context
D Harwath, A Torralba, JR Glass
Neural Information Processing Systems Foundation, Inc., 2017
1632017
Jointly discovering visual objects and spoken words from raw sensory input
D Harwath, A Recasens, D Surís, G Chuang, A Torralba, J Glass
Proceedings of the European conference on computer vision (ECCV), 649-665, 2018
1002018
A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition
A Jansen, E Dupoux, S Goldwater, M Johnson, S Khudanpur, K Church, ...
2013 IEEE International Conference on Acoustics, Speech and Signal …, 2013
982013
Deep multimodal semantic embeddings for speech and images
D Harwath, J Glass
2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU …, 2015
902015
Learning word-like units from joint audio-visual analysis
D Harwath, JR Glass
arXiv preprint arXiv:1701.07481, 2017
762017
Zero resource spoken audio corpus analysis
DF Harwath, TJ Hazen, JR Glass
2013 IEEE International Conference on Acoustics, Speech and Signal …, 2013
322013
Vision as an interlingua: Learning multilingual semantic embeddings of untranscribed speech
D Harwath, G Chuang, J Glass
2018 IEEE International Conference on Acoustics, Speech and Signal …, 2018
272018
Learning hierarchical discrete linguistic units from visually-grounded speech
D Harwath, WN Hsu, J Glass
arXiv preprint arXiv:1911.09602, 2019
262019
Look, Listen, and Decode: Multimodal Speech Recognition with Images
F Sun, D Harwath, J Glass
IEEE Workshop on Spoken Language Technology, 2016
212016
Topic identification based extrinsic evaluation of summarization techniques applied to conversational speech
D Harwath, TJ Hazen
2012 IEEE International Conference on Acoustics, Speech and Signal …, 2012
212012
Learning modality-invariant representations for speech and images
K Leidal, D Harwath, J Glass
2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU …, 2017
182017
Towards visually grounded sub-word speech unit discovery
D Harwath, J Glass
ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019
172019
Speech recognition without a lexicon—bridging the gap between graphemic and phonetic systems
D Harwath, JR Glass
Fifteenth Annual Conference of the International Speech Communication …, 2014
132014
Avlnet: Learning audio-visual language representations from instructional videos
A Rouditchenko, A Boggust, D Harwath, D Joshi, S Thomas, K Audhkhasi, ...
arXiv preprint arXiv:2006.09199, 2020
102020
On the use of acoustic unit discovery for language recognition
SH Shum, DF Harwath, N Dehak, JR Glass
IEEE/ACM Transactions on Audio, Speech, and Language Processing 24 (9), 1665 …, 2016
102016
Towards Bilingual Lexicon Discovery From Visually Grounded Speech Audio.
E Azuh, D Harwath, JR Glass
INTERSPEECH, 276-280, 2019
92019
Choosing useful word alternates for automatic speech recognition correction interfaces
D Harwath, A Gruenstein, I McGraw
Fifteenth Annual Conference of the International Speech Communication …, 2014
82014
Transfer learning from audio-visual grounding to speech recognition
WN Hsu, D Harwath, J Glass
arXiv preprint arXiv:1907.04355, 2019
72019
Learning words by drawing images
D Suris, A Recasens, D Bau, D Harwath, J Glass, A Torralba
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2019
62019
Trilingual semantic embeddings of visually grounded speech with self-attention mechanisms
Y Ohishi, A Kimura, T Kawanishi, K Kashino, D Harwath, J Glass
ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020
52020
The system can't perform the operation now. Try again later.
Articles 1–20