Antoine Miech

Cited by

	All	Since 2019
Citations	7302	7209
h-index	18	18
i10-index	21	21

3300

1650

825

2475

201820192020202120222023202472 104 224 523 930 2150 3267

Public access

View all

9 articles

2 articles

available

not available

Based on funding mandates

Co-authors

Ivan LaptevVisiting professor at MBZUAI, on leave from INRIAVerified email at inria.fr
Josef SivicCzech Technical University, CIIRC, ELLIS Unit PragueVerified email at cvut.cz
Jean-Baptiste AlayracDeepMind, LondonVerified email at google.com
Andrew ZissermanUniversity of OxfordVerified email at robots.ox.ac.uk
Cordelia SchmidResearch director INRIA Verified email at inria.fr
Antoine YangGoogle DeepMindVerified email at google.com
Makarand TapaswiIIIT Hyderabad, Wadhwani AIVerified email at iiit.ac.in
Dimitri ZhukovTractableVerified email at tractable.ai
Lorenzo TorresaniMeta, Fundamental AI Research (FAIR)Verified email at meta.com
Heng WangTikTokVerified email at fb.com
Du TranGoogleVerified email at google.com
Jeff DonahueResearch Scientist, DeepMindVerified email at google.com
Piotr BojanowskiMeta FAIRVerified email at fb.com
Karen SimonyanChief Scientist, Microsoft AIVerified email at microsoft.com

Antoine Miech

Google DeepMind

Verified email at google.com - Homepage

Computer Vision


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Flamingo: a visual language model for few-shot learning JB Alayrac, J Donahue, P Luc, A Miech, I Barr, Y Hasson, K Lenc, ... Advances in neural information processing systems 35, 23716-23736, 2022	2432	2022
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips A Miech, D Zhukov, JB Alayrac, M Tapaswi, I Laptev, J Sivic Proceedings of the IEEE International Conference on Computer Vision, 2630-2640, 2019	1121	2019
Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ... arXiv preprint arXiv:2312.11805, 2023	1042	2023
End-to-end learning of visual representations from uncurated instructional videos A Miech, JB Alayrac, L Smaira, I Laptev, J Sivic, A Zisserman Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020	737	2020
Learnable pooling with context gating for video classification A Miech, I Laptev, J Sivic arXiv preprint arXiv:1706.06905, 2017	389	2017
Just ask: Learning to answer questions from millions of narrated videos A Yang, A Miech, J Sivic, I Laptev, C Schmid Proceedings of the IEEE/CVF international conference on computer vision …, 2021	265	2021
Learning a text-video embedding from incomplete and heterogeneous data A Miech, I Laptev, J Sivic arXiv preprint arXiv:1804.02516, 2018	254	2018
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context M Reid, N Savinov, D Teplyashin, D Lepikhin, T Lillicrap, J Alayrac, ... arXiv preprint arXiv:2403.05530, 2024	196	2024
Zero-shot video question answering via frozen bidirectional language models A Yang, A Miech, J Sivic, I Laptev, C Schmid Advances in Neural Information Processing Systems 35, 124-141, 2022	168	2022
Vid2seq: Large-scale pretraining of a visual language model for dense video captioning A Yang, A Nagrani, PH Seo, A Miech, J Pont-Tuset, I Laptev, J Sivic, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023	142	2023
Thinking fast and slow: Efficient text-to-visual retrieval with transformers A Miech, JB Alayrac, I Laptev, J Sivic, A Zisserman Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021	140	2021
Tubedetr: Spatio-temporal video grounding with transformers A Yang, A Miech, J Sivic, I Laptev, C Schmid Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022	82	2022
Leveraging the present to anticipate the future in videos A Miech, I Laptev, J Sivic, H Wang, L Torresani, D Tran Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2019	79	2019
Mikoł aj Binkowski, Ricardo Barreira, Oriol Vinyals, Andrew Zisserman, and Karén Simonyan. Flamingo: a visual language model for few-shot learning JB Alayrac, J Donahue, P Luc, A Miech, I Barr, Y Hasson, K Lenc, ... Advances in Neural Information Processing Systems 35, 23716-23736, 2022	52	2022
Learning from video and text via large-scale discriminative clustering A Miech, JB Alayrac, P Bojanowski, I Laptev, J Sivic Proceedings of the IEEE international conference on computer vision, 5257-5266, 2017	48	2017
Learning to answer visual questions from web videos A Yang, A Miech, J Sivic, I Laptev, C Schmid arXiv preprint arXiv:2205.05019, 2022	28	2022
Look for the change: Learning object states and state-modifying actions from untrimmed web videos T Souček, JB Alayrac, A Miech, I Laptev, J Sivic Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022	25	2022
Rareact: A video dataset of unusual interactions A Miech, JB Alayrac, I Laptev, J Sivic, A Zisserman arXiv preprint arXiv:2008.01018, 2020	24	2020
Perception test: A diagnostic benchmark for multimodal video models V Patraucean, L Smaira, A Gupta, A Recasens, L Markeeva, D Banarse, ... Advances in Neural Information Processing Systems 36, 2024	17	2024
Zorro: the masked multimodal transformer A Recasens, J Lin, J Carreira, D Jaegle, L Wang, J Alayrac, P Luc, ... arXiv preprint arXiv:2301.09595, 2023	17	2023

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors