Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, JB Alayrac, J Yu, R Soricut, J Schalkwyk, ... arXiv preprint arXiv:2312.11805, 2023 | 2344 | 2023 |
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ... arXiv preprint arXiv:2206.04615, 2022 | 1240 | 2022 |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context G Team, P Georgiev, VI Lei, R Burnell, L Bai, A Gulati, G Tanzer, ... arXiv preprint arXiv:2403.05530, 2024 | 878 | 2024 |
Transformer memory as a differentiable search index Y Tay, VQ Tran, M Dehghani, J Ni, D Bahri, H Mehta, Z Qin, K Hui, Z Zhao, ... Advances in Neural Information Processing Systems, 2022 | 241 | 2022 |
Long range language modeling via gated state spaces H Mehta, A Gupta, A Cutkosky, B Neyshabur International Conference on Learning Representations, 2022 | 217 | 2022 |
Momentum Improves Normalized SGD A Cutkosky, H Mehta International Conference on Machine Learning, 2020 | 130 | 2020 |
Transferable representation learning in vision-and-language navigation H Huang, V Jain, H Mehta, A Ku, G Magalhaes, J Baldridge, E Ie Proceedings of the IEEE/CVF international conference on computer vision …, 2019 | 98 | 2019 |
High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails A Cutkosky, H Mehta Advances in Neural Information Processing Systems, 2021 | 60 | 2021 |
Large scale transfer learning for differentially private image classification H Mehta, A Thakurta, A Kurakin, A Cutkosky Transactions on Machine Learning Research, 2022 | 46 | 2022 |
Optimal Stochastic Non-smooth Non-convex Optimization through Online-to-Non-convex Conversion A Cutkosky, H Mehta, F Orabona International Conference on Machine Learning, 2023 | 40 | 2023 |
Retouchdown: Adding touchdown to streetlearn as a shareable resource for language grounding tasks in street view H Mehta, Y Artzi, J Baldridge, E Ie, P Mirowski arXiv preprint arXiv:2001.03671, 2020 | 37 | 2020 |
Multi-modal discriminative model for vision-and-language navigation H Huang, V Jain, H Mehta, J Baldridge, E Ie arXiv preprint arXiv:1905.13358, 2019 | 31 | 2019 |
Extreme Memorization via Scale of Initialization H Mehta, A Cutkosky, B Neyshabur International Conference on Learning Representations, 2021 | 21 | 2021 |
The road less scheduled A Defazio, XA Yang, H Mehta, K Mishchenko, A Khaled, A Cutkosky arXiv preprint arXiv:2405.15682, 2024 | 20 | 2024 |
Simplifying and understanding state space models with diagonal linear rnns A Gupta, H Mehta, J Berant arXiv preprint arXiv:2212.00768, 2022 | 20 | 2022 |
When, why and how much? adaptive learning rate scheduling by refinement A Defazio, A Cutkosky, H Mehta, K Mishchenko arXiv preprint arXiv:2310.07831, 2023 | 13 | 2023 |
Mechanic: A Learning Rate Tuner A Cutkosky, A Defazio, H Mehta Advances in Neural Information Processing Systems, 2023 | 13 | 2023 |
Towards large scale transfer learning for differentially private image classification H Mehta, AG Thakurta, A Kurakin, A Cutkosky Transactions on Machine Learning Research, 2023 | 11 | 2023 |
Differentially Private Image Classification from Features H Mehta, W Krichene, A Thakurta, A Kurakin, A Cutkosky Transactions on Machine Learning Research, 2022 | 10 | 2022 |
VALAN: vision and language agent navigation L Lansing, V Jain, H Mehta, H Huang, E Ie arXiv preprint arXiv:1912.03241, 2019 | 8 | 2019 |