Can you trust your model's uncertainty? evaluating predictive uncertainty under dataset shift Y Ovadia, E Fertig, J Ren, Z Nado, D Sculley, S Nowozin, J Dillon, ... Advances in neural information processing systems 32, 2019 | 1806 | 2019 |
Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ... arXiv preprint arXiv:2312.11805, 2023 | 1361 | 2023 |
Palm 2 technical report R Anil, AM Dai, O Firat, M Johnson, D Lepikhin, A Passos, S Shakeri, ... arXiv preprint arXiv:2305.10403, 2023 | 1184 | 2023 |
Underspecification presents challenges for credibility in modern machine learning A D'Amour, K Heller, D Moldovan, B Adlam, B Alipanahi, A Beutel, ... Journal of Machine Learning Research 23 (226), 1-61, 2022 | 742 | 2022 |
On empirical comparisons of optimizers for deep learning D Choi arXiv preprint arXiv:1910.05446, 2019 | 368 | 2019 |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context M Reid, N Savinov, D Teplyashin, D Lepikhin, T Lillicrap, J Alayrac, ... arXiv preprint arXiv:2403.05530, 2024 | 297 | 2024 |
Evaluating prediction-time batch normalization for robustness under covariate shift Z Nado, S Padhy, D Sculley, A D'Amour, B Lakshminarayanan, J Snoek arXiv preprint arXiv:2006.10963, 2020 | 215 | 2020 |
Which algorithmic choices matter at which batch sizes? insights from a noisy quadratic model G Zhang, L Li, Z Nado, J Martens, S Sachdeva, G Dahl, C Shallue, ... Advances in neural information processing systems 32, 2019 | 140 | 2019 |
Plex: Towards reliability using pretrained large model extensions D Tran, J Liu, MW Dusenberry, D Phan, M Collier, J Ren, K Han, Z Wang, ... arXiv preprint arXiv:2207.07411, 2022 | 108 | 2022 |
Uncertainty baselines: Benchmarks for uncertainty & robustness in deep learning Z Nado, N Band, M Collier, J Djolonga, MW Dusenberry, S Farquhar, ... arXiv preprint arXiv:2106.04015, 2021 | 106 | 2021 |
A loss curvature perspective on training instabilities of deep learning models J Gilmer, B Ghorbani, A Garg, S Kudugunta, B Neyshabur, D Cardoze, ... International Conference on Learning Representations, 2022 | 64* | 2022 |
Benchmarking bayesian deep learning on diabetic retinopathy detection tasks N Band, TGJ Rudner, Q Feng, A Filos, Z Nado, MW Dusenberry, G Jerfel, ... arXiv preprint arXiv:2211.12717, 2022 | 47 | 2022 |
Revisiting one-vs-all classifiers for predictive uncertainty and out-of-distribution detection in neural networks S Padhy, Z Nado, J Ren, J Liu, J Snoek, B Lakshminarayanan arXiv preprint arXiv:2007.05134, 2020 | 46 | 2020 |
AG: Imperative-style Coding with Graph-based Performance D Moldovan, J Decker, F Wang, A Johnson, B Lee, Z Nado, D Sculley, ... Proceedings of Machine Learning and Systems 1, 389-405, 2019 | 45 | 2019 |
Adaptive gradient methods at the edge of stability JM Cohen, B Ghorbani, S Krishnan, N Agarwal, S Medapati, M Badura, ... arXiv preprint arXiv:2207.14484, 2022 | 43 | 2022 |
A simple approach to improve single-model deep uncertainty via distance-awareness JZ Liu, S Padhy, J Ren, Z Lin, Y Wen, G Jerfel, Z Nado, J Snoek, D Tran, ... Journal of Machine Learning Research 24 (42), 1-63, 2023 | 39 | 2023 |
A large batch optimizer reality check: Traditional, generic optimizers suffice across batch sizes Z Nado, JM Gilmer, CJ Shallue, R Anil, GE Dahl arXiv preprint arXiv:2102.06356, 2021 | 39 | 2021 |
Pre-trained Gaussian processes for Bayesian optimization Z Wang, GE Dahl, K Swersky, C Lee, Z Nado, J Gilmer, J Snoek, ... Journal of Machine Learning Research 25 (212), 1-83, 2024 | 28 | 2024 |
Benchmarking neural network training algorithms GE Dahl, F Schneider, Z Nado, N Agarwal, CS Sastry, P Hennig, ... arXiv preprint arXiv:2306.07179, 2023 | 15 | 2023 |
Stochastic gradient Langevin dynamics that exploit neural network structure Z Nado, J Snoek, R Grosse, D Duvenaud, B Xu, J Martens | 11 | 2018 |