Painless stochastic gradient: Interpolation, line-search, and convergence rates S Vaswani, A Mishkin, I Laradji, M Schmidt, G Gidel, S Lacoste-Julien Advances in neural information processing systems 32, 2019 | 113 | 2019 |
Slang: Fast structured covariance approximations for bayesian deep learning with natural gradient A Mishkin, F Kunstner, D Nielsen, M Schmidt, ME Khan Advances in Neural Information Processing Systems 31, 2018 | 41 | 2018 |
To each optimizer a norm, to each norm its generalization S Vaswani, R Babanezhad, J Gallego, A Mishkin, S Lacoste-Julien, ... arXiv preprint arXiv:2006.06821, 2020 | 4 | 2020 |
Interpolation, Growth Conditions, and Stochastic Gradient Descent A Mishkin University of British Columbia, 2020 | 1 | 2020 |
Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone Decompositions A Mishkin, A Sahiner, M Pilanci arXiv preprint arXiv:2202.01331, 2022 | | 2022 |
How to make your optimizer generalize better S Vaswani, R Babenzhad, J Gallego, A Mishkin, S Lacoste-Julien, ... | | |