Follow
Frederik Kunstner
Frederik Kunstner
Verified email at cs.ubc.ca - Homepage
Title
Cited by
Cited by
Year
Limitations of the empirical Fisher approximation for natural gradient descent
F Kunstner, L Balles, P Hennig
Advances in Neural Information Processing Systems 32, 4158--4169, 2019
1372019
BackPACK: Packing more into Backprop
F Dangel, F Kunstner, P Hennig
International Conference on Learning Representations, 2020
782020
Slang: Fast structured covariance approximations for bayesian deep learning with natural gradient
A Mishkin, F Kunstner, D Nielsen, M Schmidt, ME Khan
Advances in Neural Information Processing Systems 31, 6248--6258, 2018
542018
Adaptive gradient methods converge faster with over-parameterization (but you should do a line-search)
S Vaswani, I Laradji, F Kunstner, SY Meng, M Schmidt, S Lacoste-Julien
arXiv preprint arXiv:2006.06835, 2020
23*2020
Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent
F Kunstner, R Kumar, M Schmidt
International Conference on Artificial Intelligence and Statistics 130, 3295 …, 2021
162021
Fully Quantized Distributed Gradient Descent
F Künstner, SU Stich, M Jaggi
Technical report, EPFL, 2017
82017
Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be
F Kunstner, J Chen, JW Lavington, M Schmidt
arXiv preprint arXiv:2304.13960, 2023
22023
Convergence Rates for the MAP of an Exponential Family and Stochastic Mirror Descent--an Open Problem
RL Priol, F Kunstner, D Scieur, S Lacoste-Julien
arXiv preprint arXiv:2111.06826, 2021
12021
Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking
F Kunstner, VS Portella, M Schmidt, N Harvey
arXiv preprint arXiv:2306.02527, 2023
2023
Heavy-tailed noise does not explain the gap between SGD and Adam on Transformers
J Chen, F Kunstner, M Schmidt
NeurIPS OPTML workshop, 2021
2021
The system can't perform the operation now. Try again later.
Articles 1–10