Frederik Kunstner
Frederik Kunstner
Verified email at - Homepage
Cited by
Cited by
Limitations of the empirical Fisher approximation for natural gradient descent
F Kunstner, L Balles, P Hennig
Advances in Neural Information Processing Systems 32, 4158--4169, 2019
BackPACK: Packing more into Backprop
F Dangel, F Kunstner, P Hennig
International Conference on Learning Representations, 2020
Slang: Fast structured covariance approximations for bayesian deep learning with natural gradient
A Mishkin, F Kunstner, D Nielsen, M Schmidt, ME Khan
Advances in Neural Information Processing Systems 31, 6248--6258, 2018
Adaptive gradient methods converge faster with over-parameterization (but you should do a line-search)
S Vaswani, I Laradji, F Kunstner, SY Meng, M Schmidt, S Lacoste-Julien
arXiv preprint arXiv:2006.06835, 2020
Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent
F Kunstner, R Kumar, M Schmidt
International Conference on Artificial Intelligence and Statistics 130, 3295 …, 2021
Noise is not the main factor behind the gap between sgd and adam on transformers, but sign descent might be
F Kunstner, J Chen, JW Lavington, M Schmidt
arXiv preprint arXiv:2304.13960, 2023
Fully Quantized Distributed Gradient Descent
F Künstner, SU Stich, M Jaggi
Technical report, EPFL, 2017
Searching for optimal per-coordinate step-sizes with multidimensional backtracking
F Kunstner, V Sanches Portella, M Schmidt, N Harvey
Advances in Neural Information Processing Systems 36, 2024
Convergence Rates for the MAP of an Exponential Family and Stochastic Mirror Descent--an Open Problem
RL Priol, F Kunstner, D Scieur, S Lacoste-Julien
arXiv preprint arXiv:2111.06826, 2021
Variance Reduced Model Based Methods: New rates and adaptive step sizes
RM Gower, F Kunstner, M Schmidt
OPT 2023: Optimization for Machine Learning, 2023
Why Adam Outperforms Gradient Descent on Language Models: A Heavy-Tailed Class Imbalance Problem
R Yadav, F Kunstner, M Schmidt, A Bietti
NeurIPS workshop, Optimization for Machine Learning, 2023
The system can't perform the operation now. Try again later.
Articles 1–11