Learning where to learn: Gradient sparsity in meta and continual learning J Von Oswald, D Zhao, S Kobayashi, S Schug, M Caccia, N Zucchet, ... Advances in Neural Information Processing Systems 34, 5250-5263, 2021 | 56 | 2021 |
Uncovering mesa-optimization algorithms in transformers J Von Oswald, E Niklasson, M Schlegel, S Kobayashi, N Zucchet, ... arXiv preprint arXiv:2309.05858, 2023 | 33* | 2023 |
Random initialisations performing above chance and how to find them F Benzing, S Schug, R Meier, J Von Oswald, Y Akram, N Zucchet, ... arXiv preprint arXiv:2209.07509, 2022 | 22 | 2022 |
Beyond backpropagation: bilevel optimization through implicit differentiation and equilibrium propagation N Zucchet, J Sacramento Neural Computation 34 (12), 2022 | 21 | 2022 |
The least-control principle for local learning at equilibrium A Meulemans, N Zucchet, S Kobayashi, J Von Oswald, J Sacramento Advances in Neural Information Processing Systems, 2022 | 20 | 2022 |
A contrastive rule for meta-learning N Zucchet, S Schug, J Von Oswald, D Zhao, J Sacramento Advances in Neural Information Processing Systems, 2022 | 20 | 2022 |
Gated recurrent neural networks discover attention N Zucchet, S Kobayashi, Y Akram, J Von Oswald, M Larcher, A Steger, ... arXiv preprint arXiv:2309.01775, 2023 | 9 | 2023 |
Online learning of long-range dependencies N Zucchet, R Meier, S Schug, A Mujika, J Sacramento Advances in Neural Information Processing Systems, 2023 | 9 | 2023 |
Recurrent neural networks: vanishing and exploding gradients are not the end of the story N Zucchet, A Orvieto arXiv preprint arXiv:2405.21064, 2024 | | 2024 |