Follow
Amirkeivan Mohtashami
Title
Cited by
Cited by
Year
Meditron-70b: Scaling medical pretraining for large language models
Z Chen, AH Cano, A Romanou, A Bonnet, K Matoba, F Salvi, ...
arXiv preprint arXiv:2311.16079, 2023
2252023
Landmark Attention: Random-Access Infinite Context Length for Transformers
A Mohtashami, M Jaggi
Advances in Neural Information Processing Systems (NeurIPS) 2023, 2023
107*2023
Quarot: Outlier-free 4-bit inference in rotated llms
S Ashkboos, A Mohtashami, ML Croci, B Li, P Cameron, M Jaggi, ...
arXiv preprint arXiv:2404.00456, 2024
532024
Masked Training of Neural Networks with Partial Gradients
A Mohtashami, M Jaggi, SU Stich
The 25th International Conference on Artificial Intelligence and Statistics, 2021
33*2021
Critical parameters for scalable distributed learning with large batches and asynchronous updates
S Stich, A Mohtashami, M Jaggi
International Conference on Artificial Intelligence and Statistics, 4042-4050, 2021
232021
The splay-list: A distribution-adaptive concurrent skip-list
V Aksenov, D Alistarh, A Drozdova, A Mohtashami
34th International Symposium on Distributed Computing 179, 2020
142020
Characterizing & finding good data orderings for fast convergence of sequential gradient methods
A Mohtashami, S Stich, M Jaggi
arXiv preprint arXiv:2202.01838, 2022
132022
Special Properties of Gradient Descent with Large Learning Rates
A Mohtashami, M Jaggi, S Stich
ICML 2023, 2022
10*2022
Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models
A Mohtashami, M Verzetti, PK Rubenstein
Practical ML for Developing Countries Workshop @ ICLR 2023, 2023
62023
Social Learning: Towards Collaborative Learning with Large Language Models
A Mohtashami, F Hartmann, S Gooding, L Zilka, M Sharifi, ...
arXiv preprint arXiv:2312.11441, 2023
52023
DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
M Pagliardini, A Mohtashami, F Fleuret, M Jaggi
arXiv preprint arXiv:2402.02622, 2024
32024
CoTFormer: More Tokens With Attention Make Up For Less Depth
A Mohtashami, M Pagliardini, M Jaggi
Workshop on Advancing Neural Network Training @ NeurIPS 2023, 2023
12023
TPS (Task Preparation System): A Tool for Developing Tasks in Programming Contests
K MIRJALALI, AK MOHTASHAMI, M ROGHANI, H ZARRABI-ZADEH
12019
Reproducibility Report for "On Warm-Starting Neural Network Training"
A Mohtashami, E Pajouheshgar, K Kireev
ML Reproducibility Challenge 2020, 2021
2021
A Gradient-Based Approach to Neural Networks Structure Learning
AA Moinfar, A Mohtashami, M Soleymani, A Sharifi-Zarchi
The system can't perform the operation now. Try again later.
Articles 1–15