Language models are few-shot learners T Brown, B Mann, N Ryder, M Subbiah, JD Kaplan, P Dhariwal, ... Advances in neural information processing systems 33, 1877-1901, 2020 | 23484 | 2020 |
Zero-shot text-to-image generation A Ramesh, M Pavlov, G Goh, S Gray, C Voss, A Radford, M Chen, ... International Conference on Machine Learning, 8821-8831, 2021 | 3454 | 2021 |
Evaluating large language models trained on code M Chen, J Tworek, H Jun, Q Yuan, HPO Pinto, J Kaplan, H Edwards, ... arXiv preprint arXiv:2107.03374, 2021 | 1863 | 2021 |
Dota 2 with large scale deep reinforcement learning C Berner, G Brockman, B Chan, V Cheung, P Dębiak, C Dennison, ... arXiv preprint arXiv:1912.06680, 2019 | 1735 | 2019 |
Generating long sequences with sparse transformers R Child, S Gray, A Radford, I Sutskever arXiv preprint arXiv:1904.10509, 2019 | 1498 | 2019 |
Fast algorithms for convolutional neural networks A Lavin, S Gray Proceedings of the IEEE Conference on Computer Vision and Pattern …, 2016 | 1126 | 2016 |
Scaling laws for neural language models J Kaplan, S McCandlish, T Henighan, TB Brown, B Chess, R Child, ... arXiv preprint arXiv:2001.08361, 2020 | 1052 | 2020 |
GPT-4 technical report OpenAI arXiv, 2023 | 1037* | 2023 |
Flexpoint: An adaptive numerical format for efficient training of deep neural networks U Köster, T Webb, X Wang, M Nassar, AK Bansal, W Constable, O Elibol, ... Advances in neural information processing systems 30, 2017 | 324 | 2017 |
Scaling Laws for Autoregressive Generative Modeling T Henighan, J Kaplan, M Katz, M Chen, C Hesse, J Jackson, H Jun, ... arXiv preprint arXiv:2010.14701, 2020 | 225 | 2020 |
Gpu kernels for block-sparse weights S Gray, A Radford, DP Kingma arXiv preprint arXiv:1711.09224 3 (2), 2, 2017 | 178 | 2017 |
DALL· E: Creating images from text A Ramesh, M Pavlov, G Goh, S Gray, M Chen, R Child, V Misra, P Mishkin, ... OpenAI blog. https://openai. com/blog/dall-e, 2021 | 78 | 2021 |