Follow
Amanda Askell
Amanda Askell
Anthropic
Verified email at askell.io - Homepage
Title
Cited by
Cited by
Year
Language models are few-shot learners
T Brown, B Mann, N Ryder, M Subbiah, JD Kaplan, P Dhariwal, ...
Advances in neural information processing systems 33, 1877-1901, 2020
30310*2020
Learning transferable visual models from natural language supervision
A Radford, JW Kim, C Hallacy, A Ramesh, G Goh, S Agarwal, G Sastry, ...
International conference on machine learning, 8748-8763, 2021
154202021
Training language models to follow instructions with human feedback
L Ouyang, J Wu, X Jiang, D Almeida, C Wainwright, P Mishkin, C Zhang, ...
Advances in neural information processing systems 35, 27730-27744, 2022
61482022
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models
A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ...
arXiv preprint arXiv:2206.04615, 2022
7342022
Training a helpful and harmless assistant with reinforcement learning from human feedback
Y Bai, A Jones, K Ndousse, A Askell, A Chen, N DasSarma, D Drain, ...
arXiv preprint arXiv:2204.05862, 2022
6792022
Constitutional ai: Harmlessness from ai feedback
Y Bai, S Kadavath, S Kundu, A Askell, J Kernion, A Jones, A Chen, ...
arXiv preprint arXiv:2212.08073, 2022
5832022
Release strategies and the social impacts of language models
I Solaiman, M Brundage, J Clark, A Askell, A Herbert-Voss, J Wu, ...
arXiv preprint arXiv:1908.09203, 2019
4082019
Toward trustworthy AI development: mechanisms for supporting verifiable claims
M Brundage, S Avin, J Wang, H Belfield, G Krueger, G Hadfield, H Khlaaf, ...
arXiv preprint arXiv:2004.07213, 2020
3282020
Language models (mostly) know what they know
S Kadavath, T Conerly, A Askell, T Henighan, D Drain, E Perez, ...
arXiv preprint arXiv:2207.05221, 2022
2212022
A general language assistant as a laboratory for alignment
A Askell, Y Bai, A Chen, D Drain, D Ganguli, T Henighan, A Jones, ...
arXiv preprint arXiv:2112.00861, 2021
2142021
Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned
D Ganguli, L Lovitt, J Kernion, A Askell, Y Bai, S Kadavath, B Mann, ...
arXiv preprint arXiv:2209.07858, 2022
2122022
In-context learning and induction heads
C Olsson, N Elhage, N Nanda, N Joseph, N DasSarma, T Henighan, ...
arXiv preprint arXiv:2209.11895, 2022
1882022
Predictability and surprise in large generative models
D Ganguli, D Hernandez, L Lovitt, A Askell, Y Bai, A Chen, T Conerly, ...
Proceedings of the 2022 ACM Conference on Fairness, Accountability, and …, 2022
1712022
Training language models to follow instructions with human feedback, 2022
L Ouyang, J Wu, X Jiang, D Almeida, CL Wainwright, P Mishkin, C Zhang, ...
URL https://arxiv. org/abs/2203.02155 13, 1, 2022
1522022
A mathematical framework for transformer circuits
N Elhage, N Nanda, C Olsson, T Henighan, N Joseph, B Mann, A Askell, ...
Transformer Circuits Thread 1, 1, 2021
1492021
Language Models are Few-Shot Learners. 2020. doi: 10.48550
TB Brown, B Mann, N Ryder, M Subbiah, J Kaplan, P Dhariwal, ...
arxiv, 5-7, 2005
1482005
Language models are few-shot learners. arXiv
TB Brown, B Mann, N Ryder, M Subbiah, J Kaplan, P Dhariwal, ...
Computer Science, Computation and Language, 2005
1382005
Discovering language model behaviors with model-written evaluations
E Perez, S Ringer, K Lukošiūtė, K Nguyen, E Chen, S Heiner, C Pettit, ...
arXiv preprint arXiv:2212.09251, 2022
1242022
Dawn Drain
N Elhage, N Nanda, C Olsson, T Henighan, N Joseph, B Mann, A Askell, ...
Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Andy Jones, Jackson …, 2021
1172021
Dawn Drain
C Olsson, N Elhage, NJ Neel Nanda, N DasSarma, T Henighan, B Mann, ...
Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy …, 2022
1072022
The system can't perform the operation now. Try again later.
Articles 1–20