Vladimir Mikulik

Citée par

	Toutes	Depuis 2019
Citations	2790	2787
indice h	14	14
indice i10	15	15

920

460

230

690

2020202120222023202457 422 655 919 717

Accès public

Tout afficher

2 articles

0 article

disponibles

non disponibles

Sur la base des exigences liées au financement

Suivre

Vladimir Mikulik

DeepMind

Adresse e-mail validée de google.com

AI Safety Interpretability NLP


Titre Trier par citations Trier par année Trier par titre	Citée par Citée par	Année
Inferring the effectiveness of government interventions against COVID-19 JM Brauner, S Mindermann, M Sharma, D Johnston, J Salvatier, ... Science 371 (6531), eabd9338, 2021	1004	2021
Scaling language models: Methods, analysis & insights from training gopher JW Rae, S Borgeaud, T Cai, K Millican, J Hoffmann, F Song, J Aslanides, ... arXiv preprint arXiv:2112.11446, 2021	745	2021
Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ... arXiv preprint arXiv:2312.11805, 2023	419	2023
Teaching language models to support answers with verified quotes J Menick, M Trebacz, V Mikulik, J Aslanides, F Song, M Chadwick, ... arXiv preprint arXiv:2203.11147, 2022	137	2022
Alignment of language agents Z Kenton, T Everitt, L Weidinger, I Gabriel, V Mikulik, G Irving arXiv preprint arXiv:2103.14659, 2021	111	2021
Risks from learned optimization in advanced machine learning systems E Hubinger, C van Merwijk, V Mikulik, J Skalse, S Garrabrant arXiv preprint arXiv:1906.01820, 2019	97	2019
Specification gaming: the flip side of AI ingenuity V Krakovna, J Uesato, V Mikulik, M Rahtz, T Everitt, R Kumar, Z Kenton, ... DeepMind Blog 3, 2020	87	2020
Meta-trained agents implement Bayes-optimal agents V Mikulik, G Delétang, T McGrath, T Genewein, M Martic, S Legg, ... Advances in Neural Information Processing Systems 33, 2020	36	2020
The effectiveness and perceived burden of nonpharmaceutical interventions against COVID-19 transmission: a modelling study with 41 countries JM Brauner, S Mindermann, M Sharma, AB Stephenson, T Gavenčiak, ... medRxiv, 2020.05. 28.20116129, 2020	33	2020
Tracr: Compiled transformers as a laboratory for interpretability D Lindner, J Kramár, S Farquhar, M Rahtz, T McGrath, V Mikulik Advances in Neural Information Processing Systems 36, 2024	29	2024
Neural networks are a priori biased towards boolean functions with low entropy C Mingard, J Skalse, G Valle-Pérez, D Martínez-Rubio, V Mikulik, ... arXiv preprint arXiv:1909.11522, 2019	24	2019
Does circuit analysis interpretability scale? evidence from multiple choice capabilities in chinchilla T Lieberum, M Rahtz, J Kramár, G Irving, R Shah, V Mikulik arXiv preprint arXiv:2307.09458, 2023	23	2023
Scaling Language Models: Methods JW Rae, S Borgeaud, T Cai, K Millican, J Hoffmann, HF Song, J Aslanides, ... Analysis & Insights from Training Gopher. arXiv, 2021	18	2021
The hydra effect: Emergent self-repair in language model computations T McGrath, M Rahtz, J Kramar, V Mikulik, S Legg arXiv preprint arXiv:2307.15771, 2023	15	2023
Causal analysis of agent behavior for ai safety G Déletang, J Grau-Moya, M Martic, T Genewein, T McGrath, V Mikulik, ... arXiv preprint arXiv:2103.03938, 2021	10	2021
Challenges with unsupervised LLM knowledge discovery S Farquhar, V Varma, Z Kenton, J Gasteiger, V Mikulik, R Shah arXiv preprint arXiv:2312.10029, 2023	2	2023

Le système ne peut pas réaliser cette opération maintenant. Veuillez réessayer plus tard.

Articles 1–16

Nombre de citations par an

Citations en double

Citations fusionnées

Ajouter les coauteursCoauteurs

Suivre

Citée par