Deep reinforcement learning from human preferences PF Christiano, J Leike, T Brown, M Martic, S Legg, D Amodei Advances in neural information processing systems, 4299-4307, 2017 | 377 | 2017 |
AI safety gridworlds J Leike, M Martic, V Krakovna, PA Ortega, T Everitt, A Lefrancq, L Orseau, ... arXiv preprint arXiv:1711.09883, 2017 | 162 | 2017 |
Scalable agent alignment via reward modeling: a research direction J Leike, D Krueger, T Everitt, M Martic, V Maini, S Legg arXiv preprint arXiv:1811.07871, 2018 | 50 | 2018 |
Penalizing side effects using stepwise relative reachability V Krakovna, L Orseau, R Kumar, M Martic, S Legg arXiv preprint arXiv:1806.01186, 2018 | 16 | 2018 |
Measuring and avoiding side effects using relative reachability V Krakovna, L Orseau, M Martic, S Legg arXiv preprint arXiv:1806.01186, 2018 | 13 | 2018 |
Deep reinforcement learning from human preferences, 2017 P Christiano, J Leike, TB Brown, M Martic, S Legg, D Amodei URL https://arxiv. org/abs/1706 3741, 0 | 8 | |
Scaling shared model governance via model splitting M Martic, J Leike, A Trask, M Hessel, S Legg, P Kohli arXiv preprint arXiv:1812.05979, 2018 | 2 | 2018 |
Algorithms for Causal Reasoning in Probability Trees T Genewein, T McGrath, G Déletang, V Mikulik, M Martic, S Legg, ... arXiv preprint arXiv:2010.12237, 2020 | 1 | 2020 |
Meta-trained agents implement Bayes-optimal agents V Mikulik, G Delétang, T McGrath, T Genewein, M Martic, S Legg, ... arXiv preprint arXiv:2010.11223, 2020 | | 2020 |
Avoiding Side Effects By Considering Future Tasks V Krakovna, L Orseau, R Ngo, M Martic, S Legg arXiv preprint arXiv:2010.07877, 2020 | | 2020 |