Adversarial risk and the dangers of evaluating against weak attacks J Uesato, B O'Donoghue, A Oord, P Kohli ICML 2018, 2018 | 493 | 2018 |
Robustfill: Neural program learning under noisy I/O J Devlin, J Uesato, S Bhupatiraju, R Singh, A Mohamed, P Kohli Proceedings of the 34th International Conference on Machine Learning-Volume …, 2017 | 381 | 2017 |
On the effectiveness of interval bound propagation for training verifiably robust models S Gowal, K Dvijotham, R Stanforth, R Bunel, C Qin, J Uesato, ... arXiv preprint arXiv:1810.12715, 2018 | 370 | 2018 |
Scaling language models: Methods, analysis & insights from training gopher JW Rae, S Borgeaud, T Cai, K Millican, J Hoffmann, F Song, J Aslanides, ... arXiv preprint arXiv:2112.11446, 2021 | 337 | 2021 |
Technical report on the cleverhans v2. 1.0 adversarial examples library N Papernot, F Faghri, N Carlini, I Goodfellow, R Feinman, A Kurakin, ... arXiv preprint arXiv:1610.00768, 2016 | 327 | 2016 |
Are Labels Required for Improving Adversarial Robustness? J Uesato, JB Alayrac, PS Huang, R Stanforth, A Fawzi, P Kohli NeurIPS 2019, 2019 | 285* | 2019 |
Robustness via curvature regularization, and vice versa SM Moosavi-Dezfooli, A Fawzi, J Uesato, P Frossard Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2019 | 261 | 2019 |
Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples S Gowal, C Qin, J Uesato, T Mann, P Kohli arXiv preprint arXiv:2010.03593, 2020 | 210 | 2020 |
Ethical and social risks of harm from Language Models L Weidinger, J Mellor, M Rauh, C Griffin, J Uesato, PS Huang, M Cheng, ... arXiv preprint arXiv:2112.04359, 2021 | 187 | 2021 |
Training verified learners with learned verifiers K Dvijotham, S Gowal, R Stanforth, R Arandjelovic, B O'Donoghue, ... arXiv preprint arXiv:1805.10265, 2018 | 160 | 2018 |
Scalable verified training for provably robust image classification S Gowal, KD Dvijotham, R Stanforth, R Bunel, C Qin, J Uesato, ... Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2019 | 116 | 2019 |
Rigorous agent evaluation: An adversarial approach to uncover catastrophic failures J Uesato, A Kumar, C Szepesvari, T Erez, A Ruderman, K Anderson, ... ICLR 2019, 2018 | 71 | 2018 |
Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming S Dathathri, K Dvijotham, A Kurakin, A Raghunathan, J Uesato, RR Bunel, ... Advances in Neural Information Processing Systems 33, 5318-5331, 2020 | 68 | 2020 |
An Alternative Surrogate Loss for PGD-based Adversarial Testing S Gowal, J Uesato, C Qin, PS Huang, T Mann, P Kohli arXiv preprint arXiv:1910.09338, 2019 | 58 | 2019 |
Challenges in detoxifying language models J Welbl, A Glaese, J Uesato, S Dathathri, J Mellor, LA Hendricks, ... arXiv preprint arXiv:2109.07445, 2021 | 57 | 2021 |
Semantic code repair using neuro-symbolic transformation networks J Devlin, J Uesato, R Singh, P Kohli arXiv preprint arXiv:1710.11054, 2017 | 41 | 2017 |
Verification of non-linear specifications for neural networks C Qin, B O'Donoghue, R Bunel, R Stanforth, S Gowal, J Uesato, ... ICLR 2019, 2019 | 38 | 2019 |
Make Sure You're Unsure: A Framework for Verifying Probabilistic Specifications L Berrada, S Dathathri, K Dvijotham, R Stanforth, RR Bunel, J Uesato, ... Advances in Neural Information Processing Systems 34, 2021 | 12* | 2021 |
Uncovering Surprising Behaviors in Reinforcement Learning via Worst-case Analysis A Ruderman, R Everett, B Sikder, H Soyer, J Uesato, A Kumar, C Beattie, ... | 11 | 2018 |
REALab: An Embedded Perspective on Tampering R Kumar, J Uesato, R Ngo, T Everitt, V Krakovna, S Legg arXiv preprint arXiv:2011.08820, 2020 | 8 | 2020 |