Follow
Aengus Lynch
Aengus Lynch
Verified email at ucl.ac.uk - Homepage
Title
Cited by
Cited by
Year
Towards automated circuit discovery for mechanistic interpretability
A Conmy, A Mavor-Parker, A Lynch, S Heimersheim, A Garriga-Alonso
Advances in Neural Information Processing Systems 36, 16318-16352, 2023
1762023
Causal machine learning: A survey and open problems
J Kaddour, A Lynch, Q Liu, MJ Kusner, R Silva
arXiv preprint arXiv:2206.15475, 2022
1592022
Eight methods to evaluate robust unlearning in llms
A Lynch, P Guo, A Ewart, S Casper, D Hadfield-Menell
arXiv preprint arXiv:2402.16835, 2024
352024
Spawrious: A benchmark for fine control of spurious correlation biases
A Lynch, GJS Dovonon, J Kaddour, R Silva
arXiv preprint arXiv:2303.05470, 2023
252023
Targeted latent adversarial training improves robustness to persistent harmful behaviors in llms
A Sheshadri, A Ewart, P Guo, A Lynch, C Wu, V Hebbar, H Sleight, ...
arXiv e-prints, arXiv: 2407.15549, 2024
102024
Causal machine learning: A survey and open problems. arXiv 2022
J Kaddour, A Lynch, Q Liu, MJ Kusner, R Silva
arXiv preprint arXiv:2206.15475, 2022
92022
Causal machine learning: a survey and open problems (2022)
J Kaddour, A Lynch, Q Liu, MJ Kusner, R Silva
arXiv preprint arXiv:2206.15475, 0
9
Analyzing the generalization and reliability of steering vectors
D Tan, D Chanin, A Lynch, D Kanoulas, B Paige, A Garriga-Alonso, R Kirk
arXiv preprint arXiv:2407.12404, 2024
52024
Evaluating the impact of geometric and statistical skews on out-of-distribution generalization performance
A Lynch, J Kaddour, R Silva
NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and …, 2022
52022
Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs
A Sheshadri, A Ewart, P Guo, A Lynch, C Wu, V Hebbar, H Sleight, ...
arXiv preprint arXiv:2407.15549, 2024
2024
Analyzing the Generalization and Reliability of Steering Vectors--ICML 2024
D Tan, D Chanin, A Lynch, D Kanoulas, B Paige, A Garriga-Alonso, R Kirk
arXiv e-prints, arXiv: 2407.12404, 2024
2024
H-Space Sparse Autoencoders
A Ijishakin, ML Ang, L Baljer, DCH Tan, HL Fry, A Abdulaal, A Lynch, ...
Neurips Safe Generative AI Workshop 2024, 0
The system can't perform the operation now. Try again later.
Articles 1–12