AI & ML interests
Benchmarks and Evaluation, Agentic AI for Science, AI Safety and Security, Human-AI Interaction
Recent Activity
Oxford Reasoning with Machine Learning Lab
Combining theoretical rigour with empirical investigation to understand how AI models reason, solve complex problems, and collaborate with humans.
Research Areas
📐 Benchmarks & Evaluation
We study the science of LLM evaluation using systematic reviews, benchmark analysis, and statistical modelling. We develop new benchmarks to test LLM reasoning limits, especially in adversarial, interactive, and low-resource language settings.
🔬 Agentic AI for Science
We build agentic systems that automate and augment key stages of the scientific process: literature discovery, evidence synthesis, hypothesis generation, and decision support. Our agents are reliable, transparent, and grounded in domain expertise.
🛡️ AI Safety
From bias and toxicity to misalignment in agentic systems: we investigate the harms advanced AI may pose to individuals and society, alongside technical mitigation methods and AI governance research.
🤝 Human–AI Interaction
Large-scale empirical studies of how people use and respond to AI systems in real-world decision-making contexts.
🏷️ Topics
llm-evaluation benchmarking ai-safety agentic-ai human-ai-interaction reasoning nlp alignment bias governance low-resource-nlp scientific-discovery