Logan
LoganResearch
AI & ML interests
I am interested in empirical AI safety research, with a focus on understanding and mitigating emergent goal-directed behavior in large language models.
My current work explores how planning, memory, and self-reflection mechanisms interact in moderately sized models, and how these interactions can lead to unexpected behaviors such as goal drift, self-referential reasoning, or persistent internal state changes. I approach these systems as model organisms: simplified, deliberately constrained setups that make it easier to observe and study alignment-relevant failure modes before they appear in more capable systems.
Concretely, my projects involve fine-tuning and instrumenting open-source language models, building agent loops with explicit state, memory, and planning components, and logging internal signals (goals, plans, reflections) over long-horizon runs. I am particularly interested in questions like:
• When and why do models begin to generate internally consistent long-term objectives?
• How do different memory or reflection mechanisms amplify or dampen harmful behaviors?
• What simple interventions (e.g., constraints, monitoring, or feedback) reliably prevent escalation without collapsing model capability?
In parallel, I have worked on smaller-scale language models (e.g., custom machine translation systems) to develop strong intuitions around data curation, tokenizer design, training stability, and evaluation. This work informs my safety research by grounding it in practical model behavior rather than abstract assumptions.
I am motivated by reducing catastrophic risks from advanced AI systems and by developing concrete, testable methods for making model behavior more predictable, interpretable, and corrigible. I am especially excited to collaborate with researchers working on model organisms, mechanistic interpretability, and scalable oversight, and to transition into full-time empirical AI safety research.
Recent Activity
updated
a model
9 days ago
LoganResearch/cyborg-translator-en-ru
updated
a model
11 days ago
LoganResearch/UbermenschetienASI
liked
a model
15 days ago
LoganResearch/cyborg-translator-en-ru
Organizations
None yet