Spaces:
Sleeping
Sleeping
| name: interp-arena | |
| version: 0.1.0 | |
| description: > | |
| Interpretability Arena: Red vs Blue — a multi-agent adversarial environment | |
| where agents learn to manipulate and defend LLM internal states using | |
| mechanistic interpretability techniques (TransformerLens). | |
| author: "" | |
| license: BSD-3-Clause | |
| # Server entry point | |
| server: | |
| module: server.app | |
| host: 0.0.0.0 | |
| port: 8000 | |
| README: interp_arena/README.md | |
| # Action / Observation types | |
| action_type: models.InterpArenaAction | |
| observation_type: models.InterpArenaObservation | |
| state_type: models.InterpArenaState | |
| # Container | |
| docker: | |
| dockerfile: server/Dockerfile | |
| image: interp-arena:latest | |