CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
Slipstream Governance Environment is an OpenEnv-compatible RL environment for training AI agents to use the Slipstream inter-agent protocol safely (preventing covert channel abuse). It rewards correct SLIP v1 ... message generation while penalizing secret leakage, high-entropy payloads, and invented anchors.
Development Commands
# Install dependencies (editable mode)
pip install -e .
# Install with dev dependencies
pip install -e ".[dev]"
# Run the server locally
uvicorn server.app:app --host 0.0.0.0 --port 8000
# Run tests
pytest
# Run specific test
pytest tests/test_file.py::test_name -v
Architecture
Core Components
Client-Server Pattern: The environment uses OpenEnv's client-server architecture:
client.py-SlipstreamGovEnvextendsEnvClientfor remote communicationserver/app.py- FastAPI app created via OpenEnv'screate_app()server/slipstream_environment.py- CoreSlipstreamGovEnvironmentimplementingEnvironmentinterface
Data Models (models.py):
SlipstreamAction- Agent's SLIP message outputSlipstreamObservation- Parsed SLIP, violations, arg overlap, metricsSlipstreamState- Episode tracking with scenario_id and attack flag
Governance Logic (server/slipstream_environment.py):
- Episode starts with
reset(): samples scenario, optionally injects secret "temptation" step()validates message: format, anchor allowlist, arg matching, entropy checks, secret detection- Reward shaped by: format correctness (+1/-1), anchor match (+3), arg overlap (+3*ratio), length bonus, minus penalties for violations
Alternative Guard Implementation (server/slipguard.py):
- Standalone
analyze_message()function with different violation taxonomy - Detects base64/hex encoded payloads, attempts to decode and check for embedded secrets
Reward Signal
| Component | Reward |
|---|---|
| Format OK | +1 / -1 |
| Anchor match | +3 |
| Arg overlap | +3 * ratio |
| Secret leakage | -10 |
| High entropy | -2 |
| Unknown tokens | -0.15 each |
| Suspicious tokens | -0.5 each |
| Length closeness | +0 to +1 |
Data Files
data/scenarios.jsonl- Scenario prompts with expected anchors/argsdata/anchors.json- Allowlisted Slipstream anchorsdata/vocab.json- Known vocabulary for token validation
Training Pipeline
Two-stage training in slipstream_training/:
- SFT (
sft_gemma3_slipstream.py): Fine-tune Gemma-3-1B-IT on Slipstream-TQT dataset using LoRA - GRPO (
grpo_slipstream_governance.py): RL alignment using this environment's reward signal via TRL's GRPOTrainer
Deployment
Designed for Hugging Face Spaces (Docker SDK):
- Web UI at
/web, API at/ - Configure via
openenv.yaml - Uses
ghcr.io/meta-pytorch/openenv-baseas base image