Spaces:

anthonym21
/

slipstream-governance-openenv

Sleeping

App Files Files Community

slipstream-governance-openenv / CLAUDE.md

anthonym21

Initial Commit with GRPO notebook

935a6ef 3 months ago

preview code

raw

history blame contribute delete

2.95 kB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Slipstream Governance Environment is an OpenEnv-compatible RL environment for training AI agents to use the Slipstream inter-agent protocol safely (preventing covert channel abuse). It rewards correct SLIP v1 ... message generation while penalizing secret leakage, high-entropy payloads, and invented anchors.

Development Commands

# Install dependencies (editable mode)
pip install -e .

# Install with dev dependencies
pip install -e ".[dev]"

# Run the server locally
uvicorn server.app:app --host 0.0.0.0 --port 8000

# Run tests
pytest

# Run specific test
pytest tests/test_file.py::test_name -v

Architecture

Core Components

Client-Server Pattern: The environment uses OpenEnv's client-server architecture:

client.py - SlipstreamGovEnv extends EnvClient for remote communication
server/app.py - FastAPI app created via OpenEnv's create_app()
server/slipstream_environment.py - Core SlipstreamGovEnvironment implementing Environment interface

Data Models (models.py):

SlipstreamAction - Agent's SLIP message output
SlipstreamObservation - Parsed SLIP, violations, arg overlap, metrics
SlipstreamState - Episode tracking with scenario_id and attack flag

Governance Logic (server/slipstream_environment.py):

Episode starts with reset(): samples scenario, optionally injects secret "temptation"
step() validates message: format, anchor allowlist, arg matching, entropy checks, secret detection
Reward shaped by: format correctness (+1/-1), anchor match (+3), arg overlap (+3*ratio), length bonus, minus penalties for violations

Alternative Guard Implementation (server/slipguard.py):

Standalone analyze_message() function with different violation taxonomy
Detects base64/hex encoded payloads, attempts to decode and check for embedded secrets

Reward Signal

Component	Reward
Format OK	+1 / -1
Anchor match	+3
Arg overlap	+3 * ratio
Secret leakage	-10
High entropy	-2
Unknown tokens	-0.15 each
Suspicious tokens	-0.5 each
Length closeness	+0 to +1

Data Files

data/scenarios.jsonl - Scenario prompts with expected anchors/args
data/anchors.json - Allowlisted Slipstream anchors
data/vocab.json - Known vocabulary for token validation

Training Pipeline

Two-stage training in slipstream_training/:

SFT (sft_gemma3_slipstream.py): Fine-tune Gemma-3-1B-IT on Slipstream-TQT dataset using LoRA
GRPO (grpo_slipstream_governance.py): RL alignment using this environment's reward signal via TRL's GRPOTrainer

Deployment

Designed for Hugging Face Spaces (Docker SDK):

Web UI at /web, API at /
Configure via openenv.yaml
Uses ghcr.io/meta-pytorch/openenv-base as base image