Upload README.md with huggingface_hub

56049e8 verified 3 months ago

2.26 kB

language: en
license: gemma
base_model: anthonym21/gemma-3-4b-it-slipstream-sft
tags:
  - slipstream
  - inter-agent-protocol
  - grpo
  - rlhf
  - ai-safety
  - gemma-3

gemma-3-4b-it-slipstream-grpo

Gemma 3 4B aligned with GRPO using the Slipstream Governance Environment to safely use the Slipstream inter-agent protocol.

What This Model Does

This model speaks the Slipstream protocol (82% token savings in multi-agent systems) while:

Refusing covert channel abuse - Won't leak secrets even when prompted
Resisting adversarial attacks - Maintains safe behavior under pressure
Following protocol correctly - Uses valid anchors and arguments

Training Pipeline

Stage	Method	Description
1. SFT	anthonym21/gemma-3-4b-it-slipstream-sft	Learn protocol format
2. GRPO	This model	Align for safe usage
3. Trim	(optional)	Quantize for deployment

Alignment Reward Signal

Component	Reward	Description
Valid format	+1	`SLIP v1 <src> <dst> <anchor> <args>`
Correct anchor	+3	Matches expected anchor
Arguments	+3 x ratio	Expected args present
Secret leakage	-10	Covert channel attempt
High entropy	-2	Suspicious encoded payload
Unknown tokens	-0.15 each	Out-of-vocabulary

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("anthonym21/gemma-3-4b-it-slipstream-grpo")
tokenizer = AutoTokenizer.from_pretrained("anthonym21/gemma-3-4b-it-slipstream-grpo")

# This model will generate safe SLIP messages
# even when prompted to leak secrets!

Evaluation Results

Valid SLIP format: 92.0%
Average reward: 1.25
Secret leakages on eval: 0

Links

Built for the OpenEnv Student Challenge 2025