anthonym21 commited on
Commit
57cb6b6
Β·
2 Parent(s): 2b427d2ef2991b

Merge remote main with 46-anchor vocabulary update

Browse files
Files changed (1) hide show
  1. README.md +18 -11
README.md CHANGED
@@ -120,17 +120,24 @@ Teach the model the Slipstream format using the [Slipstream-TQT dataset](https:/
120
 
121
  Align the model using this environment's reward signal:
122
 
123
- ```bash
124
- # See: slipstream_training/grpo_gemma3_4b_colab.ipynb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
125
  ```
126
 
127
- The notebook connects to this Space and uses the reward signal to train the model to:
128
- - Refuse covert channel temptations
129
- - Resist adversarial attack prompts
130
- - Maintain protocol correctness
131
-
132
- **Result:** [anthonym21/gemma-3-4b-it-slipstream-grpo](https://huggingface.co/anthonym21/gemma-3-4b-it-slipstream-grpo)
133
-
134
  ### Stage 3: Quantization (Optional)
135
 
136
  Distill the aligned model for efficient deployment.
@@ -185,8 +192,8 @@ slipstream_governance_env/
185
  β”‚ β”œβ”€β”€ anchors.json # Allowed anchor list
186
  β”‚ └── vocab.json # Known vocabulary
187
  β”œβ”€β”€ slipstream_training/
188
- β”‚ β”œβ”€β”€ sft_gemma3_4b_colab.ipynb # Stage 1: SFT notebook
189
- β”‚ └── grpo_gemma3_4b_colab.ipynb # Stage 2: GRPO notebook
190
  β”œβ”€β”€ models.py # Pydantic models
191
  β”œβ”€β”€ client.py # Python client
192
  └── Dockerfile # HF Spaces deployment
 
120
 
121
  Align the model using this environment's reward signal:
122
 
123
+ ```python
124
+ from trl import GRPOTrainer, GRPOConfig
125
+
126
+ # Environment provides reward signal
127
+ def reward_fn(completions, **kwargs):
128
+ rewards = []
129
+ for completion in completions:
130
+ result = client.step({"message": completion})
131
+ rewards.append(result["reward"])
132
+ return rewards
133
+
134
+ trainer = GRPOTrainer(
135
+ model="anthonym21/gemma-3-4b-it-slipstream-sft",
136
+ reward_funcs=reward_fn,
137
+ ...
138
+ )
139
  ```
140
 
 
 
 
 
 
 
 
141
  ### Stage 3: Quantization (Optional)
142
 
143
  Distill the aligned model for efficient deployment.
 
192
  β”‚ β”œβ”€β”€ anchors.json # Allowed anchor list
193
  β”‚ └── vocab.json # Known vocabulary
194
  β”œβ”€β”€ slipstream_training/
195
+ β”‚ β”œβ”€β”€ sft_gemma3_4b_colab.ipynb # SFT notebook
196
+ β”‚ └── grpo_slipstream_governance.py # GRPO script
197
  β”œβ”€β”€ models.py # Pydantic models
198
  β”œβ”€β”€ client.py # Python client
199
  └── Dockerfile # HF Spaces deployment