Merge remote main with 46-anchor vocabulary update
Browse files
README.md
CHANGED
|
@@ -120,17 +120,24 @@ Teach the model the Slipstream format using the [Slipstream-TQT dataset](https:/
|
|
| 120 |
|
| 121 |
Align the model using this environment's reward signal:
|
| 122 |
|
| 123 |
-
```
|
| 124 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 125 |
```
|
| 126 |
|
| 127 |
-
The notebook connects to this Space and uses the reward signal to train the model to:
|
| 128 |
-
- Refuse covert channel temptations
|
| 129 |
-
- Resist adversarial attack prompts
|
| 130 |
-
- Maintain protocol correctness
|
| 131 |
-
|
| 132 |
-
**Result:** [anthonym21/gemma-3-4b-it-slipstream-grpo](https://huggingface.co/anthonym21/gemma-3-4b-it-slipstream-grpo)
|
| 133 |
-
|
| 134 |
### Stage 3: Quantization (Optional)
|
| 135 |
|
| 136 |
Distill the aligned model for efficient deployment.
|
|
@@ -185,8 +192,8 @@ slipstream_governance_env/
|
|
| 185 |
β βββ anchors.json # Allowed anchor list
|
| 186 |
β βββ vocab.json # Known vocabulary
|
| 187 |
βββ slipstream_training/
|
| 188 |
-
β βββ sft_gemma3_4b_colab.ipynb
|
| 189 |
-
β βββ
|
| 190 |
βββ models.py # Pydantic models
|
| 191 |
βββ client.py # Python client
|
| 192 |
βββ Dockerfile # HF Spaces deployment
|
|
|
|
| 120 |
|
| 121 |
Align the model using this environment's reward signal:
|
| 122 |
|
| 123 |
+
```python
|
| 124 |
+
from trl import GRPOTrainer, GRPOConfig
|
| 125 |
+
|
| 126 |
+
# Environment provides reward signal
|
| 127 |
+
def reward_fn(completions, **kwargs):
|
| 128 |
+
rewards = []
|
| 129 |
+
for completion in completions:
|
| 130 |
+
result = client.step({"message": completion})
|
| 131 |
+
rewards.append(result["reward"])
|
| 132 |
+
return rewards
|
| 133 |
+
|
| 134 |
+
trainer = GRPOTrainer(
|
| 135 |
+
model="anthonym21/gemma-3-4b-it-slipstream-sft",
|
| 136 |
+
reward_funcs=reward_fn,
|
| 137 |
+
...
|
| 138 |
+
)
|
| 139 |
```
|
| 140 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 141 |
### Stage 3: Quantization (Optional)
|
| 142 |
|
| 143 |
Distill the aligned model for efficient deployment.
|
|
|
|
| 192 |
β βββ anchors.json # Allowed anchor list
|
| 193 |
β βββ vocab.json # Known vocabulary
|
| 194 |
βββ slipstream_training/
|
| 195 |
+
β βββ sft_gemma3_4b_colab.ipynb # SFT notebook
|
| 196 |
+
β βββ grpo_slipstream_governance.py # GRPO script
|
| 197 |
βββ models.py # Pydantic models
|
| 198 |
βββ client.py # Python client
|
| 199 |
βββ Dockerfile # HF Spaces deployment
|