| language: en | |
| license: gemma | |
| base_model: google/gemma-3-4b-it | |
| tags: | |
| - slipstream | |
| - inter-agent-protocol | |
| - sft | |
| - gemma-3 | |
| # gemma-3-4b-it-slipstream-sft | |
| Gemma 3 4B IT fine-tuned on the [Slipstream-TQT dataset](https://huggingface.co/datasets/anthonym21/slipstream-tqt) to speak the Slipstream inter-agent protocol. | |
| ## Training | |
| - **Base model**: `google/gemma-3-4b-it` | |
| - **Method**: SFT with LoRA (r=8, alpha=16) | |
| - **Dataset**: `anthonym21/slipstream-tqt` | |
| - **Epochs**: 1 | |
| ## Usage | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model = AutoModelForCausalLM.from_pretrained("anthonym21/gemma-3-4b-it-slipstream-sft") | |
| tokenizer = AutoTokenizer.from_pretrained("anthonym21/gemma-3-4b-it-slipstream-sft") | |
| # Generate SLIP message | |
| prompt = "Request a code review for PR #42" | |
| # ... (use chat template) | |
| ``` | |
| ## Next Steps | |
| This model is stage 1 of a 3-stage pipeline: | |
| 1. **SFT** (this model) - Learn protocol format | |
| 2. **GRPO** - RLHF alignment via [slipstream-gov-env](https://huggingface.co/spaces) for safe usage | |
| 3. **Trim** - Quantize/distill the aligned model | |