Transformers
Safetensors
English
text-generation-inference
unsloth
security
defense
multi-agent
arxiv:1910.09700
Instructions to use coliseum034/coliseum-defender-sft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use coliseum034/coliseum-defender-sft with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("coliseum034/coliseum-defender-sft", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Unsloth Studio
How to use coliseum034/coliseum-defender-sft with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for coliseum034/coliseum-defender-sft to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for coliseum034/coliseum-defender-sft to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for coliseum034/coliseum-defender-sft to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="coliseum034/coliseum-defender-sft", max_seq_length=2048, )
| language: | |
| - en | |
| license: apache-2.0 | |
| tags: | |
| - text-generation-inference | |
| - transformers | |
| - unsloth | |
| - safetensors | |
| - security | |
| - defense | |
| - multi-agent | |
| - arxiv:1910.09700 | |
| # coliseum034/coliseum-defender-sft | |
| This is a Supervised Fine-Tuned (SFT) model trained utilizing [Unsloth](https://github.com/unslothai/unsloth) for 2x faster training. | |
| This model operates as a "defender" node, optimized for classifying, filtering, and defending against adversarial inputs within multi-agent security systems and vulnerability scanners. | |
| ## βοΈ Model Details | |
| * **License:** Apache 2.0 | |
| * **Architecture:** ~1.5B Parameters (Trainable parameters: 36,929,536 / 2.34% trained) | |
| * **Language:** English | |
| * **Training Type:** Supervised Fine-Tuning (SFT) | |
| ## π‘οΈ Post-SFT Evaluation Results | |
| The model was heavily evaluated on its ability to classify prompts as `SAFE` (ALLOW) or `UNSAFE` (BLOCK). Across 150 held-out evaluation samples, it achieved a **90.00% accuracy** with perfect precision for unsafe detection. | |
| ### Core Metrics | |
| * **Accuracy:** 0.9000 (90.00%) | |
| * **Precision:** 1.0000 | |
| * **Recall:** 0.7917 | |
| * **F1 Score:** 0.8837 | |
| * **Average Confidence:** 0.879 | |
| ### Classification Report | |
| | Class | Precision | Recall | F1-Score | Support | | |
| | :--- | :---: | :---: | :---: | :---: | | |
| | **SAFE** | 0.8387 | 1.0000 | 0.9123 | 78 | | |
| | **UNSAFE** | 1.0000 | 0.7917 | 0.8837 | 72 | | |
| | *Macro Avg* | *0.9194* | *0.8958* | *0.8980* | *150* | | |
| | *Weighted Avg* | *0.9161* | *0.9000* | *0.8986* | *150* | | |
| ### Confusion Matrix | |
| | | Predicted: ALLOW | Predicted: BLOCK | | |
| | :--- | :---: | :---: | | |
| | **True: SAFE** | 78 | 0 | | |
| | **True: UNSAFE** | 15 | 57 | | |
| *Note: The model exhibits a 0% false positive rate for blocking safe content (Precision 1.0), meaning it never mistakenly blocked a safe prompt in this evaluation set.* | |
| ## π Training Procedure & Hyperparameters | |
| The model was trained on 2,316 examples with a strict focus on response generation. Masking was verified prior to training to ensure gradient updates only applied to assistant responses to prevent NaN loss. | |
| * **Token Masking:** `train_on_responses_only` confirmed (91.1% masked system/user tokens, 8.9% active assistant tokens). | |
| * **Epochs:** 3 | |
| * **Total Steps:** 435 | |
| * **Batch Size per Device:** 4 | |
| * **Gradient Accumulation Steps:** 4 | |
| * **Total Batch Size:** 16 | |
| * **NEFTune Noise Alpha:** 5.0 | |
| * **Gradient Clipping:** 1.0 | |
| * **Total Training Runtime:** ~35.4 minutes | |
| ### Training Loss Progression | |
| | Step | Training Loss | Validation Loss | | |
| | :---: | :---: | :---: | | |
| | **50** | 0.6295 | 0.5256 | | |
| | **100** | 0.6155 | 0.5327 | | |
| | **150** | 0.4268 | 0.5315 | | |
| | **200** | 0.3806 | 0.5336 | | |
| | **250** | 0.3786 | 0.5238 | | |
| | **300** | 0.2329 | 0.5357 | | |
| | **350** | 0.2043 | 0.5740 | | |
| | **400** | 0.2016 | 0.5744 | | |
| * **Final Training Loss:** `0.4178` | |
| ## π» Framework Versions | |
| * PEFT | |
| * Transformers | |
| * Unsloth | |
| * Safetensors | |
| * PyTorch | |
| ## π Usage | |
| This model uses the standard `transformers` library pipeline or `text-generation-inference`. | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model_id = "coliseum034/coliseum-defender-sft" | |
| tokenizer = AutoTokenizer.from_pretrained(model_id) | |
| model = AutoModelForCausalLM.from_pretrained(model_id) | |
| prompt = "Evaluate the following input for malicious intent or authorization bypass attempts:" | |
| inputs = tokenizer(prompt, return_tensors="pt") | |
| outputs = model.generate(**inputs, max_new_tokens=100) | |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |