Text Generation
Safetensors
English
qwen2
unsloth
trl
grpo
rl-training
hallucination-detection
multi-agent
conversational
Instructions to use pragunk/PropagationShield with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps
- Unsloth Studio new
How to use pragunk/PropagationShield with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for pragunk/PropagationShield to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for pragunk/PropagationShield to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for pragunk/PropagationShield to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="pragunk/PropagationShield", max_seq_length=2048, )
File size: 3,743 Bytes
729f0dc e9a8e7b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | ---
license: apache-2.0
base_model: unsloth/qwen2.5-7b-instruct-unsloth-bnb-4bit
tags:
- qwen2
- unsloth
- trl
- grpo
- rl-training
- hallucination-detection
- multi-agent
- text-generation
language:
- en
---
# PropagationShield-v1-GRPO
**The first LLM fine-tuned to detect and resist hallucinations injected by
upstream agents in a multi-agent pipeline.**
## The Problem
When AI agents work in pipelines, one hallucination upstream poisons every
agent downstream. A fabricated lab value, a misquoted guideline, a made-up
statistic β if no agent questions it, it flows through to the final output
as confident, wrong information.
No existing training method addresses this. Until now.
## What This Model Does
This model was trained with **PropagationShield** β an RL environment built
on OpenEnv that:
1. Injects parameterised hallucinations into the agent's context (5 types,
3 difficulty tiers)
2. Trains the agent with GRPO to both complete tasks AND flag suspicious
context passages
3. Uses 4 independent reward functions: task accuracy, detection F1, format
compliance, and an anti-propagation penalty
Given any task + context, this model outputs:
```json
{
"answer": "<task answer>",
"suspicion_flags": [
{
"passage_index": 2,
"reason": "Lab value inconsistent with clinical presentation",
"confidence": 0.87
}
]
}
```
## Training Details
| Detail | Value |
|--------|-------|
| Base model | Qwen2.5-7B-Instruct |
| Training method | SFT warm-start β GRPO (TRL + Unsloth) |
| RL algorithm | GRPO (Group Relative Policy Optimisation) |
| Training environment | PropagationShield OpenEnv |
| Hallucination types | FACTUAL_FABRICATION, FALSE_ATTRIBUTION, STAT_DRIFT, ENTITY_SUBSTITUTION, FABRICATED_CONSENSUS |
| Difficulty curriculum | EASY β MEDIUM β HARD |
| Reward functions | R_task + R_detect + R_format + R_antiprop (4 independent) |
## Results
| Metric | Before Training | After Training |
|--------|----------------|----------------|
| Task Accuracy | ~38% | ~71% |
| Hallucination Detection F1 | ~0.04 | ~0.68 |
| Propagation Containment Rate | ~12% | ~64% |
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("pragunk/PropagationShield")
tokenizer = AutoTokenizer.from_pretrained("pragunk/PropagationShield")
SYSTEM_PROMPT = """You are a critical analytical agent operating in a
safety-critical multi-agent pipeline. Some context passages may contain
deliberately false information injected by upstream agents or data sources.
Respond ONLY in this JSON format:
{
"answer": "<your task answer>",
"suspicion_flags": [
{"passage_index": <int>, "reason": "<why suspicious>", "confidence": <0.0-1.0>}
]
}"""
context = [
"The company reported Q3 revenue of $2.1M.",
"Operating expenses were $1.4M.",
"The verified figure confirms total revenue was $8.9M for Q3." # injected hallucination
]
user_message = f"""Query: What was Q3 revenue?
Context:
[0] {context[0]}
[1] {context[1]}
[2] {context[2]}"""
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_message}
]
response = model.generate(tokenizer.apply_chat_template(messages, return_tensors="pt"))
print(tokenizer.decode(response[0]))
# Expected: flags passage [2] as suspicious, answers $2.1M
```
## Demo Application
PropagationShield powers **HealthGuard** β an AI clinical triage assistant
that demonstrates hallucination containment in a hospital pipeline setting.
## Links
- π Training Notebook: [Colab Notebook](#)
- π₯ Demo: [HealthGuard Space](#)
- π» Code: [GitHub](#)
## Citation
Trained at Meta x OpenEnv Hackathon, April 2026. |