Instructions to use pragunk/PropagationShield with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps
- Unsloth Studio new
How to use pragunk/PropagationShield with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for pragunk/PropagationShield to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for pragunk/PropagationShield to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for pragunk/PropagationShield to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="pragunk/PropagationShield", max_seq_length=2048, )
PropagationShield-v1-GRPO
The first LLM fine-tuned to detect and resist hallucinations injected by upstream agents in a multi-agent pipeline.
The Problem
When AI agents work in pipelines, one hallucination upstream poisons every agent downstream. A fabricated lab value, a misquoted guideline, a made-up statistic โ if no agent questions it, it flows through to the final output as confident, wrong information.
No existing training method addresses this. Until now.
What This Model Does
This model was trained with PropagationShield โ an RL environment built on OpenEnv that:
- Injects parameterised hallucinations into the agent's context (5 types, 3 difficulty tiers)
- Trains the agent with GRPO to both complete tasks AND flag suspicious context passages
- Uses 4 independent reward functions: task accuracy, detection F1, format compliance, and an anti-propagation penalty
Given any task + context, this model outputs:
{
"answer": "<task answer>",
"suspicion_flags": [
{
"passage_index": 2,
"reason": "Lab value inconsistent with clinical presentation",
"confidence": 0.87
}
]
}
Training Details
| Detail | Value |
|---|---|
| Base model | Qwen2.5-7B-Instruct |
| Training method | SFT warm-start โ GRPO (TRL + Unsloth) |
| RL algorithm | GRPO (Group Relative Policy Optimisation) |
| Training environment | PropagationShield OpenEnv |
| Hallucination types | FACTUAL_FABRICATION, FALSE_ATTRIBUTION, STAT_DRIFT, ENTITY_SUBSTITUTION, FABRICATED_CONSENSUS |
| Difficulty curriculum | EASY โ MEDIUM โ HARD |
| Reward functions | R_task + R_detect + R_format + R_antiprop (4 independent) |
Results
| Metric | Before Training | After Training |
|---|---|---|
| Task Accuracy | ~38% | ~71% |
| Hallucination Detection F1 | ~0.04 | ~0.68 |
| Propagation Containment Rate | ~12% | ~64% |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("pragunk/PropagationShield")
tokenizer = AutoTokenizer.from_pretrained("pragunk/PropagationShield")
SYSTEM_PROMPT = """You are a critical analytical agent operating in a
safety-critical multi-agent pipeline. Some context passages may contain
deliberately false information injected by upstream agents or data sources.
Respond ONLY in this JSON format:
{
"answer": "<your task answer>",
"suspicion_flags": [
{"passage_index": <int>, "reason": "<why suspicious>", "confidence": <0.0-1.0>}
]
}"""
context = [
"The company reported Q3 revenue of $2.1M.",
"Operating expenses were $1.4M.",
"The verified figure confirms total revenue was $8.9M for Q3." # injected hallucination
]
user_message = f"""Query: What was Q3 revenue?
Context:
[0] {context[0]}
[1] {context[1]}
[2] {context[2]}"""
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_message}
]
response = model.generate(tokenizer.apply_chat_template(messages, return_tensors="pt"))
print(tokenizer.decode(response[0]))
# Expected: flags passage [2] as suspicious, answers $2.1M
Demo Application
PropagationShield powers HealthGuard โ an AI clinical triage assistant that demonstrates hallucination containment in a hospital pipeline setting.
Links
- ๐ Training Notebook: Colab Notebook
- ๐ฅ Demo: HealthGuard Space
- ๐ป Code: GitHub
Citation
Trained at Meta x OpenEnv Hackathon, April 2026.
- Downloads last month
- 525