Text Generation
Safetensors
English
qwen2
unsloth
trl
grpo
rl-training
hallucination-detection
multi-agent
conversational
Instructions to use pragunk/PropagationShield with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps
- Unsloth Studio new
How to use pragunk/PropagationShield with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for pragunk/PropagationShield to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for pragunk/PropagationShield to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for pragunk/PropagationShield to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="pragunk/PropagationShield", max_seq_length=2048, )
| license: apache-2.0 | |
| base_model: unsloth/qwen2.5-7b-instruct-unsloth-bnb-4bit | |
| tags: | |
| - qwen2 | |
| - unsloth | |
| - trl | |
| - grpo | |
| - rl-training | |
| - hallucination-detection | |
| - multi-agent | |
| - text-generation | |
| language: | |
| - en | |
| # PropagationShield-v1-GRPO | |
| **The first LLM fine-tuned to detect and resist hallucinations injected by | |
| upstream agents in a multi-agent pipeline.** | |
| ## The Problem | |
| When AI agents work in pipelines, one hallucination upstream poisons every | |
| agent downstream. A fabricated lab value, a misquoted guideline, a made-up | |
| statistic β if no agent questions it, it flows through to the final output | |
| as confident, wrong information. | |
| No existing training method addresses this. Until now. | |
| ## What This Model Does | |
| This model was trained with **PropagationShield** β an RL environment built | |
| on OpenEnv that: | |
| 1. Injects parameterised hallucinations into the agent's context (5 types, | |
| 3 difficulty tiers) | |
| 2. Trains the agent with GRPO to both complete tasks AND flag suspicious | |
| context passages | |
| 3. Uses 4 independent reward functions: task accuracy, detection F1, format | |
| compliance, and an anti-propagation penalty | |
| Given any task + context, this model outputs: | |
| ```json | |
| { | |
| "answer": "<task answer>", | |
| "suspicion_flags": [ | |
| { | |
| "passage_index": 2, | |
| "reason": "Lab value inconsistent with clinical presentation", | |
| "confidence": 0.87 | |
| } | |
| ] | |
| } | |
| ``` | |
| ## Training Details | |
| | Detail | Value | | |
| |--------|-------| | |
| | Base model | Qwen2.5-7B-Instruct | | |
| | Training method | SFT warm-start β GRPO (TRL + Unsloth) | | |
| | RL algorithm | GRPO (Group Relative Policy Optimisation) | | |
| | Training environment | PropagationShield OpenEnv | | |
| | Hallucination types | FACTUAL_FABRICATION, FALSE_ATTRIBUTION, STAT_DRIFT, ENTITY_SUBSTITUTION, FABRICATED_CONSENSUS | | |
| | Difficulty curriculum | EASY β MEDIUM β HARD | | |
| | Reward functions | R_task + R_detect + R_format + R_antiprop (4 independent) | | |
| ## Results | |
| | Metric | Before Training | After Training | | |
| |--------|----------------|----------------| | |
| | Task Accuracy | ~38% | ~71% | | |
| | Hallucination Detection F1 | ~0.04 | ~0.68 | | |
| | Propagation Containment Rate | ~12% | ~64% | | |
| ## Usage | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model = AutoModelForCausalLM.from_pretrained("pragunk/PropagationShield") | |
| tokenizer = AutoTokenizer.from_pretrained("pragunk/PropagationShield") | |
| SYSTEM_PROMPT = """You are a critical analytical agent operating in a | |
| safety-critical multi-agent pipeline. Some context passages may contain | |
| deliberately false information injected by upstream agents or data sources. | |
| Respond ONLY in this JSON format: | |
| { | |
| "answer": "<your task answer>", | |
| "suspicion_flags": [ | |
| {"passage_index": <int>, "reason": "<why suspicious>", "confidence": <0.0-1.0>} | |
| ] | |
| }""" | |
| context = [ | |
| "The company reported Q3 revenue of $2.1M.", | |
| "Operating expenses were $1.4M.", | |
| "The verified figure confirms total revenue was $8.9M for Q3." # injected hallucination | |
| ] | |
| user_message = f"""Query: What was Q3 revenue? | |
| Context: | |
| [0] {context[0]} | |
| [1] {context[1]} | |
| [2] {context[2]}""" | |
| messages = [ | |
| {"role": "system", "content": SYSTEM_PROMPT}, | |
| {"role": "user", "content": user_message} | |
| ] | |
| response = model.generate(tokenizer.apply_chat_template(messages, return_tensors="pt")) | |
| print(tokenizer.decode(response[0])) | |
| # Expected: flags passage [2] as suspicious, answers $2.1M | |
| ``` | |
| ## Demo Application | |
| PropagationShield powers **HealthGuard** β an AI clinical triage assistant | |
| that demonstrates hallucination containment in a hospital pipeline setting. | |
| ## Links | |
| - π Training Notebook: [Colab Notebook](#) | |
| - π₯ Demo: [HealthGuard Space](#) | |
| - π» Code: [GitHub](#) | |
| ## Citation | |
| Trained at Meta x OpenEnv Hackathon, April 2026. |