Text Generation
Transformers
Safetensors
English
qwen2
fact-verification
claim-verification
reasoning
grpo
lora
decomposition
conversational
text-generation-inference
Instructions to use dipta007/decomposeRL-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use dipta007/decomposeRL-7b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="dipta007/decomposeRL-7b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("dipta007/decomposeRL-7b") model = AutoModelForCausalLM.from_pretrained("dipta007/decomposeRL-7b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use dipta007/decomposeRL-7b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "dipta007/decomposeRL-7b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dipta007/decomposeRL-7b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/dipta007/decomposeRL-7b
- SGLang
How to use dipta007/decomposeRL-7b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "dipta007/decomposeRL-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dipta007/decomposeRL-7b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "dipta007/decomposeRL-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dipta007/decomposeRL-7b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use dipta007/decomposeRL-7b with Docker Model Runner:
docker model run hf.co/dipta007/decomposeRL-7b
Add detailed model card
Browse files
README.md
CHANGED
|
@@ -1,21 +1,162 @@
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
tags:
|
| 4 |
-
- text-generation-inference
|
| 5 |
-
- transformers
|
| 6 |
-
- unsloth
|
| 7 |
-
- qwen2
|
| 8 |
license: apache-2.0
|
|
|
|
|
|
|
| 9 |
language:
|
| 10 |
-
- en
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
---
|
| 12 |
|
| 13 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
|
| 15 |
-
|
| 16 |
-
- **License:** apache-2.0
|
| 17 |
-
- **Finetuned from model :** unsloth/Qwen2.5-7B-instruct
|
| 18 |
|
| 19 |
-
|
| 20 |
|
| 21 |
-
|
|
|
|
| 1 |
---
|
| 2 |
+
library_name: transformers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
license: apache-2.0
|
| 4 |
+
base_model: unsloth/Qwen2.5-7B-Instruct
|
| 5 |
+
pipeline_tag: text-generation
|
| 6 |
language:
|
| 7 |
+
- en
|
| 8 |
+
tags:
|
| 9 |
+
- fact-verification
|
| 10 |
+
- claim-verification
|
| 11 |
+
- reasoning
|
| 12 |
+
- grpo
|
| 13 |
+
- lora
|
| 14 |
+
- decomposition
|
| 15 |
+
- qwen2
|
| 16 |
---
|
| 17 |
|
| 18 |
+
# DecomposeRL-7B
|
| 19 |
+
|
| 20 |
+
**DecomposeRL-7B** is a fact-verification model that learns to decompose claims into atomic sub-questions, iteratively answer them from an evidence document, and produce a final `Supported` / `Refuted` judgment. It is trained from `Qwen2.5-7B-Instruct` with **GRPO + LoRA** using rewards over format validity, sub-claim coverage, answer necessity, and question diversity.
|
| 21 |
+
|
| 22 |
+
## Highlights
|
| 23 |
+
|
| 24 |
+
- **84.5% micro-average balanced accuracy** across 9 in-domain claim-verification benchmarks (sample-weighted)
|
| 25 |
+
- **84.6% macro-average balanced accuracy** across the same 9 benchmarks
|
| 26 |
+
- Strong on long-form evidence: 87% on Ex-FEVER, 92% on FEVEROUS, 76% on HoVer
|
| 27 |
+
- Reasoning is **fully transparent** — the model emits its sub-claim checklist, every question it asked, every quote from evidence, and a final label
|
| 28 |
+
|
| 29 |
+
## Model Overview
|
| 30 |
+
|
| 31 |
+
| Property | Value |
|
| 32 |
+
|----------|-------|
|
| 33 |
+
| **Model Type** | Causal Language Model |
|
| 34 |
+
| **Base Model** | unsloth/Qwen2.5-7B-Instruct |
|
| 35 |
+
| **Parameters** | 7B |
|
| 36 |
+
| **Training** | GRPO + LoRA (r=64, α=128) |
|
| 37 |
+
| **LoRA Targets** | q, k, v, o, gate, up, down projections |
|
| 38 |
+
| **Context Length** | 4,096 tokens |
|
| 39 |
+
| **Language** | English |
|
| 40 |
+
|
| 41 |
+
## Method
|
| 42 |
+
|
| 43 |
+
DecomposeRL trains the policy to follow a **decompose-question-answer-verify** loop:
|
| 44 |
+
|
| 45 |
+
1. **Initial analysis** (`<think>`): identify atomic sub-claims, classify them (entity / relational / quantitative / causal / temporal / comparative), and flag independently falsifiable sub-claims.
|
| 46 |
+
2. **Iterative QA cycle** (`<question>` → `<answer>`): for each sub-claim or ambiguity, ask a single targeted question and answer it **only** from the evidence document, quoting passages directly (or saying *"I don't know"* if the evidence is silent).
|
| 47 |
+
3. **Sufficiency check** (`<think>`): track which sub-claims are resolved; loop until every sub-claim is addressed.
|
| 48 |
+
4. **Final verdict** (`<verification>`): `Supported` or `Refuted`.
|
| 49 |
+
|
| 50 |
+
### Training Rewards
|
| 51 |
+
|
| 52 |
+
GRPO is supervised with a composite reward over generated trajectories:
|
| 53 |
+
|
| 54 |
+
- **Format reward** — well-formed `<think>`/`<question>`/`<answer>`/`<verification>` structure
|
| 55 |
+
- **Verification reward** — correct final label
|
| 56 |
+
- **Necessity reward** — generated sub-questions are necessary to verify the claim
|
| 57 |
+
- **Diversity reward** — sub-questions cover distinct aspects (MMR-based)
|
| 58 |
+
- **Coverage reward** — sub-questions jointly cover the claim
|
| 59 |
+
|
| 60 |
+
## Quickstart
|
| 61 |
+
|
| 62 |
+
```python
|
| 63 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 64 |
+
|
| 65 |
+
model_name = "dipta007/decomposeRL-7b"
|
| 66 |
+
|
| 67 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 68 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 69 |
+
model_name,
|
| 70 |
+
torch_dtype="auto",
|
| 71 |
+
device_map="auto",
|
| 72 |
+
)
|
| 73 |
+
|
| 74 |
+
evidence_doc = (
|
| 75 |
+
"The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, "
|
| 76 |
+
"France. It is named after the engineer Gustave Eiffel, whose company designed and "
|
| 77 |
+
"built the tower from 1887 to 1889. Locally nicknamed 'La dame de fer', it was "
|
| 78 |
+
"constructed as the centerpiece of the 1889 World's Fair. The tower is 330 metres "
|
| 79 |
+
"(1,083 ft) tall."
|
| 80 |
+
)
|
| 81 |
+
claim = "The Eiffel Tower was completed in 1887 and stands 330 metres tall."
|
| 82 |
+
|
| 83 |
+
user_prompt = f"""You are tasked with systematically verifying the accuracy of a claim. You will be provided with a claim to verify and an evidence document to consult.
|
| 84 |
+
|
| 85 |
+
Here is the evidence document you should consult:
|
| 86 |
+
|
| 87 |
+
<evidence_document>
|
| 88 |
+
{evidence_doc}
|
| 89 |
+
</evidence_document>
|
| 90 |
+
|
| 91 |
+
Here is the claim you need to verify:
|
| 92 |
+
|
| 93 |
+
<claim>
|
| 94 |
+
{claim}
|
| 95 |
+
</claim>
|
| 96 |
+
|
| 97 |
+
Your task is to verify whether this claim is Supported or Refuted through an iterative process of asking questions and gathering information.
|
| 98 |
+
|
| 99 |
+
# Verification Process
|
| 100 |
+
|
| 101 |
+
Begin by analyzing the claim in <think> tags, then enter an iterative cycle of <question>/<answer> pairs answered ONLY from the evidence document. When every sub-claim is addressed, output your final label inside <verification> tags. The label must be exactly one of: Supported, Refuted.
|
| 102 |
+
|
| 103 |
+
Stop immediately after the closing </verification> tag.
|
| 104 |
+
|
| 105 |
+
Begin your verification process now."""
|
| 106 |
+
|
| 107 |
+
messages = [{"role": "user", "content": user_prompt}]
|
| 108 |
+
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
| 109 |
+
inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
| 110 |
+
|
| 111 |
+
out = model.generate(**inputs, max_new_tokens=2048, temperature=0.7, do_sample=True)
|
| 112 |
+
response = tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
|
| 113 |
+
print(response)
|
| 114 |
+
```
|
| 115 |
+
|
| 116 |
+
The full training-time prompt template (with extended instructions, a worked example, and sub-claim classification guidance) lives in `decomposer/prompts.py` of the source repo and is what gives the strongest performance.
|
| 117 |
+
|
| 118 |
+
### Parsing the Output
|
| 119 |
+
|
| 120 |
+
The final label is between the last `<verification>` and `</verification>` tags:
|
| 121 |
+
|
| 122 |
+
```python
|
| 123 |
+
import re
|
| 124 |
+
|
| 125 |
+
match = re.search(r"<verification>\s*(Supported|Refuted)\s*</verification>", response)
|
| 126 |
+
label = match.group(1) if match else None
|
| 127 |
+
```
|
| 128 |
+
|
| 129 |
+
### Using vLLM
|
| 130 |
+
|
| 131 |
+
```bash
|
| 132 |
+
vllm serve dipta007/decomposeRL-7b --max-model-len 4096
|
| 133 |
+
```
|
| 134 |
+
|
| 135 |
+
## Performance
|
| 136 |
+
|
| 137 |
+
Balanced accuracy on 9 in-domain claim-verification benchmarks (best checkpoint, `step 5400`):
|
| 138 |
+
|
| 139 |
+
| Dataset | # Examples | Balanced Acc |
|
| 140 |
+
|---|---:|---:|
|
| 141 |
+
| ClaimDecomp | 116 | 0.9324 |
|
| 142 |
+
| FEVEROUS | 2,962 | 0.9234 |
|
| 143 |
+
| Ex-FEVER | 4,071 | 0.8724 |
|
| 144 |
+
| Fool-Me-Twice | 1,380 | 0.8584 |
|
| 145 |
+
| PubHealthFact | 985 | 0.8468 |
|
| 146 |
+
| PubMedClaim | 445 | 0.8338 |
|
| 147 |
+
| WiCE | 143 | 0.8138 |
|
| 148 |
+
| HoVer | 4,000 | 0.7635 |
|
| 149 |
+
| FEVER | 401 | 0.7304 |
|
| 150 |
+
| **Micro-average** (sample-weighted) | 14,503 | **0.8445** |
|
| 151 |
+
| **Macro-average** | — | **0.8417** |
|
| 152 |
+
|
| 153 |
+
## Intended Use
|
| 154 |
+
|
| 155 |
+
- **In-scope**: verifying factual claims against a *provided* evidence document (open-book fact verification, retrieval-augmented fact-checking pipelines).
|
| 156 |
+
- **Out-of-scope**: closed-book fact-checking, claim verification against the model's parametric knowledge, real-time news verification without supplied evidence.
|
| 157 |
|
| 158 |
+
The model is trained to say *"I don't know"* when the evidence document is silent — please respect that signal in downstream systems instead of forcing a label.
|
|
|
|
|
|
|
| 159 |
|
| 160 |
+
## License
|
| 161 |
|
| 162 |
+
Released under the Apache 2.0 License.
|