Text Generation
Transformers
Safetensors
English
qwen2
fact-verification
claim-verification
reasoning
grpo
lora
decomposition
conversational
text-generation-inference
Instructions to use dipta007/decomposeRL-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use dipta007/decomposeRL-7b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="dipta007/decomposeRL-7b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("dipta007/decomposeRL-7b") model = AutoModelForCausalLM.from_pretrained("dipta007/decomposeRL-7b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use dipta007/decomposeRL-7b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "dipta007/decomposeRL-7b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dipta007/decomposeRL-7b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/dipta007/decomposeRL-7b
- SGLang
How to use dipta007/decomposeRL-7b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "dipta007/decomposeRL-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dipta007/decomposeRL-7b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "dipta007/decomposeRL-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dipta007/decomposeRL-7b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use dipta007/decomposeRL-7b with Docker Model Runner:
docker model run hf.co/dipta007/decomposeRL-7b
Update README
Browse files
README.md
CHANGED
|
@@ -37,7 +37,7 @@ tags:
|
|
| 37 |
- **84.6% macro-average balanced accuracy** across the same 9 benchmarks
|
| 38 |
- Out-of-domain: **60.2% balanced accuracy on Coverbench**, **77.0% on LLM-AggreFact**
|
| 39 |
- Strong on long-form evidence: 87% on Ex-FEVER, 92% on FEVEROUS, 76% on HoVer
|
| 40 |
-
- Reasoning is **fully transparent**
|
| 41 |
|
| 42 |
## Model Overview
|
| 43 |
|
|
@@ -60,28 +60,28 @@ DecomposeRL trains the policy to follow a **decompose-question-answer-verify** l
|
|
| 60 |
3. **Sufficiency check** (`<think>`): track which sub-claims are resolved; loop until every sub-claim is addressed.
|
| 61 |
4. **Final verdict** (`<verification>`): `Supported` or `Refuted`.
|
| 62 |
|
| 63 |
-
### Reward Stack
|
| 64 |
|
| 65 |
GRPO is supervised with a sum of seven rewards, grouped into three families:
|
| 66 |
|
| 67 |
**Programmatic anchors** (no judge call)
|
| 68 |
|
| 69 |
-
1. **Format**
|
| 70 |
-
2. **Question count**
|
| 71 |
-
3. **Diversity**
|
| 72 |
|
| 73 |
**Set-level signals**
|
| 74 |
|
| 75 |
-
4. **Coverage**
|
| 76 |
-
5. **Verification**
|
| 77 |
|
| 78 |
**Leave-one-out and per-question composites**
|
| 79 |
|
| 80 |
-
6. **Necessity (leave-one-out)**
|
| 81 |
-
7. **Joint multiplicative quality**
|
| 82 |
-
- **(7a) Answerability**
|
| 83 |
-
- **(7b) Atomicity**
|
| 84 |
-
- **(7c) Answer correctness**
|
| 85 |
|
| 86 |
## Quickstart
|
| 87 |
|
|
@@ -213,7 +213,7 @@ the specific companies. Next, verify the companies and the layoffs.
|
|
| 213 |
π¬ A2: The evidence document states that Nicholson worked as a consultant for
|
| 214 |
companies that laid off nearly 1,900 people since 2015, shutting down
|
| 215 |
plants in Wisconsin and other states. But it also says Baldwin cites no
|
| 216 |
-
evidence that Nicholson's work caused the layoffs and shutdowns
|
| 217 |
some element of truth, our definition of Mostly False.
|
| 218 |
|
| 219 |
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
@@ -238,7 +238,7 @@ The `--max-model-len` matches the training-time `max_seq_length=16016` (with `ma
|
|
| 238 |
|
| 239 |
## Performance
|
| 240 |
|
| 241 |
-
### In-domain
|
| 242 |
|
| 243 |
Compared against every same-size (Qwen-7B) baseline plus MiniCheck. *Micro* is pooled balanced accuracy across all in-domain samples; *Macro* is the uniform mean across the 9 datasets. **Bold** marks the column winner; *italic* marks the second-best.
|
| 244 |
|
|
@@ -269,7 +269,7 @@ Compared against every same-size (Qwen-7B) baseline plus MiniCheck. *Micro* is p
|
|
| 269 |
- **In-scope**: verifying factual claims against a *provided* evidence document (open-book fact verification, retrieval-augmented fact-checking pipelines).
|
| 270 |
- **Out-of-scope**: closed-book fact-checking, claim verification against the model's parametric knowledge, real-time news verification without supplied evidence.
|
| 271 |
|
| 272 |
-
The model is trained to say *"I don't know"* when the evidence document is silent
|
| 273 |
|
| 274 |
## Citation
|
| 275 |
|
|
|
|
| 37 |
- **84.6% macro-average balanced accuracy** across the same 9 benchmarks
|
| 38 |
- Out-of-domain: **60.2% balanced accuracy on Coverbench**, **77.0% on LLM-AggreFact**
|
| 39 |
- Strong on long-form evidence: 87% on Ex-FEVER, 92% on FEVEROUS, 76% on HoVer
|
| 40 |
+
- Reasoning is **fully transparent**: the model emits its sub-claim checklist, every question it asked, every quote from evidence, and a final label
|
| 41 |
|
| 42 |
## Model Overview
|
| 43 |
|
|
|
|
| 60 |
3. **Sufficiency check** (`<think>`): track which sub-claims are resolved; loop until every sub-claim is addressed.
|
| 61 |
4. **Final verdict** (`<verification>`): `Supported` or `Refuted`.
|
| 62 |
|
| 63 |
+
### Reward Stack: seven complementary signals
|
| 64 |
|
| 65 |
GRPO is supervised with a sum of seven rewards, grouped into three families:
|
| 66 |
|
| 67 |
**Programmatic anchors** (no judge call)
|
| 68 |
|
| 69 |
+
1. **Format**: ensures the trace is parseable; a gating prerequisite without which no other reward can be computed.
|
| 70 |
+
2. **Question count**: discourages collapsing the decomposition into one mega-question or padding it with filler.
|
| 71 |
+
3. **Diversity**: penalizes redundant questions so the policy covers distinct sub-claims instead of rewording the same one.
|
| 72 |
|
| 73 |
**Set-level signals**
|
| 74 |
|
| 75 |
+
4. **Coverage**: checks whether the verdict can be recovered from the answers alone; tests if the decomposition is *collectively sufficient*.
|
| 76 |
+
5. **Verification**: direct outcome anchor; did the final label match the gold label?
|
| 77 |
|
| 78 |
**Leave-one-out and per-question composites**
|
| 79 |
|
| 80 |
+
6. **Necessity (leave-one-out)**: the only signal that can push the policy to *remove* misleading questions; a question is necessary iff its removal would change the verdict.
|
| 81 |
+
7. **Joint multiplicative quality**: composes three per-question sub-signals so a question must clear *all* of them simultaneously rather than scoring partial credit:
|
| 82 |
+
- **(7a) Answerability**: is the question answerable from the evidence?
|
| 83 |
+
- **(7b) Atomicity**: is it a single-focus, verifiable question grounded in the claim?
|
| 84 |
+
- **(7c) Answer correctness**: is the answer faithful to the document (no contradictions, no extrinsic info)?
|
| 85 |
|
| 86 |
## Quickstart
|
| 87 |
|
|
|
|
| 213 |
π¬ A2: The evidence document states that Nicholson worked as a consultant for
|
| 214 |
companies that laid off nearly 1,900 people since 2015, shutting down
|
| 215 |
plants in Wisconsin and other states. But it also says Baldwin cites no
|
| 216 |
+
evidence that Nicholson's work caused the layoffs and shutdowns, only
|
| 217 |
some element of truth, our definition of Mostly False.
|
| 218 |
|
| 219 |
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
|
|
| 238 |
|
| 239 |
## Performance
|
| 240 |
|
| 241 |
+
### In-domain: balanced accuracy (%) on 9 claim-verification benchmarks
|
| 242 |
|
| 243 |
Compared against every same-size (Qwen-7B) baseline plus MiniCheck. *Micro* is pooled balanced accuracy across all in-domain samples; *Macro* is the uniform mean across the 9 datasets. **Bold** marks the column winner; *italic* marks the second-best.
|
| 244 |
|
|
|
|
| 269 |
- **In-scope**: verifying factual claims against a *provided* evidence document (open-book fact verification, retrieval-augmented fact-checking pipelines).
|
| 270 |
- **Out-of-scope**: closed-book fact-checking, claim verification against the model's parametric knowledge, real-time news verification without supplied evidence.
|
| 271 |
|
| 272 |
+
The model is trained to say *"I don't know"* when the evidence document is silent; please respect that signal in downstream systems instead of forcing a label.
|
| 273 |
|
| 274 |
## Citation
|
| 275 |
|