Instructions to use dipta007/decomposeRL-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dipta007/decomposeRL-7b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="dipta007/decomposeRL-7b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("dipta007/decomposeRL-7b")
model = AutoModelForCausalLM.from_pretrained("dipta007/decomposeRL-7b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use dipta007/decomposeRL-7b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "dipta007/decomposeRL-7b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dipta007/decomposeRL-7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/dipta007/decomposeRL-7b

SGLang

How to use dipta007/decomposeRL-7b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "dipta007/decomposeRL-7b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dipta007/decomposeRL-7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "dipta007/decomposeRL-7b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dipta007/decomposeRL-7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use dipta007/decomposeRL-7b with Docker Model Runner:
```
docker model run hf.co/dipta007/decomposeRL-7b
```

dipta007 commited on 8 days ago

Commit

188754b

verified ·

1 Parent(s): f5bcc44

Add detailed model card

Browse files

Files changed (1) hide show

README.md +154 -13

README.md CHANGED Viewed

@@ -1,21 +1,162 @@
 ---
-base_model: unsloth/Qwen2.5-7B-instruct
-tags:
-- text-generation-inference
-- transformers
-- unsloth
-- qwen2
 license: apache-2.0
 language:
-- en
 ---
-# Uploaded finetuned  model
-- **Developed by:** dipta007
-- **License:** apache-2.0
-- **Finetuned from model :** unsloth/Qwen2.5-7B-instruct
-This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 ---
+library_name: transformers
 license: apache-2.0
+base_model: unsloth/Qwen2.5-7B-Instruct
+pipeline_tag: text-generation
 language:
+  - en
+tags:
+  - fact-verification
+  - claim-verification
+  - reasoning
+  - grpo
+  - lora
+  - decomposition
+  - qwen2
 ---
+# DecomposeRL-7B
+**DecomposeRL-7B** is a fact-verification model that learns to decompose claims into atomic sub-questions, iteratively answer them from an evidence document, and produce a final `Supported` / `Refuted` judgment. It is trained from `Qwen2.5-7B-Instruct` with **GRPO + LoRA** using rewards over format validity, sub-claim coverage, answer necessity, and question diversity.
+## Highlights
+- **84.5% micro-average balanced accuracy** across 9 in-domain claim-verification benchmarks (sample-weighted)
+- **84.6% macro-average balanced accuracy** across the same 9 benchmarks
+- Strong on long-form evidence: 87% on Ex-FEVER, 92% on FEVEROUS, 76% on HoVer
+- Reasoning is **fully transparent** — the model emits its sub-claim checklist, every question it asked, every quote from evidence, and a final label
+## Model Overview
+| Property | Value |
+|----------|-------|
+| **Model Type** | Causal Language Model |
+| **Base Model** | unsloth/Qwen2.5-7B-Instruct |
+| **Parameters** | 7B |
+| **Training** | GRPO + LoRA (r=64, α=128) |
+| **LoRA Targets** | q, k, v, o, gate, up, down projections |
+| **Context Length** | 4,096 tokens |
+| **Language** | English |
+## Method
+DecomposeRL trains the policy to follow a **decompose-question-answer-verify** loop:
+1. **Initial analysis** (`<think>`): identify atomic sub-claims, classify them (entity / relational / quantitative / causal / temporal / comparative), and flag independently falsifiable sub-claims.
+2. **Iterative QA cycle** (`<question>` → `<answer>`): for each sub-claim or ambiguity, ask a single targeted question and answer it **only** from the evidence document, quoting passages directly (or saying *"I don't know"* if the evidence is silent).
+3. **Sufficiency check** (`<think>`): track which sub-claims are resolved; loop until every sub-claim is addressed.
+4. **Final verdict** (`<verification>`): `Supported` or `Refuted`.
+### Training Rewards
+GRPO is supervised with a composite reward over generated trajectories:
+- **Format reward** — well-formed `<think>`/`<question>`/`<answer>`/`<verification>` structure
+- **Verification reward** — correct final label
+- **Necessity reward** — generated sub-questions are necessary to verify the claim
+- **Diversity reward** — sub-questions cover distinct aspects (MMR-based)
+- **Coverage reward** — sub-questions jointly cover the claim
+## Quickstart
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "dipta007/decomposeRL-7b"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto",
+)
+evidence_doc = (
+    "The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, "
+    "France. It is named after the engineer Gustave Eiffel, whose company designed and "
+    "built the tower from 1887 to 1889. Locally nicknamed 'La dame de fer', it was "
+    "constructed as the centerpiece of the 1889 World's Fair. The tower is 330 metres "
+    "(1,083 ft) tall."
+)
+claim = "The Eiffel Tower was completed in 1887 and stands 330 metres tall."
+user_prompt = f"""You are tasked with systematically verifying the accuracy of a claim. You will be provided with a claim to verify and an evidence document to consult.
+Here is the evidence document you should consult:
+<evidence_document>
+{evidence_doc}
+</evidence_document>
+Here is the claim you need to verify:
+<claim>
+{claim}
+</claim>
+Your task is to verify whether this claim is Supported or Refuted through an iterative process of asking questions and gathering information.
+# Verification Process
+Begin by analyzing the claim in <think> tags, then enter an iterative cycle of <question>/<answer> pairs answered ONLY from the evidence document. When every sub-claim is addressed, output your final label inside <verification> tags. The label must be exactly one of: Supported, Refuted.
+Stop immediately after the closing </verification> tag.
+Begin your verification process now."""
+messages = [{"role": "user", "content": user_prompt}]
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer([text], return_tensors="pt").to(model.device)
+out = model.generate(**inputs, max_new_tokens=2048, temperature=0.7, do_sample=True)
+response = tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
+print(response)
+```
+The full training-time prompt template (with extended instructions, a worked example, and sub-claim classification guidance) lives in `decomposer/prompts.py` of the source repo and is what gives the strongest performance.
+### Parsing the Output
+The final label is between the last `<verification>` and `</verification>` tags:
+```python
+import re
+match = re.search(r"<verification>\s*(Supported|Refuted)\s*</verification>", response)
+label = match.group(1) if match else None
+```
+### Using vLLM
+```bash
+vllm serve dipta007/decomposeRL-7b --max-model-len 4096
+```
+## Performance
+Balanced accuracy on 9 in-domain claim-verification benchmarks (best checkpoint, `step 5400`):
+| Dataset | # Examples | Balanced Acc |
+|---|---:|---:|
+| ClaimDecomp | 116 | 0.9324 |
+| FEVEROUS | 2,962 | 0.9234 |
+| Ex-FEVER | 4,071 | 0.8724 |
+| Fool-Me-Twice | 1,380 | 0.8584 |
+| PubHealthFact | 985 | 0.8468 |
+| PubMedClaim | 445 | 0.8338 |
+| WiCE | 143 | 0.8138 |
+| HoVer | 4,000 | 0.7635 |
+| FEVER | 401 | 0.7304 |
+| **Micro-average** (sample-weighted) | 14,503 | **0.8445** |
+| **Macro-average** | — | **0.8417** |
+## Intended Use
+- **In-scope**: verifying factual claims against a *provided* evidence document (open-book fact verification, retrieval-augmented fact-checking pipelines).
+- **Out-of-scope**: closed-book fact-checking, claim verification against the model's parametric knowledge, real-time news verification without supplied evidence.
+The model is trained to say *"I don't know"* when the evidence document is silent — please respect that signal in downstream systems instead of forcing a label.
+## License
+Released under the Apache 2.0 License.