Instructions to use groc/recursive-sat-qwen2.5-1.5b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use groc/recursive-sat-qwen2.5-1.5b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="groc/recursive-sat-qwen2.5-1.5b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("groc/recursive-sat-qwen2.5-1.5b")
model = AutoModelForCausalLM.from_pretrained("groc/recursive-sat-qwen2.5-1.5b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use groc/recursive-sat-qwen2.5-1.5b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "groc/recursive-sat-qwen2.5-1.5b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "groc/recursive-sat-qwen2.5-1.5b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/groc/recursive-sat-qwen2.5-1.5b

SGLang

How to use groc/recursive-sat-qwen2.5-1.5b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "groc/recursive-sat-qwen2.5-1.5b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "groc/recursive-sat-qwen2.5-1.5b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "groc/recursive-sat-qwen2.5-1.5b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "groc/recursive-sat-qwen2.5-1.5b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use groc/recursive-sat-qwen2.5-1.5b with Docker Model Runner:
```
docker model run hf.co/groc/recursive-sat-qwen2.5-1.5b
```

recursive-sat-qwen2.5-1.5b

This is a paper model: the REC-3 release artifact from a paper-aligned replication of recursive SAT reasoning at 1.5B scale.

It is a supervised fine-tune of Qwen/Qwen2.5-1.5B-Instruct trained on recursive SAT traces derived from SATBench with explicit <call> / <return> structure. The goal is research replication and analysis, not general-purpose production use.

What This Model Is

Base model: Qwen/Qwen2.5-1.5B-Instruct
Release artifact: results/runs/REC-3/published_model
Training run: REC-3
Seed: 303
Config: configs/rec_seed303.yaml
Dataset source: LLM4Code/SATBench
Task: SAT / UNSAT classification via recursive trace supervision

Why REC-3

REC-1 and REC-3 tie on mean accuracy, but REC-3 is the cleaner release candidate on end-to-end behavior:

Mean accuracy: 45.33%
Easy: 39.0%
Medium: 54.0%
Hard: 43.0%
Parse failure rate: 7.0%
Valid trace rate: 99.0%

Compared with REC-1, REC-3 keeps the same mean accuracy while reducing parse failure (7.0% vs 8.33%), improving hard accuracy (43.0% vs 42.0%), and slightly improving valid trace rate (99.0% vs 98.33%).

Important Caveat

This is a paper model, not a claim of robust general recursive reasoning.

The underlying paper draft treats the result as a qualified replication:

recursive SFT improves end-to-end SATBench accuracy over raw direct prompting
the strongest gain is on medium-difficulty SAT instances
absolute performance remains far below the 3B source-paper result
recursion behavior is still shallow overall

Use this release as a research artifact tied to the experiment, metrics, and discussion in the paper repo.

Training Summary

Objective: recursive_sft
Train examples: 74,827
Validation examples: 619
Global step: 46,770
Best checkpoint: checkpoint-9354
Accelerator used for the main run: cuda

Evaluation Summary

Main held-out evaluation uses 100 examples each from SATBench easy, medium, and hard buckets.

Baseline vs released model:

Base direct prompt mean accuracy: 37.33%
REC-3 mean accuracy: 45.33%
Absolute gain: +8.0 points
Base parse failure rate: 28.67%
REC-3 parse failure rate: 7.0%

Prompt Format

The model was trained on recursive traces using:

<call> ... </call> for subproblem decomposition
<return> ... </return> for compact returned answers

It is best treated as a specialized research model for this protocolized SAT setting.

Files In This Release

model.safetensors
config.json
generation_config.json
tokenizer.json
tokenizer_config.json
chat_template.jinja
export_metadata.json

Intended Use

paper artifact release
replication reference
SAT recursive-trace evaluation
qualitative inspection of recursive protocol behavior

Out Of Scope

production reasoning system
general mathematical reasoning benchmark model
safety-critical use
claims beyond the SATBench replication setting

Downloads last month: 7

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for groc/recursive-sat-qwen2.5-1.5b

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct