Instructions to use groc/recursive-sat-qwen2.5-1.5b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use groc/recursive-sat-qwen2.5-1.5b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="groc/recursive-sat-qwen2.5-1.5b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("groc/recursive-sat-qwen2.5-1.5b") model = AutoModelForCausalLM.from_pretrained("groc/recursive-sat-qwen2.5-1.5b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use groc/recursive-sat-qwen2.5-1.5b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "groc/recursive-sat-qwen2.5-1.5b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "groc/recursive-sat-qwen2.5-1.5b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/groc/recursive-sat-qwen2.5-1.5b
- SGLang
How to use groc/recursive-sat-qwen2.5-1.5b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "groc/recursive-sat-qwen2.5-1.5b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "groc/recursive-sat-qwen2.5-1.5b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "groc/recursive-sat-qwen2.5-1.5b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "groc/recursive-sat-qwen2.5-1.5b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use groc/recursive-sat-qwen2.5-1.5b with Docker Model Runner:
docker model run hf.co/groc/recursive-sat-qwen2.5-1.5b
recursive-sat-qwen2.5-1.5b
This is a paper model: the REC-3 release artifact from a paper-aligned replication of recursive SAT reasoning at 1.5B scale.
It is a supervised fine-tune of Qwen/Qwen2.5-1.5B-Instruct trained on recursive SAT traces derived from SATBench with explicit <call> / <return> structure. The goal is research replication and analysis, not general-purpose production use.
What This Model Is
- Base model:
Qwen/Qwen2.5-1.5B-Instruct - Release artifact:
results/runs/REC-3/published_model - Training run:
REC-3 - Seed:
303 - Config:
configs/rec_seed303.yaml - Dataset source:
LLM4Code/SATBench - Task: SAT / UNSAT classification via recursive trace supervision
Why REC-3
REC-1 and REC-3 tie on mean accuracy, but REC-3 is the cleaner release candidate on end-to-end behavior:
- Mean accuracy:
45.33% - Easy:
39.0% - Medium:
54.0% - Hard:
43.0% - Parse failure rate:
7.0% - Valid trace rate:
99.0%
Compared with REC-1, REC-3 keeps the same mean accuracy while reducing parse failure (7.0% vs 8.33%), improving hard accuracy (43.0% vs 42.0%), and slightly improving valid trace rate (99.0% vs 98.33%).
Important Caveat
This is a paper model, not a claim of robust general recursive reasoning.
The underlying paper draft treats the result as a qualified replication:
- recursive SFT improves end-to-end SATBench accuracy over raw direct prompting
- the strongest gain is on medium-difficulty SAT instances
- absolute performance remains far below the 3B source-paper result
- recursion behavior is still shallow overall
Use this release as a research artifact tied to the experiment, metrics, and discussion in the paper repo.
Training Summary
- Objective:
recursive_sft - Train examples:
74,827 - Validation examples:
619 - Global step:
46,770 - Best checkpoint:
checkpoint-9354 - Accelerator used for the main run:
cuda
Evaluation Summary
Main held-out evaluation uses 100 examples each from SATBench easy, medium, and hard buckets.
Baseline vs released model:
- Base direct prompt mean accuracy:
37.33% REC-3mean accuracy:45.33%- Absolute gain:
+8.0 points - Base parse failure rate:
28.67% REC-3parse failure rate:7.0%
Prompt Format
The model was trained on recursive traces using:
<call> ... </call>for subproblem decomposition<return> ... </return>for compact returned answers
It is best treated as a specialized research model for this protocolized SAT setting.
Files In This Release
model.safetensorsconfig.jsongeneration_config.jsontokenizer.jsontokenizer_config.jsonchat_template.jinjaexport_metadata.json
Intended Use
- paper artifact release
- replication reference
- SAT recursive-trace evaluation
- qualitative inspection of recursive protocol behavior
Out Of Scope
- production reasoning system
- general mathematical reasoning benchmark model
- safety-critical use
- claims beyond the SATBench replication setting
- Downloads last month
- 7