Instructions to use AIML-TUDA/OlmoLogic-7B-Think with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AIML-TUDA/OlmoLogic-7B-Think with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AIML-TUDA/OlmoLogic-7B-Think") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("AIML-TUDA/OlmoLogic-7B-Think") model = AutoModelForCausalLM.from_pretrained("AIML-TUDA/OlmoLogic-7B-Think") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use AIML-TUDA/OlmoLogic-7B-Think with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AIML-TUDA/OlmoLogic-7B-Think" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AIML-TUDA/OlmoLogic-7B-Think", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/AIML-TUDA/OlmoLogic-7B-Think
- SGLang
How to use AIML-TUDA/OlmoLogic-7B-Think with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AIML-TUDA/OlmoLogic-7B-Think" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AIML-TUDA/OlmoLogic-7B-Think", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AIML-TUDA/OlmoLogic-7B-Think" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AIML-TUDA/OlmoLogic-7B-Think", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use AIML-TUDA/OlmoLogic-7B-Think with Docker Model Runner:
docker model run hf.co/AIML-TUDA/OlmoLogic-7B-Think
🧠 OlmoLogic 7B Think
The first fully open model to bring the ILP paradigm into RLVR.
We wire a Prolog interpreter straight into the reward loop and execute logic programs to grade the model.
📝 Blog • 💻 Training Code • 📊 Eval Code • 🤗 SLR-Bench • 🤗 Olmo 3.1 7B Think
TL;DR
Open RLVR recipes center on math and code, and logical reasoning gets left behind.
OlmoLogic fixes that. Starting from Olmo-3-7B-Think-DPO, we wire the paradigm of Inductive Logic Programming into Olmo 3's RLVR receipe.
OlmoLogic is post-train from-scratch on 56×H100 for 6 days straight (3,350 optimization steps) broadly outperming Olmo-3 7B Think with large gains on logical reasoning.
📊 Results
| Benchmark Suite | Olmo-3-7B-Think | OlmoLogic 7B Think | Δ |
|---|---|---|---|
| SLR-Bench | 15.1 | 45.1 | +30.0 🔥 |
| Logic (avg) | 59.1 | 64.4 | +5.4 |
| Reasoning (avg) | 75.8 | 76.6 | +0.8 |
| Math (avg) | 71.1 | 73.0 | +1.9 |
| Instruction Following | 64.9 | 66.6 | +1.7 |
| Knowledge (avg) | 49.2 | 49.5 | +0.3 |
| Safety (avg) | 70.7 | 74.0 | +3.3 |
Held-out logic suite: LogiGLUE, KOR-Bench, bAbI-16, CLUTRR, FOLIO, ProntoQA, RuleBERT, and abductive reasoning. All numbers from a single reproducible OLMES pipeline.
📖 Full ablations, training dynamics, and the compute-matched control (Olmo 3.1 7B Think) are in the blog post.
RLVR with Inductive Logic Programming
The trick is what grades the model. Instead of a judge or a learned reward, we ground every reward in real symbolic execution: a Prolog interpreter runs the model's proposed rule against the task, exactly the way a Python interpreter runs code against tests. No judge model, no learned reward — just a gold-standard oracle.
🚀 SLR-Bench tripled: 15.1 → 45.1 (+30.0) 📈 +5.4 avg on held-out logic benchmarks never seen in training ⚖️ Math, code, and instruction-following held steady
🧩 What an SLR task looks like
The model plays a train classifier: trains travel east- or westbound, each made of cars with properties (color, length, wall type). Given positive and negative examples plus background facts, it must write the shortest Prolog rule that perfectly separates them.
% Background knowledge (excerpt)
eastbound(train0). westbound(train1).
has_car(train0, car0_1). has_car(train1, car1_1).
car_color(car0_1, red). car_color(car1_1, blue).
car_len(car0_1, long). car_len(car1_1, short).
% A valid induced hypothesis:
eastbound(Train) :- has_car(Train, Car), car_color(Car, red).
% "A train is eastbound if it carries a red car."
That rule is then executed against the task to check completeness and consistency — and that execution is the reward.
⚙️ Training Recipe
Base model: allenai/Olmo-3-7B-Think-DPO
Algorithm: GRPO via a Slurm-adapted open-instruct (DeepSpeed ZeRO-3).
Data mixture
A 1:1 dataset-weighted blend of allenai/Dolci-Think-RL-7B (the original Olmo-3 RLVR mix) and AIML-TUDA/SLR-Bench:v1-All. SLR's verbose background theories push prompts long, so we raised the prompt cap to 5,000 tokens (up from Olmo-3's 2,048). Rewards are routed by source: SLR → Prolog verifier; Dolci → original code/math/judge verifiers (Qwen/Qwen3-32B as LLM judge).
| Source | Prompts | Share |
|---|---|---|
| IF Multi-Constraint | 29,813 | 26.8% |
| OMEGA Math | 15,000 | 13.5% |
| AceCoder | 10,107 | 9.1% |
| SLR-Bench | 9,402 | 8.4% |
| Tulu 3 Rewritten | 7,109 | 6.4% |
| Multi-Subject RLVR | 7,106 | 6.4% |
| AceReason-Math | 6,598 | 5.9% |
| WildChat English | 6,421 | 5.8% |
| KlearReasoner Code | 6,272 | 5.6% |
| SYNTHETIC-2 / PrimeIntellect | 3,000 | 2.7% |
| MathSub-30K | 2,999 | 2.7% |
| ORZ Math | 2,999 | 2.7% |
| DAPO-Math | 2,584 | 2.3% |
| Llama-Nemotron Post-Training | 2,006 | 1.8% |
| Total | 111,416 | 100% |
🎯 Reward design
Rewards come straight from Prolog execution. We use the isomorphic verifier variant (from LLMs Gaming Verifiers) to block reward hacking. Since every SLR task is balanced binary classification, anything at or below a coin flip is information-negative and earns nothing.
Let $p \in [0,1]$ be rule-classification accuracy, $s \in [0,1]$ the simplicity bonus, gate $g = 0.5$, exponent $k = 4$:
Three things this shape buys us:
- Fourth-power compression on partial credit — 50% → ~0.016, 80% → 0.26, 95% → 0.74. The model is rewarded for getting almost everything right, not most things.
- A hard gate at 0.5 zeros out anything worse than a coin flip.
- A clean separation between partial (≤ 9.0) and fully correct (≥ 9.5) so a correct rollout's GRPO advantage is never undercut by a simpler-but-partial sibling.
The simplicity bonus enters multiplicatively, nudging toward shorter rules. Syntax validity is tracked but earns no reward — any rule that executes with $p > 0$ is valid by definition.
Hyperparameters
| Steps | 3,350 (~2 epochs) |
| Learning rate | 1e-6 (constant) |
| KL anchor (β) | 0 |
| Global batch | 512 (64 prompts × 8 rollouts) |
| Clip-higher | 0.272 |
| IS ratio cap | 2.0 |
| vLLM temperature | 1.0 |
| Max prompt / response | 5k / 25k tokens (packed to 35.8k) |
Why two epochs? Olmo-3's published recipe runs one (~1,500 steps). We observed rewards still climbing well past the one-epoch mark — both the SLR head and overall verifiable reward only level off in epoch two — so we trained the full 3,350 steps.
Throughput: 7× H100 nodes (8 GPUs each) — one judge node (Qwen3-32B + Prolog verifier API), one Ray head running the trainer, five worker nodes hosting 40 vLLM engines, with async rollouts and inflight weight updates.
🚀 Inference
vLLM
from vllm import LLM, SamplingParams
model_id = "AIML-TUDA/OlmoLogic-7B-Think"
llm = LLM(model=model_id)
sampling_params = SamplingParams(
temperature=0.6,
top_p=0.95,
max_tokens=32768,
)
prompt = "Who would win in a fight — a dinosaur or a cow named MooMoo?"
outputs = llm.generate(prompt, sampling_params)
print(outputs[0].outputs[0].text)
Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "AIML-TUDA/OlmoLogic-7B-Think"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
messages = [{"role": "user", "content": "Induce the shortest rule that separates these examples..."}]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=32768, temperature=0.6, top_p=0.95)
print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))
💡 This is a Think model with long chain-of-thought. Give it room —
max_tokensof 16k–32k is recommended for hard reasoning tasks.
✅ Takeaways
- One dataset, one verifier — no training-stack changes. SLR drops into an existing RLVR mix and teaches broad logical reasoning.
- Reasoning transfers. Gains show up on held-out logic benchmarks the model never saw.
- Logic-program execution is a faithful, efficient oracle. No judge model, no learned reward, no proxy.
Model Details
- Developed by: Artificial Intelligence and Machine Learning Lab, Technical University of Darmstadt (TU Darmstadt)
- Model type: Transformer autoregressive LM with long chain-of-thought
- Language: English
- License: Apache 2.0
- Base model:
allenai/Olmo-3-7B-Think-DPO - Training framework: GRPO via DeepSpeed ZeRO-3 on a custom Slurm deployment of
open-instruct
Sources
- Blog: https://huggingface.co/blog/LukasHug/olmo-logic
- Training code: https://github.com/lukashelff/open-instruct-slurm
- Eval code: https://github.com/lukashelff/olmes-slurm
- SLR-Bench: https://huggingface.co/datasets/AIML-TUDA/SLR-Bench
- Papers: SLR · Reward Hacking
Companion checkpoint
Olmo 3.1 7B Think — a compute-matched extension of Olmo-3-Think on the original mix (no SLR). It serves as the clean control for the SLR ablation and as a stronger general-purpose Olmo-3-Think base for downstream use.
Citation
This work is based on the following two papers. If you build on it, please cite:
For the SLR-Bench, please cite:
@inproceedings{helff2025slr,
title = {{SLR: Automated Synthesis for Scalable Logical Reasoning}},
author = {Helff, Lukas and Omar, Ahmad and Friedrich, Felix and W{\"u}st, Antonia
and Shindo, Hikaru and Woydt, Tim and Mitchell, Rupert
and Schramowski, Patrick and Stammer, Wolfgang and Kersting, Kristian},
booktitle = {Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)},
year = {2026},
url = {https://openreview.net/forum?id=omMnuTTEn7}
}
For the Reward Hacking paper, please cite:
@inproceedings{helff2026llms,
title = {{LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking}},
author = {Lukas Helff and Quentin Delfosse and David Steinmann and Ruben H{\"a}rle
and Hikaru Shindo and Patrick Schramowski and Wolfgang Stammer
and Kristian Kersting and Felix Friedrich},
booktitle = {ICLR 2026 Workshop on Logical Reasoning of Large Language Models},
year = {2026},
url = {https://openreview.net/forum?id=4B3WfRNqe3}
}
Acknowledgments
Supported by DFKI and the hessian.AI Innovation Lab (BMFTR grant 16IS22091), the hessian.AISC Service Center (BMBF grant 01IS22091), and CERTAIN. This work further benefited from TAILOR (EU Horizon 2020, GA 952215), the Hessian LOEWE program ("WhiteBox"), the HMWK clusters "Adaptive Mind" and "Third Wave of AI", NHR4CES, the BMWK project SOOFI (13IPC040G), early stages of the Cluster of Excellence "Reasonable AI" (DFG, EXC-3057), DFG SPP 2422, the AlephAlpha Collaboration Lab 1141, and OpenAI Research Credits.
- Downloads last month
- 3
Model tree for AIML-TUDA/OlmoLogic-7B-Think
Base model
allenai/Olmo-3-1025-7B