Instructions to use allenai/Olmo-3-7B-RL-Zero-Code with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use allenai/Olmo-3-7B-RL-Zero-Code with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="allenai/Olmo-3-7B-RL-Zero-Code")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("allenai/Olmo-3-7B-RL-Zero-Code")
model = AutoModelForCausalLM.from_pretrained("allenai/Olmo-3-7B-RL-Zero-Code", device_map="auto")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use allenai/Olmo-3-7B-RL-Zero-Code with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "allenai/Olmo-3-7B-RL-Zero-Code"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "allenai/Olmo-3-7B-RL-Zero-Code",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/allenai/Olmo-3-7B-RL-Zero-Code

SGLang

How to use allenai/Olmo-3-7B-RL-Zero-Code with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "allenai/Olmo-3-7B-RL-Zero-Code" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "allenai/Olmo-3-7B-RL-Zero-Code",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "allenai/Olmo-3-7B-RL-Zero-Code" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "allenai/Olmo-3-7B-RL-Zero-Code",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use allenai/Olmo-3-7B-RL-Zero-Code with Docker Model Runner:
```
docker model run hf.co/allenai/Olmo-3-7B-RL-Zero-Code
```

Duplicate weight files across branches: step_100==step_1000, step_200==step_2000, step_300==main

by emirhanboge - opened Mar 17

Discussion

emirhanboge

Mar 17

•

edited Mar 17

Duplicate weight files across branches: step_100==step_1000, step_200==step_2000, step_300==main

Summary

Three pairs of branches contain identical model weight files (verified via LFS SHA-256):

Branch A	Branch B	LFS SHA-256 (shard 1, first 16 chars)
`step_100`	`step_1000`	`e5f78246eb9773f0`
`step_200`	`step_2000`	`53d993f3c56a3ec1`
`step_300`	`main`	`7714e11a8d367ebc`

Reproduction

from huggingface_hub import HfApi
from collections import defaultdict

api = HfApi()
shard1_hashes = {}

branches = [f'step_{i}' for i in range(100, 3001, 100)] + ['main']
for rev in branches:
    try:
        files = api.list_repo_tree('allenai/Olmo-3-7B-RL-Zero-Code', revision=rev)
        for f in files:
            name = getattr(f, 'rfilename', getattr(f, 'path', ''))
            if 'model-00001' in name and hasattr(f, 'lfs') and f.lfs:
                shard1_hashes[rev] = f.lfs.sha256[:16]
    except:
        pass

groups = defaultdict(list)
for rev, h in shard1_hashes.items():
    groups[h].append(rev)

for h, revs in groups.items():
    if len(revs) > 1:
        print(f'DUPLICATE: {revs} -> {h}')

Output:

DUPLICATE: ['step_100', 'step_1000'] -> e5f78246eb9773f0
DUPLICATE: ['step_200', 'step_2000'] -> 53d993f3c56a3ec1
DUPLICATE: ['step_300', 'main'] -> 7714e11a8d367ebc

The pattern suggests a labeling error during upload: each X00 step was duplicated as X000 (100→1000, 200→2000, 300→main). Researchers using these checkpoints to study how representations evolve during RL training will see false patterns.

Are the correct step_1000 and step_2000 weights available? Could they be re-uploaded to the correct branches?
Which training step does main correspond to, is it intended to be step_300, or should it be the final checkpoint (step_3000)?

Environment

Verified with huggingface_hub version 0.36.2
Also confirmed via local md5sum after snapshot_download with force_download=True

Thank you for releasing intermediate checkpoints, they are extremely valuable for research.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment