OLMo-3 Recurrent Adapter - Answer-Only SFT (rec=1, coda=2, tied)
This is a Recurrent Adapter Model fine-tuned on MetaMathQA for mathematical reasoning, built on top of OLMo-3-1025-7B.
Model Details
- Base Model: allenai/OLMo-3-1025-7B
- Architecture: Recurrent Adapter (1 recurrent layer(s) + 2 coda layers)
- LM Head: tied
- Training Format: Answer-Only
- Mean Recurrence: 32 iterations
- Final Training Loss: 0.1323
- Validation Loss: 3.5968
Training Details
- Dataset: MetaMathQA
- Learning Rate: 1e-4
- Training Steps: 50,000
- Sequence Length: 2048 (Answer-only)
- Gradient Accumulation: 8 steps
Recurrent Adapter Architecture
The model uses a recurrent adapter architecture where:
- The frozen base model extracts initial representations
- A recurrent block (1 layer(s)) processes information over multiple iterations (mean=32)
- A coda block (2 layer(s)) produces the final output
- Only the adapter layers are trained
Usage
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"hanseungwook/olmo3-recurrent-adapter-sft",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-3-1025-7B")
prompt = "What is 25 * 37?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
num_recurrence_steps=32,
temperature=0.7,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
License
Apache 2.0
- Downloads last month
- 188
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support