hanseungwook/recurrent-adapter-metamath-cot-r32
This is a Recurrent Adapter Model fine-tuned on MetaMathQA for mathematical reasoning.
Model Details
- Base Model: Qwen/Qwen3-8B
- Architecture: Recurrent Adapter (1 recurrent layer + 2 coda layers)
- Training Format: Chain-of-Thought (CoT)
- Mean Recurrence: 32 iterations
- Total Parameters: 8.77B
- Trainable Parameters: 579M (6.60%)
- Final Training Loss: 0.1115
Training Details
- Dataset: MetaMathQA
- Learning Rate: 1e-4
- Training Steps: 50,000
- Sequence Length: 4096 (CoT)
- Gradient Accumulation: 8 steps
Recurrent Adapter Architecture
The model uses a recurrent adapter architecture where:
- The frozen base model extracts initial representations
- A recurrent block processes information over multiple iterations (mean=32)
- A coda block produces the final output
- Only the adapter layers are trained (~6.6% of total parameters)
Usage
import torch
from transformers import AutoTokenizer
# Load model (requires trust_remote_code=True)
from src.hf_model import RecurrentAdapterModel
model = RecurrentAdapterModel.from_pretrained(
"hanseungwook/recurrent-adapter-metamath-cot-r32",
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")
# Generate
prompt = "What is 25 * 37?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=256,
num_recurrence_steps=32,
temperature=0.7,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Citation
If you use this model, please cite:
@misc{recurrent-adapters-2025,
author = {Han, Seungwook},
title = {Recurrent Adapters for Mathematical Reasoning},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/hanseungwook/recurrent-adapter-metamath-cot-r32}}
}
License
Apache 2.0
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support