Instructions to use arvindcr4/tinker-rl-arithmetic_trajectory-llama-3.2-1b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use arvindcr4/tinker-rl-arithmetic_trajectory-llama-3.2-1b with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
tinker-rl-arithmetic_trajectory-llama-3.2-1b
LoRA adapters trained with GRPO on top of meta-llama/Llama-3.2-1B using the
Tinker cloud training service.
Part of the TinkerRL-Bench release for our NeurIPS submission
"A Unified Benchmark for RL Post-Training of Language Models"
(repo).
Training configuration
| Base model | meta-llama/Llama-3.2-1B |
| Experiment tag | arithmetic_trajectory |
| Campaign | None |
| Task | arithmetic |
| Seed | None |
| LoRA rank | 32 |
| Learning rate | None |
| Group size | None |
| Training steps | None |
| Platform | Tinker (tinker) |
| Training run ID | 39aa5eb2-e234-5a95-ab68-896e4cac8c45 |
Metrics
| Metric | Value |
|---|
Checkpoints in this repo
| Step | Original Tinker URI | Local path |
|---|---|---|
sampler_weights/100 |
tinker://39aa5eb2-e234-5a95-ab68-896e4cac8c45:train:0/sampler_weights/000100 | 100 |
sampler_weights/20 |
tinker://39aa5eb2-e234-5a95-ab68-896e4cac8c45:train:0/sampler_weights/000020 | 20 |
sampler_weights/40 |
tinker://39aa5eb2-e234-5a95-ab68-896e4cac8c45:train:0/sampler_weights/000040 | 40 |
sampler_weights/60 |
tinker://39aa5eb2-e234-5a95-ab68-896e4cac8c45:train:0/sampler_weights/000060 | 60 |
sampler_weights/80 |
tinker://39aa5eb2-e234-5a95-ab68-896e4cac8c45:train:0/sampler_weights/000080 | 80 |
sampler_weights/final |
tinker://39aa5eb2-e234-5a95-ab68-896e4cac8c45:train:0/sampler_weights/final | final |
How to load
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = "meta-llama/Llama-3.2-1B"
adapter = "arvindcr4/tinker-rl-arithmetic_trajectory-llama-3.2-1b"
tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(model, adapter, subfolder="final") # or "<step>"
Companion releases
- Dataset:
arvindcr4/tinker-rl-bench-wandb— all 334 W&B runs + 9,255 history rows - Manifest:
arvindcr4/tinker-rl-bench-checkpoints— full catalogue of every Tinker URI - Code:
pes-llm-research/tinker-rl-lab
Citation
@misc{tinkerrlbench2026,
title = {A Unified Benchmark for RL Post-Training of Language Models},
author = {Arvind, C. R. and Jeyaraj, Sandhya},
year = {2026},
note = {NeurIPS submission, https://github.com/pes-llm-research/tinker-rl-lab}
}
License
Apache 2.0. The underlying base model retains its original license —
please check meta-llama/Llama-3.2-1B for any usage restrictions.
- Downloads last month
- -
Model tree for arvindcr4/tinker-rl-arithmetic_trajectory-llama-3.2-1b
Base model
meta-llama/Llama-3.2-1B