| | --- |
| | base_model: meta-llama/Llama-3.2-1B-Instruct |
| | library_name: peft |
| | pipeline_tag: text-generation |
| | tags: |
| | - grpo |
| | - lora |
| | - trl |
| | - arithmetic |
| | - reasoning |
| | language: |
| | - en |
| | --- |
| | |
| | # Arithmetic King 1B |
| |
|
| | Repository id: `harshbhatt7585/arithmetic-king-1b`. |
| |
|
| | This model is a PEFT LoRA adapter trained with TRL GRPO on synthetic arithmetic episodes. It is tuned to answer in XML format: |
| |
|
| | - `<reasoning>...</reasoning>` |
| | - `<answer>...</answer>` |
| |
|
| | ## Artifact Type |
| |
|
| | This repo contains **adapter weights only** (not full base model weights). Use with base model: |
| |
|
| | - `meta-llama/Llama-3.2-1B-Instruct` |
| |
|
| | ## Training Configuration |
| |
|
| | - Trainer: TRL `GRPOTrainer` |
| | - Fine-tuning method: LoRA (PEFT) |
| | - Environment: arithmetic reasoning episodes |
| | - Reward: correctness reward + XML-format bonus |
| | - Output style target: short reasoning plus final integer answer |
| |
|
| |
|
| | ## Intended Use |
| |
|
| | - Arithmetic-reasoning RLVR experiments |
| | - GRPO/LoRA workflow demonstrations |
| | - Adapter-centric fine-tuning studies on small instruct models |
| |
|
| | ## Limitations |
| |
|
| | - Trained on synthetic arithmetic prompts only |
| | - Limited transfer to broader reasoning/math tasks |
| | - May produce malformed XML or incorrect answers |
| | - Not suitable for high-stakes use |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForCausalLM |
| | from peft import PeftModel |
| | |
| | base_id = "meta-llama/Llama-3.2-1B-Instruct" |
| | adapter_id = "harshbhatt7585/arithmetic-king-1b" |
| | |
| | tokenizer = AutoTokenizer.from_pretrained(base_id) |
| | base_model = AutoModelForCausalLM.from_pretrained(base_id) |
| | model = PeftModel.from_pretrained(base_model, adapter_id) |
| | |
| | prompt = "Solve: (12 + 3) * 2. Return XML with <reasoning> and <answer>." |
| | inputs = tokenizer(prompt, return_tensors="pt") |
| | out = model.generate(**inputs, max_new_tokens=128) |
| | print(tokenizer.decode(out[0], skip_special_tokens=True)) |
| | ``` |
| |
|
| | ## License |
| |
|
| | Adapter usage inherits base model license and terms: |
| |
|
| | - `meta-llama/Llama-3.2-1B-Instruct` |
| |
|