--- base_model: meta-llama/Llama-3.2-1B-Instruct library_name: peft pipeline_tag: text-generation tags: - grpo - lora - trl - arithmetic - reasoning language: - en --- # Arithmetic King 1B Repository id: `harshbhatt7585/arithmetic-king-1b`. This model is a PEFT LoRA adapter trained with TRL GRPO on synthetic arithmetic episodes. It is tuned to answer in XML format: - `...` - `...` ## Artifact Type This repo contains **adapter weights only** (not full base model weights). Use with base model: - `meta-llama/Llama-3.2-1B-Instruct` ## Training Configuration - Trainer: TRL `GRPOTrainer` - Fine-tuning method: LoRA (PEFT) - Environment: arithmetic reasoning episodes - Reward: correctness reward + XML-format bonus - Output style target: short reasoning plus final integer answer ## Intended Use - Arithmetic-reasoning RLVR experiments - GRPO/LoRA workflow demonstrations - Adapter-centric fine-tuning studies on small instruct models ## Limitations - Trained on synthetic arithmetic prompts only - Limited transfer to broader reasoning/math tasks - May produce malformed XML or incorrect answers - Not suitable for high-stakes use ## Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM from peft import PeftModel base_id = "meta-llama/Llama-3.2-1B-Instruct" adapter_id = "harshbhatt7585/arithmetic-king-1b" tokenizer = AutoTokenizer.from_pretrained(base_id) base_model = AutoModelForCausalLM.from_pretrained(base_id) model = PeftModel.from_pretrained(base_model, adapter_id) prompt = "Solve: (12 + 3) * 2. Return XML with and ." inputs = tokenizer(prompt, return_tensors="pt") out = model.generate(**inputs, max_new_tokens=128) print(tokenizer.decode(out[0], skip_special_tokens=True)) ``` ## License Adapter usage inherits base model license and terms: - `meta-llama/Llama-3.2-1B-Instruct`