harshbhatt7585
/

arithmetic-king-1b

Text Generation

Model card Files Files and versions

arithmetic-king-1b / README.md

harshbhatt7585's picture

Update README.md

53dbaba verified 7 days ago

|

history blame contribute delete

1.9 kB

	---
	base_model: meta-llama/Llama-3.2-1B-Instruct
	library_name: peft
	pipeline_tag: text-generation
	tags:
	- grpo
	- lora
	- trl
	- arithmetic
	- reasoning
	language:
	- en
	---

	# Arithmetic King 1B

	Repository id: `harshbhatt7585/arithmetic-king-1b`.

	This model is a PEFT LoRA adapter trained with TRL GRPO on synthetic arithmetic episodes. It is tuned to answer in XML format:

	- `<reasoning>...</reasoning>`
	- `<answer>...</answer>`

	## Artifact Type

	This repo contains adapter weights only (not full base model weights). Use with base model:

	- `meta-llama/Llama-3.2-1B-Instruct`

	## Training Configuration

	- Trainer: TRL `GRPOTrainer`
	- Fine-tuning method: LoRA (PEFT)
	- Environment: arithmetic reasoning episodes
	- Reward: correctness reward + XML-format bonus
	- Output style target: short reasoning plus final integer answer


	## Intended Use

	- Arithmetic-reasoning RLVR experiments
	- GRPO/LoRA workflow demonstrations
	- Adapter-centric fine-tuning studies on small instruct models

	## Limitations

	- Trained on synthetic arithmetic prompts only
	- Limited transfer to broader reasoning/math tasks
	- May produce malformed XML or incorrect answers
	- Not suitable for high-stakes use

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from peft import PeftModel

	base_id = "meta-llama/Llama-3.2-1B-Instruct"
	adapter_id = "harshbhatt7585/arithmetic-king-1b"

	tokenizer = AutoTokenizer.from_pretrained(base_id)
	base_model = AutoModelForCausalLM.from_pretrained(base_id)
	model = PeftModel.from_pretrained(base_model, adapter_id)

	prompt = "Solve: (12 + 3) * 2. Return XML with <reasoning> and <answer>."
	inputs = tokenizer(prompt, return_tensors="pt")
	out = model.generate(**inputs, max_new_tokens=128)
	print(tokenizer.decode(out[0], skip_special_tokens=True))
	```

	## License

	Adapter usage inherits base model license and terms:

	- `meta-llama/Llama-3.2-1B-Instruct`