RobinMillford
/

phi-4-math-reasoning-lora

Text Generation

Model card Files Files and versions

phi-4-math-reasoning-lora / README.md

RobinMillford's picture

Updated Readme.md

8da1f6a verified 5 months ago

|

history blame contribute delete

2.94 kB

	---
	base_model: unsloth/phi-4-unsloth-bnb-4bit
	tags:
	- text-generation
	- transformers
	- unsloth
	- llama
	- trl
	license: apache-2.0
	language:
	- en
	---

	# 🧮 Phi-4 Math Reasoning Model (LoRA Finetuned)

	## 📌 Model Overview
	This model is a LoRA fine-tuned version of [unsloth/phi-4-unsloth-bnb-4bit](https://huggingface.co/unsloth/phi-4-unsloth-bnb-4bit).
	It has been fine-tuned specifically for math reasoning tasks, capable of solving step-by-step arithmetic, algebra, and logic problems.

	The base model is Phi-4, a 14B-parameter LLaMA variant optimized with [Unsloth](https://github.com/unslothai/unsloth) for 2x faster training using Hugging Face’s [TRL](https://huggingface.co/docs/trl) library.
	This version uses bnb-4bit quantization, making it memory efficient and suitable for single-GPU setups such as Tesla T4 (16GB) or consumer GPUs.

	---

	## ⚡ Key Features
	- 🧠 Fine-tuned for math reasoning and step-by-step solutions
	- ⚡ Efficient: 4-bit quantized, runs on a single GPU or even CPU (slower)
	- 🚀 Trained with Unsloth + TRL for fast and memory-efficient fine-tuning
	- 📚 Based on Phi-4 (14B LLaMA model)

	---

	## 📥 Installation
	Ensure you have the latest versions of the required libraries:
	```bash
	pip install unsloth transformers accelerate bitsandbytes
	```



	🖥️ Usage (Colab / Local GPU)

	```bash
	import torch
	from unsloth import FastLanguageModel
	from transformers import TextStreamer

	# Load the LoRA fine-tuned model
	model_name = "RobinMillford/phi-4-math-reasoning-lora"
	model, tokenizer = FastLanguageModel.from_pretrained(
	model_name=model_name,
	max_seq_length=2048,
	dtype=torch.float16, # fp16 recommended for GPU
	load_in_4bit=True, # load in 4-bit quantized mode
	device_map="auto" # automatically place layers on GPU/CPU
	)

	# Prepare for inference
	FastLanguageModel.for_inference(model)

	# Example: Generate a step-by-step solution
	streamer = TextStreamer(tokenizer)
	inputs = tokenizer(
	"Solve step by step: Q: What is 24 * 17 ? A:",
	return_tensors="pt"
	).to("cuda")

	_ = model.generate(**inputs, streamer=streamer, max_new_tokens=500)
	```



	📊 Example Output

	Prompt:

	Solve step by step: Q: What is 45 + 67 ?



	Response:

	Step 1: Add the ones digits: 5 + 7 = 12. Write down 2 and carry over 1.
	Step 2: Add the tens digits plus carry: 4 + 6 + 1 = 11.
	Step 3: Combine the results: 112.
	Answer: 112




	⚠️ Disclaimer


	This model is intended for research and educational purposes only.

	It may not be fully accurate for complex math reasoning tasks. Always verify critical calculations independently.



	## ❤️ Made With
	- [Unsloth](https://github.com/unslothai/unsloth)
	- [Transformers](https://huggingface.co/docs/transformers)
	- [TRL](https://huggingface.co/docs/trl)
	- [Kaggle Notebook](https://www.kaggle.com/code/yaminh/finetuning-a-llm-for-math-reasoning-sbs)

	---