Improve README: working PEFT quick-start, jobs example, merge instructions

0055af1 verified 11 days ago

4.56 kB

	---
	base_model: unsloth/LFM2.5-1.2B-Instruct
	library_name: peft
	model_name: lfm-finetuned
	pipeline_tag: text-generation
	tags:
	- generated_from_trainer
	- hf_jobs
	- trl
	- unsloth
	- sft
	- lora
	- peft
	licence: license
	datasets:
	- mlabonne/FineTome-100k
	---

	# lfm-finetuned

	A LoRA adapter fine-tuned on top of [`unsloth/LFM2.5-1.2B-Instruct`](https://huggingface.co/unsloth/LFM2.5-1.2B-Instruct), trained with [TRL](https://github.com/huggingface/trl)'s SFT trainer on [`mlabonne/FineTome-100k`](https://huggingface.co/datasets/mlabonne/FineTome-100k).

	> Note: this repo contains the LoRA adapter only (`adapter_model.safetensors` + `adapter_config.json`), not a full standalone model. Load it on top of the base model with `peft`, or merge it once and use it as a regular causal LM (see below).

	## Install

	```bash
	pip install -U torch transformers peft accelerate
	```

	## Quick start — load the adapter on top of the base model

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
	from peft import PeftModel

	base_id = "unsloth/LFM2.5-1.2B-Instruct"
	adapter_id = "MenemAI/lfm-finetuned"

	tokenizer = AutoTokenizer.from_pretrained(adapter_id, trust_remote_code=True)
	base = AutoModelForCausalLM.from_pretrained(
	base_id,
	torch_dtype="auto",
	device_map="cuda",
	trust_remote_code=True,
	)
	model = PeftModel.from_pretrained(base, adapter_id)
	model.eval()

	generator = pipeline("text-generation", model=model, tokenizer=tokenizer)

	question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
	output = generator(
	[{"role": "user", "content": question}],
	max_new_tokens=512,
	return_full_text=False,
	)[0]
	print(output["generated_text"])
	```

	CPU-only? Drop `device_map="cuda"` and pass `device_map="cpu"` (or `"auto"`); generation will be slow but works.

	## Run on Hugging Face Jobs

	The script below works as-is with `hf jobs uv run`. The PEP 723 header makes `uv` install the right deps inside the job.

	```python
	# /// script
	# requires-python = ">=3.10"
	# dependencies = [
	# "torch",
	# "transformers",
	# "peft",
	# "accelerate",
	# ]
	# ///
	from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
	from peft import PeftModel

	base_id = "unsloth/LFM2.5-1.2B-Instruct"
	adapter_id = "MenemAI/lfm-finetuned"

	tokenizer = AutoTokenizer.from_pretrained(adapter_id, trust_remote_code=True)
	base = AutoModelForCausalLM.from_pretrained(
	base_id, torch_dtype="auto", device_map="cuda", trust_remote_code=True
	)
	model = PeftModel.from_pretrained(base, adapter_id).eval()

	generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
	print(generator(
	[{"role": "user", "content": "Hello!"}],
	max_new_tokens=512,
	return_full_text=False,
	)[0]["generated_text"])
	```

	```bash
	hf jobs uv run --flavor a10g-small ./test.py
	```

	## Optional — merge the adapter into the base model

	If you want a single self-contained checkpoint (faster cold start, no `peft` at inference time):

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel

	base = AutoModelForCausalLM.from_pretrained(
	"unsloth/LFM2.5-1.2B-Instruct", torch_dtype="auto", trust_remote_code=True
	)
	merged = PeftModel.from_pretrained(base, "MenemAI/lfm-finetuned").merge_and_unload()
	merged.save_pretrained("lfm-merged")
	AutoTokenizer.from_pretrained("MenemAI/lfm-finetuned", trust_remote_code=True).save_pretrained("lfm-merged")
	```

	After merging you can load it with a plain `pipeline("text-generation", model="./lfm-merged", device="cuda")` or push it to a new repo with `hf upload <your-user>/lfm-merged ./lfm-merged`.

	## Training

	- Method: SFT via TRL
	- Base model: `unsloth/LFM2.5-1.2B-Instruct`
	- Dataset: `mlabonne/FineTome-100k`
	- Acceleration: Unsloth
	- Infrastructure: Hugging Face Jobs

	### Framework versions

	- TRL: 0.22.2
	- Transformers: 4.57.3
	- PyTorch: 2.10.0
	- Datasets: 4.3.0
	- Tokenizers: 0.22.2
	- PEFT: required at inference time when loading the adapter directly

	## Citations

	```bibtex
	@misc{vonwerra2022trl,
	title = {{TRL: Transformer Reinforcement Learning}},
	author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
	year = 2020,
	journal = {GitHub repository},
	publisher = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
	}
	```