File size: 4,556 Bytes

8d2a9cd
 
0055af1
8d2a9cd
0055af1
8d2a9cd
 
 
 
 
 
0055af1
 
8d2a9cd
b9d3855
 
8d2a9cd
 
0055af1
8d2a9cd
0055af1
8d2a9cd
0055af1
 
 
 
 
 
 
 
 
8d2a9cd
 
0055af1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8d2a9cd
 
0055af1
 
 
 
 
8d2a9cd
 
 
0055af1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8d2a9cd
0055af1
8d2a9cd
0055af1
8d2a9cd
0055af1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8d2a9cd
 
 
 
 
0055af1
8d2a9cd
 
0055af1
8d2a9cd
 
 
 
 
0055af1
 
 
 
 
 
8d2a9cd
0055af1

---
base_model: unsloth/LFM2.5-1.2B-Instruct
library_name: peft
model_name: lfm-finetuned
pipeline_tag: text-generation
tags:
- generated_from_trainer
- hf_jobs
- trl
- unsloth
- sft
- lora
- peft
licence: license
datasets:
- mlabonne/FineTome-100k
---

# lfm-finetuned

A LoRA adapter fine-tuned on top of [`unsloth/LFM2.5-1.2B-Instruct`](https://huggingface.co/unsloth/LFM2.5-1.2B-Instruct), trained with [TRL](https://github.com/huggingface/trl)'s SFT trainer on [`mlabonne/FineTome-100k`](https://huggingface.co/datasets/mlabonne/FineTome-100k).

> **Note:** this repo contains the **LoRA adapter only** (`adapter_model.safetensors` + `adapter_config.json`), not a full standalone model. Load it on top of the base model with `peft`, or merge it once and use it as a regular causal LM (see below).

## Install

```bash
pip install -U torch transformers peft accelerate
```

## Quick start — load the adapter on top of the base model

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel

base_id    = "unsloth/LFM2.5-1.2B-Instruct"
adapter_id = "MenemAI/lfm-finetuned"

tokenizer = AutoTokenizer.from_pretrained(adapter_id, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(
    base_id,
    torch_dtype="auto",
    device_map="cuda",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()

generator = pipeline("text-generation", model=model, tokenizer=tokenizer)

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
output = generator(
    [{"role": "user", "content": question}],
    max_new_tokens=512,
    return_full_text=False,
)[0]
print(output["generated_text"])
```

CPU-only? Drop `device_map="cuda"` and pass `device_map="cpu"` (or `"auto"`); generation will be slow but works.

## Run on Hugging Face Jobs

The script below works as-is with `hf jobs uv run`. The PEP 723 header makes `uv` install the right deps inside the job.

```python
# /// script
# requires-python = ">=3.10"
# dependencies = [
#   "torch",
#   "transformers",
#   "peft",
#   "accelerate",
# ]
# ///
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel

base_id    = "unsloth/LFM2.5-1.2B-Instruct"
adapter_id = "MenemAI/lfm-finetuned"

tokenizer = AutoTokenizer.from_pretrained(adapter_id, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(
    base_id, torch_dtype="auto", device_map="cuda", trust_remote_code=True
)
model = PeftModel.from_pretrained(base, adapter_id).eval()

generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
print(generator(
    [{"role": "user", "content": "Hello!"}],
    max_new_tokens=512,
    return_full_text=False,
)[0]["generated_text"])
```

```bash
hf jobs uv run --flavor a10g-small ./test.py
```

## Optional — merge the adapter into the base model

If you want a single self-contained checkpoint (faster cold start, no `peft` at inference time):

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained(
    "unsloth/LFM2.5-1.2B-Instruct", torch_dtype="auto", trust_remote_code=True
)
merged = PeftModel.from_pretrained(base, "MenemAI/lfm-finetuned").merge_and_unload()
merged.save_pretrained("lfm-merged")
AutoTokenizer.from_pretrained("MenemAI/lfm-finetuned", trust_remote_code=True).save_pretrained("lfm-merged")
```

After merging you can load it with a plain `pipeline("text-generation", model="./lfm-merged", device="cuda")` or push it to a new repo with `hf upload <your-user>/lfm-merged ./lfm-merged`.

## Training

- **Method:** SFT via TRL
- **Base model:** `unsloth/LFM2.5-1.2B-Instruct`
- **Dataset:** `mlabonne/FineTome-100k`
- **Acceleration:** Unsloth
- **Infrastructure:** Hugging Face Jobs

### Framework versions

- TRL: 0.22.2
- Transformers: 4.57.3
- PyTorch: 2.10.0
- Datasets: 4.3.0
- Tokenizers: 0.22.2
- PEFT: required at inference time when loading the adapter directly

## Citations

```bibtex
@misc{vonwerra2022trl,
  title        = {{TRL: Transformer Reinforcement Learning}},
  author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
  year         = 2020,
  journal      = {GitHub repository},
  publisher    = {GitHub},
  howpublished = {\url{https://github.com/huggingface/trl}}
}
```