File size: 4,556 Bytes
8d2a9cd
 
0055af1
8d2a9cd
0055af1
8d2a9cd
 
 
 
 
 
0055af1
 
8d2a9cd
b9d3855
 
8d2a9cd
 
0055af1
8d2a9cd
0055af1
8d2a9cd
0055af1
 
 
 
 
 
 
 
 
8d2a9cd
 
0055af1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8d2a9cd
 
0055af1
 
 
 
 
8d2a9cd
 
 
0055af1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8d2a9cd
0055af1
8d2a9cd
0055af1
8d2a9cd
0055af1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8d2a9cd
 
 
 
 
0055af1
8d2a9cd
 
0055af1
8d2a9cd
 
 
 
 
0055af1
 
 
 
 
 
8d2a9cd
0055af1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
---
base_model: unsloth/LFM2.5-1.2B-Instruct
library_name: peft
model_name: lfm-finetuned
pipeline_tag: text-generation
tags:
- generated_from_trainer
- hf_jobs
- trl
- unsloth
- sft
- lora
- peft
licence: license
datasets:
- mlabonne/FineTome-100k
---

# lfm-finetuned

A LoRA adapter fine-tuned on top of [`unsloth/LFM2.5-1.2B-Instruct`](https://huggingface.co/unsloth/LFM2.5-1.2B-Instruct), trained with [TRL](https://github.com/huggingface/trl)'s SFT trainer on [`mlabonne/FineTome-100k`](https://huggingface.co/datasets/mlabonne/FineTome-100k).

> **Note:** this repo contains the **LoRA adapter only** (`adapter_model.safetensors` + `adapter_config.json`), not a full standalone model. Load it on top of the base model with `peft`, or merge it once and use it as a regular causal LM (see below).

## Install

```bash
pip install -U torch transformers peft accelerate
```

## Quick start — load the adapter on top of the base model

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel

base_id    = "unsloth/LFM2.5-1.2B-Instruct"
adapter_id = "MenemAI/lfm-finetuned"

tokenizer = AutoTokenizer.from_pretrained(adapter_id, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(
    base_id,
    torch_dtype="auto",
    device_map="cuda",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()

generator = pipeline("text-generation", model=model, tokenizer=tokenizer)

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
output = generator(
    [{"role": "user", "content": question}],
    max_new_tokens=512,
    return_full_text=False,
)[0]
print(output["generated_text"])
```

CPU-only? Drop `device_map="cuda"` and pass `device_map="cpu"` (or `"auto"`); generation will be slow but works.

## Run on Hugging Face Jobs

The script below works as-is with `hf jobs uv run`. The PEP 723 header makes `uv` install the right deps inside the job.

```python
# /// script
# requires-python = ">=3.10"
# dependencies = [
#   "torch",
#   "transformers",
#   "peft",
#   "accelerate",
# ]
# ///
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel

base_id    = "unsloth/LFM2.5-1.2B-Instruct"
adapter_id = "MenemAI/lfm-finetuned"

tokenizer = AutoTokenizer.from_pretrained(adapter_id, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(
    base_id, torch_dtype="auto", device_map="cuda", trust_remote_code=True
)
model = PeftModel.from_pretrained(base, adapter_id).eval()

generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
print(generator(
    [{"role": "user", "content": "Hello!"}],
    max_new_tokens=512,
    return_full_text=False,
)[0]["generated_text"])
```

```bash
hf jobs uv run --flavor a10g-small ./test.py
```

## Optional — merge the adapter into the base model

If you want a single self-contained checkpoint (faster cold start, no `peft` at inference time):

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained(
    "unsloth/LFM2.5-1.2B-Instruct", torch_dtype="auto", trust_remote_code=True
)
merged = PeftModel.from_pretrained(base, "MenemAI/lfm-finetuned").merge_and_unload()
merged.save_pretrained("lfm-merged")
AutoTokenizer.from_pretrained("MenemAI/lfm-finetuned", trust_remote_code=True).save_pretrained("lfm-merged")
```

After merging you can load it with a plain `pipeline("text-generation", model="./lfm-merged", device="cuda")` or push it to a new repo with `hf upload <your-user>/lfm-merged ./lfm-merged`.

## Training

- **Method:** SFT via TRL
- **Base model:** `unsloth/LFM2.5-1.2B-Instruct`
- **Dataset:** `mlabonne/FineTome-100k`
- **Acceleration:** Unsloth
- **Infrastructure:** Hugging Face Jobs

### Framework versions

- TRL: 0.22.2
- Transformers: 4.57.3
- PyTorch: 2.10.0
- Datasets: 4.3.0
- Tokenizers: 0.22.2
- PEFT: required at inference time when loading the adapter directly

## Citations

```bibtex
@misc{vonwerra2022trl,
  title        = {{TRL: Transformer Reinforcement Learning}},
  author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
  year         = 2020,
  journal      = {GitHub repository},
  publisher    = {GitHub},
  howpublished = {\url{https://github.com/huggingface/trl}}
}
```