--- base_model: unsloth/LFM2.5-1.2B-Instruct library_name: peft model_name: lfm-finetuned pipeline_tag: text-generation tags: - generated_from_trainer - hf_jobs - trl - unsloth - sft - lora - peft licence: license datasets: - mlabonne/FineTome-100k --- # lfm-finetuned A LoRA adapter fine-tuned on top of [`unsloth/LFM2.5-1.2B-Instruct`](https://huggingface.co/unsloth/LFM2.5-1.2B-Instruct), trained with [TRL](https://github.com/huggingface/trl)'s SFT trainer on [`mlabonne/FineTome-100k`](https://huggingface.co/datasets/mlabonne/FineTome-100k). > **Note:** this repo contains the **LoRA adapter only** (`adapter_model.safetensors` + `adapter_config.json`), not a full standalone model. Load it on top of the base model with `peft`, or merge it once and use it as a regular causal LM (see below). ## Install ```bash pip install -U torch transformers peft accelerate ``` ## Quick start — load the adapter on top of the base model ```python from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline from peft import PeftModel base_id = "unsloth/LFM2.5-1.2B-Instruct" adapter_id = "MenemAI/lfm-finetuned" tokenizer = AutoTokenizer.from_pretrained(adapter_id, trust_remote_code=True) base = AutoModelForCausalLM.from_pretrained( base_id, torch_dtype="auto", device_map="cuda", trust_remote_code=True, ) model = PeftModel.from_pretrained(base, adapter_id) model.eval() generator = pipeline("text-generation", model=model, tokenizer=tokenizer) question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?" output = generator( [{"role": "user", "content": question}], max_new_tokens=512, return_full_text=False, )[0] print(output["generated_text"]) ``` CPU-only? Drop `device_map="cuda"` and pass `device_map="cpu"` (or `"auto"`); generation will be slow but works. ## Run on Hugging Face Jobs The script below works as-is with `hf jobs uv run`. The PEP 723 header makes `uv` install the right deps inside the job. ```python # /// script # requires-python = ">=3.10" # dependencies = [ # "torch", # "transformers", # "peft", # "accelerate", # ] # /// from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline from peft import PeftModel base_id = "unsloth/LFM2.5-1.2B-Instruct" adapter_id = "MenemAI/lfm-finetuned" tokenizer = AutoTokenizer.from_pretrained(adapter_id, trust_remote_code=True) base = AutoModelForCausalLM.from_pretrained( base_id, torch_dtype="auto", device_map="cuda", trust_remote_code=True ) model = PeftModel.from_pretrained(base, adapter_id).eval() generator = pipeline("text-generation", model=model, tokenizer=tokenizer) print(generator( [{"role": "user", "content": "Hello!"}], max_new_tokens=512, return_full_text=False, )[0]["generated_text"]) ``` ```bash hf jobs uv run --flavor a10g-small ./test.py ``` ## Optional — merge the adapter into the base model If you want a single self-contained checkpoint (faster cold start, no `peft` at inference time): ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel base = AutoModelForCausalLM.from_pretrained( "unsloth/LFM2.5-1.2B-Instruct", torch_dtype="auto", trust_remote_code=True ) merged = PeftModel.from_pretrained(base, "MenemAI/lfm-finetuned").merge_and_unload() merged.save_pretrained("lfm-merged") AutoTokenizer.from_pretrained("MenemAI/lfm-finetuned", trust_remote_code=True).save_pretrained("lfm-merged") ``` After merging you can load it with a plain `pipeline("text-generation", model="./lfm-merged", device="cuda")` or push it to a new repo with `hf upload /lfm-merged ./lfm-merged`. ## Training - **Method:** SFT via TRL - **Base model:** `unsloth/LFM2.5-1.2B-Instruct` - **Dataset:** `mlabonne/FineTome-100k` - **Acceleration:** Unsloth - **Infrastructure:** Hugging Face Jobs ### Framework versions - TRL: 0.22.2 - Transformers: 4.57.3 - PyTorch: 2.10.0 - Datasets: 4.3.0 - Tokenizers: 0.22.2 - PEFT: required at inference time when loading the adapter directly ## Citations ```bibtex @misc{vonwerra2022trl, title = {{TRL: Transformer Reinforcement Learning}}, author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec}, year = 2020, journal = {GitHub repository}, publisher = {GitHub}, howpublished = {\url{https://github.com/huggingface/trl}} } ```