How to use from
Unsloth Studio
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MenemAI/lfm-finetuned to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MenemAI/lfm-finetuned to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for MenemAI/lfm-finetuned to start chatting
Load model with FastModel
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="MenemAI/lfm-finetuned",
    max_seq_length=2048,
)
Quick Links

lfm-finetuned

A LoRA adapter fine-tuned on top of unsloth/LFM2.5-1.2B-Instruct, trained with TRL's SFT trainer on mlabonne/FineTome-100k.

Note: this repo contains the LoRA adapter only (adapter_model.safetensors + adapter_config.json), not a full standalone model. Load it on top of the base model with peft, or merge it once and use it as a regular causal LM (see below).

Install

pip install -U torch transformers peft accelerate

Quick start — load the adapter on top of the base model

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel

base_id    = "unsloth/LFM2.5-1.2B-Instruct"
adapter_id = "MenemAI/lfm-finetuned"

tokenizer = AutoTokenizer.from_pretrained(adapter_id, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(
    base_id,
    torch_dtype="auto",
    device_map="cuda",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()

generator = pipeline("text-generation", model=model, tokenizer=tokenizer)

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
output = generator(
    [{"role": "user", "content": question}],
    max_new_tokens=512,
    return_full_text=False,
)[0]
print(output["generated_text"])

CPU-only? Drop device_map="cuda" and pass device_map="cpu" (or "auto"); generation will be slow but works.

Run on Hugging Face Jobs

The script below works as-is with hf jobs uv run. The PEP 723 header makes uv install the right deps inside the job.

# /// script
# requires-python = ">=3.10"
# dependencies = [
#   "torch",
#   "transformers",
#   "peft",
#   "accelerate",
# ]
# ///
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel

base_id    = "unsloth/LFM2.5-1.2B-Instruct"
adapter_id = "MenemAI/lfm-finetuned"

tokenizer = AutoTokenizer.from_pretrained(adapter_id, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(
    base_id, torch_dtype="auto", device_map="cuda", trust_remote_code=True
)
model = PeftModel.from_pretrained(base, adapter_id).eval()

generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
print(generator(
    [{"role": "user", "content": "Hello!"}],
    max_new_tokens=512,
    return_full_text=False,
)[0]["generated_text"])
hf jobs uv run --flavor a10g-small ./test.py

Optional — merge the adapter into the base model

If you want a single self-contained checkpoint (faster cold start, no peft at inference time):

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained(
    "unsloth/LFM2.5-1.2B-Instruct", torch_dtype="auto", trust_remote_code=True
)
merged = PeftModel.from_pretrained(base, "MenemAI/lfm-finetuned").merge_and_unload()
merged.save_pretrained("lfm-merged")
AutoTokenizer.from_pretrained("MenemAI/lfm-finetuned", trust_remote_code=True).save_pretrained("lfm-merged")

After merging you can load it with a plain pipeline("text-generation", model="./lfm-merged", device="cuda") or push it to a new repo with hf upload <your-user>/lfm-merged ./lfm-merged.

Training

  • Method: SFT via TRL
  • Base model: unsloth/LFM2.5-1.2B-Instruct
  • Dataset: mlabonne/FineTome-100k
  • Acceleration: Unsloth
  • Infrastructure: Hugging Face Jobs

Framework versions

  • TRL: 0.22.2
  • Transformers: 4.57.3
  • PyTorch: 2.10.0
  • Datasets: 4.3.0
  • Tokenizers: 0.22.2
  • PEFT: required at inference time when loading the adapter directly

Citations

@misc{vonwerra2022trl,
  title        = {{TRL: Transformer Reinforcement Learning}},
  author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
  year         = 2020,
  journal      = {GitHub repository},
  publisher    = {GitHub},
  howpublished = {\url{https://github.com/huggingface/trl}}
}
Downloads last month
82
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MenemAI/lfm-finetuned

Adapter
(4)
this model

Dataset used to train MenemAI/lfm-finetuned