Instructions to use reaperdoesntknow/LFM2.5-1.2B-Distilled-SFT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use reaperdoesntknow/LFM2.5-1.2B-Distilled-SFT with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="reaperdoesntknow/LFM2.5-1.2B-Distilled-SFT")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("reaperdoesntknow/LFM2.5-1.2B-Distilled-SFT")
model = AutoModelForCausalLM.from_pretrained("reaperdoesntknow/LFM2.5-1.2B-Distilled-SFT")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use reaperdoesntknow/LFM2.5-1.2B-Distilled-SFT with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "reaperdoesntknow/LFM2.5-1.2B-Distilled-SFT"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "reaperdoesntknow/LFM2.5-1.2B-Distilled-SFT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/reaperdoesntknow/LFM2.5-1.2B-Distilled-SFT

SGLang

How to use reaperdoesntknow/LFM2.5-1.2B-Distilled-SFT with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "reaperdoesntknow/LFM2.5-1.2B-Distilled-SFT" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "reaperdoesntknow/LFM2.5-1.2B-Distilled-SFT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "reaperdoesntknow/LFM2.5-1.2B-Distilled-SFT" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "reaperdoesntknow/LFM2.5-1.2B-Distilled-SFT",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use reaperdoesntknow/LFM2.5-1.2B-Distilled-SFT with Docker Model Runner:
```
docker model run hf.co/reaperdoesntknow/LFM2.5-1.2B-Distilled-SFT
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

LFM2.5-1.2B-Distilled-SFT

A 1.2B hybrid model (SSM + attention) built in two stages: knowledge distillation from a 24B MoE hybrid teacher on STEM chain-of-thought data, then supervised fine-tuning on logical inference. The first proof-weighted distillation + SFT pipeline on a non-transformer architecture.

Liquid Foundation Models run at 239 tok/s on AMD CPU and fit under 1GB of RAM. This model adds structured STEM reasoning and formal logical inference to that efficiency substrate.

"Structure beats scale, collaboration beats hierarchy, observation beats theory." — Convergent Intelligence LLC: Research Division

Training Pipeline

Stage 1: Knowledge Distillation (STEM Reasoning Backbone)

LFM2.5-1.2B distilled from LFM2-24B-A2B — a 24B MoE hybrid (SSM + attention) with only 2B active parameters per token. Teacher and student share the LFM hybrid architecture, so the KL divergence transfers reasoning patterns between architecturally compatible models.

Data: 2,802 STEM CoT samples from 5 domains:

Domain	Samples
Linear Algebra	667
Differential Equations	636
Electromagnetism	580
Mathematics	576
Classical Mechanics	343

All from 0xZee. Focused subset — core mathematical reasoning domains that share the most structural overlap with logical inference.

Loss function:

Proof-Weighted Cross-Entropy (55%) — 2.5x → 1.5x on derivation tokens
Knowledge Distillation KL Divergence (45%) — T=2.0, scaled by T²

Training format:

Solve the following problem carefully and show a rigorous derivation.

Problem:
{question}

Proof:
{CoT}

Final Answer:
{response}

Stage 1 hyperparameters:

Parameter	Value
Epochs	1
Effective batch size	8
Learning rate	1.5e-5 → 1e-6 (cosine)
Temperature	2.0
Proof weight	2.5 → 1.5
Precision	bf16

Stage 2: Logical Inference SFT

Fine-tuned on KK04/LogicInference_OA — a reproduction of the LogicInference dataset (Santiago Ontañón, Google Research) formatted for instruction-following. IID split, LOGICINFERENCEe format (inference first, answer at end). 5,491 unique inference problems extended to ~54,607 instruction-response pairs.

Why logical inference on a hybrid architecture? SSM components excel at sequential state propagation — exactly what formal logical inference requires. Each premise updates a logical state, and the conclusion follows from the final state. The hybrid architecture's inductive bias naturally aligns with propositional logic chains. SFT activates this alignment explicitly.

Training format:

### Instruction:
{instruction}

### Response:
{response}

Stage 2 hyperparameters:

Parameter	Value
Epochs	1
Effective batch size	8
Learning rate	5e-6 (conservative to preserve backbone)
Gradient checkpointing	Enabled
Precision	bf16

Model Details

Attribute	Value
Architecture	LFM2.5 (hybrid SSM + attention)
Parameters	1.2B
Base model	liquid/LFM2.5-1.2B-Instruct
Teacher model	liquid/LFM2-24B-A2B
Stage 1 data	2,802 STEM CoT samples (5 datasets)
Stage 2 data	KK04/LogicInference_OA (~54,607 pairs)
Inference	239 tok/s AMD CPU, 82 tok/s mobile NPU, sub-1GB RAM
Context length	1024 tokens (training)
License	Apache 2.0
Developer	Reaperdoesntrun / Convergent Intelligence LLC: Research Division

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "reaperdoesntknow/LFM2.5-1.2B-Distilled-SFT"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
    device_map="auto",
)

# Logical inference (Stage 2)
prompt = """### Instruction:
Consider the following premises: If p then q. If q then r. p is true. What can we infer?

### Response:
"""

# STEM derivation (Stage 1 still works)
prompt_stem = """Solve the following problem carefully and show a rigorous derivation.

Problem:
Solve the system of linear equations: 2x + y = 5, x - y = 1.

Proof:
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

GGUF

Quantized versions at reaperdoesntknow/LFM2.5-1.2B-Distilled-SFT-GGUF.

Prompt Formats

STEM derivation (Stage 1):

Solve the following problem carefully and show a rigorous derivation.

Problem:
[Your problem]

Proof:

Logical inference (Stage 2):

### Instruction:
[Your question or logical inference problem]

### Response:

Intended Uses

Good for: On-device logical inference and STEM reasoning, mobile/edge/IoT deployment, formal reasoning tasks, educational tutoring, embedded inference pipelines, anywhere you need structured reasoning under 1GB.

Not for: Formal proof verification, safety-critical systems, complex multi-step proofs beyond model capacity, or long-context tasks beyond 1024 tokens.

Limitations

1.2B hybrid model. The SSM components give excellent inference speed but the model has hard capacity limits. Trained on 2,802 STEM samples (smaller than the 6,122 used for Qwen3 variants). Logical inference strongest on propositional logic patterns in the training data. Complex nested quantifiers may exceed capacity. Always verify.

Related Models

Model	Description
LFM2.5-1.2B-Distilled	Stage 1 only — pure STEM backbone
LFM2.5-1.2B-Distilled-SFT-GGUF	This model quantized for edge deployment
Qwen3-1.7B-Coder-Distilled-SFT	Transformer variant, Coder teacher + logical inference
Qwen3-1.7B-Distilled-30B-A3B-SFT	Transformer variant, Instruct teacher + legal SFT

Discrepancy Calculus Foundation

This model is part of the Convergent Intelligence LLC: Research Division portfolio. All models in this portfolio are developed under the Discrepancy Calculus (DISC) framework — a measure-theoretic approach to understanding and controlling the gap between what a model should produce and what it actually produces.

DISC treats training singularities (loss plateaus, mode collapse, catastrophic forgetting) not as failures to be smoothed over, but as structural signals that reveal the geometry of the learning problem. Key concepts:

Discrepancy Operator (D): Measures the gap between expected and observed behavior at each training step
Jump Sets: Boundaries where model behavior changes discontinuously — these are features, not bugs
Ghost Imprinting: Teacher knowledge that transfers to student models through weight-space topology rather than explicit distillation signal

For the full mathematical treatment, see Discrepancy Calculus: Foundations and Core Theory (DOI: 10.57967/hf/8194).

Citation chain: Structure Over Scale (DOI: 10.57967/hf/8165) → Three Teachers to Dual Cognition (DOI: 10.57967/hf/8184) → Discrepancy Calculus (DOI: 10.57967/hf/8194)

Citation

@misc{colca2026lfmsft,
  title={Hybrid SSM/Attention Distillation + Logical Inference: LFM2-24B to LFM2.5-1.2B},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/reaperdoesntknow/LFM2.5-1.2B-Distilled-SFT},
  note={Convergent Intelligence LLC: Research Division}
}

References

Santiago Ontañón. "LogicInference: A Large-Scale Dataset for Logical Inference." ICLR 2023 Workshop on Mathematical and Empirical Understanding of Foundation Models. Paper | Code

From the Convergent Intelligence Portfolio

DistilQwen Collection — Our only BF16 series. Proof-weighted distillation from Qwen3-30B-A3B → 1.7B and 0.6B on H100. Three teacher variants (Instruct, Thinking, Coder), nine models, 2,788 combined downloads. The rest of the portfolio proves structure beats scale on CPU. This collection shows what happens when you give the methodology real hardware.

Top model: Qwen3-1.7B-Coder-Distilled-SFT — 508 downloads

Full methodology: Structure Over Scale (DOI: 10.57967/hf/8165)

Convergent Intelligence LLC: Research Division

Convergent Intelligence LLC: Research Division "Where classical analysis fails to see, we begin."

Convergent Intelligence Portfolio

Part of the Liquid Foundation Model Series by Convergent Intelligence LLC: Research Division

Top Models from Our Lab

Model	Downloads
Qwen3-1.7B-Thinking-Distil	501
Qwen3-1.7B-Coder-Distilled-SFT	302
Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT-GGUF	203
Qwen3-1.7B-Coder-Distilled-SFT-GGUF	194
Qwen3-1.7B-Distilled-30B-A3B-SFT-GGUF	175