💧 LFM2
Collection
LFM2 is a new generation of hybrid models, designed for on-device deployment. • 31 items • Updated
• 142
MLX export of LFM2-24B-A2B for Apple Silicon inference.
| Property | Value |
|---|---|
| Total Parameters | 24B |
| Active Parameters | ~2B per token |
| Architecture | Mixture of Experts (64 experts, top-4) |
| Layers | 40 (30 conv + 10 full attention) |
| Precision | 8-bit |
| Group Size | 64 |
| Size | 23.6 GB |
| Context Length | 128K |
| Parameter | Value |
|---|---|
| temperature | 0.1 |
| top_k | 50 |
| top_p | 0.1 |
| repetition_penalty | 1.05 |
| max_tokens | 512 |
pip install mlx-lm
from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler, make_logits_processors
model, tokenizer = load("LiquidAI/LFM2-24B-A2B-MLX-8bit")
prompt = "What is the capital of France?"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
sampler = make_sampler(temp=0.1, top_k=50, top_p=0.1)
logits_processors = make_logits_processors(repetition_penalty=1.05)
response = generate(
model,
tokenizer,
prompt=prompt,
max_tokens=512,
sampler=sampler,
logits_processors=logits_processors,
verbose=True,
)
This model is released under the LFM 1.0 License.
8-bit
Base model
LiquidAI/LFM2-24B-A2B