mlx-community
/

Youtu-LLM-2B

Text Generation

Model card Files Files and versions

Youtu-LLM-2B MLX

MLX-optimized version of tencent/Youtu-LLM-2B for Apple Silicon.

Quick Start

pip install mlx-lm

mlx_lm.generate \
  --model mlx-community/Youtu-LLM-2B \
  --prompt "Hello, what can you do?" \
  --max-tokens 100

Model Details

Base Model: tencent/Youtu-LLM-2B
Parameters: 1.96B
Context: 128K tokens
Architecture: Dense MLA (Multi-head Latent Attention)
Framework: MLX (Apple Silicon optimized)

Performance (M3 Ultra)

Quant	Prompt	Generation	Memory
bf16	118 tok/s	112 tok/s	4.7GB
4-bit	202 tok/s	205 tok/s	1.3GB

Features

Reasoning Mode: Uses <think> tags for Chain of Thought
128K Context: Long document understanding
Agentic: Strong on SWE-Bench, GAIA benchmarks

Benchmarks (vs Qwen3-4B)

Benchmark	Youtu-LLM-2B	Qwen3-4B
HumanEval	95.9%	95.4%
SWE-Bench	17.7%	5.7%
GAIA	33.9%	25.5%

Other Quantizations

Full precision (4.4GB)
4-bit (1.2GB)

Technical Note

Converted using deepseek_v2 architecture mapping (compatible MLA implementation).

License

See original model license.

Downloads last month: 19

Safetensors

Model size

2B params

Tensor type

BF16

·

MLX

Hardware compatibility

Log In to add your hardware

Quantized

Model tree for mlx-community/Youtu-LLM-2B

Base model

tencent/Youtu-LLM-2B-Base

Finetuned

tencent/Youtu-LLM-2B

Finetuned

(6)

this model