Youtu-LLM-2B MLX

MLX-optimized version of tencent/Youtu-LLM-2B for Apple Silicon.

Quick Start

pip install mlx-lm

mlx_lm.generate \
  --model mlx-community/Youtu-LLM-2B \
  --prompt "Hello, what can you do?" \
  --max-tokens 100

Model Details

  • Base Model: tencent/Youtu-LLM-2B
  • Parameters: 1.96B
  • Context: 128K tokens
  • Architecture: Dense MLA (Multi-head Latent Attention)
  • Framework: MLX (Apple Silicon optimized)

Performance (M3 Ultra)

Quant Prompt Generation Memory
bf16 118 tok/s 112 tok/s 4.7GB
4-bit 202 tok/s 205 tok/s 1.3GB

Features

  • Reasoning Mode: Uses <think> tags for Chain of Thought
  • 128K Context: Long document understanding
  • Agentic: Strong on SWE-Bench, GAIA benchmarks

Benchmarks (vs Qwen3-4B)

Benchmark Youtu-LLM-2B Qwen3-4B
HumanEval 95.9% 95.4%
SWE-Bench 17.7% 5.7%
GAIA 33.9% 25.5%

Other Quantizations

Technical Note

Converted using deepseek_v2 architecture mapping (compatible MLA implementation).

License

See original model license.

Downloads last month
15
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mlx-community/Youtu-LLM-2B

Finetuned
(1)
this model