---
license: other
license_name: tencent-youtu
tags:
- mlx
- apple-silicon
- tencent
- youtu
- reasoning
- mla
base_model: tencent/Youtu-LLM-2B
library_name: mlx-lm
pipeline_tag: text-generation
---

# Youtu-LLM-2B MLX

MLX-optimized version of [tencent/Youtu-LLM-2B](https://huggingface.co/tencent/Youtu-LLM-2B) for Apple Silicon.

## Quick Start

```bash
pip install mlx-lm

mlx_lm.generate \
  --model mlx-community/Youtu-LLM-2B \
  --prompt "Hello, what can you do?" \
  --max-tokens 100
```

## Model Details

- **Base Model:** tencent/Youtu-LLM-2B
- **Parameters:** 1.96B
- **Context:** 128K tokens
- **Architecture:** Dense MLA (Multi-head Latent Attention)
- **Framework:** MLX (Apple Silicon optimized)

## Performance (M3 Ultra)

| Quant | Prompt | Generation | Memory |
|-------|--------|------------|--------|
| bf16 | 118 tok/s | 112 tok/s | 4.7GB |
| 4-bit | 202 tok/s | 205 tok/s | 1.3GB |

## Features

- **Reasoning Mode:** Uses `<think>` tags for Chain of Thought
- **128K Context:** Long document understanding
- **Agentic:** Strong on SWE-Bench, GAIA benchmarks

## Benchmarks (vs Qwen3-4B)

| Benchmark | Youtu-LLM-2B | Qwen3-4B |
|-----------|--------------|----------|
| HumanEval | **95.9%** | 95.4% |
| SWE-Bench | **17.7%** | 5.7% |
| GAIA | **33.9%** | 25.5% |

## Other Quantizations

- [Full precision](https://huggingface.co/mlx-community/Youtu-LLM-2B) (4.4GB)
- [4-bit](https://huggingface.co/mlx-community/Youtu-LLM-2B-4bit) (1.2GB)

## Technical Note

Converted using deepseek_v2 architecture mapping (compatible MLA implementation).

## License

See [original model license](https://huggingface.co/tencent/Youtu-LLM-2B/blob/main/LICENSE.txt).