--- license: other license_name: tencent-youtu tags: - mlx - apple-silicon - tencent - youtu - reasoning - mla base_model: tencent/Youtu-LLM-2B library_name: mlx-lm pipeline_tag: text-generation --- # Youtu-LLM-2B MLX MLX-optimized version of [tencent/Youtu-LLM-2B](https://huggingface.co/tencent/Youtu-LLM-2B) for Apple Silicon. ## Quick Start ```bash pip install mlx-lm mlx_lm.generate \ --model mlx-community/Youtu-LLM-2B \ --prompt "Hello, what can you do?" \ --max-tokens 100 ``` ## Model Details - **Base Model:** tencent/Youtu-LLM-2B - **Parameters:** 1.96B - **Context:** 128K tokens - **Architecture:** Dense MLA (Multi-head Latent Attention) - **Framework:** MLX (Apple Silicon optimized) ## Performance (M3 Ultra) | Quant | Prompt | Generation | Memory | |-------|--------|------------|--------| | bf16 | 118 tok/s | 112 tok/s | 4.7GB | | 4-bit | 202 tok/s | 205 tok/s | 1.3GB | ## Features - **Reasoning Mode:** Uses `` tags for Chain of Thought - **128K Context:** Long document understanding - **Agentic:** Strong on SWE-Bench, GAIA benchmarks ## Benchmarks (vs Qwen3-4B) | Benchmark | Youtu-LLM-2B | Qwen3-4B | |-----------|--------------|----------| | HumanEval | **95.9%** | 95.4% | | SWE-Bench | **17.7%** | 5.7% | | GAIA | **33.9%** | 25.5% | ## Other Quantizations - [Full precision](https://huggingface.co/mlx-community/Youtu-LLM-2B) (4.4GB) - [4-bit](https://huggingface.co/mlx-community/Youtu-LLM-2B-4bit) (1.2GB) ## Technical Note Converted using deepseek_v2 architecture mapping (compatible MLA implementation). ## License See [original model license](https://huggingface.co/tencent/Youtu-LLM-2B/blob/main/LICENSE.txt).