--- license: apache-2.0 base_model: meta-llama/Llama-3.2-3B library_name: mlx language: - en tags: - quantllm - mlx - mlx-lm - apple-silicon - 4bit - transformers --- # Llama-3.2-3B-4bit-mlx ![Format](https://img.shields.io/badge/format-MLX-orange) ![Quantization](https://img.shields.io/badge/quantization-4bit-blue) ![QuantLLM](https://img.shields.io/badge/made%20with-QuantLLM-green) ## Description This is **meta-llama/Llama-3.2-3B** converted to MLX format optimized for Apple Silicon (M1/M2/M3) Macs. - **Base Model**: [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) - **Format**: MLX - **Quantization**: 4bit - **Created with**: [QuantLLM](https://github.com/codewithdark-git/QuantLLM) ## Usage ### Generate text with mlx-lm ```python from mlx_lm import load, generate model, tokenizer = load("codewithdark/Llama-3.2-3B-4bit-mlx") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) ``` ### With streaming ```python from mlx_lm import load, stream_generate model, tokenizer = load("codewithdark/Llama-3.2-3B-4bit-mlx") prompt = "Explain quantum computing" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) for token in stream_generate(model, tokenizer, prompt=prompt, max_tokens=500): print(token, end="", flush=True) ``` ### Command Line ```bash # Install mlx-lm pip install mlx-lm # Generate text python -m mlx_lm.generate --model codewithdark/Llama-3.2-3B-4bit-mlx --prompt "Hello!" # Chat mode python -m mlx_lm.chat --model codewithdark/Llama-3.2-3B-4bit-mlx ``` ## Requirements - Apple Silicon Mac (M1/M2/M3/M4) - macOS 13.0 or later - Python 3.10+ - mlx-lm: `pip install mlx-lm` ## Model Details | Property | Value | |----------|-------| | Base Model | [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) | | Format | MLX | | Quantization | 4bit | | License | apache-2.0 | | Created | 2025-12-19 | --- ## About QuantLLM This model was converted using [QuantLLM](https://github.com/codewithdark-git/QuantLLM) - the ultra-fast LLM quantization and export library. ```python from quantllm import turbo # Load and quantize any model model = turbo("meta-llama/Llama-3.2-3B") # Export to any format model.export("mlx", quantization="4bit") ``` ⭐ Star us on [GitHub](https://github.com/codewithdark-git/QuantLLM)!