|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: meta-llama/Llama-3.2-3B |
|
|
library_name: mlx |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- quantllm |
|
|
- mlx |
|
|
- mlx-lm |
|
|
- apple-silicon |
|
|
- 4bit |
|
|
- transformers |
|
|
--- |
|
|
|
|
|
# Llama-3.2-3B-4bit-mlx |
|
|
   |
|
|
|
|
|
|
|
|
## Description |
|
|
|
|
|
This is **meta-llama/Llama-3.2-3B** converted to MLX format optimized for Apple Silicon (M1/M2/M3) Macs. |
|
|
|
|
|
- **Base Model**: [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) |
|
|
- **Format**: MLX |
|
|
- **Quantization**: 4bit |
|
|
- **Created with**: [QuantLLM](https://github.com/codewithdark-git/QuantLLM) |
|
|
|
|
|
|
|
|
## Usage |
|
|
|
|
|
### Generate text with mlx-lm |
|
|
|
|
|
```python |
|
|
from mlx_lm import load, generate |
|
|
|
|
|
model, tokenizer = load("codewithdark/Llama-3.2-3B-4bit-mlx") |
|
|
|
|
|
prompt = "Write a story about Einstein" |
|
|
messages = [{"role": "user", "content": prompt}] |
|
|
prompt = tokenizer.apply_chat_template( |
|
|
messages, add_generation_prompt=True |
|
|
) |
|
|
|
|
|
text = generate(model, tokenizer, prompt=prompt, verbose=True) |
|
|
``` |
|
|
|
|
|
### With streaming |
|
|
|
|
|
```python |
|
|
from mlx_lm import load, stream_generate |
|
|
|
|
|
model, tokenizer = load("codewithdark/Llama-3.2-3B-4bit-mlx") |
|
|
|
|
|
prompt = "Explain quantum computing" |
|
|
messages = [{"role": "user", "content": prompt}] |
|
|
prompt = tokenizer.apply_chat_template( |
|
|
messages, add_generation_prompt=True |
|
|
) |
|
|
|
|
|
for token in stream_generate(model, tokenizer, prompt=prompt, max_tokens=500): |
|
|
print(token, end="", flush=True) |
|
|
``` |
|
|
|
|
|
### Command Line |
|
|
|
|
|
```bash |
|
|
# Install mlx-lm |
|
|
pip install mlx-lm |
|
|
|
|
|
# Generate text |
|
|
python -m mlx_lm.generate --model codewithdark/Llama-3.2-3B-4bit-mlx --prompt "Hello!" |
|
|
|
|
|
# Chat mode |
|
|
python -m mlx_lm.chat --model codewithdark/Llama-3.2-3B-4bit-mlx |
|
|
``` |
|
|
|
|
|
## Requirements |
|
|
|
|
|
- Apple Silicon Mac (M1/M2/M3/M4) |
|
|
- macOS 13.0 or later |
|
|
- Python 3.10+ |
|
|
- mlx-lm: `pip install mlx-lm` |
|
|
|
|
|
|
|
|
## Model Details |
|
|
|
|
|
| Property | Value | |
|
|
|----------|-------| |
|
|
| Base Model | [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) | |
|
|
| Format | MLX | |
|
|
| Quantization | 4bit | |
|
|
| License | apache-2.0 | |
|
|
| Created | 2025-12-19 | |
|
|
|
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## About QuantLLM |
|
|
|
|
|
This model was converted using [QuantLLM](https://github.com/codewithdark-git/QuantLLM) - |
|
|
the ultra-fast LLM quantization and export library. |
|
|
|
|
|
```python |
|
|
from quantllm import turbo |
|
|
|
|
|
# Load and quantize any model |
|
|
model = turbo("meta-llama/Llama-3.2-3B") |
|
|
|
|
|
# Export to any format |
|
|
model.export("mlx", quantization="4bit") |
|
|
``` |
|
|
|
|
|
⭐ Star us on [GitHub](https://github.com/codewithdark-git/QuantLLM)! |