Instructions to use kcvmk/Llama_32_3B_4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use kcvmk/Llama_32_3B_4bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Llama_32_3B_4bit kcvmk/Llama_32_3B_4bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
Llama 3.2 3B โ MLX 4-bit Quantized
Custom MLX 4-bit quantization of meta-llama/Llama-3.2-3B-Instruct optimized for MetalRT GPU inference on Apple Silicon.
Usage
Used by RCLI with the MetalRT engine:
rcli setup # select MetalRT or Both engines
Performance (Apple M3 Max)
| Metric | Value |
|---|---|
| Parameters | 3B |
| Quantization | MLX 4-bit |
License
Model weights: Llama 3.2 Community License (Meta) MetalRT engine: Proprietary (RunAnywhere, Inc.)
Contact
- Downloads last month
- 6
Hardware compatibility
Log In to add your hardware
Quantized
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support