Open4bits / Granite-4.0-Micro-MLX-3Bit

This repository provides the Granite-4.0 Micro model quantized to 3-bit in MLX format, published by Open4bits to enable efficient local inference with low memory usage and broad hardware compatibility.

The underlying Granite-4.0 model and architecture are developed and owned by their original authors. This repository contains only a 3-bit quantized MLX conversion of the original model weights.

The model is designed for lightweight, high-performance text generation and instruction-following tasks, making it suitable for local and resource-constrained environments.

Open4bits has started supporting MLX models to broaden compatibility with emerging quantization formats and efficient runtimes.


Model Overview

Granite-4.0 Micro is a compact variant of the Granite-4.0 architecture optimized for efficient inference and lower resource footprints. This release provides a 3-bit quantized checkpoint in MLX format, enabling fast inference on CPUs and supported accelerators with reduced memory demands.


Model Details

  • Base Model: Granite-4.0
  • Variant: Micro
  • Quantization: 3-bit
  • Format: MLX
  • Task: Text generation, instruction following
  • Weight tying: Preserved
  • Compatibility: MLX-enabled inference engines and supported runtimes

This quantized format balances inference performance with lower resource requirements while preserving core architectural design.


Intended Use

This model is intended for:

  • Local text generation and chat applications
  • CPU-based or resource-efficient deployments
  • Research, experimentation, and prototyping
  • Offline or self-hosted AI systems

Limitations

  • Reduced performance compared to full-precision variants
  • Output quality depends on prompt engineering and inference settings
  • Not fine-tuned for highly domain-specific tasks

License

This model follows the Apache licence 2.0 of the base Granite-4.0 model. Users must comply with the licensing conditions defined by the original creators.


Support

If you find this model useful, please consider supporting the project. Your support encourages Open4bits to continue releasing and maintaining efficient open models for the community.

Downloads last month
61
Safetensors
Model size
0.4B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

3-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Open4bits/granite-4.0-micro-mlx-3Bit

Quantized
(23)
this model