File size: 2,431 Bytes
41b94ba
 
 
 
 
 
 
9d5004f
41b94ba
 
53d1a7e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
---
license: apache-2.0
library_name: transformers
tags:
- language
- granite-4.0
- mlx
- open4bits
base_model: ibm-granite/granite-4.0-micro
---

# Open4bits / Granite-4.0-Micro-MLX-3Bit

This repository provides the **Granite-4.0 Micro model quantized to 3-bit in MLX format**, published by Open4bits to enable efficient local inference with low memory usage and broad hardware compatibility.

The underlying Granite-4.0 model and architecture are **developed and owned by their original authors**. This repository contains only a 3-bit quantized MLX conversion of the original model weights.

The model is designed for lightweight, high-performance text generation and instruction-following tasks, making it suitable for local and resource-constrained environments.

Open4bits has started supporting **MLX models** to broaden compatibility with emerging quantization formats and efficient runtimes.

---

## Model Overview

Granite-4.0 Micro is a compact variant of the Granite-4.0 architecture optimized for efficient inference and lower resource footprints.
This release provides a **3-bit quantized checkpoint in MLX format**, enabling fast inference on CPUs and supported accelerators with reduced memory demands.

---

## Model Details

* **Base Model:** Granite-4.0
* **Variant:** Micro
* **Quantization:** 3-bit
* **Format:** MLX
* **Task:** Text generation, instruction following
* **Weight tying:** Preserved
* **Compatibility:** MLX-enabled inference engines and supported runtimes

This quantized format balances inference performance with lower resource requirements while preserving core architectural design.

---

## Intended Use

This model is intended for:

* Local text generation and chat applications
* CPU-based or resource-efficient deployments
* Research, experimentation, and prototyping
* Offline or self-hosted AI systems

---

## Limitations

* Reduced performance compared to full-precision variants
* Output quality depends on prompt engineering and inference settings
* Not fine-tuned for highly domain-specific tasks

---

## License

This model follows the **Apache licence 2.0** of the base Granite-4.0 model.
Users must comply with the licensing conditions defined by the original creators.

---

## Support

If you find this model useful, please consider supporting the project.
Your support encourages Open4bits to continue releasing and maintaining efficient open models for the community.