Spaces:
No application file
No application file
Update README.md
Browse files
README.md
CHANGED
|
@@ -15,12 +15,23 @@ High-quality, Apple-Silicon–optimized **MLX** builds, tools, and evals — foc
|
|
| 15 |
> Target use: fast, reliable **interactive chat** and light batch workloads.
|
| 16 |
|
| 17 |
---
|
|
|
|
| 18 |
## 🚀 Featured models
|
|
|
|
|
|
|
|
|
|
| 19 |
| Repo | Bits/GS | Footprint | Notes |
|
| 20 |
|---|---:|---:|---|
|
| 21 |
-
| **halley-ai/gpt-oss-20b-MLX-
|
| 22 |
-
| **halley-ai/gpt-oss-20b-MLX-
|
| 23 |
-
|
| 24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
-
**Format:** MLX (not GGUF). For Linux/Windows or non-MLX stacks, use a GGUF build with llama.cpp.
|
|
|
|
| 15 |
> Target use: fast, reliable **interactive chat** and light batch workloads.
|
| 16 |
|
| 17 |
---
|
| 18 |
+
|
| 19 |
## 🚀 Featured models
|
| 20 |
+
|
| 21 |
+
### gpt-oss-20b (MLX)
|
| 22 |
+
|
| 23 |
| Repo | Bits/GS | Footprint | Notes |
|
| 24 |
|---|---:|---:|---|
|
| 25 |
+
| **halley-ai/gpt-oss-20b-MLX-5bit-gs32** | Q5 / 32 | ~15.8 GB | Small drop vs 6-bit (~3–6% PPL); “fits‑16GB” VRAM when GPU buffer limits matter. |
|
| 26 |
+
| **halley-ai/gpt-oss-20b-MLX-6bit-gs32** | Q6 / 32 | ~18.4 GB | Best of the group; strong quality/footprint tradeoff. |
|
| 27 |
+
|
| 28 |
+
### gpt-oss-120b (MLX)
|
| 29 |
+
|
| 30 |
+
| Repo | Bits/GS | Memory | Notes |
|
| 31 |
+
|---|---:|---|---|
|
| 32 |
+
| [halley-ai/gpt-oss-120b-MLX-8bit-gs32](https://huggingface.co/halley-ai/gpt-oss-120b-MLX-8bit-gs32) | Q8 / 32 | 64 GB+ (96 GB ideal) | Reference int8; stable and simple to use. |
|
| 33 |
+
| [halley-ai/gpt-oss-120b-MLX-bf16](https://huggingface.co/halley-ai/gpt-oss-120b-MLX-bf16) | bf16 | 64–96 GB recommended | Non-quantized reference for evaluation/ground truth. |
|
| 34 |
+
|
| 35 |
+
Docs: see `docs/model-cards/gpt-oss-120b-MLX-6bit-gs64.md`, `docs/model-cards/gpt-oss-120b-MLX-8bit-gs32.md`, and `docs/model-cards/gpt-oss-120b-MLX-bf16.md`.
|
| 36 |
|
| 37 |
+
**Format:** MLX (not GGUF). For Linux/Windows or non-MLX stacks, use a GGUF build with llama.cpp.
|