sebastavar commited on
Commit
c63e383
·
verified ·
1 Parent(s): 33aec5d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -21,6 +21,6 @@ High-quality, Apple-Silicon–optimized **MLX** builds, tools, and evals — foc
21
  | **HalleyAI/gpt-oss-20b-MLX-4bit-gs32** | Q4 / 32 | ~13.1 GB | Trades accuracy for footprint; use when RAM is constrained or throughput is the priority. |
22
  | **HalleyAI/gpt-oss-20b-MLX-5bit-gs32** | Q5 / 32 | ~15.8 GB | Small drop vs 6-bit/gs32 and 8-bit/gs64 (~3–6% PPL); “fits-16GB” VRAM when GPU buffer limits matter. |
23
  | **HalleyAI/gpt-oss-20b-MLX-6bit-gs32** | Q6 / 32 | ~18.4 GB | Best of the group; edges out 8-bit/gs64 slightly at a smaller footprint |
24
- | **Reference (8-bit)** | Q8 / 32 | — | Use upstream: `lmstudio-community/gpt-oss-20b-MLX-8bit` |
25
 
26
  **Format:** MLX (not GGUF). For Linux/Windows or non-MLX stacks, use a GGUF build with llama.cpp.
 
21
  | **HalleyAI/gpt-oss-20b-MLX-4bit-gs32** | Q4 / 32 | ~13.1 GB | Trades accuracy for footprint; use when RAM is constrained or throughput is the priority. |
22
  | **HalleyAI/gpt-oss-20b-MLX-5bit-gs32** | Q5 / 32 | ~15.8 GB | Small drop vs 6-bit/gs32 and 8-bit/gs64 (~3–6% PPL); “fits-16GB” VRAM when GPU buffer limits matter. |
23
  | **HalleyAI/gpt-oss-20b-MLX-6bit-gs32** | Q6 / 32 | ~18.4 GB | Best of the group; edges out 8-bit/gs64 slightly at a smaller footprint |
24
+ | **Reference (8-bit)** | Q8 / 32 | — | See upstream: `lmstudio-community/gpt-oss-20b-MLX-8bit` |
25
 
26
  **Format:** MLX (not GGUF). For Linux/Windows or non-MLX stacks, use a GGUF build with llama.cpp.