sebastavar commited on
Commit
7ac7963
·
verified ·
1 Parent(s): 58618c6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -5
README.md CHANGED
@@ -15,12 +15,23 @@ High-quality, Apple-Silicon–optimized **MLX** builds, tools, and evals — foc
15
  > Target use: fast, reliable **interactive chat** and light batch workloads.
16
 
17
  ---
 
18
  ## 🚀 Featured models
 
 
 
19
  | Repo | Bits/GS | Footprint | Notes |
20
  |---|---:|---:|---|
21
- | **halley-ai/gpt-oss-20b-MLX-4bit-gs32** | Q4 / 32 | ~13.1 GB | Trades accuracy for footprint; use when RAM is constrained or throughput is the priority. |
22
- | **halley-ai/gpt-oss-20b-MLX-5bit-gs32** | Q5 / 32 | ~15.8 GB | Small drop vs 6-bit/gs32 and 8-bit/gs64 (~3–6% PPL); “fits-16GB” VRAM when GPU buffer limits matter. |
23
- | **halley-ai/gpt-oss-20b-MLX-6bit-gs32** | Q6 / 32 | ~18.4 GB | Best of the group; edges out 8-bit/gs64 slightly at a smaller footprint |
24
- | **Reference (8-bit)** | Q8 / 32 | — | See upstream: `lmstudio-community/gpt-oss-20b-MLX-8bit` |
 
 
 
 
 
 
 
25
 
26
- **Format:** MLX (not GGUF). For Linux/Windows or non-MLX stacks, use a GGUF build with llama.cpp.
 
15
  > Target use: fast, reliable **interactive chat** and light batch workloads.
16
 
17
  ---
18
+
19
  ## 🚀 Featured models
20
+
21
+ ### gpt-oss-20b (MLX)
22
+
23
  | Repo | Bits/GS | Footprint | Notes |
24
  |---|---:|---:|---|
25
+ | **halley-ai/gpt-oss-20b-MLX-5bit-gs32** | Q5 / 32 | ~15.8 GB | Small drop vs 6-bit (~3–6% PPL); “fits‑16GB” VRAM when GPU buffer limits matter. |
26
+ | **halley-ai/gpt-oss-20b-MLX-6bit-gs32** | Q6 / 32 | ~18.4 GB | Best of the group; strong quality/footprint tradeoff. |
27
+
28
+ ### gpt-oss-120b (MLX)
29
+
30
+ | Repo | Bits/GS | Memory | Notes |
31
+ |---|---:|---|---|
32
+ | [halley-ai/gpt-oss-120b-MLX-8bit-gs32](https://huggingface.co/halley-ai/gpt-oss-120b-MLX-8bit-gs32) | Q8 / 32 | 64 GB+ (96 GB ideal) | Reference int8; stable and simple to use. |
33
+ | [halley-ai/gpt-oss-120b-MLX-bf16](https://huggingface.co/halley-ai/gpt-oss-120b-MLX-bf16) | bf16 | 64–96 GB recommended | Non-quantized reference for evaluation/ground truth. |
34
+
35
+ Docs: see `docs/model-cards/gpt-oss-120b-MLX-6bit-gs64.md`, `docs/model-cards/gpt-oss-120b-MLX-8bit-gs32.md`, and `docs/model-cards/gpt-oss-120b-MLX-bf16.md`.
36
 
37
+ **Format:** MLX (not GGUF). For Linux/Windows or non-MLX stacks, use a GGUF build with llama.cpp.