sebastavar commited on
Commit
b6a165a
·
verified ·
1 Parent(s): e41b745

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -2
README.md CHANGED
@@ -13,13 +13,12 @@ sdk_version: 5.42.0
13
  High-quality, Apple-Silicon–optimized **MLX** builds, tools, and evals — focused on practical, on-prem inference for small teams.
14
  > We publish **Mixture-of-Experts (MoE)** models and MLX quantizations tuned for M-series Macs (Metal + unified memory).
15
  > Target use: fast, reliable **interactive chat** and light batch workloads.
 
16
  ---
17
  ## 🚀 Featured models
18
-
19
  | Repo | Bits/GS | Footprint | Notes |
20
  |---|---:|---:|---|
21
  | **HalleyAI/gpt-oss-20b-MLX-4bit-gs32** | Q4 / 32 | ~13.1 GB | Best speed on 32 GB; near-baseline quality (+1.81% PPL vs 8-bit) |
22
  | **HalleyAI/gpt-oss-20b-MLX-6bit-gs32** | Q6 / 32 | ~18.4 GB | Near-Q8 fidelity (-0.51% PPL vs 8-bit) |
23
  | **Reference (8-bit)** | Q8 / 32 | — | Use upstream: `lmstudio-community/gpt-oss-20b-MLX-8bit` |
24
-
25
  > **Format:** MLX (not GGUF). For Linux/Windows or non-MLX stacks, use a GGUF build with llama.cpp.
 
13
  High-quality, Apple-Silicon–optimized **MLX** builds, tools, and evals — focused on practical, on-prem inference for small teams.
14
  > We publish **Mixture-of-Experts (MoE)** models and MLX quantizations tuned for M-series Macs (Metal + unified memory).
15
  > Target use: fast, reliable **interactive chat** and light batch workloads.
16
+
17
  ---
18
  ## 🚀 Featured models
 
19
  | Repo | Bits/GS | Footprint | Notes |
20
  |---|---:|---:|---|
21
  | **HalleyAI/gpt-oss-20b-MLX-4bit-gs32** | Q4 / 32 | ~13.1 GB | Best speed on 32 GB; near-baseline quality (+1.81% PPL vs 8-bit) |
22
  | **HalleyAI/gpt-oss-20b-MLX-6bit-gs32** | Q6 / 32 | ~18.4 GB | Near-Q8 fidelity (-0.51% PPL vs 8-bit) |
23
  | **Reference (8-bit)** | Q8 / 32 | — | Use upstream: `lmstudio-community/gpt-oss-20b-MLX-8bit` |
 
24
  > **Format:** MLX (not GGUF). For Linux/Windows or non-MLX stacks, use a GGUF build with llama.cpp.