tomkay commited on
Commit
25ebac1
Β·
verified Β·
1 Parent(s): 93b47f4

Update model card: remove MINT/SWAN branding, optimised by baa.ai

Browse files
Files changed (1) hide show
  1. README.md +1 -30
README.md CHANGED
@@ -4,20 +4,15 @@ tags:
4
  - mlx
5
  - quantized
6
  - mixed-precision
7
- - swan
8
  license: other
9
  license_name: polyform-noncommercial
10
  base_model: THU-KEG/GLM-5-0817
11
  base_model_relation: quantized
12
  ---
13
 
14
- <p align="center">
15
- <img src="https://huggingface.co/spaces/baa-ai/MINT/resolve/main/baa-logo.svg" width="300" alt="baa.ai">
16
- </p>
17
-
18
  # GLM-5-SWAN-5bit-MLX
19
 
20
- Mixed-precision quantized version of [THUDM/GLM-5](https://huggingface.co/THUDM/GLM-5) using [SWAN](https://github.com/baa-ai/MINT) | [MINT-UI](https://github.com/baa-ai/MINT-UI).
21
 
22
  > GLM-5 (355B parameters). Experimental.
23
 
@@ -31,25 +26,6 @@ Mixed-precision quantized version of [THUDM/GLM-5](https://huggingface.co/THUDM/
31
  | WikiText-2 PPL | β€” |
32
 
33
 
34
-
35
- ## πŸš€ Create Your Own Custom Quantization
36
-
37
- **Don't see the size you need?** Use [**MINT-UI**](https://github.com/baa-ai/MINT-UI) to create a custom-sized quantization targeting your exact memory budget:
38
-
39
- ```bash
40
- pip install mint-ui
41
- mint-ui
42
- ```
43
-
44
- MINT-UI analyzes any model in **under 60 seconds** using a cutting-edge allocation technique β€” no calibration data needed. Specify your exact memory target (e.g., "fit in 24 GB for RTX 4090") and MINT returns a near-optimal per-tensor bit-width allocation.
45
-
46
- - ⚑ **60 seconds** analysis (vs hours for GPTQ/AWQ calibration)
47
- - 🎯 **Any target size** β€” not limited to uniform 4-bit or 8-bit
48
- - 🧠 **Data-free** β€” no calibration dataset required
49
- - πŸ’» **Runs on any Mac** β€” even 32 GB machines can analyze 400B models
50
-
51
- πŸ‘‰ **[Get MINT-UI](https://github.com/baa-ai/MINT-UI)** | πŸ“„ **[MINT Paper](https://github.com/baa-ai/MINT) | [MINT-UI](https://github.com/baa-ai/MINT-UI)** | πŸ€— **[All Models](https://huggingface.co/baa-ai)**
52
-
53
  ## Usage
54
 
55
  ```python
@@ -60,11 +36,6 @@ response = generate(model, tokenizer, prompt="Hello!", max_tokens=256)
60
  print(response)
61
  ```
62
 
63
- ## About SWAN
64
-
65
- SWAN uses data-free per-tensor sensitivity analysis with composite scoring to allocate bit-widths across model layers.
66
-
67
- - [Paper](https://huggingface.co/spaces/baa-ai/MINT) | [Code](https://github.com/baa-ai/MINT) | [MINT-UI](https://github.com/baa-ai/MINT-UI) | [Models](https://huggingface.co/baa-ai)
68
 
69
  ---
70
  *Quantized by [baa.ai](https://baa.ai)*
 
4
  - mlx
5
  - quantized
6
  - mixed-precision
 
7
  license: other
8
  license_name: polyform-noncommercial
9
  base_model: THU-KEG/GLM-5-0817
10
  base_model_relation: quantized
11
  ---
12
 
 
 
 
 
13
  # GLM-5-SWAN-5bit-MLX
14
 
15
+ Mixed-precision quantized version of [THUDM/GLM-5](https://huggingface.co/THUDM/GLM-5) optimised by [baa.ai](https://baa.ai).
16
 
17
  > GLM-5 (355B parameters). Experimental.
18
 
 
26
  | WikiText-2 PPL | β€” |
27
 
28
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  ## Usage
30
 
31
  ```python
 
36
  print(response)
37
  ```
38
 
 
 
 
 
 
39
 
40
  ---
41
  *Quantized by [baa.ai](https://baa.ai)*