Luigi commited on
Commit
5c93b30
·
1 Parent(s): e187d3a

feat: Add GLM-4.7-Flash-30B (Original) IQ2_XXS 2-bit quantization

Browse files

- Added bartowski/zai-org_GLM-4.7-Flash-GGUF with IQ2_XXS quantization
- 30B parameters, 128K context, 2-bit ultra-compressed (~5GB)
- Original zai-org model (not REAP variant)
- Updated README: 23→24 models

Files changed (2) hide show
  1. README.md +8 -7
  2. app.py +16 -2
README.md CHANGED
@@ -12,11 +12,11 @@ license: mit
12
 
13
  # Tiny Scribe
14
 
15
- A lightweight transcript summarization tool powered by local LLMs. Features 23 models ranging from 100M to 30B parameters with live streaming output, reasoning modes, and flexible deployment options.
16
 
17
  ## Features
18
 
19
- - **23 Local Models**: From tiny 100M models to powerful 30B models
20
  - **Live Streaming**: Real-time summary generation with token-by-token output
21
  - **Model Selection**: Dropdown to choose from 22 available models
22
  - **Reasoning Modes**: Toggle thinking/reasoning for supported models (Qwen3, ERNIE, LFM2)
@@ -26,7 +26,7 @@ A lightweight transcript summarization tool powered by local LLMs. Features 23 m
26
  - **Language Support**: English or Traditional Chinese (zh-TW) output via OpenCC
27
  - **Auto Settings**: Temperature, top_p, and top_k sliders auto-populate per model
28
 
29
- ## Model Registry (23 Models)
30
 
31
  ### Tiny Models (0.1-0.6B)
32
  - **Falcon-H1-100M** - 100M parameters, 4K context
@@ -55,14 +55,15 @@ A lightweight transcript summarization tool powered by local LLMs. Features 23 m
55
  ### Medium Models (21-30B)
56
  - **ERNIE-4.5-21B-PT** - 21B parameters, 32K context
57
  - **ERNIE-4.5-21B-Thinking** - 21B parameters, 32K context (reasoning)
58
- - **GLM-4.7-Flash-23B-REAP** - 23B parameters, 32K context
59
- - **Qwen3-30B-A3B-Thinking** - 30B parameters, 32K context (reasoning)
60
- - **Qwen3-30B-A3B-Instruct** - 30B parameters, 32K context
 
61
 
62
  ## Usage
63
 
64
  1. **Select Output Language**: Choose English or Traditional Chinese (zh-TW)
65
- 2. **Select Model**: Choose from the dropdown of 23 available models
66
  3. **Configure Settings** (optional):
67
  - Enable "Use Reasoning Mode" for thinking models
68
  - Adjust Temperature, Top-p, and Top-k (auto-populated per model)
 
12
 
13
  # Tiny Scribe
14
 
15
+ A lightweight transcript summarization tool powered by local LLMs. Features 24 models ranging from 100M to 30B parameters with live streaming output, reasoning modes, and flexible deployment options.
16
 
17
  ## Features
18
 
19
+ - **24 Local Models**: From tiny 100M models to powerful 30B models
20
  - **Live Streaming**: Real-time summary generation with token-by-token output
21
  - **Model Selection**: Dropdown to choose from 22 available models
22
  - **Reasoning Modes**: Toggle thinking/reasoning for supported models (Qwen3, ERNIE, LFM2)
 
26
  - **Language Support**: English or Traditional Chinese (zh-TW) output via OpenCC
27
  - **Auto Settings**: Temperature, top_p, and top_k sliders auto-populate per model
28
 
29
+ ## Model Registry (24 Models)
30
 
31
  ### Tiny Models (0.1-0.6B)
32
  - **Falcon-H1-100M** - 100M parameters, 4K context
 
55
  ### Medium Models (21-30B)
56
  - **ERNIE-4.5-21B-PT** - 21B parameters, 32K context
57
  - **ERNIE-4.5-21B-Thinking** - 21B parameters, 32K context (reasoning)
58
+ - **GLM-4.7-Flash-30B-REAP** - 30B parameters, 128K context (TQ1_0, REAP variant)
59
+ - **GLM-4.7-Flash-30B-Original-IQ2** - 30B parameters, 128K context (IQ2_XXS 2-bit, original zai-org)
60
+ - **Qwen3-30B-A3B-Thinking** - 30B parameters, 256K context (reasoning)
61
+ - **Qwen3-30B-A3B-Instruct** - 30B parameters, 256K context
62
 
63
  ## Usage
64
 
65
  1. **Select Output Language**: Choose English or Traditional Chinese (zh-TW)
66
+ 2. **Select Model**: Choose from the dropdown of 24 available models
67
  3. **Configure Settings** (optional):
68
  - Enable "Use Reasoning Mode" for thinking models
69
  - Adjust Temperature, Top-p, and Top-k (auto-populated per model)
app.py CHANGED
@@ -302,8 +302,8 @@ AVAILABLE_MODELS = {
302
  "repeat_penalty": 1.1,
303
  },
304
  },
305
- "glm_4_7_flash_reap_23b": {
306
- "name": "GLM-4.7-Flash-REAP-23B Thinking (128K Context)",
307
  "repo_id": "unsloth/GLM-4.7-Flash-REAP-23B-A3B-GGUF",
308
  "filename": "*TQ1_0.gguf",
309
  "max_context": 131072,
@@ -316,6 +316,20 @@ AVAILABLE_MODELS = {
316
  "repeat_penalty": 1.05,
317
  },
318
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
319
  "qwen3_30b_thinking_q1": {
320
  "name": "Qwen3 30B Thinking (256K Context)",
321
  "repo_id": "unsloth/Qwen3-30B-A3B-Thinking-2507-GGUF",
 
302
  "repeat_penalty": 1.1,
303
  },
304
  },
305
+ "glm_4_7_flash_reap_30b": {
306
+ "name": "GLM-4.7-Flash-REAP-30B Thinking (128K Context)",
307
  "repo_id": "unsloth/GLM-4.7-Flash-REAP-23B-A3B-GGUF",
308
  "filename": "*TQ1_0.gguf",
309
  "max_context": 131072,
 
316
  "repeat_penalty": 1.05,
317
  },
318
  },
319
+ "glm_4_7_flash_30b_iq2": {
320
+ "name": "GLM-4.7-Flash-30B (Original) IQ2_XXS (128K Context)",
321
+ "repo_id": "bartowski/zai-org_GLM-4.7-Flash-GGUF",
322
+ "filename": "*IQ2_XXS.gguf",
323
+ "max_context": 131072,
324
+ "default_temperature": 0.6,
325
+ "supports_toggle": False,
326
+ "inference_settings": {
327
+ "temperature": 0.6,
328
+ "top_p": 0.95,
329
+ "top_k": 20,
330
+ "repeat_penalty": 1.05,
331
+ },
332
+ },
333
  "qwen3_30b_thinking_q1": {
334
  "name": "Qwen3 30B Thinking (256K Context)",
335
  "repo_id": "unsloth/Qwen3-30B-A3B-Thinking-2507-GGUF",