Spaces:
Running
Running
feat: Add GLM-4.7-Flash-30B (Original) IQ2_XXS 2-bit quantization
Browse files- Added bartowski/zai-org_GLM-4.7-Flash-GGUF with IQ2_XXS quantization
- 30B parameters, 128K context, 2-bit ultra-compressed (~5GB)
- Original zai-org model (not REAP variant)
- Updated README: 23→24 models
README.md
CHANGED
|
@@ -12,11 +12,11 @@ license: mit
|
|
| 12 |
|
| 13 |
# Tiny Scribe
|
| 14 |
|
| 15 |
-
A lightweight transcript summarization tool powered by local LLMs. Features
|
| 16 |
|
| 17 |
## Features
|
| 18 |
|
| 19 |
-
- **
|
| 20 |
- **Live Streaming**: Real-time summary generation with token-by-token output
|
| 21 |
- **Model Selection**: Dropdown to choose from 22 available models
|
| 22 |
- **Reasoning Modes**: Toggle thinking/reasoning for supported models (Qwen3, ERNIE, LFM2)
|
|
@@ -26,7 +26,7 @@ A lightweight transcript summarization tool powered by local LLMs. Features 23 m
|
|
| 26 |
- **Language Support**: English or Traditional Chinese (zh-TW) output via OpenCC
|
| 27 |
- **Auto Settings**: Temperature, top_p, and top_k sliders auto-populate per model
|
| 28 |
|
| 29 |
-
## Model Registry (
|
| 30 |
|
| 31 |
### Tiny Models (0.1-0.6B)
|
| 32 |
- **Falcon-H1-100M** - 100M parameters, 4K context
|
|
@@ -55,14 +55,15 @@ A lightweight transcript summarization tool powered by local LLMs. Features 23 m
|
|
| 55 |
### Medium Models (21-30B)
|
| 56 |
- **ERNIE-4.5-21B-PT** - 21B parameters, 32K context
|
| 57 |
- **ERNIE-4.5-21B-Thinking** - 21B parameters, 32K context (reasoning)
|
| 58 |
-
- **GLM-4.7-Flash-
|
| 59 |
-
- **
|
| 60 |
-
- **Qwen3-30B-A3B-
|
|
|
|
| 61 |
|
| 62 |
## Usage
|
| 63 |
|
| 64 |
1. **Select Output Language**: Choose English or Traditional Chinese (zh-TW)
|
| 65 |
-
2. **Select Model**: Choose from the dropdown of
|
| 66 |
3. **Configure Settings** (optional):
|
| 67 |
- Enable "Use Reasoning Mode" for thinking models
|
| 68 |
- Adjust Temperature, Top-p, and Top-k (auto-populated per model)
|
|
|
|
| 12 |
|
| 13 |
# Tiny Scribe
|
| 14 |
|
| 15 |
+
A lightweight transcript summarization tool powered by local LLMs. Features 24 models ranging from 100M to 30B parameters with live streaming output, reasoning modes, and flexible deployment options.
|
| 16 |
|
| 17 |
## Features
|
| 18 |
|
| 19 |
+
- **24 Local Models**: From tiny 100M models to powerful 30B models
|
| 20 |
- **Live Streaming**: Real-time summary generation with token-by-token output
|
| 21 |
- **Model Selection**: Dropdown to choose from 22 available models
|
| 22 |
- **Reasoning Modes**: Toggle thinking/reasoning for supported models (Qwen3, ERNIE, LFM2)
|
|
|
|
| 26 |
- **Language Support**: English or Traditional Chinese (zh-TW) output via OpenCC
|
| 27 |
- **Auto Settings**: Temperature, top_p, and top_k sliders auto-populate per model
|
| 28 |
|
| 29 |
+
## Model Registry (24 Models)
|
| 30 |
|
| 31 |
### Tiny Models (0.1-0.6B)
|
| 32 |
- **Falcon-H1-100M** - 100M parameters, 4K context
|
|
|
|
| 55 |
### Medium Models (21-30B)
|
| 56 |
- **ERNIE-4.5-21B-PT** - 21B parameters, 32K context
|
| 57 |
- **ERNIE-4.5-21B-Thinking** - 21B parameters, 32K context (reasoning)
|
| 58 |
+
- **GLM-4.7-Flash-30B-REAP** - 30B parameters, 128K context (TQ1_0, REAP variant)
|
| 59 |
+
- **GLM-4.7-Flash-30B-Original-IQ2** - 30B parameters, 128K context (IQ2_XXS 2-bit, original zai-org)
|
| 60 |
+
- **Qwen3-30B-A3B-Thinking** - 30B parameters, 256K context (reasoning)
|
| 61 |
+
- **Qwen3-30B-A3B-Instruct** - 30B parameters, 256K context
|
| 62 |
|
| 63 |
## Usage
|
| 64 |
|
| 65 |
1. **Select Output Language**: Choose English or Traditional Chinese (zh-TW)
|
| 66 |
+
2. **Select Model**: Choose from the dropdown of 24 available models
|
| 67 |
3. **Configure Settings** (optional):
|
| 68 |
- Enable "Use Reasoning Mode" for thinking models
|
| 69 |
- Adjust Temperature, Top-p, and Top-k (auto-populated per model)
|
app.py
CHANGED
|
@@ -302,8 +302,8 @@ AVAILABLE_MODELS = {
|
|
| 302 |
"repeat_penalty": 1.1,
|
| 303 |
},
|
| 304 |
},
|
| 305 |
-
"
|
| 306 |
-
"name": "GLM-4.7-Flash-REAP-
|
| 307 |
"repo_id": "unsloth/GLM-4.7-Flash-REAP-23B-A3B-GGUF",
|
| 308 |
"filename": "*TQ1_0.gguf",
|
| 309 |
"max_context": 131072,
|
|
@@ -316,6 +316,20 @@ AVAILABLE_MODELS = {
|
|
| 316 |
"repeat_penalty": 1.05,
|
| 317 |
},
|
| 318 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 319 |
"qwen3_30b_thinking_q1": {
|
| 320 |
"name": "Qwen3 30B Thinking (256K Context)",
|
| 321 |
"repo_id": "unsloth/Qwen3-30B-A3B-Thinking-2507-GGUF",
|
|
|
|
| 302 |
"repeat_penalty": 1.1,
|
| 303 |
},
|
| 304 |
},
|
| 305 |
+
"glm_4_7_flash_reap_30b": {
|
| 306 |
+
"name": "GLM-4.7-Flash-REAP-30B Thinking (128K Context)",
|
| 307 |
"repo_id": "unsloth/GLM-4.7-Flash-REAP-23B-A3B-GGUF",
|
| 308 |
"filename": "*TQ1_0.gguf",
|
| 309 |
"max_context": 131072,
|
|
|
|
| 316 |
"repeat_penalty": 1.05,
|
| 317 |
},
|
| 318 |
},
|
| 319 |
+
"glm_4_7_flash_30b_iq2": {
|
| 320 |
+
"name": "GLM-4.7-Flash-30B (Original) IQ2_XXS (128K Context)",
|
| 321 |
+
"repo_id": "bartowski/zai-org_GLM-4.7-Flash-GGUF",
|
| 322 |
+
"filename": "*IQ2_XXS.gguf",
|
| 323 |
+
"max_context": 131072,
|
| 324 |
+
"default_temperature": 0.6,
|
| 325 |
+
"supports_toggle": False,
|
| 326 |
+
"inference_settings": {
|
| 327 |
+
"temperature": 0.6,
|
| 328 |
+
"top_p": 0.95,
|
| 329 |
+
"top_k": 20,
|
| 330 |
+
"repeat_penalty": 1.05,
|
| 331 |
+
},
|
| 332 |
+
},
|
| 333 |
"qwen3_30b_thinking_q1": {
|
| 334 |
"name": "Qwen3 30B Thinking (256K Context)",
|
| 335 |
"repo_id": "unsloth/Qwen3-30B-A3B-Thinking-2507-GGUF",
|