Spaces:

Luigi
/

tiny-scribe

Running

Luigi commited on Feb 1

Commit

5c93b30

1 Parent(s): e187d3a

feat: Add GLM-4.7-Flash-30B (Original) IQ2_XXS 2-bit quantization

- Added bartowski/zai-org_GLM-4.7-Flash-GGUF with IQ2_XXS quantization
- 30B parameters, 128K context, 2-bit ultra-compressed (~5GB)
- Original zai-org model (not REAP variant)
- Updated README: 23→24 models

Files changed (2) hide show

README.md +8 -7
app.py +16 -2

README.md CHANGED Viewed

@@ -12,11 +12,11 @@ license: mit
 # Tiny Scribe
-A lightweight transcript summarization tool powered by local LLMs. Features 23 models ranging from 100M to 30B parameters with live streaming output, reasoning modes, and flexible deployment options.
 ## Features
-- **23 Local Models**: From tiny 100M models to powerful 30B models
 - **Live Streaming**: Real-time summary generation with token-by-token output
 - **Model Selection**: Dropdown to choose from 22 available models
 - **Reasoning Modes**: Toggle thinking/reasoning for supported models (Qwen3, ERNIE, LFM2)
@@ -26,7 +26,7 @@ A lightweight transcript summarization tool powered by local LLMs. Features 23 m
 - **Language Support**: English or Traditional Chinese (zh-TW) output via OpenCC
 - **Auto Settings**: Temperature, top_p, and top_k sliders auto-populate per model
-## Model Registry (23 Models)
 ### Tiny Models (0.1-0.6B)
 - **Falcon-H1-100M** - 100M parameters, 4K context
@@ -55,14 +55,15 @@ A lightweight transcript summarization tool powered by local LLMs. Features 23 m
 ### Medium Models (21-30B)
 - **ERNIE-4.5-21B-PT** - 21B parameters, 32K context
 - **ERNIE-4.5-21B-Thinking** - 21B parameters, 32K context (reasoning)
-- **GLM-4.7-Flash-23B-REAP** - 23B parameters, 32K context
-- **Qwen3-30B-A3B-Thinking** - 30B parameters, 32K context (reasoning)
-- **Qwen3-30B-A3B-Instruct** - 30B parameters, 32K context
 ## Usage
 1. **Select Output Language**: Choose English or Traditional Chinese (zh-TW)
-2. **Select Model**: Choose from the dropdown of 23 available models
 3. **Configure Settings** (optional):
    - Enable "Use Reasoning Mode" for thinking models
    - Adjust Temperature, Top-p, and Top-k (auto-populated per model)

 # Tiny Scribe
+A lightweight transcript summarization tool powered by local LLMs. Features 24 models ranging from 100M to 30B parameters with live streaming output, reasoning modes, and flexible deployment options.
 ## Features
+- **24 Local Models**: From tiny 100M models to powerful 30B models
 - **Live Streaming**: Real-time summary generation with token-by-token output
 - **Model Selection**: Dropdown to choose from 22 available models
 - **Reasoning Modes**: Toggle thinking/reasoning for supported models (Qwen3, ERNIE, LFM2)
 - **Language Support**: English or Traditional Chinese (zh-TW) output via OpenCC
 - **Auto Settings**: Temperature, top_p, and top_k sliders auto-populate per model
+## Model Registry (24 Models)
 ### Tiny Models (0.1-0.6B)
 - **Falcon-H1-100M** - 100M parameters, 4K context
 ### Medium Models (21-30B)
 - **ERNIE-4.5-21B-PT** - 21B parameters, 32K context
 - **ERNIE-4.5-21B-Thinking** - 21B parameters, 32K context (reasoning)
+- **GLM-4.7-Flash-30B-REAP** - 30B parameters, 128K context (TQ1_0, REAP variant)
+- **GLM-4.7-Flash-30B-Original-IQ2** - 30B parameters, 128K context (IQ2_XXS 2-bit, original zai-org)
+- **Qwen3-30B-A3B-Thinking** - 30B parameters, 256K context (reasoning)
+- **Qwen3-30B-A3B-Instruct** - 30B parameters, 256K context
 ## Usage
 1. **Select Output Language**: Choose English or Traditional Chinese (zh-TW)
+2. **Select Model**: Choose from the dropdown of 24 available models
 3. **Configure Settings** (optional):
    - Enable "Use Reasoning Mode" for thinking models
    - Adjust Temperature, Top-p, and Top-k (auto-populated per model)

app.py CHANGED Viewed

@@ -302,8 +302,8 @@ AVAILABLE_MODELS = {
             "repeat_penalty": 1.1,
         },
     },
-    "glm_4_7_flash_reap_23b": {
-        "name": "GLM-4.7-Flash-REAP-23B Thinking (128K Context)",
         "repo_id": "unsloth/GLM-4.7-Flash-REAP-23B-A3B-GGUF",
         "filename": "*TQ1_0.gguf",
         "max_context": 131072,
@@ -316,6 +316,20 @@ AVAILABLE_MODELS = {
             "repeat_penalty": 1.05,
         },
     },
     "qwen3_30b_thinking_q1": {
         "name": "Qwen3 30B Thinking (256K Context)",
         "repo_id": "unsloth/Qwen3-30B-A3B-Thinking-2507-GGUF",

             "repeat_penalty": 1.1,
         },
     },
+    "glm_4_7_flash_reap_30b": {
+        "name": "GLM-4.7-Flash-REAP-30B Thinking (128K Context)",
         "repo_id": "unsloth/GLM-4.7-Flash-REAP-23B-A3B-GGUF",
         "filename": "*TQ1_0.gguf",
         "max_context": 131072,
             "repeat_penalty": 1.05,
         },
     },
+    "glm_4_7_flash_30b_iq2": {
+        "name": "GLM-4.7-Flash-30B (Original) IQ2_XXS (128K Context)",
+        "repo_id": "bartowski/zai-org_GLM-4.7-Flash-GGUF",
+        "filename": "*IQ2_XXS.gguf",
+        "max_context": 131072,
+        "default_temperature": 0.6,
+        "supports_toggle": False,
+        "inference_settings": {
+            "temperature": 0.6,
+            "top_p": 0.95,
+            "top_k": 20,
+            "repeat_penalty": 1.05,
+        },
+    },
     "qwen3_30b_thinking_q1": {
         "name": "Qwen3 30B Thinking (256K Context)",
         "repo_id": "unsloth/Qwen3-30B-A3B-Thinking-2507-GGUF",