Spaces:
Running
Running
docs: update README with new UI features and custom GGUF loading
Browse files- Add Custom GGUF loading feature description
- Document tabbed interface (Preset Models / Custom GGUF)
- Add Hardware Configuration table
- New section: Custom GGUF Models how-to
- Update Usage section to match new UI flow
- Add gradio_huggingfacehub_search to Technical Details
- Fix repository link to Luigi/tiny-scribe
- Condense model registry format
README.md
CHANGED
|
@@ -12,93 +12,93 @@ license: mit
|
|
| 12 |
|
| 13 |
# Tiny Scribe
|
| 14 |
|
| 15 |
-
A lightweight transcript summarization tool powered by local LLMs. Features 24 models ranging from 100M to 30B parameters
|
| 16 |
|
| 17 |
## Features
|
| 18 |
|
| 19 |
-
- **24
|
|
|
|
|
|
|
| 20 |
- **Live Streaming**: Real-time summary generation with token-by-token output
|
| 21 |
-
- **Model Selection**: Dropdown to choose from 22 available models
|
| 22 |
- **Reasoning Modes**: Toggle thinking/reasoning for supported models (Qwen3, ERNIE, LFM2)
|
| 23 |
- **Thinking Buffer**: Automatic 50% context window extension when reasoning enabled
|
| 24 |
-
- **
|
| 25 |
- **File Upload**: Upload .txt files to summarize
|
| 26 |
- **Language Support**: English or Traditional Chinese (zh-TW) output via OpenCC
|
| 27 |
-
- **Auto Settings**: Temperature, top_p, and top_k
|
| 28 |
|
| 29 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
### Tiny Models (0.1-0.6B)
|
| 32 |
-
-
|
| 33 |
-
-
|
| 34 |
-
-
|
| 35 |
-
- **Granite-3.1-0.35B-A600M** - 350M parameters, 4K context
|
| 36 |
-
- **Granite-3.3-0.35B-A800M** - 350M parameters, 4K context
|
| 37 |
-
- **BitCPM4-0.5B** - 500M parameters, 32K context
|
| 38 |
-
- **Hunyuan-0.5B** - 500M parameters, 4K context
|
| 39 |
-
- **Qwen3-0.6B** - 600M parameters, 4K context
|
| 40 |
|
| 41 |
### Compact Models (1.5-2.6B)
|
| 42 |
-
-
|
| 43 |
-
-
|
| 44 |
-
- **Qwen3-1.7B-Thinking** - 1.7B parameters, 32K context (reasoning)
|
| 45 |
-
- **Granite-3.3-2B** - 2B parameters, 4K context
|
| 46 |
-
- **Youtu-LLM-2B** - 2B parameters, 8K context (reasoning toggle)
|
| 47 |
-
- **LFM2-2.6B-Transcript** - 2.6B parameters, 32K context (transcript-specialized)
|
| 48 |
|
| 49 |
### Standard Models (3-7B)
|
| 50 |
-
-
|
| 51 |
-
- **Breeze-3B-Q4** - 3B parameters, 32K context
|
| 52 |
-
- **Qwen3-4B-Thinking** - 4B parameters, 8K context (reasoning)
|
| 53 |
-
- **Granite-4.0-Tiny-7B** - 7B parameters, 8K context
|
| 54 |
-
|
| 55 |
-
### Medium Models (21-30B)
|
| 56 |
-
- **ERNIE-4.5-21B-PT** - 21B parameters, 32K context
|
| 57 |
-
- **ERNIE-4.5-21B-Thinking** - 21B parameters, 32K context (reasoning)
|
| 58 |
-
- **GLM-4.7-Flash-30B-REAP** - 30B parameters, 128K context (TQ1_0, REAP variant)
|
| 59 |
-
- **GLM-4.7-Flash-30B-Original-IQ2** - 30B parameters, 128K context (IQ2_XXS 2-bit, original zai-org)
|
| 60 |
-
- **Qwen3-30B-A3B-Thinking** - 30B parameters, 256K context (reasoning)
|
| 61 |
-
- **Qwen3-30B-A3B-Instruct** - 30B parameters, 256K context
|
| 62 |
|
| 63 |
-
##
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
3. **Configure Settings** (optional):
|
| 68 |
-
- Enable "Use Reasoning Mode" for thinking models
|
| 69 |
-
- Adjust Temperature, Top-p, and Top-k (auto-populated per model)
|
| 70 |
-
4. **Upload File**: Upload a .txt file containing your transcript
|
| 71 |
-
5. **Click Summarize**: Watch the summary appear in real-time!
|
| 72 |
|
| 73 |
## Technical Details
|
| 74 |
|
| 75 |
- **Inference Engine**: llama-cpp-python
|
| 76 |
-
- **Model Format**: GGUF (
|
| 77 |
-
- **Context Windows**: 4K–
|
| 78 |
- **UI Framework**: Gradio with streaming support
|
|
|
|
| 79 |
- **Language Conversion**: OpenCC for Traditional Chinese (zh-TW)
|
| 80 |
- **Deployment**: Docker (HuggingFace Spaces compatible)
|
| 81 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 82 |
## Reasoning Mode
|
| 83 |
|
| 84 |
-
For models that support thinking/reasoning (marked with
|
| 85 |
- Automatically extends context window by 50%
|
| 86 |
- Provides reasoning steps before the final summary
|
| 87 |
- Toggle on/off per generation
|
| 88 |
|
| 89 |
-
## GPU Acceleration
|
| 90 |
-
|
| 91 |
-
Set the `N_GPU_LAYERS` environment variable:
|
| 92 |
-
- `-1` or high value: Use GPU for all layers
|
| 93 |
-
- `0`: CPU-only inference
|
| 94 |
-
- Default: Automatically detects GPU availability
|
| 95 |
-
|
| 96 |
## Limitations
|
| 97 |
|
| 98 |
-
- **Input Size**: Varies by model (4K–
|
| 99 |
-
- **First Load**: 10–60 seconds depending on model size
|
| 100 |
-
- **CPU Inference**: Free tier runs on CPU;
|
| 101 |
-
- **
|
| 102 |
|
| 103 |
## CLI Usage
|
| 104 |
|
|
@@ -110,10 +110,10 @@ python summarize_transcript.py -i ./transcripts/short.txt
|
|
| 110 |
python summarize_transcript.py -i ./transcripts/short.txt -l zh-TW
|
| 111 |
|
| 112 |
# Use specific model
|
| 113 |
-
python summarize_transcript.py -
|
| 114 |
|
| 115 |
# CPU only
|
| 116 |
-
python summarize_transcript.py -
|
| 117 |
```
|
| 118 |
|
| 119 |
## Requirements
|
|
@@ -122,12 +122,10 @@ python summarize_transcript.py -i ./transcripts/short.txt -c
|
|
| 122 |
pip install -r requirements.txt
|
| 123 |
```
|
| 124 |
|
| 125 |
-
See `requirements.txt` for full dependencies including llama-cpp-python, gradio, and opencc.
|
| 126 |
-
|
| 127 |
## Repository
|
| 128 |
|
| 129 |
-
[tiny-scribe](https://huggingface.co/spaces/
|
| 130 |
|
| 131 |
## License
|
| 132 |
|
| 133 |
-
MIT License
|
|
|
|
| 12 |
|
| 13 |
# Tiny Scribe
|
| 14 |
|
| 15 |
+
A lightweight transcript summarization tool powered by local LLMs. Features 24+ preset models ranging from 100M to 30B parameters, plus the ability to load any GGUF model from HuggingFace Hub. Includes live streaming output, reasoning modes, and flexible deployment options.
|
| 16 |
|
| 17 |
## Features
|
| 18 |
|
| 19 |
+
- **24+ Preset Models**: From tiny 100M models to powerful 30B models
|
| 20 |
+
- **Custom GGUF Loading**: Load any GGUF model from HuggingFace Hub
|
| 21 |
+
- **Tabbed Interface**: Clean separation between Preset Models and Custom GGUF
|
| 22 |
- **Live Streaming**: Real-time summary generation with token-by-token output
|
|
|
|
| 23 |
- **Reasoning Modes**: Toggle thinking/reasoning for supported models (Qwen3, ERNIE, LFM2)
|
| 24 |
- **Thinking Buffer**: Automatic 50% context window extension when reasoning enabled
|
| 25 |
+
- **Hardware Presets**: Free Tier (2 vCPUs), Upgrade (8 vCPUs), or Custom thread count
|
| 26 |
- **File Upload**: Upload .txt files to summarize
|
| 27 |
- **Language Support**: English or Traditional Chinese (zh-TW) output via OpenCC
|
| 28 |
+
- **Auto Settings**: Temperature, top_p, and top_k auto-populate per model
|
| 29 |
|
| 30 |
+
## Usage
|
| 31 |
+
|
| 32 |
+
1. **Upload File**: Upload a .txt file containing your transcript
|
| 33 |
+
2. **Select Output Language**: Choose English or Traditional Chinese (zh-TW)
|
| 34 |
+
3. **Choose Model**:
|
| 35 |
+
- **Preset Models tab**: Select from 24+ curated models
|
| 36 |
+
- **Custom GGUF tab**: Search and load any GGUF from HuggingFace
|
| 37 |
+
4. **Configure Settings** (optional in Advanced Settings):
|
| 38 |
+
- Hardware tier (CPU threads)
|
| 39 |
+
- Temperature, Top-p, Top-k inference parameters
|
| 40 |
+
5. **Click Generate Summary**: Watch the thinking process and summary appear in real-time!
|
| 41 |
+
|
| 42 |
+
## Custom GGUF Models
|
| 43 |
+
|
| 44 |
+
Load any GGUF model from HuggingFace Hub:
|
| 45 |
+
|
| 46 |
+
1. Switch to the **🔧 Custom GGUF** tab
|
| 47 |
+
2. Search for a model (e.g., "qwen", "llama", "phi")
|
| 48 |
+
3. Select a GGUF file (quantization level)
|
| 49 |
+
4. Click **Load Selected Model**
|
| 50 |
+
5. The model will be downloaded and cached locally
|
| 51 |
+
|
| 52 |
+
## Model Registry (24 Preset Models)
|
| 53 |
|
| 54 |
### Tiny Models (0.1-0.6B)
|
| 55 |
+
- Falcon-H1-100M, Gemma-3-270M, ERNIE-0.3B
|
| 56 |
+
- Granite-3.1-0.35B, Granite-3.3-0.35B, BitCPM4-0.5B
|
| 57 |
+
- Hunyuan-0.5B, Qwen3-0.6B
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
|
| 59 |
### Compact Models (1.5-2.6B)
|
| 60 |
+
- Granite-3.1-1B, Falcon-H1-1.5B, Qwen3-1.7B-Thinking
|
| 61 |
+
- Granite-3.3-2B, Youtu-LLM-2B, LFM2-2.6B-Transcript
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
|
| 63 |
### Standard Models (3-7B)
|
| 64 |
+
- Granite-3.1-3B, Breeze-3B, Qwen3-4B-Thinking, Granite-4.0-Tiny-7B
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
|
| 66 |
+
### Large Models (21-30B)
|
| 67 |
+
- ERNIE-4.5-21B-PT, ERNIE-4.5-21B-Thinking
|
| 68 |
+
- GLM-4.7-Flash-30B (REAP & IQ2 variants)
|
| 69 |
+
- Qwen3-30B-A3B (Thinking & Instruct variants)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
|
| 71 |
## Technical Details
|
| 72 |
|
| 73 |
- **Inference Engine**: llama-cpp-python
|
| 74 |
+
- **Model Format**: GGUF (Q2_K_L, Q3_K_XXS, Q4_K_M, Q4_K_L, Q8_0, etc.)
|
| 75 |
+
- **Context Windows**: 4K–256K tokens depending on model
|
| 76 |
- **UI Framework**: Gradio with streaming support
|
| 77 |
+
- **Model Search**: gradio_huggingfacehub_search component
|
| 78 |
- **Language Conversion**: OpenCC for Traditional Chinese (zh-TW)
|
| 79 |
- **Deployment**: Docker (HuggingFace Spaces compatible)
|
| 80 |
|
| 81 |
+
## Hardware Configuration
|
| 82 |
+
|
| 83 |
+
| Preset | CPU Threads | Best For |
|
| 84 |
+
|--------|-------------|----------|
|
| 85 |
+
| HF Free Tier | 2 vCPUs | Small models (< 2B) |
|
| 86 |
+
| HF CPU Upgrade | 8 vCPUs | Medium models (2-7B) |
|
| 87 |
+
| Custom | 1-32 | Local deployment |
|
| 88 |
+
|
| 89 |
## Reasoning Mode
|
| 90 |
|
| 91 |
+
For models that support thinking/reasoning (marked with ⚡ icon):
|
| 92 |
- Automatically extends context window by 50%
|
| 93 |
- Provides reasoning steps before the final summary
|
| 94 |
- Toggle on/off per generation
|
| 95 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 96 |
## Limitations
|
| 97 |
|
| 98 |
+
- **Input Size**: Varies by model (4K–256K context windows)
|
| 99 |
+
- **First Load**: 10–60 seconds depending on model size
|
| 100 |
+
- **CPU Inference**: Free tier runs on CPU; larger models need more time
|
| 101 |
+
- **Custom Models**: Must be GGUF format from HuggingFace Hub
|
| 102 |
|
| 103 |
## CLI Usage
|
| 104 |
|
|
|
|
| 110 |
python summarize_transcript.py -i ./transcripts/short.txt -l zh-TW
|
| 111 |
|
| 112 |
# Use specific model
|
| 113 |
+
python summarize_transcript.py -m unsloth/Qwen3-1.7B-GGUF:Q2_K_L
|
| 114 |
|
| 115 |
# CPU only
|
| 116 |
+
python summarize_transcript.py -c
|
| 117 |
```
|
| 118 |
|
| 119 |
## Requirements
|
|
|
|
| 122 |
pip install -r requirements.txt
|
| 123 |
```
|
| 124 |
|
|
|
|
|
|
|
| 125 |
## Repository
|
| 126 |
|
| 127 |
+
[Luigi/tiny-scribe](https://huggingface.co/spaces/Luigi/tiny-scribe)
|
| 128 |
|
| 129 |
## License
|
| 130 |
|
| 131 |
+
MIT License
|