RonanMcGovern commited on
Commit
d02dc0b
·
verified ·
1 Parent(s): 1782236

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +188 -13
README.md CHANGED
@@ -1,21 +1,196 @@
1
  ---
2
- base_model: openai/whisper-tiny
3
- tags:
4
- - text-generation-inference
5
- - transformers
6
- - unsloth
7
- - whisper
8
- license: apache-2.0
9
  language:
10
  - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
- # Uploaded finetuned model
14
 
15
- - **Developed by:** Trelis
16
- - **License:** apache-2.0
17
- - **Finetuned from model :** openai/whisper-tiny
18
 
19
- This whisper model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 
 
 
 
 
20
 
21
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
1
  ---
 
 
 
 
 
 
 
2
  language:
3
  - en
4
+ metrics:
5
+ - wer
6
+ base_model:
7
+ - openai/whisper-tiny
8
+ pipeline_tag: automatic-speech-recognition
9
+ license: mit
10
+ tags:
11
+ - whisper
12
+ - stt
13
+ - speech-to-text
14
+ - british-english
15
+ - american-english
16
+ - us-english
17
+ - gb-english
18
+ - asr
19
+ - automatic-speech-recognition
20
+ extra_gated_prompt: "Purchase access to this repo [HERE](https://buy.stripe.com/fZu28q99Ih2RaCN5s7fw42B)"
21
+ extra_gated_fields:
22
+ I have purchased a license (access will be granted once your payment clears): checkbox
23
+ I agree to the terms of the license described on the dataset card: checkbox
24
  ---
25
+ # [MODEL_DISPLAY_NAME]
26
+ **Specialty Speech-to-Text (Transcription / Automatic Speech Recognition) Model**
27
+
28
+ > This is the first release of this model. Performance results are shown below. Report any errors by making a post under Community on the model repo card, to be fixed in future releases.
29
+
30
+ For all available models, see [this HuggingFace collection](https://hf.co/collections/Trelis/transcribe-british-and-american-english-spelling). For ctranslate2 variants (useful for Faster Whisper), add `-ctranslate2` to any model slug.
31
+
32
+ While training datasets are private, you can find the library for English variant conversions open sourced [here](https://github.com/TrelisResearch/whisper-english-variant-converter).
33
+
34
+ ## Background on Whisper English Variants
35
+ Whisper models disproportionately transcribe into US english, particularly when there are no obviously British english words (e.g. "rubbish" vs "trash" / "garbage").
36
+
37
+ Trelis British Spelling and American Spelling transcription models aim to make outputs uniformly follow either US or British spelling.
38
+
39
+ > Note that these models do not swap out different words with the same meaning, e.g. they will use the correct variant of colour vs color, but will not swap "trash" for "rubbish". For updates on such a model (the "lexical" variant), you can stay updated by subscribing on [trelis.substack.com]().
40
+
41
+ ## Performance
42
+ Trelis Transcribe models are fine-tunes of Whisper models.
43
+
44
+ Performance is compared on three metrics:
45
+ - Word Error Rate on two datasets: `LibriSpeech` and `Trelis/transcribe-to-en_GB-v1` or `Trelis/transcribe-to-en_US-v1`
46
+ - US -> GB %, i.e. percentage of the transcript that has American english words on `Trelis/transcribe-to-en_GB-v1` or `Trelis/transcribe-to-en_US-v1`
47
+ - GB -> US %, i.e. percentage of the transcript that has British english words on `Trelis/transcribe-to-en_GB-v1` or `Trelis/transcribe-to-en_US-v1`
48
+
49
+ US and GB percentages are measured deterministically via a [~6,000 list of exact matches](https://github.com/TrelisResearch/whisper-english-variant-converter) of British <-> American English word pairs.
50
+
51
+ Test datasets:
52
+ - `Trelis/transcribe-to-en_GB-v1` or `Trelis/transcribe-to-en_US-v1` - 30 rows of synthetic english voice data, evenly split across GB and US english source text, GB and US accents, male and female accents, mixed speeds, and with transcripts converted to British English spelling.
53
+ - `openslr/librispeech_asr` - 50 rows from the `test.other` split which contains mixed English language samples with high WER.
54
+
55
+ ### English Variant Transcription Performance
56
+ While original Whisper models transcribe ~6% of this test set to American English, the fine-tuned model reduces that towards 1% (and <0.2% for the turbo model).
57
+
58
+ **Dataset:** `Trelis/transcribe-to-en_GB-v1`
59
+ **Config:** `N/A`
60
+ **Split:** `test`
61
+ **Text Column:** `text`
62
+
63
+ | Timestamp | Model | WER % | Samples (Eval/Total/Skipped) | US→GB % | GB→US % | Normalized | Device |
64
+ |-----------|-------|-------|------------------------------|---------|---------|------------|--------|
65
+ | 2025-12-02 12:21:43 | `openai/whisper-tiny` | 10.06% | 30/30/0 | 6.12% | 0.54% | Yes | mps |
66
+ | 2025-12-02 12:16:42 | `Trelis/transcribe-en_gb-spelling-v1-tiny` | 4.58% | 30/30/0 | 1.01% | 5.64% | Yes | mps |
67
+ | 2025-12-02 12:28:33 | `openai/whisper-large-v3-turbo` | 7.15% | 30/30/0 | 5.27% | 1.62% | Yes | mps |
68
+ | 2025-12-02 13:11:01 | `Trelis/transcribe-en_gb-spelling-v1-turbo` | 1.18% | 30/30/0 | 0.20% | 6.70% | Yes | mps |
69
+
70
+ ### LibriSpeech Performance
71
+ LibriSpeech is used here as an independent check on the extent of degradation caused by fine-tuning. Notice how smaller models tend to degrade more when fine-tuned. Note that there is no evidence of degradation on the turbo model:
72
+
73
+ **Dataset:** `openslr/librispeech_asr`
74
+ **Config:** `other`
75
+ **Split:** `test`
76
+ **Text Column:** `text`
77
+
78
+ | Timestamp | Model | WER % | Samples (Eval/Total/Skipped) | US→GB % | GB→US % | Normalized | Device |
79
+ |-----------|-------|-------|------------------------------|---------|---------|------------|--------|
80
+ | 2025-12-02 09:27:52 | `openai/whisper-tiny` | 11.62% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
81
+ | 2025-12-02 12:17:18 | `Trelis/transcribe-en_gb-spelling-v1-tiny` | 13.18% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
82
+ | 2025-11-27 13:23:00 | `openai/whisper-large-v3-turbo` | 4.47% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
83
+ | 2025-12-02 13:24:33 | `Trelis/transcribe-en_gb-spelling-v1-turbo` | 4.02% | 50/50/0 | 0.00% | 0.00% | Yes | mps |
84
+
85
+ ## Inference
86
+ ### Quick Demo (3 samples)
87
+ Copy/paste to transcribe the first three rows from a HuggingFace dataset with `transcribe-en_us-spelling-v1-tiny`:
88
+
89
+ ```bash
90
+ uv run --with datasets --with transformers --with torchaudio python - <<'PY'
91
+ from datasets import load_dataset
92
+ from transformers import pipeline
93
+
94
+ DATASET_ID = "Trelis/transcribe-to-en_GB-v1"
95
+ MODEL_ID = "Trelis/transcribe-british-spelling-v1-tiny"
96
+
97
+ print(f"Loading dataset: {DATASET_ID} (first 3 rows)")
98
+ dataset = load_dataset(DATASET_ID, split="test[:3]")
99
+
100
+ print(f"Loading ASR model: {MODEL_ID}")
101
+ asr = pipeline("automatic-speech-recognition", model=MODEL_ID, return_timestamps="word")
102
+
103
+ for idx, sample in enumerate(dataset):
104
+ audio = sample["audio"]
105
+ transcription = asr(
106
+ {"array": audio["array"], "sampling_rate": audio["sampling_rate"]}
107
+ )
108
+ print(f"\nSample {idx + 1}")
109
+ print(f" Reference: {sample.get('text')}")
110
+ print(f" Transcript: {transcription['text']}")
111
+ PY
112
+ ```
113
+
114
+ Make sure you have Hugging Face access to both the dataset and model (`huggingface-cli login`).
115
+
116
+ **Transcribe your own audio (`/path/to/audio.wav`):**
117
+ ```bash
118
+ uv run --with transformers --with torchaudio python - <<'PY'
119
+ from transformers import pipeline
120
+ import torchaudio
121
+
122
+ MODEL_ID = "transcribe-en_us-spelling-v1-tiny"
123
+ audio_path = "/path/to/audio.wav" # change me
124
+
125
+ audio, sr = torchaudio.load(audio_path)
126
+ asr = pipeline("automatic-speech-recognition", model=MODEL_ID, return_timestamps="word")
127
+ result = asr({"array": audio.squeeze().numpy(), "sampling_rate": sr})
128
+ print(f"Transcript: {result['text']}")
129
+ PY
130
+ ```
131
+
132
+ ### Bulk README Uploads
133
+ Render/push README files for multiple repos listed in `model_info/readme_targets.yaml`:
134
+
135
+ ```bash
136
+ # Preview rendered files in model_info/generated_readmes/
137
+ uv run --with pyyaml --with huggingface_hub python model_info/push_readmes.py
138
+
139
+ # Push READMEs to HuggingFace Hub (requires huggingface-cli login)
140
+ uv run --with pyyaml --with huggingface_hub python model_info/push_readmes.py --push
141
+ ```
142
+
143
+ Each entry in `readme_targets.yaml` may optionally override `base_model` and `stripe_link`.
144
+ `transcribe-en_us-spelling-v1-tiny` is auto-derived from the slug; defaults exist for `tiny`, `small`, and `turbo` tiers.
145
+
146
+ ### Server Inference
147
+ For guidance on inference, see [this video](https://www.youtube.com/watch?v=qXtPPgujufI).
148
+
149
+ CTranslate2 and Faster Whisper is recommended if you wish to operate a server. You can modify [this](https://console.runpod.io/deploy?template=v7xyt1e57i&ref=jmfkcdio) one-click Runpod affiliate link to get started quickly.
150
+
151
+ ## Further Support
152
+ - For model-specific questions create a post under "Community" on the repo card.
153
+ - For support with custom fine-tunings, see [trelis.com/ADVANCED-audio](trelis.com/ADVANCED-transcription) OR for deeper support book a session [here](https://trelis.com/corporate-product-llm-review/).
154
+
155
+ ## Jobs
156
+ Trelis is hiring a part-time developer on contract to assist with model development. Apply [here](https://forms.gle/KMj6zHjiuidn4Zr89).
157
+
158
+ ### License & Usage (Trelis Transcribe v1 Models)
159
+
160
+ `Tiny` models are open for commercial use under the MIT License.
161
+
162
+ `Turbo` models are commercially licensed and:
163
+ - Available for purchase by *individuals or small organisations* under a basic license.
164
+ - Available for licensing for *larger organisations* [here](https://forms.gle/wMTBDmiLxBwMdHQH7).
165
+
166
+ > Small orgs are defined as entities with less than $1M revenue across all of their products/services over the last year AND less than 25 employees.
167
+
168
+ ## Basic License Details (for individuals + small orgs)
169
+ Purchase gives an individual or small organisation a **lifetime license to v1**. Future major versions (v2, v3, …) may be sold separately.
170
+
171
+ You may:
172
+ - Use the model for **personal, academic, and research** projects.
173
+ - Use it for **internal transcription** (meetings, calls, training, docs, etc.).
174
+ - Use it **inside your own products and services** (SaaS, apps, internal tools).
175
+ - Run it **on your own servers or embedded in your app** (desktop / mobile / edge), so users transcribe audio *through your app*.
176
+ - **Fine-tune** the model for your own internal or product use.
177
+
178
+ You **may not**:
179
+ - **Redistribute** the original or fine-tuned weights
180
+ - e.g. upload to other model hubs, share checkpoints, ship raw model files to clients.
181
+ - Offer a **general-purpose STT service for other developers or companies**
182
+ - e.g. “we sell an STT API anyone can build on” using these weights as the core engine.
183
+ - **Resell or rebrand** the model itself (weights as a product).
184
 
185
+ On-device use is fine **only** as an internal component of your app. Users get features, not reusable model files.
186
 
187
+ ### Bigger / infrastructure use
 
 
188
 
189
+ If you:
190
+ - Are above the size threshold above, or
191
+ - Want to offer speech-to-text as a **general-purpose API/service**, or
192
+ - Need rights to **redistribute original or fine-tuned weights**, or
193
+ - Want access to **larger model sizes** (e.g. fine-tunes of Whisper Large v3), or
194
+ - Want **support / SLAs / early access to future versions**
195
 
196
+ Kindly describe your use case **[here](https://forms.gle/wMTBDmiLxBwMdHQH7)** and I will respond promptly.