--- language: - en metrics: - wer base_model: - openai/whisper-tiny pipeline_tag: automatic-speech-recognition license: mit tags: - whisper - stt - speech-to-text - british-english - american-english - us-english - gb-english - asr - automatic-speech-recognition extra_gated_prompt: "Purchase access to this repo [HERE](https://buy.stripe.com/fZu28q99Ih2RaCN5s7fw42B)" extra_gated_fields: I have purchased a license (access will be granted once your payment clears): checkbox I agree to the terms of the license described on the dataset card: checkbox --- # Transcribe British English Spelling v1 Tiny **Specialty Speech-to-Text (Transcription / Automatic Speech Recognition) Model** > This is the first release of this model. Performance results are shown below. Report any errors by making a post under Community on the model repo card, to be fixed in future releases. For all available models, see [this HuggingFace collection](https://hf.co/collections/Trelis/transcribe-british-and-american-english-spelling). For ctranslate2 variants (useful for Faster Whisper), add `-ctranslate2` to any model slug. While training datasets are private, you can find the library for English variant conversions open sourced [here](https://github.com/TrelisResearch/whisper-english-variant-converter). ## Background on Whisper English Variants Whisper models disproportionately transcribe into US english, particularly when there are no obviously British english words (e.g. "rubbish" vs "trash" / "garbage"). Trelis British Spelling and American Spelling transcription models aim to make outputs uniformly follow either US or British spelling. > Note that these models do not swap out different words with the same meaning, e.g. they will use the correct variant of colour vs color, but will not swap "trash" for "rubbish". For updates on such a model (the "lexical" variant), you can stay updated by subscribing on [trelis.substack.com](). ## Performance Trelis Transcribe models are fine-tunes of Whisper models. Performance is compared on three metrics: - Word Error Rate on two datasets: `LibriSpeech` and `Trelis/transcribe-to-en_GB-v1` or `Trelis/transcribe-to-en_US-v1` - US -> GB %, i.e. percentage of the transcript that has American english words on `Trelis/transcribe-to-en_GB-v1` or `Trelis/transcribe-to-en_US-v1` - GB -> US %, i.e. percentage of the transcript that has British english words on `Trelis/transcribe-to-en_GB-v1` or `Trelis/transcribe-to-en_US-v1` US and GB percentages are measured deterministically via a [~6,000 list of exact matches](https://github.com/TrelisResearch/whisper-english-variant-converter) of British <-> American English word pairs. Test datasets: - `Trelis/transcribe-to-en_GB-v1` or `Trelis/transcribe-to-en_US-v1` - 30 rows of synthetic english voice data, evenly split across GB and US english source text, GB and US accents, male and female accents, mixed speeds, and with transcripts converted to British English spelling. - `openslr/librispeech_asr` - 50 rows from the `test.other` split which contains mixed English language samples with high WER. ### British (EN_UK) Variant Transcription Performance While original Whisper models transcribe ~6% of this test set to American English, the fine-tuned model reduces that towards 1% (and <0.2% for the turbo model). **Dataset:** `Trelis/transcribe-to-en_GB-v1` **Config:** `N/A` **Split:** `test` **Text Column:** `text` | Timestamp | Model | WER % | Samples (Eval/Total/Skipped) | US→GB % | GB→US % | Normalized | Device | |-----------|-------|-------|------------------------------|---------|---------|------------|--------| | 2025-12-02 12:21:43 | `openai/whisper-tiny` | 10.06% | 30/30/0 | 6.12% | 0.54% | Yes | mps | | 2025-12-02 12:16:42 | `Trelis/transcribe-en_gb-spelling-v1-tiny` | 4.58% | 30/30/0 | 1.01% | 5.64% | Yes | mps | | 2025-12-02 12:28:33 | `openai/whisper-large-v3-turbo` | 7.15% | 30/30/0 | 5.27% | 1.62% | Yes | mps | | 2025-12-02 13:11:01 | `Trelis/transcribe-en_gb-spelling-v1-turbo` | 1.18% | 30/30/0 | 0.20% | 6.70% | Yes | mps | ### American (EN_US) Transcription Performance Original Whisper models already tend to transcribe to American English, and so the improvement in transcription performance is smaller on the fine-tuned model, although improving by ~1.5% on the Turbo model. **Dataset:** `Trelis/asr-en_mixed-to-en_US-tts-test-20251202-105023` **Config:** `N/A` **Split:** `test` **Text Column:** `text` | Timestamp | Model | WER % | Samples (Eval/Total/Skipped) | US→GB % | GB→US % | Normalized | Device | |-----------|-------|-------|------------------------------|---------|---------|------------|--------| | 2025-12-02 11:03:11 | `openai/whisper-tiny` | 4.93% | 30/30/0 | 6.32% | 0.54% | Yes | mps | | 2025-12-02 13:43:28 | `Trelis/transcribe-en_us-spelling-v1-tiny` | 3.89% | 30/30/0 | 6.38% | 0.27% | Yes | mps | | 2025-12-02 11:02:45 | `openai/whisper-large-v3-turbo` | 4.03% | 30/30/0 | 5.47% | 1.62% | Yes | mps | | 2025-12-02 14:24:32 | `Trelis/transcribe-en_us-spelling-v1-turbo` | 1.25% | 30/30/0 | 6.84% | 0.07% | Yes | mps | ### LibriSpeech Performance LibriSpeech is used here as an independent check on the extent of degradation caused by fine-tuning. Notice how smaller models tend to degrade more when fine-tuned. Note that there is no evidence of degradation on the turbo model: **Dataset:** `openslr/librispeech_asr` **Config:** `other` **Split:** `test` **Text Column:** `text` | Timestamp | Model | WER % | Samples (Eval/Total/Skipped) | US→GB % | GB→US % | Normalized | Device | |-----------|-------|-------|------------------------------|---------|---------|------------|--------| | 2025-12-02 09:27:52 | `openai/whisper-tiny` | 11.62% | 50/50/0 | 0.00% | 0.00% | Yes | mps | | 2025-12-02 12:17:18 | `Trelis/transcribe-en_gb-spelling-v1-tiny` | 13.18% | 50/50/0 | 0.00% | 0.00% | Yes | mps | | 2025-12-02 13:44:04 | `Trelis/transcribe-en_us-spelling-v1-tiny` | 12.40% | 50/50/0 | 0.00% | 0.00% | Yes | mps | | 2025-11-27 13:23:00 | `openai/whisper-large-v3-turbo` | 4.47% | 50/50/0 | 0.00% | 0.00% | Yes | mps | | 2025-12-02 13:24:33 | `Trelis/transcribe-en_gb-spelling-v1-turbo` | 4.02% | 50/50/0 | 0.00% | 0.00% | Yes | mps | | 2025-12-02 14:37:54 | `Trelis/transcribe-en_us-spelling-v1-turbo` | 4.13% | 50/50/0 | 0.00% | 0.00% | Yes | mps | ## Inference ### Quick Demo (3 samples) Copy/paste to transcribe the first three rows from a HuggingFace dataset with `Trelis/transcribe-en_gb-spelling-v1-tiny`: ```bash uv run --isolated --with transformers --with 'datasets<3.0' --with soundfile --with librosa --with torchaudio python - <<'PY' from datasets import load_dataset from transformers import pipeline DATASET_ID = "Trelis/transcribe-to-en_GB-v1" MODEL_ID = "Trelis/transcribe-en_gb-spelling-v1-tiny" print(f"Loading dataset: {DATASET_ID} (first 3 rows)") dataset = load_dataset(DATASET_ID, split="test[:3]") print(f"Loading ASR model: {MODEL_ID}") asr = pipeline("automatic-speech-recognition", model=MODEL_ID, return_timestamps="word") for idx, sample in enumerate(dataset): audio = sample["audio"] transcription = asr( {"array": audio["array"], "sampling_rate": audio["sampling_rate"]} ) print(f"\nSample {idx + 1}") print(f" Reference: {sample.get('text')}") print(f" Transcript: {transcription['text']}") PY ``` Make sure you have Hugging Face access to both the dataset and model (`huggingface-cli login`). **Transcribe your own audio (`/path/to/audio.wav`):** ```bash uv run --isolated --with transformers --with 'datasets<3.0' --with soundfile --with librosa --with torchaudio python - <<'PY' from transformers import pipeline import torchaudio MODEL_ID = "Trelis/transcribe-en_gb-spelling-v1-tiny" audio_path = "/path/to/audio.wav" # change me audio, sr = torchaudio.load(audio_path) asr = pipeline("automatic-speech-recognition", model=MODEL_ID, return_timestamps="word") result = asr({"array": audio.squeeze().numpy(), "sampling_rate": sr}) print(f"Transcript: {result['text']}") PY ``` ### Bulk README Uploads Render/push README files for multiple repos listed in `model_info/readme_targets.yaml`: ```bash # Preview rendered files in model_info/generated_readmes/ uv run --with pyyaml --with huggingface_hub python model_info/push_readmes.py # Push READMEs to HuggingFace Hub (requires huggingface-cli login) uv run --with pyyaml --with huggingface_hub python model_info/push_readmes.py --push ``` Each entry in `readme_targets.yaml` may optionally override `base_model` and `stripe_link`. `transcribe-en_gb-spelling-v1-tiny` is auto-derived from the slug; defaults exist for `tiny`, `small`, and `turbo` tiers. ### Server Inference For guidance on inference, see [this video](https://www.youtube.com/watch?v=qXtPPgujufI). CTranslate2 and Faster Whisper is recommended if you wish to operate a server. You can modify [this](https://console.runpod.io/deploy?template=v7xyt1e57i&ref=jmfkcdio) one-click Runpod affiliate link to get started quickly. ## Further Support - For model-specific questions create a post under "Community" on the repo card. - For support with custom fine-tunings, see [trelis.com/ADVANCED-audio](trelis.com/ADVANCED-transcription) OR for deeper support book a session [here](https://trelis.com/corporate-product-llm-review/). ## Jobs Trelis is hiring a part-time developer on contract to assist with model development. Apply [here](https://forms.gle/KMj6zHjiuidn4Zr89). ### License & Usage (Trelis Transcribe v1 Models) `Tiny` models are open for commercial use under the MIT License. `Turbo` models are commercially licensed and: - Available for purchase by *individuals or small organisations* under a basic license. - Available for licensing for *larger organisations* [here](https://forms.gle/wMTBDmiLxBwMdHQH7). > Small orgs are defined as entities with less than $1M revenue across all of their products/services over the last year AND less than 25 employees. ## Basic License Details (for individuals + small orgs) Purchase gives an individual or small organisation a **lifetime license to v1**. Future major versions (v2, v3, …) may be sold separately. You may: - Use the model for **personal, academic, and research** projects. - Use it for **internal transcription** (meetings, calls, training, docs, etc.). - Use it **inside your own products and services** (SaaS, apps, internal tools). - Run it **on your own servers or embedded in your app** (desktop / mobile / edge), so users transcribe audio *through your app*. - **Fine-tune** the model for your own internal or product use. You **may not**: - **Redistribute** the original or fine-tuned weights - e.g. upload to other model hubs, share checkpoints, ship raw model files to clients. - Offer a **general-purpose STT service for other developers or companies** - e.g. “we sell an STT API anyone can build on” using these weights as the core engine. - **Resell or rebrand** the model itself (weights as a product). On-device use is fine **only** as an internal component of your app. Users get features, not reusable model files. ### Bigger / infrastructure use If you: - Are above the size threshold above, or - Want to offer speech-to-text as a **general-purpose API/service**, or - Need rights to **redistribute original or fine-tuned weights**, or - Want access to **larger model sizes** (e.g. fine-tunes of Whisper Large v3), or - Want **support / SLAs / early access to future versions** Kindly describe your use case **[here](https://forms.gle/wMTBDmiLxBwMdHQH7)** and I will respond promptly.