RonanMcGovern commited on
Commit
2c3c14f
·
verified ·
1 Parent(s): 0f0f577

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +143 -13
README.md CHANGED
@@ -1,21 +1,151 @@
1
  ---
2
- base_model: openai/whisper-tiny
3
- tags:
4
- - text-generation-inference
5
- - transformers
6
- - unsloth
7
- - whisper
8
- license: apache-2.0
9
  language:
10
  - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
- # Uploaded finetuned model
14
 
15
- - **Developed by:** Trelis
16
- - **License:** apache-2.0
17
- - **Finetuned from model :** openai/whisper-tiny
18
 
19
- This whisper model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 
 
 
 
 
20
 
21
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
1
  ---
 
 
 
 
 
 
 
2
  language:
3
  - en
4
+ metrics:
5
+ - wer
6
+ base_model:
7
+ - openai/whisper-tiny
8
+ pipeline_tag: automatic-speech-recognition
9
+ tags:
10
+ - whisper
11
+ - stt
12
+ - speech-to-text
13
+ - british-english
14
+ - american-english
15
+ - us-english
16
+ - gb-english
17
+ - asr
18
+ - automatic-speech-recognition
19
+ extra_gated_prompt: "Purchase access to this repo [HERE](https://buy.stripe.com/fZu28q99Ih2RaCN5s7fw42B)"
20
+ extra_gated_fields:
21
+ I have purchased a license (access will be granted once your payment clears): checkbox
22
+ I agree to the terms of the license described on the dataset card: checkbox
23
  ---
24
+ # transcribe-GB-spelling-v1-tiny
25
+ **Specialty Speech-to-Text (Transcription / Automatic Speech Recognition) Model**
26
+
27
+ > This is the first release of this model. Performance results are shown below. Report any errors by making a post under Community on the model repo card, to be fixed in future releases.
28
+
29
+ For all available models, see [this HuggingFace collection](https://hf.co/collections/RonanMcGovern/specialty-voice-models). For ctranslate2 variants (useful for Faster Whisper), add `-ctranslate2` to any model slug.
30
+
31
+ While training datasets are private, you can find the library for English variant conversions open sourced [here](https://github.com/TrelisResearch/whisper-english-variant-converter).
32
+
33
+ ## Background on Whisper English Variants
34
+ Whisper models disproportionately transcribe into US english, particularly when there are no obviously British english words (e.g. "rubbish" vs "trash" / "garbage").
35
+
36
+ Trelis British Spelling and American Spelling transcription models aim to make outputs uniformly follow either US or British spelling.
37
+
38
+ > Note that these models do not swap out different words with the same meaning, e.g. they will use the correct variant of colour vs color, but will not swap "trash" for "rubbish". For updates on such a model (the "lexical" variant), you can stay updated by subscribing on [trelis.substack.com]().
39
+
40
+ ## Performance
41
+ Trelis Transcribe models are fine-tunes of Whisper models.
42
+
43
+ Performance is compared on three metrics:
44
+ - Word Error Rate on two datasets: `LibriSpeech` and `Trelis/transcribe-to-en_GB-v1` or `Trelis/transcribe-to-en_US-v1`
45
+ - US -> GB %, i.e. percentage of the transcript that has American english words on `Trelis/transcribe-to-en_GB-v1` or `Trelis/transcribe-to-en_US-v1`
46
+ - GB -> US %, i.e. percentage of the transcript that has British english words on `Trelis/transcribe-to-en_GB-v1` or `Trelis/transcribe-to-en_US-v1`
47
+
48
+ US and GB percentages are measured deterministically via a [~6,000 list of exact matches](https://github.com/TrelisResearch/whisper-english-variant-converter) of British <-> American English word pairs.
49
+
50
+ Test datasets:
51
+ - `Trelis/transcribe-to-en_GB-v1` or `Trelis/transcribe-to-en_US-v1` - 30 rows of synthetic english voice data, evenly split across GB and US english source text, GB and US accents, male and female accents, mixed speeds, and with transcripts converted to British English spelling.
52
+ - `openslr/librispeech_asr` - 50 rows from the `test.other` split which contains mixed English language samples with high WER.
53
+
54
+ ## Inference
55
+ ### Quick Demo (3 samples)
56
+ Copy/paste to transcribe the first three rows from a HuggingFace dataset with `transcribe-GB-spelling-v1-tiny`:
57
+
58
+ ```bash
59
+ uv run --with datasets --with transformers --with torchaudio python - <<'PY'
60
+ from datasets import load_dataset
61
+ from transformers import pipeline
62
+
63
+ DATASET_ID = "Trelis/transcribe-to-en_GB-v1"
64
+ MODEL_ID = "Trelis/transcribe-british-spelling-v1-tiny"
65
+
66
+ print(f"Loading dataset: {DATASET_ID} (first 3 rows)")
67
+ dataset = load_dataset(DATASET_ID, split="test[:3]")
68
+
69
+ print(f"Loading ASR model: {MODEL_ID}")
70
+ asr = pipeline("automatic-speech-recognition", model=MODEL_ID, return_timestamps="word")
71
+
72
+ for idx, sample in enumerate(dataset):
73
+ audio = sample["audio"]
74
+ transcription = asr(
75
+ {"array": audio["array"], "sampling_rate": audio["sampling_rate"]}
76
+ )
77
+ print(f"\nSample {idx + 1}")
78
+ print(f" Reference: {sample.get('text')}")
79
+ print(f" Transcript: {transcription['text']}")
80
+ PY
81
+ ```
82
+
83
+ Make sure you have Hugging Face access to both the dataset and model (`huggingface-cli login`).
84
+
85
+ **Transcribe your own audio (`/path/to/audio.wav`):**
86
+ ```bash
87
+ uv run --with transformers --with torchaudio python - <<'PY'
88
+ from transformers import pipeline
89
+ import torchaudio
90
+
91
+ MODEL_ID = "transcribe-GB-spelling-v1-tiny"
92
+ audio_path = "/path/to/audio.wav" # change me
93
+
94
+ audio, sr = torchaudio.load(audio_path)
95
+ asr = pipeline("automatic-speech-recognition", model=MODEL_ID, return_timestamps="word")
96
+ result = asr({"array": audio.squeeze().numpy(), "sampling_rate": sr})
97
+ print(f"Transcript: {result['text']}")
98
+ PY
99
+ ```
100
+
101
+ ### Server Inference
102
+ For guidance on inference, see [this video](https://www.youtube.com/watch?v=qXtPPgujufI).
103
+
104
+ CTranslate2 and Faster Whisper is recommended if you wish to operate a server. You can modify [this](https://console.runpod.io/deploy?template=v7xyt1e57i&ref=jmfkcdio) one-click Runpod affiliate link to get started quickly.
105
+
106
+ ## Further Support
107
+ - For model-specific questions create a post under "Community" on the repo card.
108
+ - For support with custom fine-tunings, see [trelis.com/ADVANCED-audio](trelis.com/ADVANCED-transcription) OR for deeper support book a session [here](https://trelis.com/corporate-product-llm-review/).
109
+
110
+ ## Jobs
111
+ Trelis is hiring a part-time developer on contract to assist with model development. Apply [here](https://forms.gle/KMj6zHjiuidn4Zr89).
112
+
113
+ ### License & Usage (Trelis Transcribe v1 Models)
114
+
115
+ `Tiny` models are open for commercial use under the MIT License.
116
+
117
+ `Turbo` models are commercially licensed and:
118
+ - Available for purchase by *individuals or small organisations* under a basic license.
119
+ - Available for licensing for *larger organisations* [here](https://forms.gle/wMTBDmiLxBwMdHQH7).
120
+
121
+ > Small orgs are defined as entities with less than $1M revenue across all of their products/services over the last year AND less than 25 employees.
122
+
123
+ ### Basic License Details (for individuals + small orgs)
124
+ Purchase gives an individual or small organisation a **lifetime license to v1**. Future major versions (v2, v3, …) may be sold separately.
125
+
126
+ You may:
127
+ - Use the model for **personal, academic, and research** projects.
128
+ - Use it for **internal transcription** (meetings, calls, training, docs, etc.).
129
+ - Use it **inside your own products and services** (SaaS, apps, internal tools).
130
+ - Run it **on your own servers or embedded in your app** (desktop / mobile / edge), so users transcribe audio *through your app*.
131
+ - **Fine-tune** the model for your own internal or product use.
132
+
133
+ You **may not**:
134
+ - **Redistribute** the original or fine-tuned weights
135
+ - e.g. upload to other model hubs, share checkpoints, ship raw model files to clients.
136
+ - Offer a **general-purpose STT service for other developers or companies**
137
+ - e.g. “we sell an STT API anyone can build on” using these weights as the core engine.
138
+ - **Resell or rebrand** the model itself (weights as a product).
139
 
140
+ On-device use is fine **only** as an internal component of your app. Users get features, not reusable model files.
141
 
142
+ #### Bigger / infrastructure use
 
 
143
 
144
+ If you:
145
+ - Are above the size threshold above, or
146
+ - Want to offer speech-to-text as a **general-purpose API/service**, or
147
+ - Need rights to **redistribute original or fine-tuned weights**, or
148
+ - Want access to **larger model sizes** (e.g. fine-tunes of Whisper Large v3), or
149
+ - Want **support / SLAs / early access to future versions**
150
 
151
+ please briefly describe your use case **[here](https://forms.gle/wMTBDmiLxBwMdHQH7)** and I will respond promptly.