Upload folder using huggingface_hub
Browse files- README.md +41 -0
- models/OLMoASR-base.en.pt +3 -0
- models/OLMoASR-large.en-v2.pt +3 -0
- models/OLMoASR-large.en.pt +3 -0
- models/OLMoASR-medium.en.pt +3 -0
- models/OLMoASR-small.en.pt +3 -0
- models/OLMoASR-tiny.en.pt +3 -0
README.md
ADDED
|
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
---
|
| 4 |
+
# OLMoASR
|
| 5 |
+
|
| 6 |
+
OLMoASR is a series of English automatic speech recognition (ASR) models proposed in the [OLMoASR: Open Models and Data for Training Robust Speech Recognition Models](https://github.com/allenai/OLMoASR.git)
|
| 7 |
+
paper by Huong Ngo et al. from Ai2. Trained on 440K hours of weakly-supervised audio-text pairs collected from the public internet, OLMoASR demonstrates strong robustness and zero-shot capabilities. Visit the
|
| 8 |
+
[OLMoASR repository](https://github.com/allenai/OLMoASR.git) for access to data processing, training and evaluation code.
|
| 9 |
+
|
| 10 |
+
# Model Details
|
| 11 |
+
OLMoASR uses a Transformer-based encoder-decoder architecture and is an audio language model (LM), where there is an audio encoder and language decoder.
|
| 12 |
+
OLMoASR has 5 different model sizes and all checkpoints are trained with English-only data. Below is a table enumerating the different model sizes and associated parameter count.
|
| 13 |
+
|
| 14 |
+
| Size | Parameters |
|
| 15 |
+
|-----------|------------|
|
| 16 |
+
| tiny | 39 M |
|
| 17 |
+
| base | 74 M |
|
| 18 |
+
| small | 244 M |
|
| 19 |
+
| medium | 769 M |
|
| 20 |
+
| large | 1.5 B |
|
| 21 |
+
| large-v2 | 1.5 B |
|
| 22 |
+
|
| 23 |
+
# Training Data
|
| 24 |
+
OLMoASR is trained on 440K hours of weakly-supervised data subsampled from OLMoASR-Mix, a filtered version of [OLMoASR-Pool](link).
|
| 25 |
+
OLMoASR-Mix is a collection 1M hours of audio-text pairs, curated from the 3M hours of OLMoASR-Pool.
|
| 26 |
+
|
| 27 |
+
# Usage
|
| 28 |
+
|
| 29 |
+
To perform transcription, you can run
|
| 30 |
+
```
|
| 31 |
+
import olmoasr
|
| 32 |
+
|
| 33 |
+
model = olmoasr.load_model("medium", inference=True)
|
| 34 |
+
result = model.transcribe("audio.mp3")
|
| 35 |
+
print(result)
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
# Evaluation
|
| 39 |
+
To perform evaluation, you can visit the [OLMoASR repository](https://github.com/allenai/OLMoASR.git) for more details.
|
| 40 |
+
|
| 41 |
+
# BibTeX entry and citation info
|
models/OLMoASR-base.en.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:372a2e38ba0027b0532ec95d56782e629b1c24341fabb30f3b932734ec62add1
|
| 3 |
+
size 865331319
|
models/OLMoASR-large.en-v2.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3ba003fddc3a69bf091a8561a21f1191ffce7341fede063d133641d142a18c92
|
| 3 |
+
size 6173813259
|
models/OLMoASR-large.en.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:316ab6b0bd92943007842d1e4a7208445eb21804ecd068fcc8c64d5949c31aea
|
| 3 |
+
size 6173813259
|
models/OLMoASR-medium.en.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2ab3292b21b9eae1fa5e2e58dea1e467fca9280df8f1759e98144a699f036c8c
|
| 3 |
+
size 3055854069
|
models/OLMoASR-small.en.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bcbe6b79cf5bb089bff258ce84c61e731d5eac7ed838c50383e57668c32907f5
|
| 3 |
+
size 2892287207
|
models/OLMoASR-tiny.en.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:dc864d10e1030a4c702511a2571010cedc8e95abe7dd0359b03fd943b13ae628
|
| 3 |
+
size 448757399
|