Update README.md
Browse files
README.md
CHANGED
|
@@ -33,13 +33,13 @@ The model was trained by next-token prediction over a subset of LibriSpeech, Lib
|
|
| 33 |
|
| 34 |
### Model Sources
|
| 35 |
|
| 36 |
-
- **Repository:** [https://github.com/slp-rl/
|
| 37 |
- **Paper:** [Soon!]
|
| 38 |
- **Demo:** [Link](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/)
|
| 39 |
|
| 40 |
## Uses
|
| 41 |
-
This is a base SpeechLM and as such can be used to generate contiuations for speech segments, or as base for further tuning. See the
|
| 42 |
-
[codebase](https://github.com/slp-rl/
|
| 43 |
|
| 44 |
### Out-of-Scope Use
|
| 45 |
This model was trained on curated speech datasets which contain mainly audio-books and stories, as such the outputs should not be treated as factual in any way.
|
|
@@ -47,7 +47,7 @@ This model was trained on curated speech datasets which contain mainly audio-boo
|
|
| 47 |
|
| 48 |
|
| 49 |
## How to Get Started with the Model
|
| 50 |
-
We refer users to the official repository for full usage explainations - [github](https://github.com/slp-rl/
|
| 51 |
|
| 52 |
|
| 53 |
## Training Details
|
|
@@ -62,12 +62,12 @@ dataset [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
|
|
| 62 |
|
| 63 |
### Training Procedure
|
| 64 |
This model was trained by next token prediction over several dataset, and then trained with DPO over [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
|
| 65 |
-
Please refer to the [paper]() or [code](https://github.com/slp-rl/
|
| 66 |
|
| 67 |
#### Preprocessing
|
| 68 |
Speech tokens are extracted from the audio using [Hubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz), and quantised using the
|
| 69 |
official kmeans released with the model in [textlesslib](https://github.com/facebookresearch/textlesslib/tree/main). Units are de-duplicated.
|
| 70 |
-
We encourage you to explore the official repository for full details - [github](https://github.com/slp-rl/
|
| 71 |
|
| 72 |
|
| 73 |
## Evaluation
|
|
@@ -92,7 +92,7 @@ This model was trained as part of ["*Slamming*: Training a Speech Language Model
|
|
| 92 |
This model was trained using **only a single Nvidia A5000 GPU**, 16 CPU cores and 24 GB of RAM for **24 hours**.
|
| 93 |
|
| 94 |
#### Software
|
| 95 |
-
The model was trained using the [*
|
| 96 |
easy and efficent training of Speech Language Models.
|
| 97 |
|
| 98 |
## Citation
|
|
|
|
| 33 |
|
| 34 |
### Model Sources
|
| 35 |
|
| 36 |
+
- **Repository:** [https://github.com/slp-rl/slamkit](https://github.com/slp-rl/slamkit)
|
| 37 |
- **Paper:** [Soon!]
|
| 38 |
- **Demo:** [Link](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/)
|
| 39 |
|
| 40 |
## Uses
|
| 41 |
+
This is a base SpeechLM and as such can be used to generate contiuations for speech segments, or as base for further tuning. See the _SlamKit_
|
| 42 |
+
[codebase](https://github.com/slp-rl/slamkit) for more details on usage, and checkout the [demo page](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/) for some generation examples
|
| 43 |
|
| 44 |
### Out-of-Scope Use
|
| 45 |
This model was trained on curated speech datasets which contain mainly audio-books and stories, as such the outputs should not be treated as factual in any way.
|
|
|
|
| 47 |
|
| 48 |
|
| 49 |
## How to Get Started with the Model
|
| 50 |
+
We refer users to the official repository for full usage explainations - [github](https://github.com/slp-rl/slamkit).
|
| 51 |
|
| 52 |
|
| 53 |
## Training Details
|
|
|
|
| 62 |
|
| 63 |
### Training Procedure
|
| 64 |
This model was trained by next token prediction over several dataset, and then trained with DPO over [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
|
| 65 |
+
Please refer to the [paper]() or [code](https://github.com/slp-rl/slamkit) for the full training recipes.
|
| 66 |
|
| 67 |
#### Preprocessing
|
| 68 |
Speech tokens are extracted from the audio using [Hubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz), and quantised using the
|
| 69 |
official kmeans released with the model in [textlesslib](https://github.com/facebookresearch/textlesslib/tree/main). Units are de-duplicated.
|
| 70 |
+
We encourage you to explore the official repository for full details - [github](https://github.com/slp-rl/slamkit).
|
| 71 |
|
| 72 |
|
| 73 |
## Evaluation
|
|
|
|
| 92 |
This model was trained using **only a single Nvidia A5000 GPU**, 16 CPU cores and 24 GB of RAM for **24 hours**.
|
| 93 |
|
| 94 |
#### Software
|
| 95 |
+
The model was trained using the [*SlamKit*](https://github.com/slp-rl/slamkit) codebase which builds upon 🤗transformers extending it to support
|
| 96 |
easy and efficent training of Speech Language Models.
|
| 97 |
|
| 98 |
## Citation
|