File size: 1,163 Bytes
4290101 3945127 4290101 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
# Sylber
This is official implementation of [Sylber: Syllabic Embedding Representation of Speech from Raw Audio](https://arxiv.org/abs/2410.07168).
Sylber is the first of its kind that yields extremely short tokens from raw audio (on average, 4.27 tokens/sec) through dynamic tokenization at the syllable granularity.
The model is developed and trained by Berkeley Speech Group.
## Installation
The model can be installed through pypi for inference.
```
pip install sylber
```
### Usage
```python
from sylber import Segmenter
# Loading Sylber
segmenter = Segmenter(model_ckpt="sylber")
# Run Sylber
wav_file = "samples/sample.wav"
outputs = segmenter(wav_file, in_second=True) # in_second can be False to output segments in frame numbers.
# outputs = {"segments": numpy array of [start, end] of segment,
# "segment_features": numpy array of segment-averaged features,
# "hidden_states": numpy array of raw features used for segmentation.
```
### Training
Please check [https://github.com/Berkeley-Speech-Group/sylber](https://github.com/Berkeley-Speech-Group/sylber) for training the model.
---
license: apache-2.0
---
|