| # Sylber | |
| This is official implementation of [Sylber: Syllabic Embedding Representation of Speech from Raw Audio](https://arxiv.org/abs/2410.07168). | |
| Sylber is the first of its kind that yields extremely short tokens from raw audio (on average, 4.27 tokens/sec) through dynamic tokenization at the syllable granularity. | |
| The model is developed and trained by Berkeley Speech Group. | |
| ## Installation | |
| The model can be installed through pypi for inference. | |
| ``` | |
| pip install sylber | |
| ``` | |
| ### Usage | |
| ```python | |
| from sylber import Segmenter | |
| # Loading Sylber | |
| segmenter = Segmenter(model_ckpt="sylber") | |
| # Run Sylber | |
| wav_file = "samples/sample.wav" | |
| outputs = segmenter(wav_file, in_second=True) # in_second can be False to output segments in frame numbers. | |
| # outputs = {"segments": numpy array of [start, end] of segment, | |
| # "segment_features": numpy array of segment-averaged features, | |
| # "hidden_states": numpy array of raw features used for segmentation. | |
| ``` | |
| ### Training | |
| Please check [https://github.com/Berkeley-Speech-Group/sylber](https://github.com/Berkeley-Speech-Group/sylber) for training the model. | |
| --- | |
| license: apache-2.0 | |
| --- | |