cheoljun95
/

sylber

Model card Files Files and versions

sylber / README.md

cheoljun95's picture

Update README.md

3945127 verified 10 months ago

|

history blame contribute delete

1.16 kB

	# Sylber

	This is official implementation of [Sylber: Syllabic Embedding Representation of Speech from Raw Audio](https://arxiv.org/abs/2410.07168).

	Sylber is the first of its kind that yields extremely short tokens from raw audio (on average, 4.27 tokens/sec) through dynamic tokenization at the syllable granularity.

	The model is developed and trained by Berkeley Speech Group.


	## Installation

	The model can be installed through pypi for inference.

	```
	pip install sylber
	```

	### Usage

	```python

	from sylber import Segmenter

	# Loading Sylber
	segmenter = Segmenter(model_ckpt="sylber")


	# Run Sylber
	wav_file = "samples/sample.wav"

	outputs = segmenter(wav_file, in_second=True) # in_second can be False to output segments in frame numbers.

	# outputs = {"segments": numpy array of [start, end] of segment,
	# "segment_features": numpy array of segment-averaged features,
	# "hidden_states": numpy array of raw features used for segmentation.
	```


	### Training

	Please check [https://github.com/Berkeley-Speech-Group/sylber](https://github.com/Berkeley-Speech-Group/sylber) for training the model.

	---
	license: apache-2.0
	---