|
|
--- |
|
|
library_name: transformers |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
# ContentVec |
|
|
|
|
|
The ContentVec model in safetensors format, compatible with HuggingFace Transformers. |
|
|
|
|
|
## Uses |
|
|
|
|
|
To extract features, use the following code: |
|
|
|
|
|
```python |
|
|
from transformers import AutoProcessor, HubertModel |
|
|
import librosa |
|
|
|
|
|
# Load the processor and model |
|
|
processor = AutoProcessor.from_pretrained("safe-models/ContentVec") |
|
|
hubert = HubertModel.from_pretrained("safe-models/ContentVec") |
|
|
|
|
|
# Read the audio |
|
|
audio, sr = librosa.load("test.wav", sr=16000) |
|
|
input_values = processor(audio, sampling_rate=sr, return_tensors="pt").input_values |
|
|
|
|
|
# Get the layer 12 output as the feature |
|
|
feats = hubert(input_values, output_hidden_states=True)["hidden_states"][12] |
|
|
print(f"{feats.shape=}") |
|
|
``` |
|
|
|