Latxa-Omni
Collection
All the models used for Latxa-Omni
•
5 items
•
Updated
This repository provides a fine-tuned mHubert (Multilingual HuBERT) model specifically optimized for the Basque language. It is designed to transform raw audio signals into discrete unit sequences, which serve as a compact, symbolic representation of speech.
The model extracts high-level acoustic and phonetic features from the 9th transformer layer (Layer 9). These features are then quantized using a KMeans model with 1000 clusters. This representation is widely used in generative speech research, including unit-based Vocoders.
| Feature | Detail |
|---|---|
| Sampling Rate | 16,000 Hz |
| Transformer Layers | 12 |
| Feature Layer | 9 |
| Vocabulary Size | 1000 units |
| Language | Basque (Euskara) |
To extract discrete units from an audio file, you will need transformers, torch, torchaudio, and joblib.
pip install torch torchaudio transformers joblib huggingface_hub
from huggingface_hub import hf_hub_download
import joblib
from transformers import Wav2Vec2Processor, HubertModel
from torchaudio import load
import torch
hf_hub_download(repo_id="Ansu/mHubert-basque-k1000-L9", filename="kmeans/basque_hubert_k1000_L9.pt", local_dir="./")
kmeans = joblib.load("kmeans/basque_hubert_k1000_L9.pt")
model_name = "Ansu/mHubert-basque-k1000-L9"
processor = Wav2Vec2Processor.from_pretrained(model_name)
model = HubertModel.from_pretrained(model_name)
model.eval()
audio = load("path/to/audio")[0]
audio = audio.squeeze(0)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt", padding=True)
with torch.no_grad():
out = model(**inputs, output_hidden_states=True)
features = out.hidden_states[9].squeeze(0).numpy()
units = kmeans.predict(features)
Base model
utter-project/mHuBERT-147