mHubert Basque Discrete Units (k=1000, L9)

Model Summary

This repository provides a fine-tuned mHubert (Multilingual HuBERT) model specifically optimized for the Basque language. It is designed to transform raw audio signals into discrete unit sequences, which serve as a compact, symbolic representation of speech.

The model extracts high-level acoustic and phonetic features from the 9th transformer layer (Layer 9). These features are then quantized using a KMeans model with 1000 clusters. This representation is widely used in generative speech research, including unit-based Vocoders.

Key Features

Base Model: mHubert (Multilingual HuBERT) fine-tuned for Basque.
Quantization: KMeans with $k=1000$ clusters.
Extraction Layer: Layer 9 (L9).
Input: 16 kHz Basque speech audio.
Output: 1D sequence of discrete unit IDs (indices 0–999).
Primary Use Case: Speech discretization for generative modeling and unit-based synthesis.

Technical Specifications

Feature	Detail
Sampling Rate	16,000 Hz
Transformer Layers	12
Feature Layer	9
Vocabulary Size	1000 units
Language	Basque (Euskara)

How to Use

To extract discrete units from an audio file, you will need transformers, torch, torchaudio, and joblib.

Installation

pip install torch torchaudio transformers joblib huggingface_hub

Inference


from huggingface_hub import hf_hub_download
import joblib
from transformers import Wav2Vec2Processor, HubertModel
from torchaudio import load
import torch

hf_hub_download(repo_id="Ansu/mHubert-basque-k1000-L9", filename="kmeans/basque_hubert_k1000_L9.pt", local_dir="./")

kmeans = joblib.load("kmeans/basque_hubert_k1000_L9.pt")

model_name = "Ansu/mHubert-basque-k1000-L9"
processor = Wav2Vec2Processor.from_pretrained(model_name)
model = HubertModel.from_pretrained(model_name)
model.eval()

audio = load("path/to/audio")[0]
audio = audio.squeeze(0)

inputs = processor(audio, sampling_rate=16000, return_tensors="pt", padding=True)

with torch.no_grad():
    out = model(**inputs, output_hidden_states=True)

features = out.hidden_states[9].squeeze(0).numpy()

units = kmeans.predict(features)

Acknowledgements

This work has been partially supported by the Basque Government (IKER-GAITU project), the Spanish Ministry for Digital Transformation and of Civil Service, and the EU-funded NextGenerationEU Recovery, Transformation and Resilience Plan (ILENIA project, 2022/TL-22/00215335 & ALIA).

Downloads last month: 5

Safetensors

Model size

94.4M params

Tensor type

F32

Model tree for Ansu/mHubert-basque-k1000-L9

Base model

utter-project/mHuBERT-147

Finetuned

(15)

this model

Collection including Ansu/mHubert-basque-k1000-L9

Latxa-Omni

Collection

All the models used for Latxa-Omni • 5 items • Updated Feb 6