Ansu commited on
Commit
61fdf67
·
verified ·
1 Parent(s): 2b7c9a4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -0
README.md CHANGED
@@ -1,3 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ```
2
 
3
  from huggingface_hub import hf_hub_download
 
1
+ ---
2
+ datasets:
3
+ - asierhv/composite_corpus_eu_v2.1
4
+ ---
5
+ # mHubert Basque Discrete Units (k=1000, L9)
6
+
7
+ ## Model Summary
8
+ This repository provides a fine-tuned **mHubert** (Multilingual HuBERT) model specifically optimized for the **Basque language**. It is designed to transform raw audio signals into discrete unit sequences, which serve as a compact, symbolic representation of speech.
9
+
10
+ The model extracts high-level acoustic and phonetic features from the **9th transformer layer** (Layer 9). These features are then quantized using a KMeans model with **1000 clusters**. This representation is widely used in generative speech research, including unit-based Vocoders.
11
+
12
+ ## Key Features
13
+ * **Base Model**: mHubert (Multilingual HuBERT) fine-tuned for Basque.
14
+ * **Quantization**: KMeans with $k=1000$ clusters.
15
+ * **Extraction Layer**: Layer 9 (L9).
16
+ * **Input**: 16 kHz Basque speech audio.
17
+ * **Output**: 1D sequence of discrete unit IDs (indices 0–999).
18
+ * **Primary Use Case**: Speech discretization for generative modeling and unit-based synthesis.
19
+
20
+ ## Technical Specifications
21
+ | Feature | Detail |
22
+ | :--- | :--- |
23
+ | **Sampling Rate** | 16,000 Hz |
24
+ | **Transformer Layers** | 12 |
25
+ | **Feature Layer** | 9 |
26
+ | **Vocabulary Size** | 1000 units |
27
+ | **Language** | Basque (Euskara) |
28
+
29
+ ## How to Use
30
+
31
+ To extract discrete units from an audio file, you will need `transformers`, `torch`, `torchaudio`, and `joblib`.
32
+
33
+ ### Installation
34
+ ```bash
35
+ pip install torch torchaudio transformers joblib huggingface_hub
36
+
37
+ ```
38
+
39
+ ### Inference
40
+
41
  ```
42
 
43
  from huggingface_hub import hf_hub_download