Atotti commited on
Commit
36aa94c
·
verified ·
1 Parent(s): c0d0575

Upload Miipher-2 complete model (Adapter + Vocoder)

Browse files
Files changed (2) hide show
  1. README.md +93 -3
  2. config.json +24 -9
README.md CHANGED
@@ -1,3 +1,93 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Miipher-2: Speech Enhancement Model
2
+
3
+ Complete speech enhancement system consisting of a Parallel Adapter and Lightning SSL-Vocoder.
4
+
5
+ ## Model Components
6
+
7
+ ### 1. Parallel Adapter
8
+ - **Architecture**: Lightweight feedforward network inserted into mHuBERT-147
9
+ - **Target Layer**: Layer 6
10
+ - **Hidden Dimension**: 768
11
+ - **Training Steps**: 199k
12
+ - **File**: `checkpoint_199k_fixed.pt`
13
+
14
+ ### 2. Lightning SSL-Vocoder
15
+ - **Architecture**: HiFi-GAN based vocoder with PyTorch Lightning
16
+ - **Input**: SSL features from enhanced mHuBERT
17
+ - **Output**: High-quality audio at 22050Hz
18
+ - **Training**: 77 epochs, 137108 steps
19
+ - **File**: `epoch=77-step=137108.ckpt`
20
+
21
+ ## Usage
22
+
23
+ ```python
24
+ import torch
25
+ from omegaconf import DictConfig
26
+ from miipher_2.model.feature_cleaner import FeatureCleaner
27
+ from miipher_2.lightning_vocoders.lightning_module import HiFiGANLightningModule
28
+ from huggingface_hub import hf_hub_download
29
+
30
+ # Download model files
31
+ adapter_path = hf_hub_download(
32
+ repo_id="YOUR_USERNAME/miipher2",
33
+ filename="checkpoint_199k_fixed.pt"
34
+ )
35
+ vocoder_path = hf_hub_download(
36
+ repo_id="YOUR_USERNAME/miipher2",
37
+ filename="epoch=77-step=137108.ckpt"
38
+ )
39
+
40
+ # Load models
41
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
42
+
43
+ # Feature Cleaner (Adapter)
44
+ config = DictConfig({
45
+ "hubert_model_name": "utter-project/mHuBERT-147",
46
+ "hubert_layer": 6,
47
+ "adapter_hidden_dim": 768
48
+ })
49
+
50
+ cleaner = FeatureCleaner(config).to(device).eval()
51
+ checkpoint = torch.load(adapter_path, map_location=device, weights_only=False)
52
+ cleaner.load_state_dict(checkpoint["model_state_dict"])
53
+
54
+ # Vocoder
55
+ vocoder = HiFiGANLightningModule.load_from_checkpoint(
56
+ vocoder_path, map_location=device
57
+ ).to(device).eval()
58
+
59
+ # Inference
60
+ with torch.inference_mode():
61
+ # Extract and clean features
62
+ enhanced_features = cleaner(input_audio)
63
+
64
+ # Generate audio
65
+ batch = {"input_feature": enhanced_features.transpose(1, 2)}
66
+ restored_audio = vocoder.generator_forward(batch)
67
+ ```
68
+
69
+ ## Model Performance
70
+
71
+ - **Target**: Speech enhancement from noisy/degraded audio
72
+ - **Training Data**: Japanese Voice Speech corpus (JVS) and multilingual datasets
73
+ - **Evaluation**: Improved speech quality metrics (STOI, PESQ, etc.)
74
+
75
+ ## Files
76
+
77
+ - `checkpoint_199k_fixed.pt` (442MB) - Parallel Adapter weights
78
+ - `epoch=77-step=137108.ckpt` (1.2GB) - Lightning SSL-Vocoder weights
79
+ - `config.json` - Model configuration and metadata
80
+
81
+ ## Citation
82
+
83
+ ```bibtex
84
+ @article{miipher2,
85
+ title={Miipher-2: Speech Enhancement with Parallel Adapters},
86
+ author={Miipher-2 Team},
87
+ year={2024}
88
+ }
89
+ ```
90
+
91
+ ## License
92
+
93
+ Apache-2.0
config.json CHANGED
@@ -1,12 +1,27 @@
1
  {
2
- "model": {
3
- "hubert_model_name": "utter-project/mHuBERT-147",
4
- "hubert_layer": 6,
5
- "adapter_hidden_dim": 768
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  },
7
- "files": {
8
- "adapter_checkpoint": "checkpoint_199k_fixed.pt",
9
- "vocoder_checkpoint": "epoch=77-step=137108.ckpt"
10
- },
11
- "output_sampling_rate": 22050
 
12
  }
 
1
  {
2
+ "model_type": "miipher2",
3
+ "architecture": "speech_enhancement",
4
+ "components": {
5
+ "adapter": {
6
+ "architecture": "parallel_adapter",
7
+ "base_model": "utter-project/mHuBERT-147",
8
+ "hubert_layer": 6,
9
+ "adapter_hidden_dim": 768,
10
+ "checkpoint_file": "checkpoint_199k_fixed.pt",
11
+ "training_steps": "199k"
12
+ },
13
+ "vocoder": {
14
+ "architecture": "lightning_ssl_vocoder",
15
+ "base_architecture": "hifigan",
16
+ "checkpoint_file": "epoch=77-step=137108.ckpt",
17
+ "training_epoch": 77,
18
+ "training_step": 137108
19
+ }
20
  },
21
+ "model_description": "Miipher-2: Complete speech enhancement system with Parallel Adapter and SSL-Vocoder",
22
+ "output_sampling_rate": 22050,
23
+ "version": "1.0.0",
24
+ "paper": "Miipher-2: Speech Enhancement with Parallel Adapters",
25
+ "license": "Apache-2.0",
26
+ "authors": "Miipher-2 Team"
27
  }