Atotti
/

miipher-2-HuBERT-HiFi-GAN-v0.1

Japanese

miipher2

Model card Files Files and versions

xet

Community

Atotti commited on Jul 19, 2025

Commit

36aa94c

verified ·

1 Parent(s): c0d0575

Upload Miipher-2 complete model (Adapter + Vocoder)

Browse files

Files changed (2) hide show

README.md +93 -3
config.json +24 -9

README.md CHANGED Viewed

@@ -1,3 +1,93 @@
----
-license: apache-2.0
----

+# Miipher-2: Speech Enhancement Model
+Complete speech enhancement system consisting of a Parallel Adapter and Lightning SSL-Vocoder.
+## Model Components
+### 1. Parallel Adapter
+- **Architecture**: Lightweight feedforward network inserted into mHuBERT-147
+- **Target Layer**: Layer 6
+- **Hidden Dimension**: 768
+- **Training Steps**: 199k
+- **File**: `checkpoint_199k_fixed.pt`
+### 2. Lightning SSL-Vocoder
+- **Architecture**: HiFi-GAN based vocoder with PyTorch Lightning
+- **Input**: SSL features from enhanced mHuBERT
+- **Output**: High-quality audio at 22050Hz
+- **Training**: 77 epochs, 137108 steps
+- **File**: `epoch=77-step=137108.ckpt`
+## Usage
+```python
+import torch
+from omegaconf import DictConfig
+from miipher_2.model.feature_cleaner import FeatureCleaner
+from miipher_2.lightning_vocoders.lightning_module import HiFiGANLightningModule
+from huggingface_hub import hf_hub_download
+# Download model files
+adapter_path = hf_hub_download(
+    repo_id="YOUR_USERNAME/miipher2",
+    filename="checkpoint_199k_fixed.pt"
+)
+vocoder_path = hf_hub_download(
+    repo_id="YOUR_USERNAME/miipher2",
+    filename="epoch=77-step=137108.ckpt"
+)
+# Load models
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+# Feature Cleaner (Adapter)
+config = DictConfig({
+    "hubert_model_name": "utter-project/mHuBERT-147",
+    "hubert_layer": 6,
+    "adapter_hidden_dim": 768
+})
+cleaner = FeatureCleaner(config).to(device).eval()
+checkpoint = torch.load(adapter_path, map_location=device, weights_only=False)
+cleaner.load_state_dict(checkpoint["model_state_dict"])
+# Vocoder
+vocoder = HiFiGANLightningModule.load_from_checkpoint(
+    vocoder_path, map_location=device
+).to(device).eval()
+# Inference
+with torch.inference_mode():
+    # Extract and clean features
+    enhanced_features = cleaner(input_audio)
+    # Generate audio
+    batch = {"input_feature": enhanced_features.transpose(1, 2)}
+    restored_audio = vocoder.generator_forward(batch)
+```
+## Model Performance
+- **Target**: Speech enhancement from noisy/degraded audio
+- **Training Data**: Japanese Voice Speech corpus (JVS) and multilingual datasets
+- **Evaluation**: Improved speech quality metrics (STOI, PESQ, etc.)
+## Files
+- `checkpoint_199k_fixed.pt` (442MB) - Parallel Adapter weights
+- `epoch=77-step=137108.ckpt` (1.2GB) - Lightning SSL-Vocoder weights
+- `config.json` - Model configuration and metadata
+## Citation
+```bibtex
+@article{miipher2,
+  title={Miipher-2: Speech Enhancement with Parallel Adapters},
+  author={Miipher-2 Team},
+  year={2024}
+}
+```
+## License
+Apache-2.0

config.json CHANGED Viewed

@@ -1,12 +1,27 @@
 {
-  "model": {
-    "hubert_model_name": "utter-project/mHuBERT-147",
-    "hubert_layer": 6,
-    "adapter_hidden_dim": 768
   },
-  "files": {
-    "adapter_checkpoint": "checkpoint_199k_fixed.pt",
-    "vocoder_checkpoint": "epoch=77-step=137108.ckpt"
-  },
-  "output_sampling_rate": 22050
 }

 {
+  "model_type": "miipher2",
+  "architecture": "speech_enhancement",
+  "components": {
+    "adapter": {
+      "architecture": "parallel_adapter",
+      "base_model": "utter-project/mHuBERT-147",
+      "hubert_layer": 6,
+      "adapter_hidden_dim": 768,
+      "checkpoint_file": "checkpoint_199k_fixed.pt",
+      "training_steps": "199k"
+    },
+    "vocoder": {
+      "architecture": "lightning_ssl_vocoder",
+      "base_architecture": "hifigan",
+      "checkpoint_file": "epoch=77-step=137108.ckpt",
+      "training_epoch": 77,
+      "training_step": 137108
+    }
   },
+  "model_description": "Miipher-2: Complete speech enhancement system with Parallel Adapter and SSL-Vocoder",
+  "output_sampling_rate": 22050,
+  "version": "1.0.0",
+  "paper": "Miipher-2: Speech Enhancement with Parallel Adapters",
+  "license": "Apache-2.0",
+  "authors": "Miipher-2 Team"
 }