ayousanz
/

piper-plus-base

Model card Files Files and versions

piper-plus-base / README.md

ayousanz's picture

Update README.md

2973145 verified 6 days ago

|

history blame contribute delete

1.9 kB

	---
	license: cc-by-sa-4.0
	language:
	- ja
	---
	# Piper Plus Base Model (Japanese)

	日本語TTS用の事前学習済みベースモデルです。このモデルは単一話者のファインチューニング用に最適化されています。

	## Model Details

	\| 項目 \| 値 \|
	\|------\|-----\|
	\| アーキテクチャ \| VITS \|
	\| 言語 \| 日本語 (ja) \|
	\| サンプルレート \| 22050 Hz \|
	\| 品質 \| medium \|
	\| 音素タイプ \| OpenJTalk \|
	\| 話者数 \| 0 (単一話者用) \|

	## 使用方法

	### ファインチューニング

	このベースモデルを使用して、新しい話者の音声でファインチューニングできます。

	#### 1. データセットの前処理

	```bash
	uv run python -m piper_train.preprocess \
	--input-dir /path/to/your-ljspeech-data \
	--output-dir /path/to/dataset \
	--language ja \
	--dataset-format ljspeech \
	--sample-rate 22050 \
	--single-speaker \
	--phoneme-type openjtalk
	```

	#### 2. ファインチューニングの実行

	```bash
	uv run python -m piper_train \
	--dataset-dir /path/to/dataset \
	--accelerator gpu \
	--devices 1 \
	--precision 16-mixed \
	--max_epochs 50 \
	--batch-size 32 \
	--checkpoint-epochs 1 \
	--base_lr 1e-4 \
	--disable_auto_lr_scaling \
	--resume_from_checkpoint /path/to/model.ckpt \
	--default_root_dir /path/to/output
	```

	### 推奨パラメータ

	\| パラメータ \| 値 \| 説明 \|
	\|-----------\|-----\|------\|
	\| `--base_lr` \| 1e-4 \| 事前学習より低い学習率（過学習防止） \|
	\| `--disable_auto_lr_scaling` \| - \| 学習率の自動スケーリングを無効化 \|
	\| `--max_epochs` \| 50-100 \| 少量データの場合は短め \|
	\| `--batch-size` \| 32 \| GPUメモリに応じて調整 \|

	## Citation

	```bibtex
	@software{piper_plus,
	title = {Piper Plus: Japanese TTS with VITS},
	author = {ayousanz},
	year = {2024},
	url = {https://github.com/ayousanz/piper}
	}
	```