Create README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,106 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
base_model:
|
| 4 |
+
- hfl/chinese-bert-wwm
|
| 5 |
+
---
|
| 6 |
+
# G2PWModel_zh-Hans
|
| 7 |
+
|
| 8 |
+
[中文](#中文) | [English](#english)
|
| 9 |
+
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
## 中文
|
| 13 |
+
|
| 14 |
+
### 简介
|
| 15 |
+
|
| 16 |
+
这是 G2PW (Grapheme-to-Phoneme for Word) 模型的重新训练版本,专门针对简体中文进行了优化。
|
| 17 |
+
|
| 18 |
+
### 主要优化
|
| 19 |
+
|
| 20 |
+
- **纯简体中文数据集**:此版本仅使用简体中文数据进行训练,提供更准确的简体中文发音预测
|
| 21 |
+
- **更快的推理速度**:推荐使用 [@baicai-1145/g2pw-torch](https://github.com/baicai-1145/g2pw-torch) 进行推理,速度比原版 ONNX 实现更快
|
| 22 |
+
|
| 23 |
+
### 模型信息
|
| 24 |
+
|
| 25 |
+
- **模型名称**: G2PWModel_zh-Hans
|
| 26 |
+
- **训练数据**: 简体中文语料
|
| 27 |
+
- **优化目标**: 提高简体中文字音转换准确率
|
| 28 |
+
|
| 29 |
+
### 使用方法
|
| 30 |
+
|
| 31 |
+
推荐使用 [g2pw-torch](https://github.com/baicai-1145/g2pw-torch) 进行推理:
|
| 32 |
+
|
| 33 |
+
```bash
|
| 34 |
+
pip install g2pw-torch
|
| 35 |
+
```
|
| 36 |
+
|
| 37 |
+
```python
|
| 38 |
+
from g2pw import G2PWConverter
|
| 39 |
+
|
| 40 |
+
converter = G2PWConverter(model_dir='baicai1145/G2PWModel_zh-Hans')
|
| 41 |
+
result = converter('我今天很开心')
|
| 42 |
+
print(result)
|
| 43 |
+
```
|
| 44 |
+
|
| 45 |
+
### 致谢
|
| 46 |
+
|
| 47 |
+
- **数据收集与整理**: [@TheSmallHanCat](https://huggingface.co/TheSmallHanCat)
|
| 48 |
+
- **模型训练**: [@baicai1145](https://huggingface.co/baicai1145)
|
| 49 |
+
|
| 50 |
+
### 相关项目
|
| 51 |
+
|
| 52 |
+
- [baicai-1145/g2pW](https://github.com/baicai-1145/g2pW) - 原始 G2PW 项目
|
| 53 |
+
- [baicai-1145/g2pw-torch](https://github.com/baicai-1145/g2pw-torch) - PyTorch 推理版本(推荐使用)
|
| 54 |
+
|
| 55 |
+
### 许可证
|
| 56 |
+
|
| 57 |
+
本项目遵循原项目的许可证协议。
|
| 58 |
+
|
| 59 |
+
---
|
| 60 |
+
|
| 61 |
+
## English
|
| 62 |
+
|
| 63 |
+
### Introduction
|
| 64 |
+
|
| 65 |
+
This is a retrained version of the G2PW (Grapheme-to-Phoneme for Word) model, specifically optimized for Simplified Chinese.
|
| 66 |
+
|
| 67 |
+
### Key Improvements
|
| 68 |
+
|
| 69 |
+
- **Simplified Chinese Only Dataset**: This version is trained exclusively on Simplified Chinese data, providing more accurate pronunciation predictions for Simplified Chinese
|
| 70 |
+
- **Faster Inference Speed**: We recommend using [@baicai-1145/g2pw-torch](https://github.com/baicai-1145/g2pw-torch) for inference, which is faster than the original ONNX implementation
|
| 71 |
+
|
| 72 |
+
### Model Information
|
| 73 |
+
|
| 74 |
+
- **Model Name**: G2PWModel_zh-Hans
|
| 75 |
+
- **Training Data**: Simplified Chinese corpus
|
| 76 |
+
- **Optimization Goal**: Improve accuracy of Simplified Chinese grapheme-to-phoneme conversion
|
| 77 |
+
|
| 78 |
+
### Usage
|
| 79 |
+
|
| 80 |
+
We recommend using [g2pw-torch](https://github.com/baicai-1145/g2pw-torch) for inference:
|
| 81 |
+
|
| 82 |
+
```bash
|
| 83 |
+
pip install g2pw-torch
|
| 84 |
+
```
|
| 85 |
+
|
| 86 |
+
```python
|
| 87 |
+
from g2pw import G2PWConverter
|
| 88 |
+
|
| 89 |
+
converter = G2PWConverter(model_dir='baicai1145/G2PWModel_zh-Hans')
|
| 90 |
+
result = converter('我今天很开心')
|
| 91 |
+
print(result)
|
| 92 |
+
```
|
| 93 |
+
|
| 94 |
+
### Credits
|
| 95 |
+
|
| 96 |
+
- **Data Collection and Organization**: [@TheSmallHanCat](https://huggingface.co/TheSmallHanCat)
|
| 97 |
+
- **Model Training**: [@baicai1145](https://huggingface.co/baicai1145)
|
| 98 |
+
|
| 99 |
+
### Related Projects
|
| 100 |
+
|
| 101 |
+
- [baicai-1145/g2pW](https://github.com/baicai-1145/g2pW) - Original G2PW project
|
| 102 |
+
- [baicai-1145/g2pw-torch](https://github.com/baicai-1145/g2pw-torch) - PyTorch inference version (recommended)
|
| 103 |
+
|
| 104 |
+
### License
|
| 105 |
+
|
| 106 |
+
This project follows the same license as the original project.
|