Commit ·
16f0a44
1
Parent(s): fed42c7
gitmodules updated
Browse files- .claude/settings.local.json +2 -1
- .gitignore +2 -0
- README.md +0 -21
.claude/settings.local.json
CHANGED
|
@@ -6,7 +6,8 @@
|
|
| 6 |
"Bash(du -sh:*)",
|
| 7 |
"Bash(/dev/null -exec ls -lh {} ;)",
|
| 8 |
"Bash(git lfs track:*)",
|
| 9 |
-
"Bash(git add:*)"
|
|
|
|
| 10 |
]
|
| 11 |
}
|
| 12 |
}
|
|
|
|
| 6 |
"Bash(du -sh:*)",
|
| 7 |
"Bash(/dev/null -exec ls -lh {} ;)",
|
| 8 |
"Bash(git lfs track:*)",
|
| 9 |
+
"Bash(git add:*)",
|
| 10 |
+
"Bash(git commit:*)"
|
| 11 |
]
|
| 12 |
}
|
| 13 |
}
|
.gitignore
CHANGED
|
@@ -5,3 +5,5 @@ __pycache__/
|
|
| 5 |
*.egg-info/
|
| 6 |
.venv/
|
| 7 |
venv/
|
|
|
|
|
|
|
|
|
| 5 |
*.egg-info/
|
| 6 |
.venv/
|
| 7 |
venv/
|
| 8 |
+
|
| 9 |
+
.claude
|
README.md
CHANGED
|
@@ -50,12 +50,6 @@ python validate_qnn_accuracy.py \
|
|
| 50 |
- ~75ms latency for 2-second audio (5 diffusion steps)
|
| 51 |
- ~70-100 MB total model size (after INT8/FP16 quantization)
|
| 52 |
|
| 53 |
-
## Requirements
|
| 54 |
-
|
| 55 |
-
```bash
|
| 56 |
-
pip install numpy onnxruntime
|
| 57 |
-
```
|
| 58 |
-
|
| 59 |
## Model Files
|
| 60 |
|
| 61 |
Required directory structure:
|
|
@@ -205,21 +199,6 @@ supertonic2-qualcomm/
|
|
| 205 |
- Unicode characters are handled transparently
|
| 206 |
- Emoji are automatically removed
|
| 207 |
|
| 208 |
-
## Troubleshooting
|
| 209 |
-
|
| 210 |
-
**No output generated:**
|
| 211 |
-
- Check that all model files exist in `model/onnx/`
|
| 212 |
-
- Ensure voice style file exists in `model/voice_styles/`
|
| 213 |
-
|
| 214 |
-
**Poor quality output:**
|
| 215 |
-
- Increase `--steps` to 15-20
|
| 216 |
-
- Try different voice styles
|
| 217 |
-
- Check that input text is properly formatted
|
| 218 |
-
|
| 219 |
-
**Different outputs each run:**
|
| 220 |
-
- Use `--seed` parameter for reproducibility
|
| 221 |
-
- Same seed = same output
|
| 222 |
-
|
| 223 |
## License
|
| 224 |
|
| 225 |
This implementation uses models from [Supertone/supertonic-2](https://huggingface.co/Supertone/supertonic-2).
|
|
|
|
| 50 |
- ~75ms latency for 2-second audio (5 diffusion steps)
|
| 51 |
- ~70-100 MB total model size (after INT8/FP16 quantization)
|
| 52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
## Model Files
|
| 54 |
|
| 55 |
Required directory structure:
|
|
|
|
| 199 |
- Unicode characters are handled transparently
|
| 200 |
- Emoji are automatically removed
|
| 201 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 202 |
## License
|
| 203 |
|
| 204 |
This implementation uses models from [Supertone/supertonic-2](https://huggingface.co/Supertone/supertonic-2).
|