Update README.md
Browse files
README.md
CHANGED
|
@@ -32,25 +32,26 @@ Open-weights Dutch TTS based on the [Parakeet](https://jordandarefsky.com/blog/2
|
|
| 32 |
* Laughter can be added with the `(laughs)` tag. However, use it sparingly because the model quickly derails for too many events.
|
| 33 |
* Reduce hallucination by tuning the text prompts. The model can be brittle for unexpected events or tokens. Take a look at the example sentences and mimick the style.
|
| 34 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
## Quickstart
|
| 36 |
|
| 37 |
-
|
|
|
|
|
|
|
| 38 |
|
| 39 |
```bash
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
uv sync # For CPU
|
| 41 |
-
uv sync --extra tpu # For TPU
|
| 42 |
uv sync --extra cuda # For CUDA
|
| 43 |
|
| 44 |
-
#
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
# Create the checkpoint folder and unzip
|
| 48 |
-
mkdir -p weights
|
| 49 |
-
unzip weights/dia-nl-v1.zip -d weights
|
| 50 |
-
|
| 51 |
-
# Run the inference demo
|
| 52 |
-
# NOTE: Inference can take a while because of JAX compilation. Subsequent calls will be cached and much faster. I'm working on some performance improvements.
|
| 53 |
-
uv run python src/parkiet/jax/inference.py
|
| 54 |
```
|
| 55 |
|
| 56 |
<details>
|
|
@@ -58,6 +59,9 @@ uv run python src/parkiet/jax/inference.py
|
|
| 58 |
<summary>PyTorch</summary>
|
| 59 |
|
| 60 |
```bash
|
|
|
|
|
|
|
|
|
|
| 61 |
uv sync # For CPU
|
| 62 |
uv sync --extra cuda # For CUDA
|
| 63 |
|
|
@@ -69,31 +73,38 @@ uv run python src/parkiet/dia/inference.py
|
|
| 69 |
|
| 70 |
<details>
|
| 71 |
|
| 72 |
-
<summary>
|
| 73 |
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
from transformers import AutoProcessor, DiaForConditionalGeneration
|
| 78 |
|
| 79 |
-
|
| 80 |
-
|
| 81 |
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
]
|
| 85 |
-
processor = AutoProcessor.from_pretrained(model_checkpoint)
|
| 86 |
-
inputs = processor(text=text, padding=True, return_tensors="pt").to(torch_device)
|
| 87 |
|
| 88 |
-
|
| 89 |
-
|
|
|
|
| 90 |
|
| 91 |
-
|
| 92 |
-
|
|
|
|
| 93 |
```
|
| 94 |
|
| 95 |
</details>
|
| 96 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 97 |
## ⚠️ Disclaimer
|
| 98 |
This project offers a high-fidelity speech generation model intended for research and educational use. The following uses are strictly forbidden:
|
| 99 |
|
|
|
|
| 32 |
* Laughter can be added with the `(laughs)` tag. However, use it sparingly because the model quickly derails for too many events.
|
| 33 |
* Reduce hallucination by tuning the text prompts. The model can be brittle for unexpected events or tokens. Take a look at the example sentences and mimick the style.
|
| 34 |
|
| 35 |
+
## News
|
| 36 |
+
|
| 37 |
+
**September 28, 2025**: Added tensorsafe format support allowing the model to run directly in the Dia pipeline without conversion.
|
| 38 |
+
|
| 39 |
## Quickstart
|
| 40 |
|
| 41 |
+
There are three flavours of the model. The HF transformers version (recommended), the original JAX model, and the backported PyTorch model. The HF transformers version is the easiest to use and integrates seamlessly with the Hugging Face ecosystem.
|
| 42 |
+
|
| 43 |
+
### HF Transformers (Recommended)
|
| 44 |
|
| 45 |
```bash
|
| 46 |
+
# Make sure you have the runtime dependencies installed for JAX
|
| 47 |
+
# You can also extract the HF inference code and the transformers dependency
|
| 48 |
+
sudo apt-get install build-essential cmake protobuf-compiler libprotobuf-dev
|
| 49 |
+
|
| 50 |
uv sync # For CPU
|
|
|
|
| 51 |
uv sync --extra cuda # For CUDA
|
| 52 |
|
| 53 |
+
# Run the inference demo with HF transformers
|
| 54 |
+
uv run python src/parkiet/dia/inference_hf.py
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
```
|
| 56 |
|
| 57 |
<details>
|
|
|
|
| 59 |
<summary>PyTorch</summary>
|
| 60 |
|
| 61 |
```bash
|
| 62 |
+
# Make sure you have the runtime dependencies installed for JAX
|
| 63 |
+
sudo apt-get install build-essential cmake protobuf-compiler libprotobuf-dev
|
| 64 |
+
|
| 65 |
uv sync # For CPU
|
| 66 |
uv sync --extra cuda # For CUDA
|
| 67 |
|
|
|
|
| 73 |
|
| 74 |
<details>
|
| 75 |
|
| 76 |
+
<summary>JAX</summary>
|
| 77 |
|
| 78 |
+
```bash
|
| 79 |
+
# Make sure you have the runtime dependencies installed for JAX
|
| 80 |
+
sudo apt-get install build-essential cmake protobuf-compiler libprotobuf-dev
|
|
|
|
| 81 |
|
| 82 |
+
uv sync --extra tpu # For TPU
|
| 83 |
+
uv sync --extra cuda # For CUDA
|
| 84 |
|
| 85 |
+
# Download the checkpoint
|
| 86 |
+
wget https://huggingface.co/pevers/parkiet/resolve/main/dia-nl-v1.zip?download=true -O weights/dia-nl-v1.zip
|
|
|
|
|
|
|
|
|
|
| 87 |
|
| 88 |
+
# Create the checkpoint folder and unzip
|
| 89 |
+
mkdir -p weights
|
| 90 |
+
unzip weights/dia-nl-v1.zip -d weights
|
| 91 |
|
| 92 |
+
# Run the inference demo
|
| 93 |
+
# NOTE: Inference can take a while because of JAX compilation. Subsequent calls will be cached and much faster. I'm working on some performance improvements.
|
| 94 |
+
uv run python src/parkiet/jax/inference.py
|
| 95 |
```
|
| 96 |
|
| 97 |
</details>
|
| 98 |
|
| 99 |
+
## Hardware Requirements
|
| 100 |
+
|
| 101 |
+
| Framework | float32 VRAM | bfloat16 VRAM |
|
| 102 |
+
|---|---:|---:|
|
| 103 |
+
| JAX | ≥19 GB | ≥10GB |
|
| 104 |
+
| PyTorch | ≥15 GB | ≥10GB |
|
| 105 |
+
|
| 106 |
+
Note: `bfloat16` typically reduces VRAM usage versus `float32` on supported hardware to about 10GB. However, converting the full model to `bfloat16` causes more instability and hallucinations. Setting just the `compute_dtype` to `bfloat16` is a good compromise and is also done during training. We would like to reduce the VRAM requirements in a next training run.
|
| 107 |
+
|
| 108 |
## ⚠️ Disclaimer
|
| 109 |
This project offers a high-fidelity speech generation model intended for research and educational use. The following uses are strictly forbidden:
|
| 110 |
|