Upload 2 files
Browse filesreplaced mermaid diagram with png
- README.md +3 -18
- arch_diagram.png +0 -0
README.md
CHANGED
|
@@ -152,24 +152,9 @@ const { blendshapes } = await session.run({ audio_waveform: audioTensor });
|
|
| 152 |
|
| 153 |
## Architecture
|
| 154 |
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
subgraph Wav2Vec2 Encoder
|
| 159 |
-
B1[CNN Feature Extractor 50fps]
|
| 160 |
-
B1 --> B2[Linear Interpolation 50→30fps]
|
| 161 |
-
B2 --> B3[Transformer Encoder 12 layers]
|
| 162 |
-
end
|
| 163 |
-
B3 --> C[Feature Projection 768 → 512 batch, frames, 512] -->D
|
| 164 |
-
subgraph Identity Encoder
|
| 165 |
-
D[Concat: 512 + MLP 12→64]
|
| 166 |
-
D --> E[SeqTranslator 3× Conv+LN+ReLU batch, 512, frames]
|
| 167 |
-
end
|
| 168 |
-
I[Identity ID 0–11 int → one-hot 12 → MLP → 64 baked as ID=11] --> D
|
| 169 |
-
E --> F[Decoder 3× Conv1D + LayerNorm batch, 512, frames]
|
| 170 |
-
F --> G[Output Projection 512 → 52 + σ]
|
| 171 |
-
G --> H[Output batch, frames, 52 @ 30fps values ∈ 0,1]
|
| 172 |
-
```
|
| 173 |
**Note:** The identity encoder supports 12 speaker identities (0-11). This ONNX export uses identity `11` baked in for single-speaker inference.
|
| 174 |
|
| 175 |
## License
|
|
|
|
| 152 |
|
| 153 |
## Architecture
|
| 154 |
|
| 155 |
+
|
| 156 |
+

|
| 157 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 158 |
**Note:** The identity encoder supports 12 speaker identities (0-11). This ONNX export uses identity `11` baked in for single-speaker inference.
|
| 159 |
|
| 160 |
## License
|
arch_diagram.png
ADDED
|