antonios-makro commited on
Commit
a817dfc
·
verified ·
1 Parent(s): 723c6d3

Upload 2 files

Browse files

replaced mermaid diagram with png

Files changed (2) hide show
  1. README.md +3 -18
  2. arch_diagram.png +0 -0
README.md CHANGED
@@ -152,24 +152,9 @@ const { blendshapes } = await session.run({ audio_waveform: audioTensor });
152
 
153
  ## Architecture
154
 
155
- ```mermaid
156
- flowchart TD
157
- A[Audio Input batch, samples @ 16kHz] -->B1
158
- subgraph Wav2Vec2 Encoder
159
- B1[CNN Feature Extractor 50fps]
160
- B1 --> B2[Linear Interpolation 50→30fps]
161
- B2 --> B3[Transformer Encoder 12 layers]
162
- end
163
- B3 --> C[Feature Projection 768 → 512 batch, frames, 512] -->D
164
- subgraph Identity Encoder
165
- D[Concat: 512 + MLP 12→64]
166
- D --> E[SeqTranslator 3× Conv+LN+ReLU batch, 512, frames]
167
- end
168
- I[Identity ID 0–11 int → one-hot 12 → MLP → 64 baked as ID=11] --> D
169
- E --> F[Decoder 3× Conv1D + LayerNorm batch, 512, frames]
170
- F --> G[Output Projection 512 → 52 + σ]
171
- G --> H[Output batch, frames, 52 @ 30fps values ∈ 0,1]
172
- ```
173
  **Note:** The identity encoder supports 12 speaker identities (0-11). This ONNX export uses identity `11` baked in for single-speaker inference.
174
 
175
  ## License
 
152
 
153
  ## Architecture
154
 
155
+
156
+ ![Model Architecture](arch_diagram.png)
157
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
158
  **Note:** The identity encoder supports 12 speaker identities (0-11). This ONNX export uses identity `11` baked in for single-speaker inference.
159
 
160
  ## License
arch_diagram.png ADDED