myned-ai
/

wav2arkit_cpu

audio2expression

facial-animation

Model card Files Files and versions

antonios-makro commited on Jan 12

Commit

a817dfc

·

verified ·

1 Parent(s): 723c6d3

Upload 2 files

replaced mermaid diagram with png

Files changed (2) hide show

README.md +3 -18
arch_diagram.png +0 -0

README.md CHANGED Viewed

@@ -152,24 +152,9 @@ const { blendshapes } = await session.run({ audio_waveform: audioTensor });
 ## Architecture
-```mermaid
-flowchart TD
-  A[Audio Input batch, samples @ 16kHz] -->B1
-  subgraph Wav2Vec2 Encoder
-    B1[CNN Feature Extractor 50fps]
-    B1 --> B2[Linear Interpolation 50→30fps]
-    B2 --> B3[Transformer Encoder 12 layers]
-  end
-  B3 --> C[Feature Projection 768 → 512 batch, frames, 512] -->D
-  subgraph Identity Encoder
-    D[Concat: 512 + MLP 12→64]
-    D --> E[SeqTranslator 3× Conv+LN+ReLU batch, 512, frames]
-  end
-  I[Identity ID 0–11 int → one-hot 12 → MLP → 64 baked as ID=11] --> D
-  E --> F[Decoder 3× Conv1D + LayerNorm batch, 512, frames]
-  F --> G[Output Projection 512 → 52 + σ]
-  G --> H[Output batch, frames, 52 @ 30fps values ∈ 0,1]
-```
 **Note:** The identity encoder supports 12 speaker identities (0-11). This ONNX export uses identity `11` baked in for single-speaker inference.
 ## License

 ## Architecture
+![Model Architecture](arch_diagram.png)
 **Note:** The identity encoder supports 12 speaker identities (0-11). This ONNX export uses identity `11` baked in for single-speaker inference.
 ## License

arch_diagram.png ADDED Viewed