atMrMattV commited on
Commit
33264dd
verified
1 Parent(s): c90ac06

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -37
README.md CHANGED
@@ -1,3 +1,9 @@
 
 
 
 
 
 
1
  # Visione
2
 
3
  A local-first AI creative production suite for consumer GPUs.
@@ -13,6 +19,29 @@ The pipeline covers the full creative arc: text-to-image and video generation, r
13
 
14
  **Stack:** Python 3.12 + FastAPI + SSE 路 React 18 + TypeScript + Zustand 路 Tauri 2 desktop shell 路 ComfyUI headless for video inference 路 PyTorch 2.7 + CUDA
15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  ---
17
 
18
  ## Components
@@ -28,43 +57,28 @@ The pipeline covers the full creative arc: text-to-image and video generation, r
28
  | **Characters** | Persistent character library with 5-shot reference generation for cross-shot consistency |
29
  | **Gallery** | Unified asset browser across all components |
30
 
31
- ## Models
32
-
33
- | Model | Purpose |
34
- |-------|---------|
35
- | Z-Image Turbo FP8 | Image generation |
36
- | Z-Image Qwen 3 4B | Text encoder |
37
- | Z-Image VAE | VAE |
38
- | Z-Image LoRAs (38) | Style presets |
39
- | Flux2 Klein 4B FP8 | Image gen / editing |
40
- | Flux2 Klein 9B BF16 | Image gen/ High-quality editing |
41
- | Flux2 VAE | VAE |
42
- | ControlNet Union 2.1 | Structural conditioning (Retexture) |
43
- | Patina LoRAs (21) | Stylization presets |
44
- | SPAN 4x Upscaler | Image upscaling |
45
- | SCUNet Denoiser | Image denoising |
46
- | CodeFormer | Face enhancement |
47
- | LTX-2.3 22B FP8 | Video generation |
48
- | LTX-2 Gemma 3 12B FP4 | Video text encoder |
49
- | LTX-2.3 22B Distilled LoRA | Fast video sampling |
50
- | LTX-2.3 Spatial Upscaler | 2脳 video upscale |
51
- | LTX-2.3 Audio VAE | Audio generation |
52
- | VEnhancer FP16 | Video enhancement |
53
- | SeedVR2 3B FP8 | Video upscaling |
54
- | RIFE v4.26 | Frame interpolation |
55
- | ACE-Step SFT + Base | Music generation |
56
- | ACE-Step LM 1.7B | Music language model |
57
- | ACE-Step VAE + TextEnc | Music pipeline |
58
- | Qwen3-TTS 1.7B (3 variants) | Text-to-speech |
59
- | HunyuanVideo-Foley XL | Video-to-audio |
60
- | Wan 2.1 T2V 1.3B | StyleMaster backbone |
61
- | StyleMaster checkpoints | Style injection weights |
62
- | CLIP ViT-H-14 | Style extraction |
63
- | IS-Net (rembg) | Background removal (CPU) |
64
- | LatentSync 1.6 | Lip sync (quality) |
65
- | MuseTalk 1.5 | Lip sync (fast) |
66
- | InsightFace buffalo_l | Face detection/swap |
67
- | Inswapper_128.onnx | Face swap model |
68
 
69
  ---
70
 
@@ -76,7 +90,27 @@ The desktop shell (Tauri 2) wraps the frontend as a native window and manages ba
76
 
77
  Components share models where possible. Image generation models are reused across Imagine, Retouch, Retexture, and Storyboard; video models feed through from Imagine into Retexture and Sound Studio. The Video Editor and Gallery operate CPU-side, assembling outputs produced by the GPU components.
78
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
  ---
 
80
  ## License
81
 
82
  MIT
 
1
+ <p align="center">
2
+ <img
3
+ src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/YJHpzH436J828nNymCNk7.png"
4
+ width="600" />
5
+ </p>
6
+
7
  # Visione
8
 
9
  A local-first AI creative production suite for consumer GPUs.
 
19
 
20
  **Stack:** Python 3.12 + FastAPI + SSE 路 React 18 + TypeScript + Zustand 路 Tauri 2 desktop shell 路 ComfyUI headless for video inference 路 PyTorch 2.7 + CUDA
21
 
22
+ <table align="center"><tr>
23
+ <td><a href="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/X0pIezsKwIRl-Guw3k58A.
24
+ png"><img
25
+ src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/X0pIezsKwIRl-Guw3k58A.png"
26
+ width="300" /></a></td>
27
+ <td><a href="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/euOPxXTNWxjmRl-C88uU2.
28
+ png"><img
29
+ src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/euOPxXTNWxjmRl-C88uU2.png"
30
+ width="300" /></a></td>
31
+ <td><a href="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/lW_zGi1O8HblIoamV0RLr.
32
+ png"><img
33
+ src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/lW_zGi1O8HblIoamV0RLr.png"
34
+ width="300" /></a></td>
35
+ <td><a href="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/qKWonqa8ZQvl3CTdD0Pje.
36
+ png"><img
37
+ src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/qKWonqa8ZQvl3CTdD0Pje.png"
38
+ width="300" /></a></td>
39
+ <td><a href="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/IjNbVVpnLepr9NI8cdxA3.
40
+ png"><img
41
+ src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/IjNbVVpnLepr9NI8cdxA3.png"
42
+ width="300" /></a></td>
43
+ </tr></table>
44
+
45
  ---
46
 
47
  ## Components
 
57
  | **Characters** | Persistent character library with 5-shot reference generation for cross-shot consistency |
58
  | **Gallery** | Unified asset browser across all components |
59
 
60
+ <table align="center"><tr>
61
+ <td><a href="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/No1ABmspTrCWqpvsukafQ.
62
+ png"><img
63
+ src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/No1ABmspTrCWqpvsukafQ.png"
64
+ width="300" /></a></td>
65
+ <td><a href="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/mXVAiuj8Vpik0a_UNREIU.
66
+ png"><img
67
+ src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/mXVAiuj8Vpik0a_UNREIU.png"
68
+ width="300" /></a></td>
69
+ <td><a href="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/Gmzmavqm9antFHYsbl4Ka.
70
+ png"><img
71
+ src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/Gmzmavqm9antFHYsbl4Ka.png"
72
+ width="300" /></a></td>
73
+ <td><a href="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/BbYSmMGcXZENjW-LBiIUz.
74
+ png"><img
75
+ src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/BbYSmMGcXZENjW-LBiIUz.png"
76
+ width="300" /></a></td>
77
+ <td><a href="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/jcy5-_cKf0oa_Utf3ZXbK.
78
+ png"><img
79
+ src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/jcy5-_cKf0oa_Utf3ZXbK.png"
80
+ width="300" /></a></td>
81
+ </tr></table>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
 
83
  ---
84
 
 
90
 
91
  Components share models where possible. Image generation models are reused across Imagine, Retouch, Retexture, and Storyboard; video models feed through from Imagine into Retexture and Sound Studio. The Video Editor and Gallery operate CPU-side, assembling outputs produced by the GPU components.
92
 
93
+ <table align="center"><tr>
94
+ <td><a href="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/7_CDVBV6B08IosFIkr5jq.
95
+ png"><img
96
+ src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/7_CDVBV6B08IosFIkr5jq.png"
97
+ width="300" /></a></td>
98
+ <td><a href="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/fRhZcUYtK_TE8uIlXyPH-.
99
+ png"><img
100
+ src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/fRhZcUYtK_TE8uIlXyPH-.png"
101
+ width="300" /></a></td>
102
+ <td><a href="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/B1J7kJuRPiPY12-Wja0jW.
103
+ png"><img
104
+ src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/B1J7kJuRPiPY12-Wja0jW.png"
105
+ width="300" /></a></td>
106
+ <td><a href="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/MXtHgy7hlq9YZVaQED_WA.
107
+ png"><img
108
+ src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/MXtHgy7hlq9YZVaQED_WA.png"
109
+ width="300" /></a></td>
110
+ </tr></table>
111
+
112
  ---
113
+
114
  ## License
115
 
116
  MIT