Visione / README.md

Update README.md

018678e verified 2 days ago

9 kB

	---
	language:
	- en
	base_model:
	- Lightricks/LTX-2.3
	- Tongyi-MAI/Z-Image-Turbo
	- black-forest-labs/FLUX.2-klein-9B
	- tencent/HunyuanVideo-Foley
	- Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice
	- Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign
	- Qwen/Qwen3-TTS-12Hz-1.7B-Base
	pipeline_tag: text-to-image

	---
	<p align="center">
	<img
	src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/YJHpzH436J828nNymCNk7.png"
	width="600" />
	</p>

	<p align="center">
	<strong>Full creative production suite that runs entirely on your GPU.</strong><br/>
	No cloud. No API keys. No subscriptions. Every model runs on-device.
	</p>

	<p align="center">
	<a
	href="https://www.notion.so/Visione-Technical-Documentation-3194a74185bb8015b154e234606497e2">Documentation</a>
	</p>

	---

	## Why Visione

	Most AI creative tools are fragmented: one app for image gen, another for video, another for audio, each with its
	own cloud dependency and pricing tier. Visione puts the entire pipeline — from concept to final export — inside a
	single desktop application that runs on a consumer NVIDIA GPU (16GB VRAM).

	You own your hardware, your models, and your outputs. Nothing is transmitted externally. Ever.

	<table align="center"><tr>
	<td><a href="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/X0pIezsKwIRl-Guw3k58
	A.png"><img
	src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/X0pIezsKwIRl-Guw3k58A.png"
	width="300" /></a></td>
	<td><a href="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/euOPxXTNWxjmRl-C88uU
	2.png"><img
	src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/euOPxXTNWxjmRl-C88uU2.png"
	width="300" /></a></td>
	<td><a href="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/lW_zGi1O8HblIoamV0RL
	r.png"><img
	src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/lW_zGi1O8HblIoamV0RLr.png"
	width="300" /></a></td>
	<td><a href="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/qKWonqa8ZQvl3CTdD0Pj
	e.png"><img
	src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/qKWonqa8ZQvl3CTdD0Pje.png"
	width="300" /></a></td>
	<td><a href="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/IjNbVVpnLepr9NI8cdxA
	3.png"><img
	src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/IjNbVVpnLepr9NI8cdxA3.png"
	width="300" /></a></td>
	</tr></table>

	---

	## What You Can Do

	\| Component \| Description \|
	\|---\|---\|
	\| Imagine \| Text-to-image generation with 90+ style LoRAs across 3 model tiers (Z-Image Turbo, Klein 9B). Character `@mentions` for consistent subjects. \|
	\| Animate \| Image-to-video and text-to-video via LTX 2.3. 5 workflow modes: standard I2V/T2V, Best 3-stage, first-last-frame, audio-conditioned. \|
	\| Retouch \| Full image editor — inpainting, upscaling, reframing, face swap (InsightFace + FaceFusion), background removal, LUT color grading, optical realism effects, multi-reference compositing, and smart selection (SAM). \|
	\| Retexture \| Apply any of 90+ preset styles to existing images via LoRA, or transfer the style of a reference image using depth-conditioned generation. \|
	\| Enhance \| SeedVR2 video enhancement (3B/7B models), Real-ESRGAN upscaling, and RIFE frame interpolation. \|
	\| Storyboard \| 12-stage AI filmmaking pipeline: concept development with multi-agent LLM collaboration, character library, shot-by-shot generation, and ZIP export. \|
	\| Sound Studio \| ACE-Step music generation, Qwen3-TTS voiceover (28 preset voices + clone + design), and HunyuanVideo-Foley for video-to-audio. \|
	\| Characters \| Persistent character library with full-body 5-shot reference generation for visual consistency across shots and components. \|
	\| Styles \| Browse and install LoRAs from CivitAI directly inside the app. Manage custom styles with per-preset strength tuning. \|
	\| Gallery \| Unified asset browser across all components with metadata, output modal, and send-to integration for cross-component workflows. \|

	<table align="center"><tr>
	<td><a href="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/No1ABmspTrCWqpvsukaf
	Q.png"><img
	src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/No1ABmspTrCWqpvsukafQ.png"
	width="300" /></a></td>
	<td><a href="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/mXVAiuj8Vpik0a_UNREI
	U.png"><img
	src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/mXVAiuj8Vpik0a_UNREIU.png"
	width="300" /></a></td>
	<td><a href="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/Gmzmavqm9antFHYsbl4K
	a.png"><img
	src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/Gmzmavqm9antFHYsbl4Ka.png"
	width="300" /></a></td>
	<td><a href="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/BbYSmMGcXZENjW-LBiIU
	z.png"><img
	src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/BbYSmMGcXZENjW-LBiIUz.png"
	width="300" /></a></td>
	<td><a href="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/jcy5-_cKf0oa_Utf3ZXb
	K.png"><img
	src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/jcy5-_cKf0oa_Utf3ZXbK.png"
	width="300" /></a></td>
	</tr></table>

	---

	## Key Features

	- 90+ style presets — LoRA-based styles spanning cinematic, illustration, animation, photography, design, and
	artist-specific looks. Browse and install more from CivitAI directly inside the app.
	- Character consistency — Generate a persistent character once, then reference them by name across Imagine,
	Retouch, and Storyboard with `@mentions`.
	- Smart VRAM management — Models load and unload sequentially to fit within 16GB. One active model at a time,
	no manual memory management needed.
	- Multilingual UI — English, Italian, Spanish, French, German. (COMING SOON)
	- Local LLM + VLM — Qwen3.5-4B handles prompt enhancement, image captioning, and storyboard agents. Falls back
	to Llama 3.2 3B on CPU if needed. No external API calls.
	- Optical realism — Client-side film emulation: grain, halation, vignette, pro-mist, chromatic aberration,
	highlight roll-off, color temperature and tint.

	---

	## Architecture

	Visione is a local client-server desktop app. The React frontend talks to a FastAPI backend over localhost;
	real-time progress streams via SSE. Heavy inference runs in-process (diffusers/PyTorch) or through a headless
	ComfyUI subprocess for video pipelines. The Tauri 2 shell wraps it as a native window and manages the backend
	lifecycle.

	Models are shared across components wherever possible — the same image generation backbone serves Imagine,
	Retouch, Retexture, and Storyboard. All assets, models, and outputs stay on local storage.

	Stack: Python 3.12 + FastAPI + SSE / React 18 + TypeScript + Zustand / Tauri 2 / ComfyUI headless / PyTorch
	2.7 + CUDA

	<table align="center"><tr>
	<td><a href="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/7_CDVBV6B08IosFIkr5j
	q.png"><img
	src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/7_CDVBV6B08IosFIkr5jq.png"
	width="300" /></a></td>
	<td><a href="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/fRhZcUYtK_TE8uIlXyPH
	-.png"><img
	src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/fRhZcUYtK_TE8uIlXyPH-.png"
	width="300" /></a></td>
	<td><a href="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/B1J7kJuRPiPY12-Wja0j
	W.png"><img
	src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/B1J7kJuRPiPY12-Wja0jW.png"
	width="300" /></a></td>
	<td><a href="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/MXtHgy7hlq9YZVaQED_W
	A.png"><img
	src="https://cdn-uploads.huggingface.co/production/uploads/695017bb0c3fc8b9c78497e9/MXtHgy7hlq9YZVaQED_WA.png"
	width="300" /></a></td>
	</tr></table>

	---

	## Hardware Requirements

	\| \| Minimum \| Recommended \|
	\|---\|---\|---\|
	\| GPU \| NVIDIA 12GB VRAM (RTX 3060) \| NVIDIA 16GB VRAM (RTX 4080) \|
	\| RAM \| 16GB \| 32GB \|
	\| Storage \| ~50GB (core models) \| ~210GB (all models) \|
	\| OS \| Windows 10/11 \| Windows 11 \|

	---

	## License

	MIT

	---
	license: mit
	tags:
	- art
	- agent
	- image-generation
	- video-generation
	- text-to-image
	- text-to-video
	- style-transfer
	- image-editing
	- tts
	- local-inference
	---