--- language: - en base_model: - Lightricks/LTX-2.3 - Tongyi-MAI/Z-Image-Turbo - black-forest-labs/FLUX.2-klein-9B - tencent/HunyuanVideo-Foley - Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice - Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign - Qwen/Qwen3-TTS-12Hz-1.7B-Base pipeline_tag: text-to-image ---

Full creative production suite that runs entirely on your GPU.
No cloud. No API keys. No subscriptions. Every model runs on-device.

--- ## Why Visione Most AI creative tools are fragmented: one app for image gen, another for video, another for audio, each with its own cloud dependency and pricing tier. Visione puts the entire pipeline — from concept to final export — inside a single desktop application that runs on a consumer NVIDIA GPU (16GB VRAM). You own your hardware, your models, and your outputs. Nothing is transmitted externally. Ever.

--- ## What You Can Do | Component | Description | |---|---| | **Imagine** | Text-to-image generation with 90+ style LoRAs across 3 model tiers (Z-Image Turbo, Klein 9B). Character `@mentions` for consistent subjects. | | **Animate** | Image-to-video and text-to-video via LTX 2.3. 5 workflow modes: standard I2V/T2V, Best 3-stage, first-last-frame, audio-conditioned. | | **Retouch** | Full image editor — inpainting, upscaling, reframing, face swap (InsightFace + FaceFusion), background removal, LUT color grading, optical realism effects, multi-reference compositing, and smart selection (SAM). | | **Retexture** | Apply any of 90+ preset styles to existing images via LoRA, or transfer the style of a reference image using depth-conditioned generation. | | **Enhance** | SeedVR2 video enhancement (3B/7B models), Real-ESRGAN upscaling, and RIFE frame interpolation. | | **Storyboard** | 12-stage AI filmmaking pipeline: concept development with multi-agent LLM collaboration, character library, shot-by-shot generation, and ZIP export. | | **Sound Studio** | ACE-Step music generation, Qwen3-TTS voiceover (28 preset voices + clone + design), and HunyuanVideo-Foley for video-to-audio. | | **Characters** | Persistent character library with full-body 5-shot reference generation for visual consistency across shots and components. | | **Styles** | Browse and install LoRAs from CivitAI directly inside the app. Manage custom styles with per-preset strength tuning. | | **Gallery** | Unified asset browser across all components with metadata, output modal, and send-to integration for cross-component workflows. |

--- ## Key Features - **90+ style presets** — LoRA-based styles spanning cinematic, illustration, animation, photography, design, and artist-specific looks. Browse and install more from CivitAI directly inside the app. - **Character consistency** — Generate a persistent character once, then reference them by name across Imagine, Retouch, and Storyboard with `@mentions`. - **Smart VRAM management** — Models load and unload sequentially to fit within 16GB. One active model at a time, no manual memory management needed. - **Multilingual UI** — English, Italian, Spanish, French, German. (COMING SOON) - **Local LLM + VLM** — Qwen3.5-4B handles prompt enhancement, image captioning, and storyboard agents. Falls back to Llama 3.2 3B on CPU if needed. No external API calls. - **Optical realism** — Client-side film emulation: grain, halation, vignette, pro-mist, chromatic aberration, highlight roll-off, color temperature and tint. --- ## Architecture Visione is a local client-server desktop app. The React frontend talks to a FastAPI backend over localhost; real-time progress streams via SSE. Heavy inference runs in-process (diffusers/PyTorch) or through a headless ComfyUI subprocess for video pipelines. The Tauri 2 shell wraps it as a native window and manages the backend lifecycle. Models are shared across components wherever possible — the same image generation backbone serves Imagine, Retouch, Retexture, and Storyboard. All assets, models, and outputs stay on local storage. **Stack:** Python 3.12 + FastAPI + SSE / React 18 + TypeScript + Zustand / Tauri 2 / ComfyUI headless / PyTorch 2.7 + CUDA

--- ## Hardware Requirements | | Minimum | Recommended | |---|---|---| | **GPU** | NVIDIA 12GB VRAM (RTX 3060) | NVIDIA 16GB VRAM (RTX 4080) | | **RAM** | 16GB | 32GB | | **Storage** | ~50GB (core models) | ~210GB (all models) | | **OS** | Windows 10/11 | Windows 11 | --- ## License MIT --- license: mit tags: - art - agent - image-generation - video-generation - text-to-image - text-to-video - style-transfer - image-editing - tts - local-inference ---