license: apache-2.0
base_model:
- Yuanshi/OminiControl
An all-in-one, control framework for unified visual content creation using GenAI.
Generate multi-view, diverse-scene, and task-specific high-resolution images from a single subject imageโwithout fine-tuning.
๐ง Overview
ZenCtrl is a comprehensive toolkit built to tackle core challenges in image generation:
- No fine-tuning needed โ works from a single subject image
- Maintains control over shape, pose, camera angle, context
- Supports high-resolution, multi-scene generation
- Modular toolkit for preprocessing, control, editing, and post-processing tasks
ZenCtrl is based on OminiControl but enhanced with more fine-grained control, consistent subject preservation, and more improved and ready-to-use models. Our goal is to build an agentic visual generation system that can orchestrate image/video creation from LLM-driven recipes.
๐ฆ Github code
https://github.com/FotographerAI/ZenCtrl/tree/main
๐ Toolkit Components (coming soon)
๐งน Preprocessing
- Background removal
- Matting
- Reshaping
- Segmentation
๐ฎ Control Models
- Shape (Canny, HED, Scribble, Depth)
- Pose (OpenPose, DensePose)
- Mask control
- Camera/View control
๐จ Post-processing
- Deblurring
- Color fixing
- Natural blending
โ๏ธ Editing Models
- Inpainting (removal, masked editing, replacement)
- Outpainting
- Transformation / Motion
- Relighting
๐ฏ Supported Tasks
- Background generation
- Controlled background generation
- Subject-consistent context-aware generation
- Object and subject placement (coming soon)
- In-context image/video generation (coming soon)
- Multi-object/subject merging & blending (coming soon)
- Video generation (coming soon)
๐ฆ Target Use Cases
- Product photography
- Fashion & accessory try-on
- Virtual try-on (shoes, hats, glasses, etc.)
- People & portrait control
- Illustration, animation, and ad creatives
All of these tasks can be mixed and layered โ ZenCtrl is designed to support real-world visual workflows with agentic task composition.
๐ข News
- 2025-03-26: ๐ง First release โ model weights available on Hugging Face!
- Coming Soon: Source code release, Quick Start guide, Example notebooks
- Next: Controlled fine-grain version on our platform and API (Pro version)
- Future: Video generation toolkit release
๐ง Limitations
- Models currently perform best with objects, and to some extent humans.
- Resolution support is currently capped at 1024x1024 (higher quality coming soon).
- Performance with illustrations is currently limited.
- The models were not trained on large-scale or highly diverse datasets yet โ we plan to improve quality and variation by training on larger and more diverse datasets, especially for illustration and stylized content.
- Video support and the full agentic task pipeline are still under development.
๐ To-do
- Release early pretrained model weights for defined tasks
- Release additional task-specific models and modes
- Release open source code (coming soon)
- Release Quick Start guide and example notebooks
- Launch API access via our app and Baseten for easier deployment
- Release high-resolution models (1500ร1500+)
- Enable full toolkit integration with agent API
- Add video generation module
๐ค Join the Community
- ๐ฌ Discord โ share ideas and feedback
- ๐ Landing Page
- ๐งช Try it now on Hugging Face Space (release on 2025/03/28 PST)
๐ค Community Collaboration
We hope to collaborate closely with the open-source community to make ZenCtrl a powerful and extensible toolkit for visual content creation.
Once the source code is released, we welcome contributions in training, expanding supported use cases, and developing new task-specific modules.
Our vision is to make ZenCtrl the standard framework for agentic, high-quality image and video generation โ built together, for everyone.