zenctrl_tools / README.md
ultimaxxl's picture
refactor: Replace image path
a81898e
|
raw
history blame
6.47 kB
metadata
license: apache-2.0
base_model:
  - Yuanshi/OminiControl
ZenCtrl Banner

ZenCtrl

An all-in-one, control framework for unified visual content creation using GenAI.
Generate multi-view, diverse-scene, and task-specific high-resolution images from a single subject imageโ€”without fine-tuning.

GitHub Repo Discord LP X

๐Ÿง  Overview

ZenCtrl is a comprehensive toolkit built to tackle core challenges in image generation:

  • No fine-tuning needed โ€“ works from a single subject image
  • Maintains control over shape, pose, camera angle, context
  • Supports high-resolution, multi-scene generation
  • Modular toolkit for preprocessing, control, editing, and post-processing tasks

ZenCtrl is based on OminiControl but enhanced with more fine-grained control, consistent subject preservation, and more improved and ready-to-use models. Our goal is to build an agentic visual generation system that can orchestrate image/video creation from LLM-driven recipes.


๐Ÿ“ฆ Github code

https://github.com/FotographerAI/ZenCtrl/tree/main


๐Ÿ›  Toolkit Components (coming soon)

๐Ÿงน Preprocessing

  • Background removal
  • Matting
  • Reshaping
  • Segmentation

๐ŸŽฎ Control Models

  • Shape (Canny, HED, Scribble, Depth)
  • Pose (OpenPose, DensePose)
  • Mask control
  • Camera/View control

๐ŸŽจ Post-processing

  • Deblurring
  • Color fixing
  • Natural blending

โœ๏ธ Editing Models

  • Inpainting (removal, masked editing, replacement)
  • Outpainting
  • Transformation / Motion
  • Relighting

๐ŸŽฏ Supported Tasks

  • Background generation
  • Controlled background generation
  • Subject-consistent context-aware generation
  • Object and subject placement (coming soon)
  • In-context image/video generation (coming soon)
  • Multi-object/subject merging & blending (coming soon)
  • Video generation (coming soon)

๐Ÿ“ฆ Target Use Cases

  • Product photography
  • Fashion & accessory try-on
  • Virtual try-on (shoes, hats, glasses, etc.)
  • People & portrait control
  • Illustration, animation, and ad creatives

All of these tasks can be mixed and layered โ€” ZenCtrl is designed to support real-world visual workflows with agentic task composition.


๐Ÿ“ข News

  • 2025-03-26: ๐Ÿง  First release โ€” model weights available on Hugging Face!
  • Coming Soon: Source code release, Quick Start guide, Example notebooks
  • Next: Controlled fine-grain version on our platform and API (Pro version)
  • Future: Video generation toolkit release

๐Ÿšง Limitations

  1. Models currently perform best with objects, and to some extent humans.
  2. Resolution support is currently capped at 1024x1024 (higher quality coming soon).
  3. Performance with illustrations is currently limited.
  4. The models were not trained on large-scale or highly diverse datasets yet โ€” we plan to improve quality and variation by training on larger and more diverse datasets, especially for illustration and stylized content.
  5. Video support and the full agentic task pipeline are still under development.

๐Ÿ“‹ To-do

  • Release early pretrained model weights for defined tasks
  • Release additional task-specific models and modes
  • Release open source code (coming soon)
  • Release Quick Start guide and example notebooks
  • Launch API access via our app and Baseten for easier deployment
  • Release high-resolution models (1500ร—1500+)
  • Enable full toolkit integration with agent API
  • Add video generation module

๐Ÿค Join the Community


๐Ÿค Community Collaboration

We hope to collaborate closely with the open-source community to make ZenCtrl a powerful and extensible toolkit for visual content creation.
Once the source code is released, we welcome contributions in training, expanding supported use cases, and developing new task-specific modules.
Our vision is to make ZenCtrl the standard framework for agentic, high-quality image and video generation โ€” built together, for everyone.