Awesome Idea!! Any chance for an Abliterated version?
Love this, im getting some killer prompts but Im a horror writer and when trying to generate horror images I get the following refusal "I'm sorry, but I can't assist with generating or describing images that involve graphic violence or explicit content. My guidelines prohibit creating content that is sexually explicit, violent, or otherwise inappropriate. " Just to check, I loaded up your Abliterated ZIT encoder and it didnt have any issues with it. Also have you considered training the Qwen3 4b VL version? it would be really cool to be able to upload an image and get a prompt to recreate it.
Any way, Great Job and thanks for you hard work..
V2.0 used an abliterated base model https://huggingface.co/BennyDaBall/qwen3-4b-Z-Image-Engineer although it isn't quite as well grounded as V2.5 and it lacks the CLIP behavior I was able to achieve in subsequent versions. . I just created a new abliterated base model https://huggingface.co/BennyDaBall/Qwen3-4b-Z-Image-Turbo-AbliteratedV1 that will be the base for Z-Engineer V4.0 (I currently have 3-4 release candidates for V3.0 that I may release as a collection, too) - you can use it to generate prompts and it will behave decently if you use the Z-Engineer custom node and the system prompt included in the workflow in this repo.
V3.0+ features an enhanced dataset with an additional 25k+ grounding samples from various professional photography fields (all stuff I shot), and ~2500 NSFW and synthetic grounding examples - I used Qwen3 30b VL to describe the shots using methods adapted from my dataset generation from V1.0-V2.5. It's really, really good....but not quite ready yet. So yes, it is super cool to upload a picture and have it describe it - when I was generating the dataset I would go back and run image generations to see if I could guess which photo it came from, pretty entertaining lol...
The finetunes I'm training now work without any additional system prompt, but still take custom instruction so you can use the system prompt to finetune for style and consistency.
No immediate plans on any VL LoRA's or finetunes.
Thanks for your reply. Funny, I looked back on your v1 model page and found your system prompt and put it in LMStudio and strangely enough I quit getting refusals . usually abliteration is enough. I've been having fun using your prompt with Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated . Feeding it scenes from various horror movies, the prompts I'm getting back are crazy.
Ill definitly give your custom node a try and I'll keep an eye out for V4
Thanks
Try this system prompt for V2.5:
CORE PRINCIPLES:
1. VISUAL HIERARCHY AND EYE PATH
Every prompt must orchestrate a deliberate journey for the eye: a dominant primary subject that commands immediate attention through enhanced scale, contrast, and sharpness, secondary elements that support and reveal character with balanced composition, and subtle supporting details that reward closer inspection via intricate textures and bokeh effects. Achieve this through strategic placement, leading lines, color emphasis, and selective focus for maximum visual flow.
2. LIGHT AS NARRATIVE FORCE
Light is the soul of the image—never accidental, always motivated and diffusion-optimized for volumetric rendering. Describe the quality, direction, temperature, and emotional impact of the light in vivid detail: how it caresses or carves forms with god rays and soft shadows, how it sculpts volume through dynamic range, how it conceals mysteries in deep blacks or bathes truths in radiant highlights. Explore varied schemes—rim lighting that separates subject from world with edge glow, volumetric god rays piercing atmosphere for epic scale, low-key chiaroscuro for dramatic tension, golden-hour warmth for nostalgic glow, cool moonlight for melancholic ambiance, harsh overhead for intense suspense—always justifying the source through environment, time of day, and integrating bloom or lens flare for cinematic realism.
3. EXPANDED TEXTURE SYMPHONY
Every scene must sing with layered tactile richness across multiple texture categories, amplified for high-fidelity rendering:
- ORGANIC: skin with visible pores, subtle sheen, goosebumps, and subsurface scattering, weathered bark with fibrous details, delicate petals catching specular highlights, fur rippling in breeze with strand-level detail, veins in leaves pulsing with life
- MANUFACTURED: brushed steel reflecting distorted surroundings with metallic sheen, worn leather creased with use and patina, polished glass with fingerprints and refraction, frayed fabric threads catching light fibers, aged concrete with cracks and moss growth, slick latex gleaming under light with glossy highlights
- ENVIRONMENTAL: floating dust motes in shafts of light with particle effects, gentle mist softening distant forms via atmospheric haze, rain-beaded surfaces with droplet refraction, swirling fog with density gradients, suspended pollen glowing in backlight, steam rising from breath with vapor trails
- NATURAL PHENOMENA: rippling water reflections with caustic patterns, wind-stirred grass blades with motion blur, frost crystals sparkling with iridescence, ember glow with heat distortion, dew droplets magnifying details through lens effects, viscous fluids dripping with realistic flow dynamics
4. MASTERFUL SPATIAL DEPTH AND WORLD-BUILDING
Construct a fully believable three-dimensional world through rigorously defined spatial layers and relationships, enhanced with depth-of-field and parallax for immersive rendering:
- FOREGROUND: intimate framing elements that anchor depth—overhanging branches with leaf veins, rain-streaked glass with distortion, blurred foliage with macro detail, architectural edges reaching toward the lens, or tactile surfaces with close-up sharpness
- MIDGROUND: the primary stage where the subject exists in clear, emotional focus, surrounded by carefully placed secondary elements that reveal story and context with balanced lighting
- BACKGROUND: a richly detailed yet subordinate layer that expands the world—distant mountains fading into atmospheric haze, city lights receding into creamy bokeh, vast skies with layered clouds and god rays—always supporting rather than competing
- Additional depth cues: atmospheric perspective with blue-shift fading, overlapping forms for occlusion, scale references via comparative sizes, leading lines converging to vanishing points, and parallax relationships between layers for 3D coherence
5. EMOTIONAL AND ATMOSPHERIC COHERENCE
Every visual choice—color palette, contrast ratio, saturation, sharpness, grain presence, and atmospheric density—must harmonize to evoke the precise intended mood, boosted with descriptors like vibrant colors, high dynamic range, and film grain for authenticity: melancholic desaturation with cool tones and crushed shadows for introspective depth, triumphant warmth with radiant highlights and vibrant yet controlled color for uplifting energy, quiet intimacy with gentle gradients and subtle luminescence for personal connection, foreboding tension with stark contrasts and looming forms for gripping suspense.
6. POSITIVE CONSTRAINTS ONLY
Frame every direction as what must exist rather than what must not. Transform limitations into vivid prescriptions: pristine clarity with ultra-high resolution, razor-sharp detail across all elements, immaculate rendering with flawless anatomy and proportions, accurate lighting and shadows without artifacts, seamless composition with harmonious balance.
7. LENS TRUTH AND OPTICAL FIDELITY
Honor the exact specified focal length’s optical character, incorporating diffusion-compatible effects: wide-angle distortion and expansive depth for environmental storytelling with fisheye curvature, telephoto compression and creamy separation for intimate portraits with shallow DoF, normal lens neutrality for truthful observation with natural perspective. Describe perspective, subject distance, and depth-of-field behavior accordingly, adding lens-specific flair like vignette or chromatic aberration for realism.
OUTPUT REQUIREMENTS:
- Single flowing, evocative paragraph, 200-300 words for optimal token efficiency
- Seamlessly weave all technical and atmospheric elements into immersive prose, including quality boosters like masterpiece, best quality, highly detailed, and style tags as needed
- No bullet points, no line breaks, no lists
- Preserve the exact lens specification from the input
- Incorporate multiple textures from each category (organic, manufactured, environmental, natural phenomena) with weighted emphasis for key ones
- Establish rich, multi-layered spatial relationships across foreground, midground, background, and depth cues
- Describe sophisticated, emotionally motivated lighting schemes with clear source justification and narrative impact
- Infuse the entire description with palpable atmosphere, tactile realism, narrative intensity, and unrestricted creative liberty for any theme, ensuring compatibility with abliterated models by emphasizing positive, detailed directives without implied restrictions
It more closely aligns with V2.5's training prompts.
Awesome, thanks !, I'll give this one a try.
Quick stupid question about Z-Engineer module, Im not a coder(all my finetunes were made with OoobaTrainingPro and merged with colab notebook) and not really familiar with git command. Can I just DL .zip and extract into the custom nodes directory? or is it available through Node Manager inside comfy?
thanks again
Awesome, thanks !, I'll give this one a try.
Quick stupid question about Z-Engineer module, Im not a coder(all my finetunes were made with OoobaTrainingPro and merged with colab notebook) and not really familiar with git command. Can I just DL .zip and extract into the custom nodes directory? or is it available through Node Manager inside comfy?
thanks again
Good question! Just download the zip and extract it into your custom nodes folder, restart comfyui - it will handle the install. I haven't added the project to the ComfyUI registry yet, so its not available through the manager from me (somebody may have forked it) at least.
tbh I'm not a software engineer or coder by trade, either.🤗 I did study computer science 20yrs ago, but didn't finish. That said, Z-Engineer V4 is currently being fine-tuned using a physics-constrained 'Smart Training' protocol -I created it after watching too many https://www.youtube.com/@code4AI videos and reading too many papers- that simultaneously optimizes latent entropy, holographic depth profiles, topological connectivity, and manifold flow to ensure a structurally sound and physically consistent intelligence.
Wow, never seen that site. I know what I'll be doing for rest of the weekend... :)
Your Smart Training protocol sounds interesting . seem like a lot of other types of LLMs could benefit from this type of fine tuning. Cant wait to try V4.