BonusLockSMith commited on
Commit
f4f2c6d
·
verified ·
1 Parent(s): f42ba03

Upload 10 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ examples/example_01.png filter=lfs diff=lfs merge=lfs -text
37
+ examples/example_02.png filter=lfs diff=lfs merge=lfs -text
38
+ examples/example_03.png filter=lfs diff=lfs merge=lfs -text
LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2025 GritAI Solutions LLC
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
README.md CHANGED
@@ -1,3 +1,142 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # VisionHarvester v1 — Identity-Safe Image Style & Pose Extractor
2
+ **By GritAI Solutions LLC**
3
+
4
+ VisionHarvester v1 is a lightweight prompt-based extraction tool for creators working with Stable Diffusion, ComfyUI, SDXL, LoRA training, and Qwen-VL workflows.
5
+
6
+ It converts a reference image into clean, modular, identity-safe text blocks you can reuse for character building, style replication, dataset generation, and scene reconstruction — without copying real people.
7
+
8
+ ---
9
+
10
+ ## 🚀 What VisionHarvester v1 Does
11
+
12
+ VisionHarvester is built around five reusable components:
13
+
14
+ ### 1. Base Identity (Safe & Generic)
15
+ A neutral description of the subject that includes:
16
+ - General body type
17
+ - Hair color and basic hairstyle
18
+ - Clothing and fabric behavior
19
+ - Broad, non-identifying facial description
20
+
21
+ No pose, no emotion, no personality.
22
+
23
+ ### 2. Pose (Geometry Only)
24
+ Short, comma-separated fragments describing:
25
+ - Limb positions
26
+ - Body orientation
27
+ - Weight distribution
28
+ - Head/hip angles
29
+
30
+ No outfit, no style, no emotion.
31
+
32
+ ### 3. Outfit & Materials
33
+ Details about:
34
+ - Clothing type and cut
35
+ - Colors
36
+ - Fabric texture and behavior (matte, glossy, stretchy)
37
+ - Accessories
38
+
39
+ ### 4. Camera & Lighting
40
+ Information about:
41
+ - Framing (close-up, half body, full body)
42
+ - Camera angle / lens feel
43
+ - Lighting direction and softness
44
+ - Major shadows and highlights
45
+
46
+ ### 5. Style Tags
47
+ Reusable tags such as:
48
+ - studio fitness look
49
+ - clean background
50
+ - soft cinematic lighting
51
+ - high-resolution texture
52
+
53
+ These drop straight into Stable Diffusion prompts.
54
+
55
+ ---
56
+
57
+ ## 📂 Included in This Repository
58
+
59
+ - README.md — this documentation
60
+ - LICENSE — MIT license
61
+ - isionharvester_v1_extractor.prompt.txt — main extraction prompt
62
+ - VisionHarvester-PoseStyleExtractor.json — ComfyUI workflow (optional)
63
+ - examples/ — sample images and their extracted outputs
64
+
65
+ Example files:
66
+ - examples/example_01.png
67
+ - examples/example_01_output.txt
68
+ - examples/example_02.png
69
+ - examples/example_02_output.txt
70
+ - examples/example_03.png
71
+ - examples/example_03_output.txt
72
+
73
+ ---
74
+
75
+ ## 🟩 Main Extraction Prompt (v1)
76
+
77
+ This is the core VisionHarvester v1 prompt shipped in isionharvester_v1_extractor.prompt.txt:
78
+
79
+ \\\
80
+ Extract a clean, neutral description of the woman in the image.
81
+
82
+ Keep it simple:
83
+ • No pose or body positioning
84
+ • No emotions or personality
85
+ • No unique facial identifiers
86
+ • No NSFW content
87
+ • Do describe hair, body type (general), clothing, colors, fabrics, and broad facial features
88
+
89
+ Output 2–4 sentences that would work as a Stable Diffusion base identity block.
90
+ \\\
91
+
92
+ Use this in:
93
+ - Qwen-VL custom prompt
94
+ - Any Vision-LLM
95
+ - ComfyUI Qwen nodes
96
+ - Image-to-text or SD prompt pipelines
97
+
98
+ ---
99
+
100
+ ## 🖼 Example Outputs
101
+
102
+ Example of the kind of identity-safe text this prompt produces:
103
+
104
+ \\\
105
+ An athletic woman with long dark hair, a medium tan complexion, and soft neutral facial features without distinctive identifiers. She is wearing a fitted black sports bra made from matte stretch fabric and high-waisted leggings. Her appearance is clean, simple, and suitable as a Stable Diffusion base identity.
106
+ \\\
107
+
108
+ ---
109
+
110
+ ## 🔒 Identity & Safety
111
+
112
+ VisionHarvester v1 is designed to:
113
+ - Avoid 1:1 face cloning
114
+ - Avoid unique facial markers
115
+ - Avoid real-person or celebrity references
116
+ - Avoid explicit or NSFW content
117
+
118
+ It focuses on style, clothing, pose, and scene — not identity.
119
+
120
+ ---
121
+
122
+ ## 🧩 Use Cases
123
+
124
+ - Character consistency
125
+ - Pose and outfit reuse
126
+ - LoRA dataset prep
127
+ - Style transfer
128
+ - Scene reconstruction
129
+ - Visual prompt creation
130
+ - Multi-lane ComfyUI pipelines
131
+
132
+ ---
133
+
134
+ ## 🧱 Author
135
+
136
+ **GritAI Solutions LLC**
137
+ Robert "BonusLockSmith" Lucyk
138
+ Lawton, Oklahoma
139
+
140
+ ---
141
+
142
+ MIT licensed. Free for personal and commercial use.
VisionHarvester-PoseStyleExtractor.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"id":"6e0f9792-1d56-4fd7-ad14-d1269803c658","revision":0,"last_node_id":21,"last_link_id":25,"nodes":[{"id":4,"type":"SaveImage","pos":[-1296.7420260001102,-3007.666190531094],"size":[270,270],"flags":{},"order":7,"mode":0,"inputs":[{"localized_name":"images","name":"images","type":"IMAGE","link":3},{"localized_name":"filename_prefix","name":"filename_prefix","type":"STRING","widget":{"name":"filename_prefix"},"link":null}],"outputs":[],"properties":{"cnr_id":"comfy-core","ver":"0.3.71","Node name for S&R":"SaveImage","ue_properties":{"widget_ue_connectable":{},"input_ue_unconnectable":{},"version":"7.4.1"}},"widgets_values":["VisionHarvester/images/ComfyUI_"]},{"id":2,"type":"ImageResize+","pos":[-2589.521058680804,-2739.633891185613],"size":[270,218],"flags":{},"order":4,"mode":0,"inputs":[{"localized_name":"image","name":"image","type":"IMAGE","link":1},{"localized_name":"width","name":"width","type":"INT","widget":{"name":"width"},"link":null},{"localized_name":"height","name":"height","type":"INT","widget":{"name":"height"},"link":null},{"localized_name":"interpolation","name":"interpolation","type":"COMBO","widget":{"name":"interpolation"},"link":null},{"localized_name":"method","name":"method","type":"COMBO","widget":{"name":"method"},"link":null},{"localized_name":"condition","name":"condition","type":"COMBO","widget":{"name":"condition"},"link":null},{"localized_name":"multiple_of","name":"multiple_of","type":"INT","widget":{"name":"multiple_of"},"link":null}],"outputs":[{"localized_name":"IMAGE","name":"IMAGE","type":"IMAGE","links":[2,3]},{"localized_name":"width","name":"width","type":"INT","links":null},{"localized_name":"height","name":"height","type":"INT","links":null}],"properties":{"cnr_id":"comfyui_essentials","ver":"1.1.0","Node name for S&R":"ImageResize+","ue_properties":{"widget_ue_connectable":{},"input_ue_unconnectable":{},"version":"7.4.1"}},"widgets_values":[832,1216,"lanczos","keep proportion","always",0]},{"id":1,"type":"LoadImage","pos":[-2983.0397680299106,-2730.082471465146],"size":[274.080078125,314],"flags":{},"order":0,"mode":0,"inputs":[{"localized_name":"image","name":"image","type":"COMBO","widget":{"name":"image"},"link":null},{"localized_name":"choose file to upload","name":"upload","type":"IMAGEUPLOAD","widget":{"name":"upload"},"link":null}],"outputs":[{"localized_name":"IMAGE","name":"IMAGE","type":"IMAGE","links":[1]},{"localized_name":"MASK","name":"MASK","type":"MASK","links":null}],"properties":{"cnr_id":"comfy-core","ver":"0.3.71","Node name for S&R":"LoadImage","ue_properties":{"widget_ue_connectable":{},"input_ue_unconnectable":{},"version":"7.4.1"}},"widgets_values":["v1_base_Paladin_01_00002_.png","image"]},{"id":15,"type":"VAEDecode","pos":[-1816.4155165225534,-1484.74221751387],"size":[210,46],"flags":{},"order":11,"mode":0,"inputs":[{"localized_name":"samples","name":"samples","type":"LATENT","link":20},{"localized_name":"vae","name":"vae","type":"VAE","link":21}],"outputs":[{"localized_name":"IMAGE","name":"IMAGE","type":"IMAGE","slot_index":0,"links":[22]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.65","Node name for S&R":"VAEDecode","ue_properties":{"widget_ue_connectable":{},"version":"7.4.1","input_ue_unconnectable":{}}},"widgets_values":[]},{"id":19,"type":"CheckpointLoaderSimple","pos":[-2731.415516522554,-1984.7422175138709],"size":[315,98],"flags":{},"order":1,"mode":0,"inputs":[{"localized_name":"ckpt_name","name":"ckpt_name","type":"COMBO","widget":{"name":"ckpt_name"},"link":null}],"outputs":[{"localized_name":"MODEL","name":"MODEL","type":"MODEL","slot_index":0,"links":[16]},{"localized_name":"CLIP","name":"CLIP","type":"CLIP","slot_index":1,"links":[23,24]},{"localized_name":"VAE","name":"VAE","type":"VAE","slot_index":2,"links":[21]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.65","Node name for S&R":"CheckpointLoaderSimple","ue_properties":{"widget_ue_connectable":{},"version":"7.4.1","input_ue_unconnectable":{}},"models":[{"name":"v1-5-pruned-emaonly-fp16.safetensors","url":"https://huggingface.co/Comfy-Org/stable-diffusion-v1-5-archive/resolve/main/v1-5-pruned-emaonly-fp16.safetensors?download=true","directory":"checkpoints"}]},"widgets_values":["cyberrealistic_v90.safetensors"]},{"id":14,"type":"KSampler","pos":[-1921.4155165225534,-2014.7422175138709],"size":[315,474],"flags":{},"order":10,"mode":0,"inputs":[{"localized_name":"model","name":"model","type":"MODEL","link":16},{"localized_name":"positive","name":"positive","type":"CONDITIONING","link":17},{"localized_name":"negative","name":"negative","type":"CONDITIONING","link":18},{"localized_name":"latent_image","name":"latent_image","type":"LATENT","link":19},{"localized_name":"seed","name":"seed","type":"INT","widget":{"name":"seed"},"link":null},{"localized_name":"steps","name":"steps","type":"INT","widget":{"name":"steps"},"link":null},{"localized_name":"cfg","name":"cfg","type":"FLOAT","widget":{"name":"cfg"},"link":null},{"localized_name":"sampler_name","name":"sampler_name","type":"COMBO","widget":{"name":"sampler_name"},"link":null},{"localized_name":"scheduler","name":"scheduler","type":"COMBO","widget":{"name":"scheduler"},"link":null},{"localized_name":"denoise","name":"denoise","type":"FLOAT","widget":{"name":"denoise"},"link":null}],"outputs":[{"localized_name":"LATENT","name":"LATENT","type":"LATENT","slot_index":0,"links":[20]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.65","Node name for S&R":"KSampler","ue_properties":{"widget_ue_connectable":{},"version":"7.4.1","input_ue_unconnectable":{}}},"widgets_values":[400782778445824,"randomize",20,8,"euler","normal",1]},{"id":5,"type":"Save Text File","pos":[-1338.6015503133458,-2578.2798589391555],"size":[306.1015625,202],"flags":{},"order":8,"mode":0,"inputs":[{"localized_name":"text","name":"text","type":"STRING","link":4},{"localized_name":"path","name":"path","type":"STRING","widget":{"name":"path"},"link":null},{"localized_name":"filename_prefix","name":"filename_prefix","type":"STRING","widget":{"name":"filename_prefix"},"link":null},{"localized_name":"filename_delimiter","name":"filename_delimiter","type":"STRING","widget":{"name":"filename_delimiter"},"link":null},{"localized_name":"filename_number_padding","name":"filename_number_padding","type":"INT","widget":{"name":"filename_number_padding"},"link":null},{"localized_name":"file_extension","name":"file_extension","shape":7,"type":"STRING","widget":{"name":"file_extension"},"link":null},{"localized_name":"encoding","name":"encoding","shape":7,"type":"STRING","widget":{"name":"encoding"},"link":null},{"localized_name":"filename_suffix","name":"filename_suffix","shape":7,"type":"STRING","widget":{"name":"filename_suffix"},"link":null}],"outputs":[],"properties":{"cnr_id":"was-ns","ver":"3.0.1","Node name for S&R":"Save Text File","ue_properties":{"widget_ue_connectable":{},"input_ue_unconnectable":{},"version":"7.4.1"}},"widgets_values":["VisionHarvester/descriptions/ComfyUI_","","_",4,".txt","utf-8",""]},{"id":16,"type":"SaveImage","pos":[-1571.4155165225534,-2014.7422175138709],"size":[470,560],"flags":{},"order":12,"mode":0,"inputs":[{"localized_name":"images","name":"images","type":"IMAGE","link":22},{"localized_name":"filename_prefix","name":"filename_prefix","type":"STRING","widget":{"name":"filename_prefix"},"link":null}],"outputs":[],"properties":{"cnr_id":"comfy-core","ver":"0.3.65","Node name for S&R":"SaveImage","ue_properties":{"widget_ue_connectable":{},"version":"7.4.1","input_ue_unconnectable":{}}},"widgets_values":["VisionHarvester/rerun/ComfyUI_"]},{"id":18,"type":"EmptyLatentImage","pos":[-2271.4155165225534,-1494.74221751387],"size":[315,106],"flags":{},"order":2,"mode":0,"inputs":[{"localized_name":"width","name":"width","type":"INT","widget":{"name":"width"},"link":null},{"localized_name":"height","name":"height","type":"INT","widget":{"name":"height"},"link":null},{"localized_name":"batch_size","name":"batch_size","type":"INT","widget":{"name":"batch_size"},"link":null}],"outputs":[{"localized_name":"LATENT","name":"LATENT","type":"LATENT","slot_index":0,"links":[19]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.65","Node name for S&R":"EmptyLatentImage","ue_properties":{"widget_ue_connectable":{},"version":"7.4.1","input_ue_unconnectable":{}}},"widgets_values":[832,1216,1]},{"id":17,"type":"CLIPTextEncode","pos":[-2381.4155165225534,-1774.7422175138695],"size":[425.27801513671875,180.6060791015625],"flags":{},"order":5,"mode":0,"inputs":[{"localized_name":"clip","name":"clip","type":"CLIP","link":23},{"localized_name":"text","name":"text","type":"STRING","widget":{"name":"text"},"link":null}],"outputs":[{"localized_name":"CONDITIONING","name":"CONDITIONING","type":"CONDITIONING","slot_index":0,"links":[18]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.65","Node name for S&R":"CLIPTextEncode","ue_properties":{"widget_ue_connectable":{},"version":"7.4.1","input_ue_unconnectable":{}}},"widgets_values":["bad anatomy, deformed face, asymmetry, wrong skin tone, blurry, cartoon, CGI, extra limbs"],"color":"#223","bgcolor":"#335"},{"id":20,"type":"CLIPTextEncode","pos":[-2387.706260435692,-1984.0581775706505],"size":[422.84503173828125,164.31304931640625],"flags":{},"order":9,"mode":0,"inputs":[{"localized_name":"clip","name":"clip","type":"CLIP","link":24},{"localized_name":"text","name":"text","type":"STRING","widget":{"name":"text"},"link":25}],"outputs":[{"localized_name":"CONDITIONING","name":"CONDITIONING","type":"CONDITIONING","slot_index":0,"links":[17]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.65","Node name for S&R":"CLIPTextEncode","ue_properties":{"widget_ue_connectable":{},"version":"7.4.1","input_ue_unconnectable":{}}},"widgets_values":[""],"color":"#232","bgcolor":"#353"},{"id":21,"type":"Note","pos":[-3475.4594977571664,-2650.763830733268],"size":[417.7683774744314,547.9715107219222],"flags":{},"order":3,"mode":0,"inputs":[],"outputs":[],"title":"ootblock","properties":{},"widgets_values":["🟩 VisionHarvester Hybrid vNext — Chatbot Bootblock\n\nRole:\nYou are VisionHarvester, a pose, style, and prompt extraction assistant built on top of a hybrid Stable Diffusion workflow (SD1.5 → SDXL) with Qwen-VL for image understanding.\n\nYour job is to:\n\nExplain what this workflow does in clear, non-technical language to normal users.\n\nHelp users pull out styles, outfits, poses, camera angles, and lighting from reference images.\n\nReturn Stable Diffusion–ready prompt blocks that plug directly into a multi-lane ComfyUI workflow.\n\nKeep faces generic and non-identifying. Users can get everything except a detailed face description.\n\n1. High-Level Explanation (User-Facing)\n\nWhen a new user arrives, briefly explain yourself like this (paraphrase is okay, but keep the structure):\n\nI’m VisionHarvester.\nYou can give me reference images, and I’ll break them down into clear building blocks for image generation:\n\nPose (body position and angles)\n\nOutfit & materials (clothing, fabric types, accessories)\n\nCamera & lighting (framing, lens feel, light direction)\n\nStyle tags (e.g., studio fitness look, soft cinematic lighting)\n\nI’ll turn those into Stable Diffusion–ready prompt fragments that you or your workflow can reuse.\nI deliberately keep facial details vague to protect identity — you get the full style and body language without a 1:1 face copy.\n\nDo not mention internal implementation details like ControlNet, SD1.5 vs SDXL, or Qwen-VL unless the user explicitly asks.\n\n2. Core Capabilities (What You Do)\n\nYou support these main tasks:\n\nPose Extraction\n\nOutput short, comma-separated fragments describing only body geometry.\n\nNo mood, no style, no clothing, no face details.\n\nOutfit & Material Extraction\n\nDescribe clothing type, cut, length, fabric behavior (matte, glossy, stretchy), and colors.\n\nAvoid brand logos or real-world branding unless the user explicitly wants them.\n\nCamera & Lighting Extraction\n\nDescribe framing (close-up, half body, full body), camera angle, and approximate focal length feel.\n\nDescribe lighting direction, softness, and major shadows/highlights.\n\nStyle & Atmosphere Tags\n\nDescribe overall look: studio fitness, lifestyle, high-key, low-key, etc.\n\nThese should be usable as style tags in a prompt.\n\nPrompt Block Assembly\n\nOrganize outputs into separate lanes:\n\nBase Identity (public-safe, no pose)\n\nPose\n\nOutfit\n\nCamera & Lighting\n\nStyle/Detail\n\nOptionally suggest negative prompt fragments if user asks (e.g., “no extra limbs, no distorted hands”).\n\n3. Safety & Face Handling Rules\n\nYou must always follow these constraints:\n\nNo detailed face cloning.\n\nDescribe faces only in broad, non-identifying terms:\n\nAllowed: “soft neutral facial features”, “generic feminine face”, “short beard”, “light makeup”\n\nNot allowed: specific scars, moles, freckles patterns, celebrity likeness, or any unique marks.\n\nBase Identity is public-safe and generic.\n\nThe default body description should be something like:\n\n“an athletic woman with long chestnut hair, wearing a black halter cropped sports top made from smooth stretch fabric and fitted dark yoga shorts with clean seams and a matte finish, soft neutral facial features without distinctive marks”\n\nDo not include pose language (no “standing”, “arms lifted”, etc.) in this Base Identity block.\n\nNo emotions, no personality terms in Base Identity.\n\nNo explicit content.\n\nIf a user pushes for NSFW, refuse and gently redirect to safe, clothed, fitness or casual styles.\n\n4. Internal Prompt Structure (How You Format Outputs)\n\nWhenever you return SD-ready content, use this structure by default:\n\n[BASE IDENTITY — PUBLIC-SAFE]\nan athletic woman with long chestnut hair, wearing a black halter cropped sports top made from smooth stretch fabric and fitted dark yoga shorts with clean seams and a matte finish, soft neutral facial features without distinctive marks\n\n[POSE]\ntorso facing slight left, shoulders relaxed, left hip shifted, right leg straight, left leg slightly bent, arms hanging naturally at sides, head level, weight mostly on right leg\n\n[OUTFIT & MATERIALS]\nblack halter cropped sports top, smooth matte athletic fabric, fitted dark yoga shorts with clean seams, no logos, subtle fabric folds at waist and hips\n\n[CAMERA & LIGHTING]\nfull body framing, eye-level camera, approximately 50mm lens feel, soft even gym-style lighting, faint reflections on floor, slightly blurred background\n\n[STYLE & DETAIL TAGS]\nstudio fitness look, clean background, subtle skin sheen, realistic fabric folds, high-resolution, minimal clutter\n\n[OPTIONAL NEGATIVE PROMPT]\nextra limbs, deformed hands, distorted anatomy, text, watermarks, logos\n\n\nRules:\n\nNo pose terms in [BASE IDENTITY].\n\n[POSE] is always short, comma-separated geometry fragments.\n\n[OUTFIT & MATERIALS] focuses on clothing & fabric behavior, not anatomy.\n\n[CAMERA & LIGHTING] is practical: framing + lens feel + light description.\n\n[STYLE & DETAIL TAGS] are reusable tags, not full sentences.\n\n5. How Users “Pull Styles and Such” Out of Images\n\nYou must help users in plain language. Examples of supported requests:\n\n“Extract just the pose from this image.”\n\n“Give me the outfit and material description only.”\n\n“I want the lighting and camera angle from this shot, but nothing else.”\n\n“Turn this into SD prompt blocks I can reuse, with a generic safe face.”\n\n“Give me multiple style tags that describe this image’s vibe.”\n\n“Combine this pose with a different gym style — outline the blocks.”\n\nFor each, respond using the block structure above and only fill in relevant sections.\nIf they ask for “pose only,” you might reply:\n\n[POSE]\ntorso facing forward, shoulders slightly back, both legs straight, feet shoulder-width apart, arms relaxed by sides, head looking straight ahead\n\n6. Architectural Constraints (Don’t Break These)\n\nInternally you are aware of the following design rules and always respect them, even if the user isn’t talking in those terms:\n\nNo pose language in Base Identity.\nPose is handled entirely in [POSE] + ControlNet in the underlying workflow.\n\nNo abstract words in Base Identity.\nNo “confident”, “cinematic”, “hyper-realistic”, etc. Base Identity = literal anatomy + clothing.\n\nPose text must be narrow and geometric.\nShort fragments, no style or emotion.\n\nIdentity & outfit live in Base/Outfit lanes only.\nPose, camera, style, and detail must not change identity.\n\nFace is intentionally generic.\nYou never output detailed, unique facial identifiers.\n\n7. Example User Prompts & Expected Behavior\n\nExample 1 — Full Breakdown\n\nUser:\n\nHere’s an image. Break it into reusable prompt blocks, but keep the face generic and don’t include any real-world brand logos.\n\nAssistant:\n\nReturn all blocks: [BASE IDENTITY], [POSE], [OUTFIT & MATERIALS], [CAMERA & LIGHTING], [STYLE & DETAIL TAGS], optionally [NEGATIVE PROMPT].\n\nMake facial description generic.\n\nReplace specific brand names with generic terms like “running shoes”, “sports leggings”.\n\nExample 2 — Style Extraction Only\n\nUser:\n\nI like the style of this photo. Just give me the camera, lighting, and style tags.\n\nAssistant:\n\nFill only [CAMERA & LIGHTING] and [STYLE & DETAIL TAGS].\n\nLeave identity and pose out unless explicitly requested.\n\nExample 3 — Pose Reuse\n\nUser:\n\nGive me the pose from this image so I can reuse it in my own character workflow.\n\nAssistant:\n\nReturn only [POSE] with clean, geometric fragments.\n\nNo style, no emotion, no clothing details.\n\n8. Interaction Rules\n\nUse clear, direct language. The user shouldn’t need to know ComfyUI or SD internals.\n\nIf they want more control, suggest:\n\n“I can also split this into pose, outfit, lighting, and style tags if you’d like finer control.”\n\nIf they submit text-only (no image), you can still:\n\nHelp them draft blocks in the same [BASE IDENTITY] / [POSE] / [CAMERA & LIGHTING] / [STYLE & DETAIL TAGS] structure based on their description.\n\nIf they ask for something you can’t safely provide (exact face clone, explicit content), refuse and offer a safe alternative.\n\nEnd of Bootblock"],"color":"#432","bgcolor":"#653"},{"id":3,"type":"AILab_QwenVL","pos":[-2225.4773794667894,-2958.681206862904],"size":[510.3534331830017,445.77510957882987],"flags":{},"order":6,"mode":0,"inputs":[{"localized_name":"image","name":"image","shape":7,"type":"IMAGE","link":2},{"localized_name":"video","name":"video","shape":7,"type":"IMAGE","link":null},{"localized_name":"model_name","name":"model_name","type":"COMBO","widget":{"name":"model_name"},"link":null},{"localized_name":"quantization","name":"quantization","type":"COMBO","widget":{"name":"quantization"},"link":null},{"localized_name":"attention_mode","name":"attention_mode","type":"COMBO","widget":{"name":"attention_mode"},"link":null},{"localized_name":"preset_prompt","name":"preset_prompt","type":"COMBO","widget":{"name":"preset_prompt"},"link":null},{"localized_name":"custom_prompt","name":"custom_prompt","type":"STRING","widget":{"name":"custom_prompt"},"link":null},{"localized_name":"max_tokens","name":"max_tokens","type":"INT","widget":{"name":"max_tokens"},"link":null},{"localized_name":"keep_model_loaded","name":"keep_model_loaded","type":"BOOLEAN","widget":{"name":"keep_model_loaded"},"link":null},{"localized_name":"seed","name":"seed","type":"INT","widget":{"name":"seed"},"link":null}],"outputs":[{"localized_name":"RESPONSE","name":"RESPONSE","type":"STRING","links":[4,25]}],"properties":{"cnr_id":"ComfyUI-QwenVL","ver":"1f6af2528168650fdf2ee544572549d59dc2824a","Node name for S&R":"AILab_QwenVL","ue_properties":{"widget_ue_connectable":{},"input_ue_unconnectable":{},"version":"7.4.1"},"aux_id":"1038lab/ComfyUI-QwenVL"},"widgets_values":["Qwen3-VL-2B-Instruct","None (FP16)","auto","🖼️ Tags","You are VisionHarvester.\n\nExtract ONLY a clean, neutral, identity-safe description of the woman in the image.\n\nFollow these strict rules:\n\n1. DO NOT describe pose, stance, limb position, or body angles.\n2. DO NOT describe emotion, attitude, personality, or expression.\n3. DO NOT describe unique facial identifiers (no freckles, moles, scars, specific face shape).\n4. DO NOT mention real people, celebrities, or names.\n5. DO NOT include NSFW content or anything suggestive.\n\n6. DO describe:\n - General body type (athletic, slim, curvy, average, etc.)\n - Hair color, length, and overall style (generic only)\n - Clothing type and materials (fabric texture, cut, color)\n - Accessories (if any)\n - Very broad face description (“soft neutral facial features,” “generic feminine face,” etc.)\n - Skin tone in broad terms (“light,” “medium,” “tan,” “deep”)\n\n7. Keep the output in ONE paragraph, 2–4 sentences max.\n\n8. Output ONLY the description. No commentary, no labels.\n\nThe final output must be suitable for a Stable Diffusion [BASE IDENTITY — PUBLIC-SAFE] block.\n",512,true,5,"fixed"],"color":"#28403f","bgcolor":"#374539"}],"links":[[1,1,0,2,0,"IMAGE"],[2,2,0,3,0,"IMAGE"],[3,2,0,4,0,"IMAGE"],[4,3,0,5,0,"STRING"],[16,19,0,14,0,"MODEL"],[17,20,0,14,1,"CONDITIONING"],[18,17,0,14,2,"CONDITIONING"],[19,18,0,14,3,"LATENT"],[20,14,0,15,0,"LATENT"],[21,19,2,15,1,"VAE"],[22,15,0,16,0,"IMAGE"],[23,19,1,17,0,"CLIP"],[24,19,1,20,0,"CLIP"],[25,3,0,20,1,"STRING"]],"groups":[],"config":{},"extra":{"ue_links":[],"ds":{"scale":1.3109994191500136,"offset":[3216.263730597563,2932.0583917462063]},"links_added_by_ue":[],"frontendVersion":"1.28.8","VHS_latentpreview":false,"VHS_latentpreviewrate":0,"VHS_MetadataImage":true,"VHS_KeepIntermediate":true},"version":0.4}
examples/example_01.png ADDED

Git LFS Details

  • SHA256: f1bb7199ba1912911c87d98f4d2360261af5ab762a6a2e7c7036ce1d0f8d290a
  • Pointer size: 131 Bytes
  • Size of remote file: 721 kB
examples/example_01_output.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ A woman with long dark brown wavy hair, medium skin tone, wearing a fitted olive green sports bra made from smooth stretch fabric, standing against a plain beige background. She has soft neutral facial features, an athletic build, and is accessorized with small silver earrings. Her posture is relaxed with one hand on her hip and the other behind her head. The lighting highlights her toned physique and natural appearance without emphasizing any particular detail beyond generic aesthetic elements.
examples/example_02.png ADDED

Git LFS Details

  • SHA256: f81de9fa633c371951ed4c5f321a69b7c33dd7a36b13dc69d66eebdc9920a390
  • Pointer size: 131 Bytes
  • Size of remote file: 896 kB
examples/example_02_output.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ A slender athletic woman with long brown hair styled straight down her back stands facing forward against a light gray background. She wears black performance sportswear consisting of an ergonomic sports bra with a high-cut waistband and form-fitting leggings that extend to her ankles. Her skin has a medium-tone complexion with soft neutral facial features, suggesting a calm presence without overt expressions. The clothing appears smooth and stretchy, likely made from moisture-wicking fabric designed for comfort during physical activity.
examples/example_03.png ADDED

Git LFS Details

  • SHA256: 4bd1de7422b035512694c358fd14f86d083d7e0c0882aa266418fe0891547677
  • Pointer size: 132 Bytes
  • Size of remote file: 1.26 MB
examples/example_03_output.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ A woman with long, wavy red hair styled loosely over her shoulders stands confidently against a plain gray background. She wears an armored top made from metallic silver material that covers her chest and upper abdomen, paired with brown leather shorts featuring fur trim at the sides. Her arms are adorned with matching arm guards on both wrists, while she holds a large sword with a detailed hilt in one hand. The outfit includes knee-high boots crafted from dark brown leather with metal accents around the joints. A wide belt cinches her waist, fastened by ornate detailing near the center. Her skin has a medium tone, appearing smooth under even lighting, with soft-neutral facial features suggesting a balanced build.
visionharvester_v1_extractor.prompt.txt ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ Extract a clean, neutral description of the woman in the image.
2
+
3
+ Keep it simple:
4
+ • No pose or body positioning
5
+ • No emotions or personality
6
+ • No unique facial identifiers
7
+ • No NSFW content
8
+ • Do describe hair, body type (general), clothing, colors, fabrics, and broad facial features
9
+
10
+ Output 2–4 sentences that would work as a Stable Diffusion base identity block.