Multiple-Angles LoRA Comparison Report
Test: Single prompt, same LoRA, same input across all three spaces.
Prompt: "Rotate the camera 45 degrees to the left."
LoRA: Multiple-Angles
Input image: 2780a5956b353a2d58d6dfd931f1be61b124bf99c9a52fb9e67f87cbefe06d34.png (wall clock)
0. Sources, inspiration & AIO versions
The three spaces are built on Pr0f3ssi0n4ln00b/Phr00t-Qwen-Rapid-AIO (the “extracted” Diffusers-safe model), not on the original ComfyUI checkpoint. That extracted repo’s README states:
- Inspired by: linoyts/Qwen-Image-Edit-Rapid-AIO — i.e. the extraction idea (making Phr00t’s merge usable in a Diffusers/LoRA workflow) was inspired by linoyts’ space/repo; the actual weights still come from Phr00t’s merge.
- Original Phr00t repo: Phr00t/Qwen-Image-Edit-Rapid-AIO — the source merge (ComfyUI, 1 CFG, 4-step, etc.). Phr00t’s model card notes that v19 is likely best for consistency in edits, v23 for prompt adherence.
- Extraction: Done via the Space Pr0f3ssi0n4ln00b/Extract-diffusers, so the Hugging Face model Pr0f3ssi0n4ln00b/Phr00t-Qwen-Rapid-AIO is a Diffusers-compatible extraction of Phr00t’s versions.
Versions in the extracted repo: v14.1, v18.1, v19, v20, v21, v22, v23 (same version numbers as Phr00t/Qwen-Image-Edit-Rapid-AIO).
Which version is used in each space (by code):
| Space | Model repo (code) | Default version | Override |
|---|---|---|---|
| Base (Qwen-Image-Edit-Rapid-AIO-Loras) | Pr0f3ssi0n4ln00b/Phr00t-Qwen-Rapid-AIO |
v19 | Env / Space variable AIO_VERSION |
| Experimental | same | v19 | same |
| Experimental-2 | same | v19 | same |
All three spaces use the same AIO_REPO_ID and DEFAULT_AIO_VERSION = "v19". If AIO_VERSION is not set (or load fails), they fall back to v19. So out of the box they all use Phr00t’s v19 via the extracted repo — i.e. the “consistency in edits” variant per Phr00t’s own note. A Space maintainer can set AIO_VERSION (e.g. to v21 or v23) when duplicating the space; the code does not hardcode a different version per space.
Summary: The LoRA spaces are inspired by / built on the extracted Pr0f3ssi0n4ln00b/Phr00t-Qwen-Rapid-AIO; that extraction was inspired by linoyts/Qwen-Image-Edit-Rapid-AIO; the underlying merge is Phr00t/Qwen-Image-Edit-Rapid-AIO. Version used by default in all three spaces: v19 (Phr00t’s “best for consistency” version).
1. Resolution & metadata
| Asset | Dimensions | Format | DPI | Color | File size |
|---|---|---|---|---|---|
| Input | 1024×1024 | PNG | 72×72 | RGB, 8bpc | ~644 KB |
| Base | 1024×1024 | PNG | 72×72 | RGB, 8bpc | ~661 KB |
| Experimental | 1024×1024 | PNG | 72×72 | RGB, 8bpc | ~673 KB |
| Experimental-2 | 1024×1024 | PNG | 72×72 | RGB, 8bpc | ~674 KB |
Conclusion: All three outputs match the input resolution (1024×1024). Same format, DPI, and color space. File size varies only slightly (encoding); no resolution up/downscale.
2. Accuracy / consistency ranking
Best → worst: Base > Experimental > Experimental-2.
Important: Base is the lesser of evils, not a perfect result. All three outputs show some drift from the input (e.g. text, fine patterns); base simply preserves colors, shadows, clock hands, and letter consistency better than the two experimental spaces. For “same content, new angle” the base pipeline is the best of the three, but none of them are pixel-perfect faithful.
3. Consistency priority (what matters most)
Order of importance for “same as input, just rotated”:
- Colors & shadows – Overall color and shadow direction/shape must match (e.g. top-front-right lighting, shadow under/left of clock). Base preserves this best.
- Main structural elements – Clock hands, bezel, pendulum; position and proportions. Base keeps these most accurate.
- Letter / text consistency – Legibility and correctness of text on the clock face (e.g. “WALL BE ESION” / “WALELN STOCK”). Base is better; experimentals drift more.
- Fine details – Tick marks, numeral style (e.g. “IIII” vs “IV”), material patterns, counts, small decorative consistency. These matter after the above; base still best, experimentals show more variation/loss.
4. Likely causes (technical)
- Base uses long-edge resolution (1024) and a minimal pipeline: no
pad_to_canvas, novae_image_indices, no area-based canvas, no extra refs. One image in, one resolution path → less re-encoding and fewer ways for the scene to drift. - Experimental / Experimental-2 use area-based (megapixel) canvas, pad_to_canvas, extras conditioning-only, and (in Experimental) resolution_multiple and optional decoder_vae. More processing and conditioning paths can change how the latent and decoder preserve colors, shadows, and fine detail for a simple single-image edit like “rotate 45° left.”
For a strict “same content, new angle” edit, the simpler base pipeline appears better at preserving input fidelity; the experimental features seem tuned for flexibility (multi-ref, pose/depth, high-res) rather than maximal consistency on single-image edits.
5. Code differences: what this is about and how it's done
Below are the precise code differences that explain the behavior: how resolution and pipeline options are chosen in each space and why base tends to preserve input fidelity better for single-image edits.
5.1 Resolution: long-edge (base) vs area-based (experimental)
Base fixes the long edge (e.g. 1024) and derives width/height from the image aspect ratio. No megapixel slider. For Multiple-Angles there is no target_long_edge in the adapter spec, so it uses the default 1024.
# BASE: Qwen-Image-Edit-Rapid-AIO-Loras/app.py
def _round8(x: int) -> int:
return max(8, (int(x) // 8) * 8)
def compute_dimensions(image: Image.Image, long_edge: int) -> tuple[int, int]:
w, h = image.size
if w >= h:
new_w = long_edge
new_h = int(round(long_edge * (h / w)))
else:
new_h = long_edge
new_w = int(round(long_edge * (w / h)))
return _round8(new_w), _round8(new_h)
def get_target_long_edge_for_lora(lora_adapter: str) -> int:
spec = ADAPTER_SPECS.get(lora_adapter, {})
return int(spec.get("target_long_edge", 1024))
# In infer():
target_long_edge = get_target_long_edge_for_lora(lora_adapter)
width, height = compute_dimensions(img1, target_long_edge)
Experimental / Experimental-2 use area-based sizing: a target pixel area (from megapixels or adapter spec) is converted to (width, height) via the pipeline's calculate_dimensions, then rounded to a lattice multiple.
# EXPERIMENTAL & EXPERIMENTAL-2
def compute_canvas_dimensions_from_area(image, target_area, multiple_of):
w, h = image.size
aspect = w / h if h else 1.0
from qwenimage.pipeline_qwenimage_edit_plus import calculate_dimensions
width, height = calculate_dimensions(int(target_area), float(aspect))
width = _round_to_multiple(int(width), int(multiple_of))
height = _round_to_multiple(int(height), int(multiple_of))
return width, height
Experimental only: 0 MP means "match input area" (return int(w * h)). Experimental-2: no 0 = match; user megapixels used directly; in infer, multiple_of = int(pipe.vae_scale_factor * 2). Experimental uses UI resolution_multiple (32/56/112).
Takeaway: Base = one rule (long edge → dimensions). Experimentals add area/megapixel path and optional lattice; more moving parts can change effective canvas and VAE behavior.
5.2 Pipeline call: minimal (base) vs extended (experimental) vs middle (experimental-2)
Base — core args only:
# BASE
result = pipe(
image=pipe_images,
prompt=prompt,
negative_prompt=negative_prompt,
height=height,
width=width,
num_inference_steps=steps,
generator=generator,
true_cfg_scale=guidance_scale,
).images[0]
return result, seed
Experimental — adds VAE routing, padding, resolution lattice, extra-ref VAE area, optional Wan2.1 decoder:
# EXPERIMENTAL
vae_image_indices = None
if extras_condition_only:
if isinstance(pipe_images, list) and len(pipe_images) > 2:
vae_image_indices = [0, 1] if len(pipe_images) >= 2 else [0]
res_mult = int(resolution_multiple) if resolution_multiple else int(pipe.vae_scale_factor * 2)
vae_ref_area = int(mp_ref * 1024 * 1024) if mp_ref and mp_ref > 0 else None
_apply_vae_tiling(bool(vae_tiling))
result = pipe(
image=pipe_images, prompt=prompt, negative_prompt=negative_prompt,
height=height, width=width, num_inference_steps=steps,
generator=generator, true_cfg_scale=guidance_scale,
vae_image_indices=vae_image_indices,
pad_to_canvas=bool(pad_to_canvas),
resolution_multiple=res_mult,
vae_ref_area=vae_ref_area,
vae_ref_start_index=base_ref_count,
decoder_vae=str(decoder_vae).lower(),
keep_decoder_2x=bool(keep_decoder_2x),
).images[0]
Experimental-2 — same as experimental but no resolution_multiple, vae_ref_area, vae_ref_start_index, decoder_vae, keep_decoder_2x; only vae_image_indices and pad_to_canvas added over base.
Takeaway: More pipeline knobs (padding, VAE routing, lattice, ref area, decoder) help multi-image/high-res workflows but add paths where latent/decoder can diverge from a simple single-image edit. Base avoids those paths.
5.3 Engineering summary
| Aspect | Base | Experimental | Experimental-2 |
|---|---|---|---|
| Resolution | Long-edge (1024 default) | Area (MP; 0 = match input) | Area (MP; min 0.5) |
| Lattice multiple | Implicit (round to 8) | User: 32/56/112 | Fixed: vae_scale_factor*2 |
| pad_to_canvas | Not used | Used | Used |
| vae_image_indices | Not used | Used | Used |
| resolution_multiple, vae_ref_*, decoder_vae | Not passed | Passed | Not passed |
6. Summary table
| Criterion | Base | Experimental | Experimental-2 |
|---|---|---|---|
| Resolution vs input | Same (1024×1024) | Same | Same |
| Colors & shadows | Best | Weaker | Weaker |
| Clock hands / structure | Best | Weaker | Weaker |
| Letter consistency | Best | Weaker | Weaker |
| Fine details (patterns, etc.) | Best | Weaker | Weaker |
7. Conclusion after experiment
Fidelity: For the tested prompt ("Rotate the camera 45 degrees to the left") and Multiple-Angles LoRA, base is the best of the three but still not perfect. Colors, shadows, clock hands, and text are preserved better in base; experimentals show more drift. All three outputs share the same resolution as the input (1024×1024); the difference is in how the pipeline chooses resolution and which options it passes to the model (padding, VAE routing, lattice, decoder).
Cause: The base space uses a minimal path: long-edge resolution (1024) and a pipe() call with only the core arguments. The experimental spaces add area-based canvas sizing,
pad_to_canvas,vae_image_indices, and (in Experimental) resolution_multiple, vae_ref_*, and optional Wan2.1 decoder. Those features are useful for multi-image refs, pose/depth, and high-res workflows but introduce more processing steps and conditioning paths, which in this single-image angle-edit test led to worse input consistency (colors, shadows, structure, text, fine details).Consistency priority (observed): The order of what stayed most faithful in the best output was: (1) colors and shadows, (2) main structure (e.g. clock hands), (3) letter/text consistency, (4) fine details (patterns, counts, material). That order is a useful checklist when judging “same content, new angle” edits.
Recommendation: For single-image, angle-only edits where input fidelity matters most, use the base space and treat it as the lesser of evils—better than the two experimentals, but not pixel-perfect. Use Experimental or Experimental-2 when you need multi-image refs, pose/depth conditioning, or high-res/decoder options and can accept a consistency tradeoff. If you need to improve base further, the next lever is the pipeline itself (e.g. fewer steps, different scheduler, or LoRA strength), not adding the experimental pipeline options.
- Downloads last month
- -
Model tree for Manojb/Qwen-Image-Edit-Rapid-AIO-MultipleAngle
Base model
Qwen/Qwen-Image-Edit-2511