Multiple-Angles LoRA Comparison Report

Test: Single prompt, same LoRA, same input across all three spaces.
Prompt: "Rotate the camera 45 degrees to the left."
LoRA: Multiple-Angles
Input image: 2780a5956b353a2d58d6dfd931f1be61b124bf99c9a52fb9e67f87cbefe06d34.png (wall clock)


0. Sources, inspiration & AIO versions

The three spaces are built on Pr0f3ssi0n4ln00b/Phr00t-Qwen-Rapid-AIO (the “extracted” Diffusers-safe model), not on the original ComfyUI checkpoint. That extracted repo’s README states:

Versions in the extracted repo: v14.1, v18.1, v19, v20, v21, v22, v23 (same version numbers as Phr00t/Qwen-Image-Edit-Rapid-AIO).

Which version is used in each space (by code):

Space Model repo (code) Default version Override
Base (Qwen-Image-Edit-Rapid-AIO-Loras) Pr0f3ssi0n4ln00b/Phr00t-Qwen-Rapid-AIO v19 Env / Space variable AIO_VERSION
Experimental same v19 same
Experimental-2 same v19 same

All three spaces use the same AIO_REPO_ID and DEFAULT_AIO_VERSION = "v19". If AIO_VERSION is not set (or load fails), they fall back to v19. So out of the box they all use Phr00t’s v19 via the extracted repo — i.e. the “consistency in edits” variant per Phr00t’s own note. A Space maintainer can set AIO_VERSION (e.g. to v21 or v23) when duplicating the space; the code does not hardcode a different version per space.

Summary: The LoRA spaces are inspired by / built on the extracted Pr0f3ssi0n4ln00b/Phr00t-Qwen-Rapid-AIO; that extraction was inspired by linoyts/Qwen-Image-Edit-Rapid-AIO; the underlying merge is Phr00t/Qwen-Image-Edit-Rapid-AIO. Version used by default in all three spaces: v19 (Phr00t’s “best for consistency” version).


1. Resolution & metadata

Asset Dimensions Format DPI Color File size
Input 1024×1024 PNG 72×72 RGB, 8bpc ~644 KB
Base 1024×1024 PNG 72×72 RGB, 8bpc ~661 KB
Experimental 1024×1024 PNG 72×72 RGB, 8bpc ~673 KB
Experimental-2 1024×1024 PNG 72×72 RGB, 8bpc ~674 KB

Conclusion: All three outputs match the input resolution (1024×1024). Same format, DPI, and color space. File size varies only slightly (encoding); no resolution up/downscale.


2. Accuracy / consistency ranking

Best → worst: Base > Experimental > Experimental-2.

Important: Base is the lesser of evils, not a perfect result. All three outputs show some drift from the input (e.g. text, fine patterns); base simply preserves colors, shadows, clock hands, and letter consistency better than the two experimental spaces. For “same content, new angle” the base pipeline is the best of the three, but none of them are pixel-perfect faithful.


3. Consistency priority (what matters most)

Order of importance for “same as input, just rotated”:

  1. Colors & shadows – Overall color and shadow direction/shape must match (e.g. top-front-right lighting, shadow under/left of clock). Base preserves this best.
  2. Main structural elements – Clock hands, bezel, pendulum; position and proportions. Base keeps these most accurate.
  3. Letter / text consistency – Legibility and correctness of text on the clock face (e.g. “WALL BE ESION” / “WALELN STOCK”). Base is better; experimentals drift more.
  4. Fine details – Tick marks, numeral style (e.g. “IIII” vs “IV”), material patterns, counts, small decorative consistency. These matter after the above; base still best, experimentals show more variation/loss.

4. Likely causes (technical)

  • Base uses long-edge resolution (1024) and a minimal pipeline: no pad_to_canvas, no vae_image_indices, no area-based canvas, no extra refs. One image in, one resolution path → less re-encoding and fewer ways for the scene to drift.
  • Experimental / Experimental-2 use area-based (megapixel) canvas, pad_to_canvas, extras conditioning-only, and (in Experimental) resolution_multiple and optional decoder_vae. More processing and conditioning paths can change how the latent and decoder preserve colors, shadows, and fine detail for a simple single-image edit like “rotate 45° left.”

For a strict “same content, new angle” edit, the simpler base pipeline appears better at preserving input fidelity; the experimental features seem tuned for flexibility (multi-ref, pose/depth, high-res) rather than maximal consistency on single-image edits.


5. Code differences: what this is about and how it's done

Below are the precise code differences that explain the behavior: how resolution and pipeline options are chosen in each space and why base tends to preserve input fidelity better for single-image edits.

5.1 Resolution: long-edge (base) vs area-based (experimental)

Base fixes the long edge (e.g. 1024) and derives width/height from the image aspect ratio. No megapixel slider. For Multiple-Angles there is no target_long_edge in the adapter spec, so it uses the default 1024.

# BASE: Qwen-Image-Edit-Rapid-AIO-Loras/app.py

def _round8(x: int) -> int:
    return max(8, (int(x) // 8) * 8)

def compute_dimensions(image: Image.Image, long_edge: int) -> tuple[int, int]:
    w, h = image.size
    if w >= h:
        new_w = long_edge
        new_h = int(round(long_edge * (h / w)))
    else:
        new_h = long_edge
        new_w = int(round(long_edge * (w / h)))
    return _round8(new_w), _round8(new_h)

def get_target_long_edge_for_lora(lora_adapter: str) -> int:
    spec = ADAPTER_SPECS.get(lora_adapter, {})
    return int(spec.get("target_long_edge", 1024))

# In infer():
target_long_edge = get_target_long_edge_for_lora(lora_adapter)
width, height = compute_dimensions(img1, target_long_edge)

Experimental / Experimental-2 use area-based sizing: a target pixel area (from megapixels or adapter spec) is converted to (width, height) via the pipeline's calculate_dimensions, then rounded to a lattice multiple.

# EXPERIMENTAL & EXPERIMENTAL-2

def compute_canvas_dimensions_from_area(image, target_area, multiple_of):
    w, h = image.size
    aspect = w / h if h else 1.0
    from qwenimage.pipeline_qwenimage_edit_plus import calculate_dimensions
    width, height = calculate_dimensions(int(target_area), float(aspect))
    width = _round_to_multiple(int(width), int(multiple_of))
    height = _round_to_multiple(int(height), int(multiple_of))
    return width, height

Experimental only: 0 MP means "match input area" (return int(w * h)). Experimental-2: no 0 = match; user megapixels used directly; in infer, multiple_of = int(pipe.vae_scale_factor * 2). Experimental uses UI resolution_multiple (32/56/112).

Takeaway: Base = one rule (long edge → dimensions). Experimentals add area/megapixel path and optional lattice; more moving parts can change effective canvas and VAE behavior.

5.2 Pipeline call: minimal (base) vs extended (experimental) vs middle (experimental-2)

Base — core args only:

# BASE
result = pipe(
    image=pipe_images,
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=height,
    width=width,
    num_inference_steps=steps,
    generator=generator,
    true_cfg_scale=guidance_scale,
).images[0]
return result, seed

Experimental — adds VAE routing, padding, resolution lattice, extra-ref VAE area, optional Wan2.1 decoder:

# EXPERIMENTAL
vae_image_indices = None
if extras_condition_only:
    if isinstance(pipe_images, list) and len(pipe_images) > 2:
        vae_image_indices = [0, 1] if len(pipe_images) >= 2 else [0]
res_mult = int(resolution_multiple) if resolution_multiple else int(pipe.vae_scale_factor * 2)
vae_ref_area = int(mp_ref * 1024 * 1024) if mp_ref and mp_ref > 0 else None
_apply_vae_tiling(bool(vae_tiling))
result = pipe(
    image=pipe_images, prompt=prompt, negative_prompt=negative_prompt,
    height=height, width=width, num_inference_steps=steps,
    generator=generator, true_cfg_scale=guidance_scale,
    vae_image_indices=vae_image_indices,
    pad_to_canvas=bool(pad_to_canvas),
    resolution_multiple=res_mult,
    vae_ref_area=vae_ref_area,
    vae_ref_start_index=base_ref_count,
    decoder_vae=str(decoder_vae).lower(),
    keep_decoder_2x=bool(keep_decoder_2x),
).images[0]

Experimental-2 — same as experimental but no resolution_multiple, vae_ref_area, vae_ref_start_index, decoder_vae, keep_decoder_2x; only vae_image_indices and pad_to_canvas added over base.

Takeaway: More pipeline knobs (padding, VAE routing, lattice, ref area, decoder) help multi-image/high-res workflows but add paths where latent/decoder can diverge from a simple single-image edit. Base avoids those paths.

5.3 Engineering summary

Aspect Base Experimental Experimental-2
Resolution Long-edge (1024 default) Area (MP; 0 = match input) Area (MP; min 0.5)
Lattice multiple Implicit (round to 8) User: 32/56/112 Fixed: vae_scale_factor*2
pad_to_canvas Not used Used Used
vae_image_indices Not used Used Used
resolution_multiple, vae_ref_*, decoder_vae Not passed Passed Not passed

6. Summary table

Criterion Base Experimental Experimental-2
Resolution vs input Same (1024×1024) Same Same
Colors & shadows Best Weaker Weaker
Clock hands / structure Best Weaker Weaker
Letter consistency Best Weaker Weaker
Fine details (patterns, etc.) Best Weaker Weaker

7. Conclusion after experiment

  • Fidelity: For the tested prompt ("Rotate the camera 45 degrees to the left") and Multiple-Angles LoRA, base is the best of the three but still not perfect. Colors, shadows, clock hands, and text are preserved better in base; experimentals show more drift. All three outputs share the same resolution as the input (1024×1024); the difference is in how the pipeline chooses resolution and which options it passes to the model (padding, VAE routing, lattice, decoder).

  • Cause: The base space uses a minimal path: long-edge resolution (1024) and a pipe() call with only the core arguments. The experimental spaces add area-based canvas sizing, pad_to_canvas, vae_image_indices, and (in Experimental) resolution_multiple, vae_ref_*, and optional Wan2.1 decoder. Those features are useful for multi-image refs, pose/depth, and high-res workflows but introduce more processing steps and conditioning paths, which in this single-image angle-edit test led to worse input consistency (colors, shadows, structure, text, fine details).

  • Consistency priority (observed): The order of what stayed most faithful in the best output was: (1) colors and shadows, (2) main structure (e.g. clock hands), (3) letter/text consistency, (4) fine details (patterns, counts, material). That order is a useful checklist when judging “same content, new angle” edits.

  • Recommendation: For single-image, angle-only edits where input fidelity matters most, use the base space and treat it as the lesser of evils—better than the two experimentals, but not pixel-perfect. Use Experimental or Experimental-2 when you need multi-image refs, pose/depth conditioning, or high-res/decoder options and can accept a consistency tradeoff. If you need to improve base further, the next lever is the pipeline itself (e.g. fewer steps, different scheduler, or LoRA strength), not adding the experimental pipeline options.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Manojb/Qwen-Image-Edit-Rapid-AIO-MultipleAngle

Finetuned
(28)
this model