InteriorFusion Inference Optimization Guide

Target Platforms

RTX 4090 (24GB VRAM) — Consumer Desktop

# Quantized inference with INT8
python -m interiorfusion.infer \
    --image room.jpg \
    --output ./output/ \
    --model-size L \
    --device cuda \
    --dtype float16 \
    --no-pbr  # Disable PBR for faster generation

# Expected: ~12s for full scene with GLB+PLY output

Optimizations:

FP16 inference throughout pipeline
Skip material generation for preview mode
Use torch.compile() on DiT forward pass
Flash Attention 2 for transformer attention
Batch multi-view generation (6 views simultaneously)

A100 (80GB VRAM) — Cloud / Datacenter

# Full quality generation
python -m interiorfusion.infer \
    --image room.jpg \
    --output ./output/ \
    --model-size XL \
    --device cuda \
    --dtype bfloat16

# Expected: ~8s for full scene with all formats

Optimizations:

BF16 precision (better numerical stability than FP16)
Batch size 4 for parallel room generation
CUDA Graphs for repeated operations
Persistent CUDA cache

H100 (80GB VRAM) — Latest Datacenter

# Maximum quality with Transformer Engine
python -m interiorfusion.infer \
    --image room.jpg \
    --output ./output/ \
    --model-size XL \
    --device cuda \
    --dtype bfloat16

# Expected: ~5s full pipeline

Optimizations:

FP8 via Transformer Engine
Hardware-accelerated attention
NVLink for multi-GPU distribution

Apple Silicon (MLX)

# MLX-optimized inference
python -m interiorfusion.infer \
    --image room.jpg \
    --output ./output/ \
    --model-size S \
    --device mps \
    --dtype float32

# Expected: ~30s on M3 Max (36GB unified memory)

Optimizations:

MLX graph compilation
Unified memory avoids CPU-GPU copies
Model quantization to 4-bit via GPTQ

Edge / Mobile

# Core pipeline only (depth + layout)
python -m interiorfusion.infer \
    --image room.jpg \
    --output ./output/ \
    --model-size S \
    --device cpu \
    --no-pbr --no-gaussian \
    --formats glb

# Expected: ~5s depth+layout, scene sent to cloud for 3D generation

Optimizations:

Core inference on-device (depth + segmentation)
Cloud offloading for 3D generation
Streaming mesh chunks
Aggressive quantization (INT4)

Quantization Strategies

Method	Model Size	Speedup	Quality Impact	VRAM Reduction
FP32 (baseline)	100%	1×	—	100%
FP16	50%	1.8×	Minimal	50%
BF16	50%	1.8×	Minimal	50%
INT8 (SmoothQuant)	25%	2.5×	Low	25%
FP8 (TE)	25%	3×	Low	25%
GPTQ-4bit	12.5%	3.5×	Medium	12.5%
AWQ-4bit	12.5%	3.2×	Low	12.5%

Export Formats

Format	Size	Viewer	Game Engine	AR/VR	Notes
GLB	~5-50MB	✅ (Web)	✅ (UE/Unity)	✅ (WebXR)	Recommended default
FBX	~10-100MB	⚠️ (Limited)	✅ (UE/Unity/Maya)	⚠️	For animation/ rigging
OBJ	~5-30MB	✅ (Universal)	✅ (All)	⚠️	Legacy, no PBR
USDZ	~5-50MB	✅ (iOS AR)	⚠️ (UE via plugin)	✅ (ARKit)	Apple's format
PLY (3DGS)	~10-500MB	✅ (Gaussian viewers)	⚠️ (UE5 plugin)	⚠️	For splatting render

ComfyUI Integration

Install the custom nodes:

cd ComfyUI/custom_nodes
git clone https://github.com/stevee00/ComfyUI-InteriorFusion

Available nodes:

InteriorFusion: Generate Scene — Full pipeline
InteriorFusion: Generate Object — Single furniture
InteriorFusion: Apply Material — PBR material
InteriorFusion: Export Mesh — Format conversion

Blender Integration

Install the addon:

# In Blender: Edit > Preferences > Add-ons > Install
# Select blender_plugin/interiorfusion_blender.py

Features:

Generate 3D scene from reference image
Import with PBR materials
Interactive object editing
Export to game engines

Unreal Engine Integration

Export GLB from InteriorFusion
Import via glTF importer (UE5 built-in)
Materials auto-convert to Unreal PBR
Use Gaussian Splatting plugin for real-time preview

Plugins needed:

glTFRuntime for runtime GLB loading
MLSLabsGaussianSplattingRenderer for 3DGS

Unity Integration

Export GLB or FBX from InteriorFusion
Import into Unity project
Materials map to Unity Standard/URP/HDRP
Use GaussianSplatting package for 3DGS

Performance Targets

Platform	Target Time	Target VRAM	Output Quality
RTX 4090	< 15s	< 20GB	Production
A100	< 8s	< 72GB	Maximum
H100	< 5s	< 72GB	Maximum
M3 Max	< 30s	< 36GB	Production
RTX 3060	< 60s	< 10GB	Preview
Edge (CPU)	< 10s (depth only)	< 4GB	Core only