# InteriorFusion Inference Optimization Guide

## Target Platforms

### RTX 4090 (24GB VRAM) — Consumer Desktop
```bash
# Quantized inference with INT8
python -m interiorfusion.infer \
    --image room.jpg \
    --output ./output/ \
    --model-size L \
    --device cuda \
    --dtype float16 \
    --no-pbr  # Disable PBR for faster generation

# Expected: ~12s for full scene with GLB+PLY output
```

**Optimizations**:
- FP16 inference throughout pipeline
- Skip material generation for preview mode
- Use `torch.compile()` on DiT forward pass
- Flash Attention 2 for transformer attention
- Batch multi-view generation (6 views simultaneously)

### A100 (80GB VRAM) — Cloud / Datacenter
```bash
# Full quality generation
python -m interiorfusion.infer \
    --image room.jpg \
    --output ./output/ \
    --model-size XL \
    --device cuda \
    --dtype bfloat16

# Expected: ~8s for full scene with all formats
```

**Optimizations**:
- BF16 precision (better numerical stability than FP16)
- Batch size 4 for parallel room generation
- CUDA Graphs for repeated operations
- Persistent CUDA cache

### H100 (80GB VRAM) — Latest Datacenter
```bash
# Maximum quality with Transformer Engine
python -m interiorfusion.infer \
    --image room.jpg \
    --output ./output/ \
    --model-size XL \
    --device cuda \
    --dtype bfloat16

# Expected: ~5s full pipeline
```

**Optimizations**:
- FP8 via Transformer Engine
- Hardware-accelerated attention
- NVLink for multi-GPU distribution

### Apple Silicon (MLX)
```bash
# MLX-optimized inference
python -m interiorfusion.infer \
    --image room.jpg \
    --output ./output/ \
    --model-size S \
    --device mps \
    --dtype float32

# Expected: ~30s on M3 Max (36GB unified memory)
```

**Optimizations**:
- MLX graph compilation
- Unified memory avoids CPU-GPU copies
- Model quantization to 4-bit via GPTQ

### Edge / Mobile
```bash
# Core pipeline only (depth + layout)
python -m interiorfusion.infer \
    --image room.jpg \
    --output ./output/ \
    --model-size S \
    --device cpu \
    --no-pbr --no-gaussian \
    --formats glb

# Expected: ~5s depth+layout, scene sent to cloud for 3D generation
```

**Optimizations**:
- Core inference on-device (depth + segmentation)
- Cloud offloading for 3D generation
- Streaming mesh chunks
- Aggressive quantization (INT4)

## Quantization Strategies

| Method | Model Size | Speedup | Quality Impact | VRAM Reduction |
|--------|-----------|---------|---------------|---------------|
| FP32 (baseline) | 100% | 1× | — | 100% |
| FP16 | 50% | 1.8× | Minimal | 50% |
| BF16 | 50% | 1.8× | Minimal | 50% |
| INT8 (SmoothQuant) | 25% | 2.5× | Low | 25% |
| FP8 (TE) | 25% | 3× | Low | 25% |
| GPTQ-4bit | 12.5% | 3.5× | Medium | 12.5% |
| AWQ-4bit | 12.5% | 3.2× | Low | 12.5% |

## Export Formats

| Format | Size | Viewer | Game Engine | AR/VR | Notes |
|--------|------|--------|------------|-------|-------|
| **GLB** | ~5-50MB | ✅ (Web) | ✅ (UE/Unity) | ✅ (WebXR) | Recommended default |
| **FBX** | ~10-100MB | ⚠️ (Limited) | ✅ (UE/Unity/Maya) | ⚠️ | For animation/ rigging |
| **OBJ** | ~5-30MB | ✅ (Universal) | ✅ (All) | ⚠️ | Legacy, no PBR |
| **USDZ** | ~5-50MB | ✅ (iOS AR) | ⚠️ (UE via plugin) | ✅ (ARKit) | Apple's format |
| **PLY (3DGS)** | ~10-500MB | ✅ (Gaussian viewers) | ⚠️ (UE5 plugin) | ⚠️ | For splatting render |

## ComfyUI Integration

Install the custom nodes:
```bash
cd ComfyUI/custom_nodes
git clone https://github.com/stevee00/ComfyUI-InteriorFusion
```

Available nodes:
- `InteriorFusion: Generate Scene` — Full pipeline
- `InteriorFusion: Generate Object` — Single furniture
- `InteriorFusion: Apply Material` — PBR material
- `InteriorFusion: Export Mesh` — Format conversion

## Blender Integration

Install the addon:
```bash
# In Blender: Edit > Preferences > Add-ons > Install
# Select blender_plugin/interiorfusion_blender.py
```

Features:
- Generate 3D scene from reference image
- Import with PBR materials
- Interactive object editing
- Export to game engines

## Unreal Engine Integration

1. Export GLB from InteriorFusion
2. Import via glTF importer (UE5 built-in)
3. Materials auto-convert to Unreal PBR
4. Use Gaussian Splatting plugin for real-time preview

Plugins needed:
- `glTFRuntime` for runtime GLB loading
- `MLSLabsGaussianSplattingRenderer` for 3DGS

## Unity Integration

1. Export GLB or FBX from InteriorFusion
2. Import into Unity project
3. Materials map to Unity Standard/URP/HDRP
4. Use GaussianSplatting package for 3DGS

## Performance Targets

| Platform | Target Time | Target VRAM | Output Quality |
|----------|------------|-------------|---------------|
| RTX 4090 | < 15s | < 20GB | Production |
| A100 | < 8s | < 72GB | Maximum |
| H100 | < 5s | < 72GB | Maximum |
| M3 Max | < 30s | < 36GB | Production |
| RTX 3060 | < 60s | < 10GB | Preview |
| Edge (CPU) | < 10s (depth only) | < 4GB | Core only |