# FloorplanVLM Training

Fine-tune **Qwen2.5-VL-3B** to extract wall, door, and window geometry from floor plan images as structured JSON.

Based on [FloorplanVLM (arxiv:2602.06507)](https://arxiv.org/abs/2602.06507) — two-stage training:
1. **SFT** on CubiCasa5K (5000 real floor plans)
2. **GRPO** with geometric reward functions (wall IoU, room IoU, JSON validity)

## Quick Start

```bash
# Install dependencies
pip install torch torchvision transformers trl peft datasets accelerate shapely Pillow lxml numpy tqdm huggingface_hub

# Optional (faster attention on GPU)
pip install flash-attn

# Login to HuggingFace
huggingface-cli login

# Stage 1: SFT Training
python train_floorplan_vlm.py

# Stage 2: GRPO Training (after SFT completes)
python train_floorplan_grpo.py
```

## What it does

- **Downloads** CubiCasa5K dataset (~5GB) from Zenodo automatically
- **Converts** SVG floor plan annotations → structured JSON (walls with coordinates, doors, windows, rooms)
- **Trains** Qwen2.5-VL-3B with LoRA to predict this JSON from floor plan images
- **Pushes** the model to HuggingFace Hub
- **Auto-detects** GPU vs CPU (GPU recommended for full training)

## Configuration

Edit the top of each script:

| Setting | Default | Description |
|---|---|---|
| `MAX_SAMPLES` | `None` (all) | Set to `100` for a quick test run |
| `NUM_EPOCHS` | `2` | Training epochs |
| `PUSH_TO_HUB` | `True` | Push model to HF Hub |
| `HUB_MODEL_ID` | `manitocross/floorplan-vlm-sft` | Your model repo |

## Hardware Requirements

| Mode | VRAM | Time (full dataset) |
|---|---|---|
| GPU (A100 80GB) | ~20GB | ~4-6 hours |
| GPU (RTX 3090/4090) | ~20GB | ~8-12 hours |
| CPU | ~14GB RAM | ~days (for testing only) |

## Output JSON Schema

```json
{
  "walls": [
    {
      "id": "wall_1",
      "start": [120, 80],
      "end": [520, 80],
      "thickness": 15,
      "curvature": 0,
      "openings": [
        {"type": "door", "center": 320, "width": 90},
        {"type": "window", "center": 450, "width": 60}
      ]
    }
  ],
  "rooms": [
    {"label": "bedroom", "walls": ["wall_1", "wall_2", "wall_3", "wall_4"]}
  ]
}
```

## GRPO Reward Functions

Stage 2 uses geometric rewards from the FloorplanVLM paper:
- **R_val** (0.1 weight): JSON validity + schema compliance
- **R_ext** (0.5 weight): External wall boundary IoU (Shapely polygon comparison)
- **R_int** (0.4 weight): Room IoU, gated by α when external walls are wrong

## References

- [FloorplanVLM: A Vision-Language Model for Floorplan Vectorization](https://arxiv.org/abs/2602.06507)
- [CubiCasa5K: A Dataset for Floorplan Image Analysis](https://arxiv.org/abs/1904.01920)
- [TRL: Transformer Reinforcement Learning](https://huggingface.co/docs/trl)