File size: 4,290 Bytes
31b834c a62cc7c 38fe5d3 a62cc7c 9acf7c9 a62cc7c 9acf7c9 a62cc7c edac5ee a62cc7c 38fe5d3 a62cc7c 38fe5d3 a62cc7c 38fe5d3 a62cc7c 38fe5d3 a62cc7c e6c1a7d a62cc7c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 | ---
license: cc-by-nc-4.0
base_model:
- Ultralytics/YOLOv8
pipeline_tag: object-detection
---
# Architect (YOLOv8m)
`Architect` is a fine-tuned YOLOv8m model for **architectural symbol spotting** in rasterized floor plans and CAD drawings. Developed as part of the `Architecture-RAG` project, it empowers multimodal systems to understand structured architectural content.
## Model Summary
- **Base Model:** YOLOv8m (pretrained on COCO)
- **Task:** Object detection (28 architectural object categories)
- **Dataset:** [FloorPlanCAD](https://floorplancad.github.io/)
- **Performance:**
- **mAP50-95(B):** 0.80797
- **mAP50(B):** 0.87664
---
## โ
Supported Classes (28)
{
'single door': 0, 'double door': 1, 'sliding door': 2, 'window': 3, 'bay window': 4,
'blind window': 5, 'opening symbol': 6, 'stair': 7, 'gas stove': 8, 'refrigerator': 9,
'washing machine': 10, 'sofa': 11, 'bed': 12, 'chair': 13, 'table': 14,
'bedside cupboard': 15, 'TV cabinet': 16, 'half-height cabinet': 17, 'high cabinet': 18,
'wardrobe': 19, 'sink': 20, 'bath': 21, 'bath tub': 22, 'squat toilet': 23, 'urinal': 24,
'toilet': 25, 'elevator': 26, 'escalator': 27
}
## ๐งช How to Use
```python
from ultralytics import YOLO
from PIL import Image
# Load the model from Hugging Face Hub
model = YOLO('SamirShabani/Architect')
# Run inference on a local image file
results = model('path/to/image.png')
# Optionally, run inference on a PIL Image
# image = Image.open('path/to/image.png')
# results = model(image)[0]
# Print detection results
for r in results:
for box in r.boxes:
class_id = int(box.cls[0])
class_name = model.names[class_id]
confidence = float(box.conf[0])
bbox = box.xyxy[0].tolist()
print(f"Detected: {class_name}, Confidence: {confidence:.2f}, BBox: {bbox}")
# Save output image with drawn bounding boxes
results[0].save(filename="prediction_output.jpg")
```
## ๐ ๏ธ Training Details
- Framework: Ultralytics YOLOv8
- Pretrained Model: yolov8m.pt
- Training Hardware: NVIDIA Tesla P100 / T4 (Kaggle)
- Epochs: 100 (early stopping patience=20)
- Image Size: 640 ร 640
- Batch Size: 16
- Optimizer: AdamW
- Scheduler: Cosine Annealing
---
## ๐ฆ Dataset
- Source: FloorPlanCAD (https://floorplancad.github.io/)
- Images: 15,285 SVG drawings โ converted to 640ร640 PNG images
- Labeled Samples: ~11,35 images with bounding box annotations
- License: CC BY-NC 4.0 (https://creativecommons.org/licenses/by-nc/4.0/)
Non-commercial use only
---
## ๐ Evaluation Metrics (Epoch 54)
Metric | Value | Description
---------------------|----------|-------------------------------------------
metrics/mAP50-95(B) | 0.80797 | Mean Average Precision [IoU = 0.50 to 0.95]
metrics/mAP50(B) | 0.87664 | Mean Average Precision at IoU = 0.50
train/box_loss | 0.4671 | Localization loss on training set
val/box_loss | 0.32854 | Localization loss on validation set
train/cls_loss | 0.81329 | Classification loss on training set
val/cls_loss | 0.57334 | Classification loss on validation set
Training and validation curves are available in the results.png generated during training.
---
## โ ๏ธ Known Limitations
- Symbol Bias: Frequent objects like doors and windows dominate the training samples.
- Centering Bias: Objects are mostly centered in cropped training patches.
- Text Ignorance: The model does **not** interpret text or annotations near symbols.
- "Stuff" Categories Ignored: The model does **not** detect background elements like walls or parking spaces.
- Low-Quality Documents: Performance may degrade on scanned or low-resolution plans with noise.
---
## ๐ Citation
```bibtex
@InProceedings{Fan_2021_ICCV,
author = {Fan, Zhiwen and Zhu, Lingjie and Li, Honghua and Zhu, Siyu and Tan, Ping},
title = {FloorPlanCAD: A Large-Scale CAD Drawing Dataset for Panoptic Symbol},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2021}
}
```
## ๐ค Creator
Samir Shabani
Machine Learning Engineer | Student
LinkedIn: https://www.linkedin.com/in/samir-shabani
GitHub: https://github.com/Sam1rShaban1 |