| --- |
| license: cc-by-nc-4.0 |
| base_model: |
| - Ultralytics/YOLOv8 |
| pipeline_tag: object-detection |
| --- |
| |
| # Architect (YOLOv8m) |
|
|
| `Architect` is a fine-tuned YOLOv8m model for **architectural symbol spotting** in rasterized floor plans and CAD drawings. Developed as part of the `Architecture-RAG` project, it empowers multimodal systems to understand structured architectural content. |
|
|
| ## Model Summary |
|
|
| - **Base Model:** YOLOv8m (pretrained on COCO) |
| - **Task:** Object detection (28 architectural object categories) |
| - **Dataset:** [FloorPlanCAD](https://floorplancad.github.io/) |
| - **Performance:** |
| - **mAP50-95(B):** 0.80797 |
| - **mAP50(B):** 0.87664 |
|
|
| --- |
|
|
| ## β
Supported Classes (28) |
|
|
| { |
| 'single door': 0, 'double door': 1, 'sliding door': 2, 'window': 3, 'bay window': 4, |
| 'blind window': 5, 'opening symbol': 6, 'stair': 7, 'gas stove': 8, 'refrigerator': 9, |
| 'washing machine': 10, 'sofa': 11, 'bed': 12, 'chair': 13, 'table': 14, |
| 'bedside cupboard': 15, 'TV cabinet': 16, 'half-height cabinet': 17, 'high cabinet': 18, |
| 'wardrobe': 19, 'sink': 20, 'bath': 21, 'bath tub': 22, 'squat toilet': 23, 'urinal': 24, |
| 'toilet': 25, 'elevator': 26, 'escalator': 27 |
| } |
|
|
| ## π§ͺ How to Use |
|
|
|
|
| ```python |
| from ultralytics import YOLO |
| from PIL import Image |
| |
| # Load the model from Hugging Face Hub |
| model = YOLO('SamirShabani/Architect') |
| |
| # Run inference on a local image file |
| results = model('path/to/image.png') |
| |
| # Optionally, run inference on a PIL Image |
| # image = Image.open('path/to/image.png') |
| # results = model(image)[0] |
| |
| # Print detection results |
| for r in results: |
| for box in r.boxes: |
| class_id = int(box.cls[0]) |
| class_name = model.names[class_id] |
| confidence = float(box.conf[0]) |
| bbox = box.xyxy[0].tolist() |
| print(f"Detected: {class_name}, Confidence: {confidence:.2f}, BBox: {bbox}") |
| |
| # Save output image with drawn bounding boxes |
| results[0].save(filename="prediction_output.jpg") |
| ``` |
|
|
| ## π οΈ Training Details |
|
|
| - Framework: Ultralytics YOLOv8 |
| - Pretrained Model: yolov8m.pt |
| - Training Hardware: NVIDIA Tesla P100 / T4 (Kaggle) |
| - Epochs: 100 (early stopping patience=20) |
| - Image Size: 640 Γ 640 |
| - Batch Size: 16 |
| - Optimizer: AdamW |
| - Scheduler: Cosine Annealing |
|
|
| --- |
|
|
| ## π¦ Dataset |
|
|
| - Source: FloorPlanCAD (https://floorplancad.github.io/) |
| - Images: 15,285 SVG drawings β converted to 640Γ640 PNG images |
| - Labeled Samples: ~11,35 images with bounding box annotations |
| - License: CC BY-NC 4.0 (https://creativecommons.org/licenses/by-nc/4.0/) |
| Non-commercial use only |
|
|
| --- |
|
|
| ## π Evaluation Metrics (Epoch 54) |
|
|
| Metric | Value | Description |
| ---------------------|----------|------------------------------------------- |
| metrics/mAP50-95(B) | 0.80797 | Mean Average Precision [IoU = 0.50 to 0.95] |
| metrics/mAP50(B) | 0.87664 | Mean Average Precision at IoU = 0.50 |
| train/box_loss | 0.4671 | Localization loss on training set |
| val/box_loss | 0.32854 | Localization loss on validation set |
| train/cls_loss | 0.81329 | Classification loss on training set |
| val/cls_loss | 0.57334 | Classification loss on validation set |
|
|
| Training and validation curves are available in the results.png generated during training. |
|
|
| --- |
|
|
| ## β οΈ Known Limitations |
|
|
| - Symbol Bias: Frequent objects like doors and windows dominate the training samples. |
| - Centering Bias: Objects are mostly centered in cropped training patches. |
| - Text Ignorance: The model does **not** interpret text or annotations near symbols. |
| - "Stuff" Categories Ignored: The model does **not** detect background elements like walls or parking spaces. |
| - Low-Quality Documents: Performance may degrade on scanned or low-resolution plans with noise. |
|
|
| --- |
|
|
| ## π Citation |
| ```bibtex |
| @InProceedings{Fan_2021_ICCV, |
| author = {Fan, Zhiwen and Zhu, Lingjie and Li, Honghua and Zhu, Siyu and Tan, Ping}, |
| title = {FloorPlanCAD: A Large-Scale CAD Drawing Dataset for Panoptic Symbol}, |
| booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, |
| month = {October}, |
| year = {2021} |
| } |
| ``` |
|
|
| ## π€ Creator |
|
|
| Samir Shabani |
| Machine Learning Engineer | Student |
|
|
| LinkedIn: https://www.linkedin.com/in/samir-shabani |
| GitHub: https://github.com/Sam1rShaban1 |