--- license: cc-by-nc-4.0 base_model: - Ultralytics/YOLOv8 pipeline_tag: object-detection --- # Architect (YOLOv8m) `Architect` is a fine-tuned YOLOv8m model for **architectural symbol spotting** in rasterized floor plans and CAD drawings. Developed as part of the `Architecture-RAG` project, it empowers multimodal systems to understand structured architectural content. ## Model Summary - **Base Model:** YOLOv8m (pretrained on COCO) - **Task:** Object detection (28 architectural object categories) - **Dataset:** [FloorPlanCAD](https://floorplancad.github.io/) - **Performance:** - **mAP50-95(B):** 0.80797 - **mAP50(B):** 0.87664 --- ## โœ… Supported Classes (28) { 'single door': 0, 'double door': 1, 'sliding door': 2, 'window': 3, 'bay window': 4, 'blind window': 5, 'opening symbol': 6, 'stair': 7, 'gas stove': 8, 'refrigerator': 9, 'washing machine': 10, 'sofa': 11, 'bed': 12, 'chair': 13, 'table': 14, 'bedside cupboard': 15, 'TV cabinet': 16, 'half-height cabinet': 17, 'high cabinet': 18, 'wardrobe': 19, 'sink': 20, 'bath': 21, 'bath tub': 22, 'squat toilet': 23, 'urinal': 24, 'toilet': 25, 'elevator': 26, 'escalator': 27 } ## ๐Ÿงช How to Use ```python from ultralytics import YOLO from PIL import Image # Load the model from Hugging Face Hub model = YOLO('SamirShabani/Architect') # Run inference on a local image file results = model('path/to/image.png') # Optionally, run inference on a PIL Image # image = Image.open('path/to/image.png') # results = model(image)[0] # Print detection results for r in results: for box in r.boxes: class_id = int(box.cls[0]) class_name = model.names[class_id] confidence = float(box.conf[0]) bbox = box.xyxy[0].tolist() print(f"Detected: {class_name}, Confidence: {confidence:.2f}, BBox: {bbox}") # Save output image with drawn bounding boxes results[0].save(filename="prediction_output.jpg") ``` ## ๐Ÿ› ๏ธ Training Details - Framework: Ultralytics YOLOv8 - Pretrained Model: yolov8m.pt - Training Hardware: NVIDIA Tesla P100 / T4 (Kaggle) - Epochs: 100 (early stopping patience=20) - Image Size: 640 ร— 640 - Batch Size: 16 - Optimizer: AdamW - Scheduler: Cosine Annealing --- ## ๐Ÿ“ฆ Dataset - Source: FloorPlanCAD (https://floorplancad.github.io/) - Images: 15,285 SVG drawings โ†’ converted to 640ร—640 PNG images - Labeled Samples: ~11,35 images with bounding box annotations - License: CC BY-NC 4.0 (https://creativecommons.org/licenses/by-nc/4.0/) Non-commercial use only --- ## ๐Ÿ“Š Evaluation Metrics (Epoch 54) Metric | Value | Description ---------------------|----------|------------------------------------------- metrics/mAP50-95(B) | 0.80797 | Mean Average Precision [IoU = 0.50 to 0.95] metrics/mAP50(B) | 0.87664 | Mean Average Precision at IoU = 0.50 train/box_loss | 0.4671 | Localization loss on training set val/box_loss | 0.32854 | Localization loss on validation set train/cls_loss | 0.81329 | Classification loss on training set val/cls_loss | 0.57334 | Classification loss on validation set Training and validation curves are available in the results.png generated during training. --- ## โš ๏ธ Known Limitations - Symbol Bias: Frequent objects like doors and windows dominate the training samples. - Centering Bias: Objects are mostly centered in cropped training patches. - Text Ignorance: The model does **not** interpret text or annotations near symbols. - "Stuff" Categories Ignored: The model does **not** detect background elements like walls or parking spaces. - Low-Quality Documents: Performance may degrade on scanned or low-resolution plans with noise. --- ## ๐Ÿ“š Citation ```bibtex @InProceedings{Fan_2021_ICCV, author = {Fan, Zhiwen and Zhu, Lingjie and Li, Honghua and Zhu, Siyu and Tan, Ping}, title = {FloorPlanCAD: A Large-Scale CAD Drawing Dataset for Panoptic Symbol}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021} } ``` ## ๐Ÿ‘ค Creator Samir Shabani Machine Learning Engineer | Student LinkedIn: https://www.linkedin.com/in/samir-shabani GitHub: https://github.com/Sam1rShaban1