Update README.md

edac5ee verified 9 months ago

4.29 kB

	---
	license: cc-by-nc-4.0
	base_model:
	- Ultralytics/YOLOv8
	pipeline_tag: object-detection
	---

	# Architect (YOLOv8m)

	`Architect` is a fine-tuned YOLOv8m model for architectural symbol spotting in rasterized floor plans and CAD drawings. Developed as part of the `Architecture-RAG` project, it empowers multimodal systems to understand structured architectural content.

	## Model Summary

	- Base Model: YOLOv8m (pretrained on COCO)
	- Task: Object detection (28 architectural object categories)
	- Dataset: [FloorPlanCAD](https://floorplancad.github.io/)
	- Performance:
	- mAP50-95(B): 0.80797
	- mAP50(B): 0.87664

	---

	## ✅ Supported Classes (28)

	{
	'single door': 0, 'double door': 1, 'sliding door': 2, 'window': 3, 'bay window': 4,
	'blind window': 5, 'opening symbol': 6, 'stair': 7, 'gas stove': 8, 'refrigerator': 9,
	'washing machine': 10, 'sofa': 11, 'bed': 12, 'chair': 13, 'table': 14,
	'bedside cupboard': 15, 'TV cabinet': 16, 'half-height cabinet': 17, 'high cabinet': 18,
	'wardrobe': 19, 'sink': 20, 'bath': 21, 'bath tub': 22, 'squat toilet': 23, 'urinal': 24,
	'toilet': 25, 'elevator': 26, 'escalator': 27
	}

	## 🧪 How to Use


	```python
	from ultralytics import YOLO
	from PIL import Image

	# Load the model from Hugging Face Hub
	model = YOLO('SamirShabani/Architect')

	# Run inference on a local image file
	results = model('path/to/image.png')

	# Optionally, run inference on a PIL Image
	# image = Image.open('path/to/image.png')
	# results = model(image)[0]

	# Print detection results
	for r in results:
	for box in r.boxes:
	class_id = int(box.cls[0])
	class_name = model.names[class_id]
	confidence = float(box.conf[0])
	bbox = box.xyxy[0].tolist()
	print(f"Detected: {class_name}, Confidence: {confidence:.2f}, BBox: {bbox}")

	# Save output image with drawn bounding boxes
	results[0].save(filename="prediction_output.jpg")
	```

	## 🛠️ Training Details

	- Framework: Ultralytics YOLOv8
	- Pretrained Model: yolov8m.pt
	- Training Hardware: NVIDIA Tesla P100 / T4 (Kaggle)
	- Epochs: 100 (early stopping patience=20)
	- Image Size: 640 × 640
	- Batch Size: 16
	- Optimizer: AdamW
	- Scheduler: Cosine Annealing

	---

	## 📦 Dataset

	- Source: FloorPlanCAD (https://floorplancad.github.io/)
	- Images: 15,285 SVG drawings → converted to 640×640 PNG images
	- Labeled Samples: ~11,35 images with bounding box annotations
	- License: CC BY-NC 4.0 (https://creativecommons.org/licenses/by-nc/4.0/)
	Non-commercial use only

	---

	## 📊 Evaluation Metrics (Epoch 54)

	Metric \| Value \| Description
	---------------------\|----------\|-------------------------------------------
	metrics/mAP50-95(B) \| 0.80797 \| Mean Average Precision [IoU = 0.50 to 0.95]
	metrics/mAP50(B) \| 0.87664 \| Mean Average Precision at IoU = 0.50
	train/box_loss \| 0.4671 \| Localization loss on training set
	val/box_loss \| 0.32854 \| Localization loss on validation set
	train/cls_loss \| 0.81329 \| Classification loss on training set
	val/cls_loss \| 0.57334 \| Classification loss on validation set

	Training and validation curves are available in the results.png generated during training.

	---

	## ⚠️ Known Limitations

	- Symbol Bias: Frequent objects like doors and windows dominate the training samples.
	- Centering Bias: Objects are mostly centered in cropped training patches.
	- Text Ignorance: The model does not interpret text or annotations near symbols.
	- "Stuff" Categories Ignored: The model does not detect background elements like walls or parking spaces.
	- Low-Quality Documents: Performance may degrade on scanned or low-resolution plans with noise.

	---

	## 📚 Citation
	```bibtex
	@InProceedings{Fan_2021_ICCV,
	author = {Fan, Zhiwen and Zhu, Lingjie and Li, Honghua and Zhu, Siyu and Tan, Ping},
	title = {FloorPlanCAD: A Large-Scale CAD Drawing Dataset for Panoptic Symbol},
	booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
	month = {October},
	year = {2021}
	}
	```

	## 👤 Creator

	Samir Shabani
	Machine Learning Engineer \| Student

	LinkedIn: https://www.linkedin.com/in/samir-shabani
	GitHub: https://github.com/Sam1rShaban1