|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- qualcomm/RF-DETR |
|
|
pipeline_tag: image-segmentation |
|
|
tags: |
|
|
- Leaf |
|
|
- Tulsi |
|
|
- Segmentation |
|
|
new_version: Subh775/Seg-Basil-rfdetr |
|
|
--- |
|
|
|
|
|
# Segment-Tulsi Leaves with Transformers(RF-DETR) |
|
|
|
|
|
| **Model** | **Best EMA Mask mAP (@.50:.95)** | |
|
|
|--------------------------|----------------------------------| |
|
|
| [LeafNet75/Segment-Tulsi-TFs](https://huggingface.co/LeafNet75/Segment-Tulsi-TFs) | 0.9650 | |
|
|
| [Subh775/Seg-Basil-rfdetr](https://huggingface.co/Subh775/Seg-Basil-rfdetr) | 0.9668 | |
|
|
|
|
|
<p align="left"> |
|
|
<a href="https://opensource.org/licenses/Apache-2.0"><img src="https://img.shields.io/badge/License-Apache_2.0-blue.svg" /></a> |
|
|
<a href="https://blog.roboflow.com/rf-detr-segmentation-preview/"><img src="https://img.shields.io/badge/Roboflow-navy?logo=roboflow" /></a> |
|
|
<a href="https://arxiv.org/abs/2504.13099"><img src="https://img.shields.io/badge/arXiv-2504.13099-B31B1B?logo=arxiv&logoColor=white" /></a> |
|
|
</p> |
|
|
|
|
|
### Refer to this amazing paper entitled: |
|
|
**RF-DETR Object Detection vs YOLOv12 : A Study of Transformer-based and CNN-based Architectures for Single-Class and Multi-Class Greenfruit Detection in Complex Orchard Environments Under Label Ambiguity** at: [arxiv.org/abs/2504.13099](https://arxiv.org/abs/2504.13099) |
|
|
|
|
|
# Transformers for Leaf Segmentation 🍁 |
|
|
This model card explores the application of Roboflow’s RF-DETR for leaf segmentation, focusing particularly on Ocimum tenuiflorum (Holy Basil). Unlike traditional CNN-based segmentation models, transformers can effectively capture global dependencies through attention mechanisms, leading to improved contextual understanding and better generalization performance. |
|
|
|
|
|
> RF-DETR represents one of the first transformer-based architectures to demonstrate that transformers can achieve both high accuracy and fast inference speeds, outperforming many CNN-based models in detection and segmentation tasks despite their traditionally heavier computational design. |
|
|
|
|
|
RF-DETR integrates architectural innovations from Deformable DETR and LW-DETR, and utilizes a DINOv2 backbone, offering superior global context modeling and domain adaptability. |
|
|
|
|
|
### Example Outputs |
|
|
Here are output examples from the model's validation run: |
|
|
|
|
|
<table> |
|
|
<tr> |
|
|
<td><img src="https://cdn-uploads.huggingface.co/production/uploads/66c6048d0bf40704e4159a23/aNF_6VN8FgBbYxWnA6Uwm.png" width="350"/></td> |
|
|
<td><img src="https://cdn-uploads.huggingface.co/production/uploads/66c6048d0bf40704e4159a23/8xF1fPREvmJ0OmgC-BBs6.png" width="350"/></td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td><img src="https://cdn-uploads.huggingface.co/production/uploads/66c6048d0bf40704e4159a23/9bBN7GXxpn6Ly8rLCkLhA.jpeg" width="350"/></td> |
|
|
<td><img src="https://cdn-uploads.huggingface.co/production/uploads/66c6048d0bf40704e4159a23/ZzmJCbK7hvKVK0EmBf3L-.png" width="350"/></td> |
|
|
</tr> |
|
|
</table> |
|
|
|
|
|
### Training Config: |
|
|
The model is trained on: [https://universe.roboflow.com/politicians/tulsi-wgmfs](https://universe.roboflow.com/politicians/tulsi-wgmfs) using COCO dataset format for RF-DETR Seg Preview. |
|
|
|
|
|
Training followed the official Roboflow implementation. The model was initialized with pretrained weights and trained using the AdamW optimizer, more params are as: |
|
|
```python |
|
|
epochs=2, |
|
|
batch_size=2, |
|
|
grad_accum_steps=4, |
|
|
lr=1e-4, #default |
|
|
pretrain_weights='rf-detr-seg-preview.pt', #default |
|
|
layer_norm=True, |
|
|
checkpoint_interval=10, |
|
|
seed=42, |
|
|
num_workers=2, |
|
|
device='cuda', #T4 colab GPU |
|
|
resolution=432, |
|
|
lr_scheduler='step', |
|
|
tensorboard=True, #check Training metrics |
|
|
class_names=['Tulsi'], |
|
|
segmentation_head=True |
|
|
``` |
|
|
Here is the training results over 2 epochs: |
|
|
 |
|
|
|
|
|
### Final Evaluation Metrics (Epoch 1 - Best EMA Model) |
|
|
|
|
|
The training was completed after 2 epochs, with the best performance achieved at **Epoch 1**. The metrics below are for the **Exponential Moving Average (EMA) model** (`checkpoint_best_ema.pth`), which represents a smoothed-out and more stable version of the model's weights. |
|
|
|
|
|
| Metric | Value | Description | |
|
|
| :--- | :--- | :--- | |
|
|
| **mAP (Masks) @.50:.95** | **`0.9650`** | **Primary metric for segmentation.** | |
|
|
| mAP (Boxes) @.50:.95 | `0.9424` | Primary metric for bounding box. | |
|
|
| mAP (Masks) @.50 | `0.9749` | Segmentation quality at 50% overlap. | |
|
|
| mAP (Boxes) @.50 | `0.9749` | Bounding box quality at 50% overlap. | |
|
|
| Precision (Boxes) | `0.9749` | Accuracy of positive predictions. | |
|
|
| Recall (Boxes) | `0.9400` | Ability to find all positive instances. | |
|
|
|
|
|
### Understanding the Metrics |
|
|
|
|
|
* **mAP (Masks) @.50:.95 (Primary Metric): `0.9650`** |
|
|
* **What it is:** This is the most important metric for this *segmentation* task. It stands for "mean Average Precision." It is the average of the model's mAP score across 10 different "strictness" thresholds, starting from 50% mask overlap (easy) all the way to 95% mask overlap (very hard). |
|
|
* **Value:** A score of **96.5%** is exceptionally high and indicates the model is extremely accurate at predicting the precise pixel-level outline of the leaves. |
|
|
|
|
|
* **mAP (Boxes) @.50:.95: `0.9424`** |
|
|
* **What it is:** This is the same as the primary metric, but it only judges the *bounding box* (the rectangle around the leaf), not the pixel-level mask. |
|
|
* **Value:** A score of **94.2%** shows the model is also excellent at just *locating* the leaves. |
|
|
|
|
|
* **mAP @.50 (Masks/Boxes): `0.9749`** |
|
|
* **What it is:** This is the mAP calculated at only one "easy" threshold: 50% overlap. As long as the predicted mask/box overlaps with the true mask/box by at least 50%, it's considered a "hit." |
|
|
* **Value:** A score of **97.5%** means the model is nearly perfect at *finding* all the leaves, even if the predicted outline isn't 100% pixel-perfect. |
|
|
|
|
|
* **Precision (Boxes): `0.9749`** |
|
|
* **What it is:** This answers the question: "Of all the leaves the model *predicted*, what percentage were *actually* leaves?" |
|
|
* **Value:** A score of **97.5%** means the model has extremely few "false positives." It almost never predicts a leaf where there isn't one. |
|
|
|
|
|
* **Recall (Boxes): `0.9400`** |
|
|
* **What it is:** This answers the question: "Of all the *actual* leaves that exist in the images, what percentage did the model *find*?" |
|
|
* **Value:** A score of **94.0%** is very high and means the model has very few "false negatives." It rarely misses a leaf that it should have found. |
|
|
|
|
|
### Graph Analysis: Base Model vs. EMA Model |
|
|
|
|
|
The training graph `metrics_plot.png` shows as: |
|
|
1. **Training vs. Validation Loss:** The training loss (blue) drops as the model learns, while the validation loss (orange) stays low and flat, indicating no overfitting. |
|
|
2. **Average Precision @0.50:** Shows the mAP at the "easy" 50% IoU threshold. |
|
|
3. **Average Precision @0.50-0.95:** Shows the primary (and "harder") COCO mAP. |
|
|
4. **Average Recall @0.50-0.95:** Shows the model's ability to find all objects. |
|
|
|
|
|
In all three evaluation plots, the **EMA Model (orange dashed line)** is clearly and consistently superior to the **Base Model (blue solid line)**. This is why the final metrics are reported from the EMA model checkpoint (`checkpoint_best_ema.pth`). |
|
|
|
|
|
### Inference |
|
|
```python |
|
|
!pip install rfdetr==1.3.0 supervision==0.26.1 requests pillow numpy |
|
|
``` |
|
|
```python |
|
|
!python rfdetr_seg_infer.py --image /d.jpg --weights /content/output/checkpoint.pth --out annotated_d.png |
|
|
``` |
|
|
|
|
|
### Topics in trend.. |
|
|
|
|
|
1. 🔥 Comparative Analysis of **RFDETR** vs **YOLO26** |
|
|
2. ©️ Continual Learning with RFDETR |