File size: 7,548 Bytes
a52042c 7a8094d a52042c 8b8701a 552beca 2ca709e c494b1d 2ca709e 8b8701a e8995b9 c940ab8 8b8701a 6eadaca 8b8701a 4da3290 8b8701a c24c295 7448cd7 c24c295 e8995b9 c24c295 c940ab8 c24c295 47ed3e9 c940ab8 c24c295 47ed3e9 c24c295 c940ab8 c24c295 8b8701a e8995b9 8b8701a eef23ba 8b8701a e8995b9 02ea81d 1c39d84 02ea81d 6eadaca e8995b9 6eadaca e8995b9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | ---
license: apache-2.0
language:
- en
base_model:
- qualcomm/RF-DETR
pipeline_tag: image-segmentation
tags:
- Leaf
- Tulsi
- Segmentation
new_version: Subh775/Seg-Basil-rfdetr
---
# Segment-Tulsi Leaves with Transformers(RF-DETR)
| **Model** | **Best EMA Mask mAP (@.50:.95)** |
|--------------------------|----------------------------------|
| [LeafNet75/Segment-Tulsi-TFs](https://huggingface.co/LeafNet75/Segment-Tulsi-TFs) | 0.9650 |
| [Subh775/Seg-Basil-rfdetr](https://huggingface.co/Subh775/Seg-Basil-rfdetr) | 0.9668 |
<p align="left">
<a href="https://opensource.org/licenses/Apache-2.0"><img src="https://img.shields.io/badge/License-Apache_2.0-blue.svg" /></a>
<a href="https://blog.roboflow.com/rf-detr-segmentation-preview/"><img src="https://img.shields.io/badge/Roboflow-navy?logo=roboflow" /></a>
<a href="https://arxiv.org/abs/2504.13099"><img src="https://img.shields.io/badge/arXiv-2504.13099-B31B1B?logo=arxiv&logoColor=white" /></a>
</p>
### Refer to this amazing paper entitled:
**RF-DETR Object Detection vs YOLOv12 : A Study of Transformer-based and CNN-based Architectures for Single-Class and Multi-Class Greenfruit Detection in Complex Orchard Environments Under Label Ambiguity** at: [arxiv.org/abs/2504.13099](https://arxiv.org/abs/2504.13099)
# Transformers for Leaf Segmentation 🍁
This model card explores the application of Roboflow’s RF-DETR for leaf segmentation, focusing particularly on Ocimum tenuiflorum (Holy Basil). Unlike traditional CNN-based segmentation models, transformers can effectively capture global dependencies through attention mechanisms, leading to improved contextual understanding and better generalization performance.
> RF-DETR represents one of the first transformer-based architectures to demonstrate that transformers can achieve both high accuracy and fast inference speeds, outperforming many CNN-based models in detection and segmentation tasks despite their traditionally heavier computational design.
RF-DETR integrates architectural innovations from Deformable DETR and LW-DETR, and utilizes a DINOv2 backbone, offering superior global context modeling and domain adaptability.
### Example Outputs
Here are output examples from the model's validation run:
<table>
<tr>
<td><img src="https://cdn-uploads.huggingface.co/production/uploads/66c6048d0bf40704e4159a23/aNF_6VN8FgBbYxWnA6Uwm.png" width="350"/></td>
<td><img src="https://cdn-uploads.huggingface.co/production/uploads/66c6048d0bf40704e4159a23/8xF1fPREvmJ0OmgC-BBs6.png" width="350"/></td>
</tr>
<tr>
<td><img src="https://cdn-uploads.huggingface.co/production/uploads/66c6048d0bf40704e4159a23/9bBN7GXxpn6Ly8rLCkLhA.jpeg" width="350"/></td>
<td><img src="https://cdn-uploads.huggingface.co/production/uploads/66c6048d0bf40704e4159a23/ZzmJCbK7hvKVK0EmBf3L-.png" width="350"/></td>
</tr>
</table>
### Training Config:
The model is trained on: [https://universe.roboflow.com/politicians/tulsi-wgmfs](https://universe.roboflow.com/politicians/tulsi-wgmfs) using COCO dataset format for RF-DETR Seg Preview.
Training followed the official Roboflow implementation. The model was initialized with pretrained weights and trained using the AdamW optimizer, more params are as:
```python
epochs=2,
batch_size=2,
grad_accum_steps=4,
lr=1e-4, #default
pretrain_weights='rf-detr-seg-preview.pt', #default
layer_norm=True,
checkpoint_interval=10,
seed=42,
num_workers=2,
device='cuda', #T4 colab GPU
resolution=432,
lr_scheduler='step',
tensorboard=True, #check Training metrics
class_names=['Tulsi'],
segmentation_head=True
```
Here is the training results over 2 epochs:

### Final Evaluation Metrics (Epoch 1 - Best EMA Model)
The training was completed after 2 epochs, with the best performance achieved at **Epoch 1**. The metrics below are for the **Exponential Moving Average (EMA) model** (`checkpoint_best_ema.pth`), which represents a smoothed-out and more stable version of the model's weights.
| Metric | Value | Description |
| :--- | :--- | :--- |
| **mAP (Masks) @.50:.95** | **`0.9650`** | **Primary metric for segmentation.** |
| mAP (Boxes) @.50:.95 | `0.9424` | Primary metric for bounding box. |
| mAP (Masks) @.50 | `0.9749` | Segmentation quality at 50% overlap. |
| mAP (Boxes) @.50 | `0.9749` | Bounding box quality at 50% overlap. |
| Precision (Boxes) | `0.9749` | Accuracy of positive predictions. |
| Recall (Boxes) | `0.9400` | Ability to find all positive instances. |
### Understanding the Metrics
* **mAP (Masks) @.50:.95 (Primary Metric): `0.9650`**
* **What it is:** This is the most important metric for this *segmentation* task. It stands for "mean Average Precision." It is the average of the model's mAP score across 10 different "strictness" thresholds, starting from 50% mask overlap (easy) all the way to 95% mask overlap (very hard).
* **Value:** A score of **96.5%** is exceptionally high and indicates the model is extremely accurate at predicting the precise pixel-level outline of the leaves.
* **mAP (Boxes) @.50:.95: `0.9424`**
* **What it is:** This is the same as the primary metric, but it only judges the *bounding box* (the rectangle around the leaf), not the pixel-level mask.
* **Value:** A score of **94.2%** shows the model is also excellent at just *locating* the leaves.
* **mAP @.50 (Masks/Boxes): `0.9749`**
* **What it is:** This is the mAP calculated at only one "easy" threshold: 50% overlap. As long as the predicted mask/box overlaps with the true mask/box by at least 50%, it's considered a "hit."
* **Value:** A score of **97.5%** means the model is nearly perfect at *finding* all the leaves, even if the predicted outline isn't 100% pixel-perfect.
* **Precision (Boxes): `0.9749`**
* **What it is:** This answers the question: "Of all the leaves the model *predicted*, what percentage were *actually* leaves?"
* **Value:** A score of **97.5%** means the model has extremely few "false positives." It almost never predicts a leaf where there isn't one.
* **Recall (Boxes): `0.9400`**
* **What it is:** This answers the question: "Of all the *actual* leaves that exist in the images, what percentage did the model *find*?"
* **Value:** A score of **94.0%** is very high and means the model has very few "false negatives." It rarely misses a leaf that it should have found.
### Graph Analysis: Base Model vs. EMA Model
The training graph `metrics_plot.png` shows as:
1. **Training vs. Validation Loss:** The training loss (blue) drops as the model learns, while the validation loss (orange) stays low and flat, indicating no overfitting.
2. **Average Precision @0.50:** Shows the mAP at the "easy" 50% IoU threshold.
3. **Average Precision @0.50-0.95:** Shows the primary (and "harder") COCO mAP.
4. **Average Recall @0.50-0.95:** Shows the model's ability to find all objects.
In all three evaluation plots, the **EMA Model (orange dashed line)** is clearly and consistently superior to the **Base Model (blue solid line)**. This is why the final metrics are reported from the EMA model checkpoint (`checkpoint_best_ema.pth`).
### Inference
```python
!pip install rfdetr==1.3.0 supervision==0.26.1 requests pillow numpy
```
```python
!python rfdetr_seg_infer.py --image /d.jpg --weights /content/output/checkpoint.pth --out annotated_d.png
```
### Topics in trend..
1. 🔥 Comparative Analysis of **RFDETR** vs **YOLO26**
2. ©️ Continual Learning with RFDETR |