File size: 9,073 Bytes
135114c
 
 
 
 
 
 
 
 
 
 
175b769
 
6644c23
c91a1e4
16e44f5
c91a1e4
 
 
 
 
 
16e44f5
c91a1e4
 
 
16e44f5
 
c91a1e4
 
16e44f5
c91a1e4
 
 
 
16e44f5
c91a1e4
16e44f5
 
c91a1e4
16e44f5
c91a1e4
16e44f5
 
c91a1e4
16e44f5
135114c
 
 
7692574
 
 
6d7b203
 
7692574
135114c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d3cc231
f34302e
 
 
1aa4144
f34302e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
---
license: apache-2.0
language:
- en
base_model:
- qualcomm/RF-DETR
pipeline_tag: image-segmentation
tags:
- Instance_Segmentation
- CPU_friendly
- Transformers
- rfdetr
- supervision
- roboflow
model-index:
- name: BasiliskSeg
  results:
  - task:
      type: image-segmentation
      name: Instance Segmentation
    metrics:
    - type: coco
      value: 0.9668
      name: Mask mAP @ IoU=0.50:0.95 | area=all | maxDets=100
      config: segm
      args:
        iouThr: .50:.05:.95
        areaRng: all
        maxDets: 100
    - type: coco
      value: 0.9783
      name: Mask mAP @ IoU=0.50 | area=all | maxDets=100
      config: segm
      args:
        iouThr: '.50'
        areaRng: all
        maxDets: 100
    - type: coco
      value: 0.9871
      name: Mask AR @ IoU=0.50:0.95 | area=all | maxDets=100
      config: segm
      args:
        iouThr: .50:.05:.95
        areaRng: all
        maxDets: 100
library_name: transformers
---

# Segment-Tulsi(Basil) with Transformers(RF-DETR)

| **Model**              | **Best EMA Mask mAP (@.50:.95)** |
|--------------------------|----------------------------------|
| [LeafNet75/Segment-Tulsi-TFs](https://huggingface.co/LeafNet75/Segment-Tulsi-TFs) | 0.9650                           |
| [Subh775/Seg-Basil-rfdetr](https://huggingface.co/Subh775/Seg-Basil-rfdetr)  | 0.9668                           |

<p align="left">
  <a href="https://opensource.org/licenses/Apache-2.0"><img src="https://img.shields.io/badge/License-Apache_2.0-blue.svg" /></a>
  <a href="https://blog.roboflow.com/rf-detr-segmentation-preview/"><img src="https://img.shields.io/badge/Roboflow-navy?logo=roboflow" /></a>
  <a href="https://huggingface.co/spaces/LeafNet75/Segment-Leaf-RFDETR"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20H%20F-Demo-darkred" /></a>
  <a href="https://arxiv.org/abs/2504.13099"><img src="https://img.shields.io/badge/arXiv-2504.13099-B31B1B?logo=arxiv&logoColor=white" /></a>
</p>

### Refer to this amazing paper entitled:  
**RF-DETR Object Detection vs YOLOv12 : A Study of Transformer-based and CNN-based Architectures for Single-Class and Multi-Class Greenfruit Detection in Complex Orchard Environments Under Label Ambiguity** at: [arxiv.org/abs/2504.13099](https://arxiv.org/abs/2504.13099)

# Transformers for Leaf Segmentation 🍁
This model card explores the application of Roboflow’s RF-DETR for leaf instance segmentation, focusing particularly on Ocimum tenuiflorum (Holy Basil). Unlike traditional CNN-based segmentation models, transformers can effectively capture global dependencies through attention mechanisms, leading to improved contextual understanding and better generalization performance.

> RF-DETR represents one of the first transformer-based architectures to demonstrate that transformers can achieve both high accuracy and fast inference speeds, outperforming many CNN-based models in detection and segmentation tasks despite their traditionally heavier computational design.

RF-DETR integrates architectural innovations from Deformable DETR and LW-DETR, and utilizes a DINOv2 backbone, offering superior global context modeling and domain adaptability.

<!-- ### Example Outputs
Here are output examples from the model's validation run:

<table>
  <tr>
    <td><img src="https://cdn-uploads.huggingface.co/production/uploads/66c6048d0bf40704e4159a23/aNF_6VN8FgBbYxWnA6Uwm.png" width="350"/></td>
    <td><img src="https://cdn-uploads.huggingface.co/production/uploads/66c6048d0bf40704e4159a23/8xF1fPREvmJ0OmgC-BBs6.png" width="350"/></td>
  </tr>
  <tr>
    <td><img src="https://cdn-uploads.huggingface.co/production/uploads/66c6048d0bf40704e4159a23/9bBN7GXxpn6Ly8rLCkLhA.jpeg" width="350"/></td>
    <td><img src="https://cdn-uploads.huggingface.co/production/uploads/66c6048d0bf40704e4159a23/ZzmJCbK7hvKVK0EmBf3L-.png" width="350"/></td>
  </tr>
</table> -->

### Training Config:
The model is trained on: [https://universe.roboflow.com/politicians/tulsi-wgmfs](https://universe.roboflow.com/politicians/tulsi-wgmfs) using COCO dataset format for RF-DETR Seg Preview.

Training followed the official Roboflow implementation. The model was initialized with pretrained weights and trained using the AdamW optimizer, more params are as:
```python
epochs=3, # Updated from original run which stopped effectively at epoch 1
batch_size=2,
grad_accum_steps=4,
lr=1e-4, #default
pretrain_weights='rf-detr-seg-preview.pt', #default
layer_norm=True,
checkpoint_interval=12, # Note: Actual saving seems per-epoch based on best metrics
seed=42,
num_workers=2,
device='cuda', #T4 colab GPU
resolution=432,
lr_scheduler='step',
tensorboard=True, #check Training metrics
class_names=['Tulsi'],
segmentation_head=True
```
Here is the training results over 3 epochs (note: peak performance at Epoch 1):

### Final Evaluation Metrics (Epoch 1 - Best EMA Model)
![train_metrics](metrics_plot.png)

The training ran for 3 epochs on Colab's T4 GPU, with the best performance achieved at Epoch 1. The metrics below are for the Exponential Moving Average (EMA) model (checkpoint_best_ema.pth saved at Epoch 1), which represents a smoothed-out and more stable version of the model's weights. Training beyond Epoch 1 showed signs of overfitting.

| **Metric**                     | **Value** | **Description**                                 |
|--------------------------------|------------|-------------------------------------------------|
| mAP (Masks) @.50:.95           | 0.9668     | Primary metric for segmentation.                |
| mAP (Boxes) @.50:.95           | 0.9491     | Primary metric for bounding box detection.      |
| mAP (Masks) @.50               | 0.9783     | Segmentation quality at 50% overlap threshold.  |
| mAP (Boxes) @.50               | 0.9783     | Bounding box quality at 50% overlap threshold.  |
| Precision (Boxes)              | 0.9820     | Accuracy of positive predictions.               |
| Recall (Boxes)                 | 0.9300     | Ability to find all positive instances.         |


### Understanding the Metrics
- mAP (Masks) @.50:.95 (Primary Metric): 0.9668     * What it is: This is the most important metric for this segmentation task. It stands for "mean Average Precision." It is the average of the model's mAP score across 10 different "strictness" thresholds, starting from 50% mask overlap (easy) all the way to 95% mask overlap (very hard).     * Value: A score of 96.7% is exceptionally high and indicates the model is extremely accurate at predicting the precise pixel-level outline of the leaves.
- mAP (Boxes) @.50:.95: 0.9491     * What it is: This is the same as the primary metric, but it only judges the bounding box (the rectangle around the leaf), not the pixel-level mask.     * Value: A score of 94.9% shows the model is also excellent at just locating the leaves.
- mAP @.50 (Masks/Boxes): 0.9783     * What it is: This is the mAP calculated at only one "easy" threshold: 50% overlap. As long as the predicted mask/box overlaps with the true mask/box by at least 50%, it's considered a "hit."     * Value: A score of 97.8% means the model is nearly perfect at finding all the leaves, even if the predicted outline isn't 100% pixel-perfect.
- Precision (Boxes): 0.9820     * What it is: This answers the question: "Of all the leaves the model predicted, what percentage were actually leaves?"     * Value: A score of 98.2% means the model has extremely few "false positives." It almost never predicts a leaf where there isn't one.
- Recall (Boxes): 0.9300     * What it is: This answers the question: "Of all the actual leaves that exist in the images, what percentage did the model find?"     * Value: A score of 93.0% is very high and means the model has few "false negatives." It rarely misses a leaf that it should have found.

The training graph `metrics_plot.png` (updated for the 3-epoch run) shows: 
- 1.  Training vs. Validation Loss: Training loss (blue) decreases over the 3 epochs. Validation loss (orange) hits its minimum at Epoch 1 and slightly increases in Epoch 2, indicating the start of overfitting.
- 2.  Average Precision @0.50: Both Base (blue) and EMA (orange) models peak at Epoch 1, then decline.
- 3.  Average Precision @0.50-0.95: Both models peak at Epoch 1, then decline, with the Base model declining more sharply in Epoch 2.
- 4.  Average Recall @0.50-0.95: Both models peak at Epoch 1 and then slightly decline.

In all evaluation plots, the EMA Model (orange dashed line) consistently achieves higher scores and shows more stability than the Base Model (blue solid line). Both models show that peak performance was reached at Epoch 1, confirming that checkpoint_best_ema.pth saved at that point is the optimal model.

### Topics in trend..
- 🔥 Comparative Analysis of RFDETR vs YOLO26
- ©️ Continual Learning Applications with Transformers

## Foundational Resources
- Zero shot open vocabulary detectors: `Grounding-DINO`, `YOLO-World`, `Owl-ViT`
- Image Embedding and Visual features analysis: `Dinov2`, `Dinov3`(vision + language)