Segment-Tulsi-TFs / README.md
Subh775's picture
Update README.md
4362153 verified
---
license: apache-2.0
language:
- en
base_model:
- qualcomm/RF-DETR
pipeline_tag: image-segmentation
tags:
- Leaf
- Tulsi
- Segmentation
new_version: Subh775/Seg-Basil-rfdetr
---
# Segment-Tulsi Leaves with Transformers(RF-DETR)
| **Model** | **Best EMA Mask mAP (@.50:.95)** |
|--------------------------|----------------------------------|
| [LeafNet75/Segment-Tulsi-TFs](https://huggingface.co/LeafNet75/Segment-Tulsi-TFs) | 0.9650 |
| [Subh775/Seg-Basil-rfdetr](https://huggingface.co/Subh775/Seg-Basil-rfdetr) | 0.9668 |
<p align="left">
<a href="https://opensource.org/licenses/Apache-2.0"><img src="https://img.shields.io/badge/License-Apache_2.0-blue.svg" /></a>
<a href="https://blog.roboflow.com/rf-detr-segmentation-preview/"><img src="https://img.shields.io/badge/Roboflow-navy?logo=roboflow" /></a>
<a href="https://arxiv.org/abs/2504.13099"><img src="https://img.shields.io/badge/arXiv-2504.13099-B31B1B?logo=arxiv&logoColor=white" /></a>
</p>
### Refer to this amazing paper entitled:
**RF-DETR Object Detection vs YOLOv12 : A Study of Transformer-based and CNN-based Architectures for Single-Class and Multi-Class Greenfruit Detection in Complex Orchard Environments Under Label Ambiguity** at: [arxiv.org/abs/2504.13099](https://arxiv.org/abs/2504.13099)
# Transformers for Leaf Segmentation 🍁
This model card explores the application of Roboflow’s RF-DETR for leaf segmentation, focusing particularly on Ocimum tenuiflorum (Holy Basil). Unlike traditional CNN-based segmentation models, transformers can effectively capture global dependencies through attention mechanisms, leading to improved contextual understanding and better generalization performance.
> RF-DETR represents one of the first transformer-based architectures to demonstrate that transformers can achieve both high accuracy and fast inference speeds, outperforming many CNN-based models in detection and segmentation tasks despite their traditionally heavier computational design.
RF-DETR integrates architectural innovations from Deformable DETR and LW-DETR, and utilizes a DINOv2 backbone, offering superior global context modeling and domain adaptability.
### Example Outputs
Here are output examples from the model's validation run:
<table>
<tr>
<td><img src="https://cdn-uploads.huggingface.co/production/uploads/66c6048d0bf40704e4159a23/aNF_6VN8FgBbYxWnA6Uwm.png" width="350"/></td>
<td><img src="https://cdn-uploads.huggingface.co/production/uploads/66c6048d0bf40704e4159a23/8xF1fPREvmJ0OmgC-BBs6.png" width="350"/></td>
</tr>
<tr>
<td><img src="https://cdn-uploads.huggingface.co/production/uploads/66c6048d0bf40704e4159a23/9bBN7GXxpn6Ly8rLCkLhA.jpeg" width="350"/></td>
<td><img src="https://cdn-uploads.huggingface.co/production/uploads/66c6048d0bf40704e4159a23/ZzmJCbK7hvKVK0EmBf3L-.png" width="350"/></td>
</tr>
</table>
### Training Config:
The model is trained on: [https://universe.roboflow.com/politicians/tulsi-wgmfs](https://universe.roboflow.com/politicians/tulsi-wgmfs) using COCO dataset format for RF-DETR Seg Preview.
Training followed the official Roboflow implementation. The model was initialized with pretrained weights and trained using the AdamW optimizer, more params are as:
```python
epochs=2,
batch_size=2,
grad_accum_steps=4,
lr=1e-4, #default
pretrain_weights='rf-detr-seg-preview.pt', #default
layer_norm=True,
checkpoint_interval=10,
seed=42,
num_workers=2,
device='cuda', #T4 colab GPU
resolution=432,
lr_scheduler='step',
tensorboard=True, #check Training metrics
class_names=['Tulsi'],
segmentation_head=True
```
Here is the training results over 2 epochs:
![train_graph](metrics_plot.png)
### Final Evaluation Metrics (Epoch 1 - Best EMA Model)
The training was completed after 2 epochs, with the best performance achieved at **Epoch 1**. The metrics below are for the **Exponential Moving Average (EMA) model** (`checkpoint_best_ema.pth`), which represents a smoothed-out and more stable version of the model's weights.
| Metric | Value | Description |
| :--- | :--- | :--- |
| **mAP (Masks) @.50:.95** | **`0.9650`** | **Primary metric for segmentation.** |
| mAP (Boxes) @.50:.95 | `0.9424` | Primary metric for bounding box. |
| mAP (Masks) @.50 | `0.9749` | Segmentation quality at 50% overlap. |
| mAP (Boxes) @.50 | `0.9749` | Bounding box quality at 50% overlap. |
| Precision (Boxes) | `0.9749` | Accuracy of positive predictions. |
| Recall (Boxes) | `0.9400` | Ability to find all positive instances. |
### Understanding the Metrics
* **mAP (Masks) @.50:.95 (Primary Metric): `0.9650`**
* **What it is:** This is the most important metric for this *segmentation* task. It stands for "mean Average Precision." It is the average of the model's mAP score across 10 different "strictness" thresholds, starting from 50% mask overlap (easy) all the way to 95% mask overlap (very hard).
* **Value:** A score of **96.5%** is exceptionally high and indicates the model is extremely accurate at predicting the precise pixel-level outline of the leaves.
* **mAP (Boxes) @.50:.95: `0.9424`**
* **What it is:** This is the same as the primary metric, but it only judges the *bounding box* (the rectangle around the leaf), not the pixel-level mask.
* **Value:** A score of **94.2%** shows the model is also excellent at just *locating* the leaves.
* **mAP @.50 (Masks/Boxes): `0.9749`**
* **What it is:** This is the mAP calculated at only one "easy" threshold: 50% overlap. As long as the predicted mask/box overlaps with the true mask/box by at least 50%, it's considered a "hit."
* **Value:** A score of **97.5%** means the model is nearly perfect at *finding* all the leaves, even if the predicted outline isn't 100% pixel-perfect.
* **Precision (Boxes): `0.9749`**
* **What it is:** This answers the question: "Of all the leaves the model *predicted*, what percentage were *actually* leaves?"
* **Value:** A score of **97.5%** means the model has extremely few "false positives." It almost never predicts a leaf where there isn't one.
* **Recall (Boxes): `0.9400`**
* **What it is:** This answers the question: "Of all the *actual* leaves that exist in the images, what percentage did the model *find*?"
* **Value:** A score of **94.0%** is very high and means the model has very few "false negatives." It rarely misses a leaf that it should have found.
### Graph Analysis: Base Model vs. EMA Model
The training graph `metrics_plot.png` shows as:
1. **Training vs. Validation Loss:** The training loss (blue) drops as the model learns, while the validation loss (orange) stays low and flat, indicating no overfitting.
2. **Average Precision @0.50:** Shows the mAP at the "easy" 50% IoU threshold.
3. **Average Precision @0.50-0.95:** Shows the primary (and "harder") COCO mAP.
4. **Average Recall @0.50-0.95:** Shows the model's ability to find all objects.
In all three evaluation plots, the **EMA Model (orange dashed line)** is clearly and consistently superior to the **Base Model (blue solid line)**. This is why the final metrics are reported from the EMA model checkpoint (`checkpoint_best_ema.pth`).
### Inference
```python
!pip install rfdetr==1.3.0 supervision==0.26.1 requests pillow numpy
```
```python
!python rfdetr_seg_infer.py --image /d.jpg --weights /content/output/checkpoint.pth --out annotated_d.png
```
### Topics in trend..
1. 🔥 Comparative Analysis of **RFDETR** vs **YOLO26**
2. ©️ Continual Learning with RFDETR