Initial release: Segformer85Mv1 (Segformer-b5 fine-tuned, 8-class apple orchard)

Browse files

Files changed (10) hide show

.gitattributes +4 -0
README.md +148 -0
Segformer85Mv1.pt +3 -0
history_v6.json +572 -0
predict.py +128 -0
samples/sample_00_frame_2575.jpg +3 -0
samples/sample_05_frame_3371.jpg +3 -0
samples/sample_09_frame_4009.jpg +3 -0
train_v6_5090.py +230 -0
v6_OOD_full_res.mp4 +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+samples/sample_00_frame_2575.jpg filter=lfs diff=lfs merge=lfs -text
+samples/sample_05_frame_3371.jpg filter=lfs diff=lfs merge=lfs -text
+samples/sample_09_frame_4009.jpg filter=lfs diff=lfs merge=lfs -text
+v6_OOD_full_res.mp4 filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,148 @@

+---
+license: apache-2.0
+language:
+- en
+tags:
+- semantic-segmentation
+- segformer
+- agriculture
+- orchard
+- apple
+- outdoor
+library_name: transformers
+pipeline_tag: image-segmentation
+base_model: nvidia/segformer-b5-finetuned-ade-640-640
+---
+# Segformer85Mv1 — Apple Orchard Semantic Segmentation
+A Segformer-B5 (85M parameters) fine-tuned for **8-class semantic segmentation** of outdoor apple orchard scenes captured from a robotic platform.
+## Quick Use
+```python
+from huggingface_hub import hf_hub_download
+from transformers import SegformerForSemanticSegmentation
+import torch, cv2, numpy as np
+import torch.nn.functional as F
+# 1. Download weights
+ckpt_path = hf_hub_download(repo_id="YOUR_USER/Segformer85Mv1", filename="Segformer85Mv1.pt")
+# 2. Init architecture from base + load fine-tuned weights
+NAMES = ["tree","ground","person","sky","road","mountain","building","background"]
+model = SegformerForSemanticSegmentation.from_pretrained(
+    "nvidia/segformer-b5-finetuned-ade-640-640",
+    num_labels=8,
+    id2label={i:n for i,n in enumerate(NAMES)},
+    label2id={n:i for i,n in enumerate(NAMES)},
+    ignore_mismatched_sizes=True,
+).cuda().eval()
+model.load_state_dict(torch.load(ckpt_path, map_location="cuda")["model"])
+# 3. Inference
+img = cv2.imread("your_image.jpg")
+H, W = img.shape[:2]
+H32, W32 = (H//32)*32, (W//32)*32
+rgb = cv2.cvtColor(cv2.resize(img, (W32, H32)), cv2.COLOR_BGR2RGB).astype(np.float32) / 255.0
+mean = np.array([0.485, 0.456, 0.406]); std = np.array([0.229, 0.224, 0.225])
+x = torch.from_numpy(((rgb - mean) / std).transpose(2,0,1)).unsqueeze(0).float().cuda()
+with torch.no_grad():
+    logits = model(pixel_values=x).logits
+    logits = F.interpolate(logits, size=(H, W), mode="bilinear", align_corners=False)
+    pred = logits.argmax(1)[0].cpu().numpy()  # H x W, values 0..7
+```
+A ready-to-use `predict.py` is included in this repo.
+## Classes (id → name)
+| ID | Class       | Notes                                                  |
+|----|-------------|--------------------------------------------------------|
+| 0  | **tree**    | Apple trees (priority class for downstream tasks)      |
+| 1  | ground      | Grass / dirt / orchard floor                           |
+| 2  | person      | Workers in scene                                       |
+| 3  | sky         |                                                        |
+| 4  | road        | Path between rows                                      |
+| 5  | mountain    | Distant terrain (often confused with sky in fog)       |
+| 6  | building    | Sheds, equipment shelters                              |
+| 7  | background  | Unknown / unlabeled regions (model output rare)        |
+## Architecture & Preprocessing
+| | |
+|---|---|
+| Base model | `nvidia/segformer-b5-finetuned-ade-640-640` |
+| Parameters | ~85M |
+| Decoder head | Reinitialized for 8 classes |
+| Input format | RGB, normalized with ImageNet mean/std |
+| `mean` | `[0.485, 0.456, 0.406]` |
+| `std` | `[0.229, 0.224, 0.225]` |
+| Input resolution | Any H×W where both are multiples of 32 |
+| Trained at | 1024×576 (native 16:9) |
+| Recommended inference | 1280×704 or original native (snap to 32-multiple) |
+| Precision | bfloat16 fine — model weights stored in fp32 |
+## Performance (NO data leakage)
+Validated on a temporally-disjoint hold-out (frames 4501+ from training set):
+| Metric | Value |
+|---|---|
+| **Tree IoU** | **0.742** |
+| **mIoU (7 real classes)** | **0.714** |
+| **Pixel accuracy** | **0.834** |
+### Per-class IoU
+| Class | IoU | Precision | Recall |
+|---|---|---|---|
+| tree | 0.742 | 0.79 | 0.93 |
+| ground | 0.851 | 0.91 | 0.93 |
+| person | 0.719 | 0.82 | 0.85 |
+| sky | 0.769 | 0.83 | 0.91 |
+| road | 0.804 | 0.86 | 0.92 |
+| mountain | 0.437 | 0.62 | 0.66 |
+| building | 0.711 | 0.84 | 0.83 |
+(Reported values from epoch 21 best-tree checkpoint on the no-leak validation split.)
+### OOD evaluation
+On a completely held-out recording (1912 frames from `oak_0415_twoRadar_1`, never seen in training), mean prediction confidence is **0.939**, with model predicting `tree` on 41.8% of pixels and falling back to `background` on only 7.4% — indicating strong out-of-distribution generalization.
+## Training Data
+- ~5300 frames from a single oak_0415_oneRadar_1 recording
+- Initial annotations from 3 separate Roboflow projects (SAM-assisted polygons), merged + class-aligned (`vines`→`tree`, `moutain`→`mountain` typo fixed)
+- Pseudo-labels generated by an earlier model to fill SAM annotation gaps
+- Temporal split: frames `<=4500` train (5177 samples), frames `>4500` validation (155 samples) — **no neighbor leakage**
+## Training Recipe
+| Hyperparameter | Value |
+|---|---|
+| Optimizer | AdamW, weight_decay 0.01 |
+| LR | 2e-5, cosine schedule |
+| Epochs | 30 |
+| Batch | 2 × grad_accum 4 (effective 8) |
+| Resolution | 1024×576 |
+| Precision | bfloat16 |
+| Loss | weighted cross-entropy |
+| Class weights | tree 1.5, ground 0.5, person 1.5, sky 1.0, road 1.0, mountain 1.0, building 1.0, background 0.1 |
+| Hardware | RTX 5090 (32 GB), ~2.3 hours |
+## Files in This Repo
+| File | Purpose |
+|---|---|
+| `Segformer85Mv1.pt` | Fine-tuned weights (339 MB) |
+| `predict.py` | Standalone inference script |
+| `README.md` | This file |
+| `samples/*.jpg` | Side-by-side prediction examples |
+| `train_v6_5090.py` | Training script (for reproduction) |
+| `history_v6.json` | Per-epoch training history |
+| `v6_OOD_full_res.mp4` | 1-minute OOD inference video at native resolution |
+## License
+Apache 2.0

Segformer85Mv1.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8acb2e185d8a0fdef3002f65bb77ab935c8b825a745585d484913511ea2192ec
+size 338890309

history_v6.json ADDED Viewed

	@@ -0,0 +1,572 @@

+[
+  {
+    "epoch": 1,
+    "train_loss": 0.7317733424766079,
+    "val_loss": 0.3614382008329416,
+    "pixel_accuracy": 0.7693707866053427,
+    "mIoU_7": 0.6290054460148573,
+    "mIoU_8": 0.551595071231773,
+    "tree_iou": 0.6935091765839309,
+    "per_class_iou": {
+      "tree": 0.6935091765839309,
+      "ground": 0.8269576770586198,
+      "person": 0.6239303161783188,
+      "sky": 0.6214335001151623,
+      "road": 0.7967923018977053,
+      "mountain": 0.32553944888984093,
+      "building": 0.5148757013804227,
+      "background": 0.009722447750183178
+    }
+  },
+  {
+    "epoch": 2,
+    "train_loss": 0.33158363348054465,
+    "val_loss": 0.29247073370676774,
+    "pixel_accuracy": 0.7828593483107918,
+    "mIoU_7": 0.671867518383689,
+    "mIoU_8": 0.5965795049819951,
+    "tree_iou": 0.6851515450971696,
+    "per_class_iou": {
+      "tree": 0.6851515450971696,
+      "ground": 0.8329074795641042,
+      "person": 0.651328150567213,
+      "sky": 0.7052746613315422,
+      "road": 0.7775950133402935,
+      "mountain": 0.38359572918942386,
+      "building": 0.6672200495960767,
+      "background": 0.0695634111701382
+    }
+  },
+  {
+    "epoch": 3,
+    "train_loss": 0.2567576938229606,
+    "val_loss": 0.2792640005548795,
+    "pixel_accuracy": 0.7981670639420922,
+    "mIoU_7": 0.6746137194404379,
+    "mIoU_8": 0.610904000372088,
+    "tree_iou": 0.7087253188861061,
+    "per_class_iou": {
+      "tree": 0.7087253188861061,
+      "ground": 0.8361002659414856,
+      "person": 0.6559418314019541,
+      "sky": 0.7082608919602776,
+      "road": 0.7365729363277305,
+      "mountain": 0.3941387795068854,
+      "building": 0.6825560120586263,
+      "background": 0.16493596689363926
+    }
+  },
+  {
+    "epoch": 4,
+    "train_loss": 0.22328906943892032,
+    "val_loss": 0.2556124304731687,
+    "pixel_accuracy": 0.8095350696194556,
+    "mIoU_7": 0.6815957786979532,
+    "mIoU_8": 0.6245835393810028,
+    "tree_iou": 0.7157414904333625,
+    "per_class_iou": {
+      "tree": 0.7157414904333625,
+      "ground": 0.8427510979252235,
+      "person": 0.6691967663941234,
+      "sky": 0.7399725381481819,
+      "road": 0.7453247701300584,
+      "mountain": 0.3829273869357179,
+      "building": 0.6752564009190044,
+      "background": 0.2254978641623496
+    }
+  },
+  {
+    "epoch": 5,
+    "train_loss": 0.2025882395349984,
+    "val_loss": 0.2460598509090069,
+    "pixel_accuracy": 0.8063361492635529,
+    "mIoU_7": 0.6819090058037277,
+    "mIoU_8": 0.6238768515193147,
+    "tree_iou": 0.7115214580042588,
+    "per_class_iou": {
+      "tree": 0.7115214580042588,
+      "ground": 0.840159545948353,
+      "person": 0.6882897261490508,
+      "sky": 0.7281685473144833,
+      "road": 0.7760738885802295,
+      "mountain": 0.36856786582115475,
+      "building": 0.6605820088085645,
+      "background": 0.21765177152842294
+    }
+  },
+  {
+    "epoch": 6,
+    "train_loss": 0.18754121326082826,
+    "val_loss": 0.2658998461870047,
+    "pixel_accuracy": 0.8175638397107415,
+    "mIoU_7": 0.6910897251785011,
+    "mIoU_8": 0.6396066650450611,
+    "tree_iou": 0.7255735452977728,
+    "per_class_iou": {
+      "tree": 0.7255735452977728,
+      "ground": 0.8462776574160872,
+      "person": 0.6831329647948793,
+      "sky": 0.7372118420359767,
+      "road": 0.7377976663573189,
+      "mountain": 0.424413863299673,
+      "building": 0.6832205370477994,
+      "background": 0.27922524411098154
+    }
+  },
+  {
+    "epoch": 7,
+    "train_loss": 0.1783967717151968,
+    "val_loss": 0.24774512161429113,
+    "pixel_accuracy": 0.8165311970591118,
+    "mIoU_7": 0.6933117282517155,
+    "mIoU_8": 0.6395462936236171,
+    "tree_iou": 0.7274226749021537,
+    "per_class_iou": {
+      "tree": 0.7274226749021537,
+      "ground": 0.8418132457425621,
+      "person": 0.6861982683859594,
+      "sky": 0.7479193848975584,
+      "road": 0.7410099984549848,
+      "mountain": 0.4233301221532889,
+      "building": 0.6854884032255009,
+      "background": 0.2631882512269295
+    }
+  },
+  {
+    "epoch": 8,
+    "train_loss": 0.1686619697196245,
+    "val_loss": 0.253682183722655,
+    "pixel_accuracy": 0.8146241000049003,
+    "mIoU_7": 0.6906289318837233,
+    "mIoU_8": 0.635875803992791,
+    "tree_iou": 0.724371348212418,
+    "per_class_iou": {
+      "tree": 0.724371348212418,
+      "ground": 0.8416168509199589,
+      "person": 0.689128159543506,
+      "sky": 0.7491636164004881,
+      "road": 0.7221377666908946,
+      "mountain": 0.41500210217063455,
+      "building": 0.6929826792481624,
+      "background": 0.2526039087562656
+    }
+  },
+  {
+    "epoch": 9,
+    "train_loss": 0.1617005393484829,
+    "val_loss": 0.23389703856828886,
+    "pixel_accuracy": 0.8142122877114135,
+    "mIoU_7": 0.6982474481555915,
+    "mIoU_8": 0.6396787822355385,
+    "tree_iou": 0.7227781453845477,
+    "per_class_iou": {
+      "tree": 0.7227781453845477,
+      "ground": 0.8463292649711305,
+      "person": 0.6948543902963943,
+      "sky": 0.752873076350096,
+      "road": 0.7767962580351105,
+      "mountain": 0.4051208118881919,
+      "building": 0.6889801901636698,
+      "background": 0.22969812079516688
+    }
+  },
+  {
+    "epoch": 10,
+    "train_loss": 0.15628578751225605,
+    "val_loss": 0.247397372737909,
+    "pixel_accuracy": 0.8216093548737119,
+    "mIoU_7": 0.6951037061409514,
+    "mIoU_8": 0.6432917480279269,
+    "tree_iou": 0.7360921756649008,
+    "per_class_iou": {
+      "tree": 0.7360921756649008,
+      "ground": 0.8459931111438093,
+      "person": 0.6749524232501306,
+      "sky": 0.7544135951669563,
+      "road": 0.7459900655039325,
+      "mountain": 0.41176574357825835,
+      "building": 0.6965188286786725,
+      "background": 0.28060804123675503
+    }
+  },
+  {
+    "epoch": 11,
+    "train_loss": 0.15047259357868703,
+    "val_loss": 0.2422896781219886,
+    "pixel_accuracy": 0.8186814503003192,
+    "mIoU_7": 0.7032928273867227,
+    "mIoU_8": 0.6488940620062726,
+    "tree_iou": 0.7280322295553546,
+    "per_class_iou": {
+      "tree": 0.7280322295553546,
+      "ground": 0.8452403621614176,
+      "person": 0.6885070862388089,
+      "sky": 0.7378996317184476,
+      "road": 0.7874709202653031,
+      "mountain": 0.4391025863133733,
+      "building": 0.6967969754543536,
+      "background": 0.26810270434312317
+    }
+  },
+  {
+    "epoch": 12,
+    "train_loss": 0.14541167222608278,
+    "val_loss": 0.2657107286728345,
+    "pixel_accuracy": 0.8122323094303036,
+    "mIoU_7": 0.6944432923305518,
+    "mIoU_8": 0.6353890116604214,
+    "tree_iou": 0.7179542327564974,
+    "per_class_iou": {
+      "tree": 0.7179542327564974,
+      "ground": 0.8458071815929991,
+      "person": 0.6630505235819328,
+      "sky": 0.7576548123175906,
+      "road": 0.7707661859292342,
+      "mountain": 0.4066247100942614,
+      "building": 0.6992454000413479,
+      "background": 0.22200904696950824
+    }
+  },
+  {
+    "epoch": 13,
+    "train_loss": 0.14175742986817863,
+    "val_loss": 0.245059463649224,
+    "pixel_accuracy": 0.8285006506041387,
+    "mIoU_7": 0.709682438526708,
+    "mIoU_8": 0.6600046944565648,
+    "tree_iou": 0.737251946536209,
+    "per_class_iou": {
+      "tree": 0.737251946536209,
+      "ground": 0.8495676511625408,
+      "person": 0.673414491471947,
+      "sky": 0.7658854974430468,
+      "road": 0.8044619128651328,
+      "mountain": 0.4364369682032032,
+      "building": 0.7007586020048767,
+      "background": 0.31226048596556205
+    }
+  },
+  {
+    "epoch": 14,
+    "train_loss": 0.13955131033507437,
+    "val_loss": 0.24541893630073622,
+    "pixel_accuracy": 0.823048581359207,
+    "mIoU_7": 0.703872681203359,
+    "mIoU_8": 0.6523065270693985,
+    "tree_iou": 0.7334317964255183,
+    "per_class_iou": {
+      "tree": 0.7334317964255183,
+      "ground": 0.8472548995668966,
+      "person": 0.6849725597168292,
+      "sky": 0.7430631079001473,
+      "road": 0.7870281350545207,
+      "mountain": 0.42822861819540836,
+      "building": 0.7031296515641924,
+      "background": 0.2913434481316755
+    }
+  },
+  {
+    "epoch": 15,
+    "train_loss": 0.1340014274150041,
+    "val_loss": 0.2613945401822909,
+    "pixel_accuracy": 0.8256342077767977,
+    "mIoU_7": 0.6938176732869097,
+    "mIoU_8": 0.6465352082869225,
+    "tree_iou": 0.7348229701448902,
+    "per_class_iou": {
+      "tree": 0.7348229701448902,
+      "ground": 0.8485500320261311,
+      "person": 0.6985548375080566,
+      "sky": 0.7719071628391762,
+      "road": 0.6875423844689514,
+      "mountain": 0.4175557643370586,
+      "building": 0.697790561684104,
+      "background": 0.3155579532870117
+    }
+  },
+  {
+    "epoch": 16,
+    "train_loss": 0.13028158216666677,
+    "val_loss": 0.2640461309407002,
+    "pixel_accuracy": 0.8236345844884072,
+    "mIoU_7": 0.7017641778623044,
+    "mIoU_8": 0.6513057693665132,
+    "tree_iou": 0.729911002309811,
+    "per_class_iou": {
+      "tree": 0.729911002309811,
+      "ground": 0.8492910742408395,
+      "person": 0.6838423200192018,
+      "sky": 0.7641037715450866,
+      "road": 0.777327883240598,
+      "mountain": 0.41684152019417936,
+      "building": 0.6910316734864147,
+      "background": 0.29809690989597504
+    }
+  },
+  {
+    "epoch": 17,
+    "train_loss": 0.1271395583290152,
+    "val_loss": 0.27516031752412135,
+    "pixel_accuracy": 0.8244798229586694,
+    "mIoU_7": 0.7030521402715308,
+    "mIoU_8": 0.6534132836443587,
+    "tree_iou": 0.7358827999718607,
+    "per_class_iou": {
+      "tree": 0.7358827999718607,
+      "ground": 0.8468523343784162,
+      "person": 0.6691681595051834,
+      "sky": 0.7460565593685016,
+      "road": 0.7955708148842865,
+      "mountain": 0.42532427764901415,
+      "building": 0.7025100361434535,
+      "background": 0.3059412872541537
+    }
+  },
+  {
+    "epoch": 18,
+    "train_loss": 0.12417552829900011,
+    "val_loss": 0.26260221988344806,
+    "pixel_accuracy": 0.8236053794942876,
+    "mIoU_7": 0.7054233496573896,
+    "mIoU_8": 0.6541734279184714,
+    "tree_iou": 0.7336979074099337,
+    "per_class_iou": {
+      "tree": 0.7336979074099337,
+      "ground": 0.8459052222065085,
+      "person": 0.7072702433260717,
+      "sky": 0.7598311921866788,
+      "road": 0.7741989806072286,
+      "mountain": 0.4169527905509687,
+      "building": 0.7001071113143372,
+      "background": 0.29542397574604473
+    }
+  },
+  {
+    "epoch": 19,
+    "train_loss": 0.12182419967637549,
+    "val_loss": 0.27713373312965417,
+    "pixel_accuracy": 0.8270421947629648,
+    "mIoU_7": 0.7109813189537684,
+    "mIoU_8": 0.6606370811375022,
+    "tree_iou": 0.7323743055163786,
+    "per_class_iou": {
+      "tree": 0.7323743055163786,
+      "ground": 0.8498375951196969,
+      "person": 0.7007254650111793,
+      "sky": 0.7758277755445154,
+      "road": 0.7673058681573506,
+      "mountain": 0.44176907525611614,
+      "building": 0.7090291480711419,
+      "background": 0.3082274164236383
+    }
+  },
+  {
+    "epoch": 20,
+    "train_loss": 0.11663105687877909,
+    "val_loss": 0.2742788792611697,
+    "pixel_accuracy": 0.8236675740997423,
+    "mIoU_7": 0.705239040664458,
+    "mIoU_8": 0.6555626455120473,
+    "tree_iou": 0.7277708011216564,
+    "per_class_iou": {
+      "tree": 0.7277708011216564,
+      "ground": 0.8480108982494552,
+      "person": 0.6999480566951554,
+      "sky": 0.7583926266348064,
+      "road": 0.7733277370427203,
+      "mountain": 0.43148052924849667,
+      "building": 0.6977426356589147,
+      "background": 0.3078278794451741
+    }
+  },
+  {
+    "epoch": 21,
+    "train_loss": 0.11508058199955516,
+    "val_loss": 0.27676021279050755,
+    "pixel_accuracy": 0.8313707139756944,
+    "mIoU_7": 0.7077708636483279,
+    "mIoU_8": 0.6614555034088554,
+    "tree_iou": 0.7424294121477844,
+    "per_class_iou": {
+      "tree": 0.7424294121477844,
+      "ground": 0.8522079523217,
+      "person": 0.7032718147223185,
+      "sky": 0.7628166116510707,
+      "road": 0.7671078882037391,
+      "mountain": 0.4261684625639688,
+      "building": 0.700393903927713,
+      "background": 0.3372479817325494
+    }
+  },
+  {
+    "epoch": 22,
+    "train_loss": 0.11490547226906007,
+    "val_loss": 0.2687373280716248,
+    "pixel_accuracy": 0.8267316264490927,
+    "mIoU_7": 0.7110329608311641,
+    "mIoU_8": 0.6609831987414228,
+    "tree_iou": 0.7378164635727757,
+    "per_class_iou": {
+      "tree": 0.7378164635727757,
+      "ground": 0.8448131953009181,
+      "person": 0.706298048708027,
+      "sky": 0.7649988259226514,
+      "road": 0.768533202732033,
+      "mountain": 0.44301062115660517,
+      "building": 0.7117603684251378,
+      "background": 0.3106348641132343
+    }
+  },
+  {
+    "epoch": 23,
+    "train_loss": 0.1098061478228943,
+    "val_loss": 0.2768534162105658,
+    "pixel_accuracy": 0.8206308016213038,
+    "mIoU_7": 0.7051513810529587,
+    "mIoU_8": 0.6541264498668504,
+    "tree_iou": 0.7231827678494563,
+    "per_class_iou": {
+      "tree": 0.7231827678494563,
+      "ground": 0.8449066487884843,
+      "person": 0.7041216903044678,
+      "sky": 0.7592148050549444,
+      "road": 0.7857372699186778,
+      "mountain": 0.41639223489260957,
+      "building": 0.7025042505620707,
+      "background": 0.2969519315640925
+    }
+  },
+  {
+    "epoch": 24,
+    "train_loss": 0.10950512597927539,
+    "val_loss": 0.2952564288026247,
+    "pixel_accuracy": 0.8288865721781195,
+    "mIoU_7": 0.7106030798944316,
+    "mIoU_8": 0.665005005296281,
+    "tree_iou": 0.730085894441625,
+    "per_class_iou": {
+      "tree": 0.730085894441625,
+      "ground": 0.8504673603873821,
+      "person": 0.6969281633100608,
+      "sky": 0.7696089528996574,
+      "road": 0.7867176272154283,
+      "mountain": 0.4279298391265215,
+      "building": 0.7124837218803463,
+      "background": 0.3458184831092274
+    }
+  },
+  {
+    "epoch": 25,
+    "train_loss": 0.10687056659552423,
+    "val_loss": 0.27761973994664657,
+    "pixel_accuracy": 0.8238253138825885,
+    "mIoU_7": 0.7077265497389625,
+    "mIoU_8": 0.6572216095334821,
+    "tree_iou": 0.7262833333589551,
+    "per_class_iou": {
+      "tree": 0.7262833333589551,
+      "ground": 0.8512492951444962,
+      "person": 0.7094425059787509,
+      "sky": 0.7588241111321905,
+      "road": 0.7839445591226314,
+      "mountain": 0.41280223596732046,
+      "building": 0.7115398074683938,
+      "background": 0.30368702809511827
+    }
+  },
+  {
+    "epoch": 26,
+    "train_loss": 0.10538023605171182,
+    "val_loss": 0.29933813543846977,
+    "pixel_accuracy": 0.823246825296819,
+    "mIoU_7": 0.704192355270756,
+    "mIoU_8": 0.6550478112982461,
+    "tree_iou": 0.7267114852570552,
+    "per_class_iou": {
+      "tree": 0.7267114852570552,
+      "ground": 0.8469380423129873,
+      "person": 0.7013946479260358,
+      "sky": 0.7600105651077275,
+      "road": 0.7588123322822986,
+      "mountain": 0.43047006821479455,
+      "building": 0.7050093457943926,
+      "background": 0.311036003490677
+    }
+  },
+  {
+    "epoch": 27,
+    "train_loss": 0.10386164612072006,
+    "val_loss": 0.3018250732849806,
+    "pixel_accuracy": 0.8294269083221326,
+    "mIoU_7": 0.7132434231102097,
+    "mIoU_8": 0.6670003332357785,
+    "tree_iou": 0.7363907878200537,
+    "per_class_iou": {
+      "tree": 0.7363907878200537,
+      "ground": 0.8504632417020571,
+      "person": 0.7128458671987324,
+      "sky": 0.7624585962209668,
+      "road": 0.7741146775923515,
+      "mountain": 0.4371976195380884,
+      "building": 0.7192331716992189,
+      "background": 0.3432987041147584
+    }
+  },
+  {
+    "epoch": 28,
+    "train_loss": 0.10010939652540676,
+    "val_loss": 0.3033689678861545,
+    "pixel_accuracy": 0.8270183713632673,
+    "mIoU_7": 0.7110530306755144,
+    "mIoU_8": 0.6650936852452135,
+    "tree_iou": 0.7283072811263064,
+    "per_class_iou": {
+      "tree": 0.7283072811263064,
+      "ground": 0.8456942664597132,
+      "person": 0.6923932394957782,
+      "sky": 0.7610996834802041,
+      "road": 0.7837289807636174,
+      "mountain": 0.4502767990300785,
+      "building": 0.7158709643729031,
+      "background": 0.34337826723310705
+    }
+  },
+  {
+    "epoch": 29,
+    "train_loss": 0.09945478258795541,
+    "val_loss": 0.3067954242802583,
+    "pixel_accuracy": 0.8338931613498264,
+    "mIoU_7": 0.7143796374000682,
+    "mIoU_8": 0.6711484387659205,
+    "tree_iou": 0.7407570830601974,
+    "per_class_iou": {
+      "tree": 0.7407570830601974,
+      "ground": 0.851453401283257,
+      "person": 0.7188091768041799,
+      "sky": 0.7689343025621244,
+      "road": 0.7752271933914192,
+      "mountain": 0.43486270051421433,
+      "building": 0.710613604185085,
+      "background": 0.3685300483268865
+    }
+  },
+  {
+    "epoch": 30,
+    "train_loss": 0.09672546130497084,
+    "val_loss": 0.32073069191896,
+    "pixel_accuracy": 0.8235090577046931,
+    "mIoU_7": 0.7008779030646494,
+    "mIoU_8": 0.655579892357319,
+    "tree_iou": 0.7274351359731892,
+    "per_class_iou": {
+      "tree": 0.7274351359731892,
+      "ground": 0.844361516651275,
+      "person": 0.6928363707324503,
+      "sky": 0.7461845320594144,
+      "road": 0.7788302132855058,
+      "mountain": 0.40112782563995303,
+      "building": 0.7153697271107579,
+      "background": 0.3384938174060061
+    }
+  }
+]

predict.py ADDED Viewed

	@@ -0,0 +1,128 @@

+"""Segformer85Mv1 — apple-orchard semantic segmentation inference.
+Usage:
+    python predict.py input.jpg                     # writes input_pred.png + input_overlay.jpg
+    python predict.py --dir frames/ --out out/      # batch process a folder
+Classes (id → name):
+    0 tree   1 ground   2 person   3 sky
+    4 road   5 mountain 6 building 7 background
+"""
+from __future__ import annotations
+import argparse
+import os
+from pathlib import Path
+import cv2
+import numpy as np
+import torch
+import torch.nn.functional as F
+from transformers import SegformerForSemanticSegmentation
+# ─── config ───
+BASE_MODEL = "nvidia/segformer-b5-finetuned-ade-640-640"
+WEIGHTS_PATH = os.environ.get("SEGFORMER85MV1_WEIGHTS", "Segformer85Mv1.pt")  # local file or full path
+NAMES = ["tree", "ground", "person", "sky", "road", "mountain", "building", "background"]
+PALETTE = np.array([
+    [60, 220, 60],    # tree     - green
+    [40, 100, 160],   # ground   - brown
+    [40,  40, 230],   # person   - red
+    [230, 200, 60],   # sky      - cyan
+    [140, 140, 140],  # road     - gray
+    [180,  60, 180],  # mountain - purple
+    [50, 220, 220],   # building - yellow
+    [100, 100, 100],  # background - mid-gray
+], dtype=np.uint8)
+IMAGENET_MEAN = np.array([0.485, 0.456, 0.406], dtype=np.float32)
+IMAGENET_STD  = np.array([0.229, 0.224, 0.225], dtype=np.float32)
+def load_model(weights_path: str | Path = WEIGHTS_PATH, device: str = "cuda"):
+    """Load Segformer85Mv1. Returns model in eval mode on the target device."""
+    model = SegformerForSemanticSegmentation.from_pretrained(
+        BASE_MODEL,
+        num_labels=len(NAMES),
+        id2label={i: n for i, n in enumerate(NAMES)},
+        label2id={n: i for i, n in enumerate(NAMES)},
+        ignore_mismatched_sizes=True,
+    ).to(device)
+    ckpt = torch.load(weights_path, map_location=device, weights_only=False)
+    state = ckpt["model"] if "model" in ckpt else ckpt
+    model.load_state_dict(state)
+    model.eval()
+    return model
+def preprocess(bgr_img: np.ndarray) -> tuple[torch.Tensor, tuple[int, int]]:
+    """BGR uint8 image → normalized tensor sized to 32 multiples; returns (tensor, original (H,W))."""
+    H, W = bgr_img.shape[:2]
+    H32, W32 = (H // 32) * 32, (W // 32) * 32
+    if H32 == 0 or W32 == 0:
+        raise ValueError(f"Image too small: {W}x{H}")
+    rgb = cv2.cvtColor(cv2.resize(bgr_img, (W32, H32)), cv2.COLOR_BGR2RGB).astype(np.float32) / 255.0
+    rgb = (rgb - IMAGENET_MEAN) / IMAGENET_STD
+    x = torch.from_numpy(rgb.transpose(2, 0, 1)).unsqueeze(0).float()
+    return x, (H, W)
+def predict(model, bgr_img: np.ndarray, device: str = "cuda") -> np.ndarray:
+    """Run inference on one BGR image. Returns (H,W) uint8 mask with class ids 0..7."""
+    x, (H, W) = preprocess(bgr_img)
+    x = x.to(device)
+    with torch.no_grad():
+        logits = model(pixel_values=x).logits
+        logits = F.interpolate(logits, size=(H, W), mode="bilinear", align_corners=False)
+    return logits.argmax(1)[0].cpu().numpy().astype(np.uint8)
+def colorize(mask: np.ndarray) -> np.ndarray:
+    """class-id mask (H,W) → BGR color visualization (H,W,3)."""
+    return PALETTE[mask]
+def overlay(bgr_img: np.ndarray, mask: np.ndarray, alpha: float = 0.45) -> np.ndarray:
+    """Blend prediction over original image."""
+    return cv2.addWeighted(bgr_img, 1 - alpha, colorize(mask), alpha, 0)
+def main():
+    ap = argparse.ArgumentParser(description="Segformer85Mv1 inference (8-class outdoor segmentation).")
+    ap.add_argument("input", nargs="?", help="Single image path")
+    ap.add_argument("--dir", help="Directory of images to process")
+    ap.add_argument("--out", default=".", help="Output directory")
+    ap.add_argument("--weights", default=WEIGHTS_PATH, help="Path to Segformer85Mv1.pt")
+    ap.add_argument("--device", default="cuda" if torch.cuda.is_available() else "cpu")
+    args = ap.parse_args()
+    if not args.input and not args.dir:
+        ap.print_help()
+        return
+    print(f"loading model from {args.weights} on {args.device} ...")
+    model = load_model(args.weights, device=args.device)
+    out_dir = Path(args.out); out_dir.mkdir(parents=True, exist_ok=True)
+    paths = []
+    if args.dir:
+        paths = sorted(p for p in Path(args.dir).iterdir() if p.suffix.lower() in {".jpg", ".jpeg", ".png", ".bmp"})
+    if args.input:
+        paths.append(Path(args.input))
+    for p in paths:
+        img = cv2.imread(str(p))
+        if img is None:
+            print(f"  skip (unreadable): {p}")
+            continue
+        mask = predict(model, img, device=args.device)
+        cv2.imwrite(str(out_dir / f"{p.stem}_pred.png"), mask)             # raw class-id mask
+        cv2.imwrite(str(out_dir / f"{p.stem}_overlay.jpg"), overlay(img, mask))  # visualization
+        # quick stats
+        counts = np.bincount(mask.flatten(), minlength=len(NAMES))
+        top = counts.argmax()
+        print(f"  {p.name:<40}  top class: {NAMES[top]} ({100*counts[top]/counts.sum():.1f}%)")
+    print(f"\noutputs -> {out_dir.resolve()}")
+if __name__ == "__main__":
+    main()

samples/sample_00_frame_2575.jpg ADDED Viewed

Git LFS Details

SHA256: 6b5b4fa73c3be9450d1cb2ab2dc658a87833308e4adc172ba087a012177d5196
Pointer size: 131 Bytes
Size of remote file: 576 kB

samples/sample_05_frame_3371.jpg ADDED Viewed

Git LFS Details

SHA256: 4dc36e08d9b76a7f2d76a1b28fd8f6a880a00fd5a9b31f37114b4b682d1b77a9
Pointer size: 131 Bytes
Size of remote file: 656 kB

samples/sample_09_frame_4009.jpg ADDED Viewed

Git LFS Details

SHA256: b9b1d27f602db898962a224a955209d03c949296b02822491df8c5e6a205d135
Pointer size: 131 Bytes
Size of remote file: 601 kB

train_v6_5090.py ADDED Viewed

	@@ -0,0 +1,230 @@

+"""V6 — final, all-problems-fixed training on RTX 5090.
+Fixes vs v4 (the leaky 0.78 mIoU):
+  1. TEMPORAL split (frame_<=4500 train, frame_>4500 val) — zero neighbor leakage
+  2. Native 1280x704 input (16:9, no padding, no resizing artifacts)
+  3. Segformer-b5 (85M params, 4x v4's b2 capacity)
+  4. batch 4 + BF16 (saturates 5090's 32GB VRAM)
+  5. Global confusion-matrix IoU (not per-batch noisy averages)
+  6. Pseudo-labels (carry over - they were generated by v4 on full images)
+"""
+from __future__ import annotations
+import json, re, time
+from pathlib import Path
+import numpy as np, cv2, torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch.utils.data import Dataset, DataLoader
+from torch.amp import GradScaler, autocast
+import albumentations as A
+from transformers import SegformerForSemanticSegmentation
+# ───────────── config ─────────────
+ROOT = Path("/workspace/agmotree/dataset")
+IMG_DIR = ROOT / "train/images"
+MSK_DIR = ROOT / "train/masks_pseudo"
+OUT_DIR = Path("/workspace/agmotree/v6_output")
+OUT_DIR.mkdir(parents=True, exist_ok=True)
+MODEL_NAME = "nvidia/segformer-b5-finetuned-ade-640-640"
+NUM_CLASSES = 8
+NAMES = ["tree", "ground", "person", "sky", "road", "mountain", "building", "background"]
+IMG_W = 1024
+IMG_H = 576                  # 32 multiple closest to native 720
+BATCH_SIZE = 2
+GRAD_ACCUM = 4
+EPOCHS = 30
+LR = 2e-5
+WEIGHT_DECAY = 1e-2
+NUM_WORKERS = 8
+SEED = 42
+DEVICE = "cuda"
+SPLIT_FRAME = 4500           # frames<=4500 → train, >4500 → val (NO LEAK)
+# Hand-tuned class weights (proven in v4 - prevents collapse)
+WEIGHTS = np.array([
+    1.5,   # tree     - priority class
+    0.5,   # ground   - very common
+    1.5,   # person
+    1.0,   # sky
+    1.0,   # road
+    1.0,   # mountain
+    1.0,   # building
+    0.1,   # background - low but trainable
+])
+print(f"=== V6 / RTX 5090 / NO LEAK ===")
+print(f"  model: {MODEL_NAME}")
+print(f"  input: {IMG_W}x{IMG_H} (native 16:9)")
+print(f"  batch: {BATCH_SIZE} x grad_accum {GRAD_ACCUM} = effective {BATCH_SIZE*GRAD_ACCUM}")
+print(f"  LR: {LR}, epochs: {EPOCHS}")
+print(f"  TEMPORAL split: train frame<={SPLIT_FRAME}, val frame>{SPLIT_FRAME}")
+# ───────────── data ─────────────
+def frame_num(p: Path) -> int:
+    m = re.match(r"frame_(\d+)", p.stem)
+    return int(m.group(1)) if m else -1
+all_imgs = sorted(IMG_DIR.glob("*.jpg"))
+train_imgs = [p for p in all_imgs if frame_num(p) <= SPLIT_FRAME]
+val_imgs   = [p for p in all_imgs if frame_num(p) >  SPLIT_FRAME]
+train_nums = set(frame_num(p) for p in train_imgs)
+val_nums = set(frame_num(p) for p in val_imgs)
+print(f"  train: {len(train_imgs)} files, frames {min(train_nums)}-{max(train_nums)}")
+print(f"  val:   {len(val_imgs)} files, frames {min(val_nums)}-{max(val_nums)}")
+print(f"  overlap (must be 0): {len(train_nums & val_nums)}")
+assert len(train_nums & val_nums) == 0
+train_tf = A.Compose([
+    A.Resize(IMG_H, IMG_W),
+    A.HorizontalFlip(p=0.5),
+    A.RandomBrightnessContrast(0.2, 0.2, p=0.5),
+    A.HueSaturationValue(10, 15, 10, p=0.3),
+    A.GaussianBlur(blur_limit=(3, 5), p=0.2),
+    A.Normalize(mean=(0.485,0.456,0.406), std=(0.229,0.224,0.225)),
+])
+val_tf = A.Compose([
+    A.Resize(IMG_H, IMG_W),
+    A.Normalize(mean=(0.485,0.456,0.406), std=(0.229,0.224,0.225)),
+])
+class SegDataset(Dataset):
+    def __init__(self, paths, tf):
+        self.paths = paths; self.tf = tf
+    def __len__(self): return len(self.paths)
+    def __getitem__(self, i):
+        ip = self.paths[i]
+        img = cv2.imread(str(ip)); img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
+        msk = cv2.imread(str(MSK_DIR / (ip.stem + ".png")), cv2.IMREAD_GRAYSCALE)
+        out = self.tf(image=img, mask=msk)
+        return (torch.from_numpy(out["image"]).permute(2,0,1).float(),
+                torch.from_numpy(out["mask"]).long())
+# ───────────── train ─────────────
+log_path = OUT_DIR / "training_log_v6.txt"
+def log(msg):
+    print(msg, flush=True)
+    with log_path.open("a", encoding="utf-8") as f:
+        f.write(msg + "\n")
+def compute_iou_global(cm):
+    n = cm.shape[0]; ious = np.zeros(n)
+    for c in range(n):
+        tp = cm[c,c]; fp = cm[:,c].sum()-tp; fn = cm[c,:].sum()-tp
+        ious[c] = tp/(tp+fp+fn) if (tp+fp+fn) > 0 else float("nan")
+    return ious
+def main():
+    log_path.write_text("")
+    train_ds = SegDataset(train_imgs, train_tf)
+    val_ds = SegDataset(val_imgs, val_tf)
+    train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True,
+                              num_workers=NUM_WORKERS, pin_memory=True, drop_last=True,
+                              persistent_workers=True)
+    val_loader = DataLoader(val_ds, batch_size=BATCH_SIZE, shuffle=False,
+                            num_workers=NUM_WORKERS, pin_memory=True,
+                            persistent_workers=True)
+    log(f"=== V6 / RTX 5090 / NO LEAK ===")
+    log(f"input: {IMG_W}x{IMG_H}  batch: {BATCH_SIZE}x{GRAD_ACCUM}  LR: {LR}")
+    log(f"split: TEMPORAL  train={len(train_imgs)} val={len(val_imgs)}  no overlap")
+    log(f"loading {MODEL_NAME} ...")
+    model = SegformerForSemanticSegmentation.from_pretrained(
+        MODEL_NAME, num_labels=NUM_CLASSES,
+        id2label={i:n for i,n in enumerate(NAMES)},
+        label2id={n:i for i,n in enumerate(NAMES)},
+        ignore_mismatched_sizes=True,
+    ).to(DEVICE)
+    log(f"  params: {sum(p.numel() for p in model.parameters())/1e6:.1f}M")
+    cw = torch.tensor(WEIGHTS, dtype=torch.float32, device=DEVICE)
+    loss_fn = nn.CrossEntropyLoss(weight=cw)
+    optim = torch.optim.AdamW(model.parameters(), lr=LR, weight_decay=WEIGHT_DECAY)
+    sched = torch.optim.lr_scheduler.CosineAnnealingLR(optim, T_max=EPOCHS*len(train_loader))
+    # BF16 doesn't need GradScaler, but we keep it for safety/compat
+    scaler = GradScaler("cuda")
+    log(f"device: {torch.cuda.get_device_name(0)}  vram: {torch.cuda.get_device_properties(0).total_memory/1e9:.1f} GB")
+    log(f"train batches: {len(train_loader)}  val batches: {len(val_loader)}")
+    best_tree_iou = -1.0
+    best_miou = -1.0
+    history = []
+    for epoch in range(1, EPOCHS+1):
+        model.train()
+        t0 = time.time()
+        epoch_loss = 0.0
+        optim.zero_grad()
+        for step,(x,y) in enumerate(train_loader):
+            x = x.to(DEVICE, non_blocking=True); y = y.to(DEVICE, non_blocking=True)
+            with autocast("cuda", dtype=torch.bfloat16):
+                out = model(pixel_values=x)
+                logits = F.interpolate(out.logits, size=y.shape[-2:], mode="bilinear", align_corners=False)
+                loss = loss_fn(logits, y) / GRAD_ACCUM
+            loss.backward()
+            if (step+1) % GRAD_ACCUM == 0:
+                optim.step(); optim.zero_grad(); sched.step()
+            epoch_loss += loss.item() * GRAD_ACCUM
+        train_loss = epoch_loss / len(train_loader)
+        model.eval()
+        cm = np.zeros((NUM_CLASSES, NUM_CLASSES), dtype=np.int64)
+        val_loss = 0.0
+        with torch.no_grad():
+            for x,y in val_loader:
+                x = x.to(DEVICE, non_blocking=True); y = y.to(DEVICE, non_blocking=True)
+                with autocast("cuda", dtype=torch.bfloat16):
+                    out = model(pixel_values=x)
+                    logits = F.interpolate(out.logits, size=y.shape[-2:], mode="bilinear", align_corners=False)
+                    val_loss += loss_fn(logits, y).item()
+                preds = logits.argmax(1).cpu().numpy()
+                ys = y.cpu().numpy()
+                for tc in range(NUM_CLASSES):
+                    mt = (ys == tc)
+                    if not mt.any(): continue
+                    for pc in range(NUM_CLASSES):
+                        cm[tc, pc] += int(((preds == pc) & mt).sum())
+        val_loss /= max(1, len(val_loader))
+        per_iou = compute_iou_global(cm)
+        miou_7 = float(np.nanmean(per_iou[:7]))
+        miou_8 = float(np.nanmean(per_iou))
+        tree_iou = float(per_iou[0])
+        pix_acc = float(np.diag(cm).sum() / cm.sum())
+        elapsed = time.time() - t0
+        log(f"epoch {epoch:02d}/{EPOCHS}  tloss={train_loss:.4f}  vloss={val_loss:.4f}  "
+            f"pix_acc={pix_acc:.3f}  mIoU(7)={miou_7:.3f}  tree={tree_iou:.3f}  ({elapsed:.0f}s)")
+        log("  per-class IoU: " + ", ".join(f"{n}={v:.3f}" for n,v in zip(NAMES, per_iou)))
+        history.append({
+            "epoch": epoch, "train_loss": float(train_loss), "val_loss": float(val_loss),
+            "pixel_accuracy": pix_acc, "mIoU_7": miou_7, "mIoU_8": miou_8, "tree_iou": tree_iou,
+            "per_class_iou": {n: float(v) for n, v in zip(NAMES, per_iou)},
+        })
+        torch.save({"model": model.state_dict(), "epoch": epoch, "miou_7": miou_7, "tree_iou": tree_iou},
+                   OUT_DIR / "v6_last.pt")
+        if tree_iou > best_tree_iou:
+            best_tree_iou = tree_iou
+            torch.save({"model": model.state_dict(), "epoch": epoch, "miou_7": miou_7, "tree_iou": tree_iou},
+                       OUT_DIR / "v6_best_tree.pt")
+            log(f"  saved v6_best_tree.pt (tree IoU {tree_iou:.3f})")
+        if miou_7 > best_miou:
+            best_miou = miou_7
+            torch.save({"model": model.state_dict(), "epoch": epoch, "miou_7": miou_7, "tree_iou": tree_iou},
+                       OUT_DIR / "v6_best_miou.pt")
+        (OUT_DIR / "history_v6.json").write_text(json.dumps(history, indent=2))
+    log(f"\n=== DONE ===")
+    log(f"best tree IoU (NO LEAK): {best_tree_iou:.3f}")
+    log(f"best mIoU(7) (NO LEAK):  {best_miou:.3f}")
+if __name__ == "__main__":
+    main()

v6_OOD_full_res.mp4 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ec19068d5aaddf3d1afcf18b7f35a355bbe3ef3bfab59a18080253210373442f
+size 246499502