TwinLiteNet8 β€” Real-time orchard segmentation for edge devices

A 0.44 M-parameter semantic-segmentation model adapted from TwinLiteNet for 7-class apple orchard scenes, designed to run >30 FPS on Jetson-class hardware for robotic navigation.

Drop-in lightweight alternative to WEN0256/Segformer85Mv1 for low-compute deployments.

Why "7-class" but 8 logit channels?

The model is trained to recognize 7 real classes (tree, ground, person, sky, road, mountain, building). The 8th label background is NOT treated as a real class β€” pixels that fall outside any labeled object are simply masked out of the loss (ignore_index=255). The 8th logit channel exists only to keep the architecture identical to the original TwinLiteNet shape; it is never trained and is forced to -inf before argmax at inference, so the model never outputs background.

This matches what you usually want from a robot's perception stack: "tell me what you DO recognize", not "tell me you don't know".

Performance (no data leakage, temporal split val, fair apples-to-apples)

Metric TwinLiteNet8 Segformer-b5 (85 M) Ξ” vs Segformer
Tree IoU 0.872 0.742 +13 pp ⭐
Ground IoU 0.916 0.851 +6.5 pp
Person IoU 0.441 0.72 -28 pp
Sky IoU 0.835 0.77 +6 pp
Road IoU 0.745 0.80 -5 pp
Mountain IoU 0.592 0.44 +15 pp
Building IoU 0.555 0.71 -16 pp
mIoU (7 classes) 0.708 0.714 -0.6 pp
Model size 1.8 MB 339 MB 188Γ— smaller
Params 0.437 M 85 M 194Γ— fewer

(Segformer numbers come from WEN0256/Segformer85Mv1. Both models tested on the same 155-frame temporal-split val from the original orchard recording, with the same "background pixels excluded" protocol so the IoUs are directly comparable.)

Headline: TwinLiteNet8 matches Segformer-b5 in overall mIoU (0.708 vs 0.714, within noise) and beats it on the two classes that matter most for orchard navigation (tree, ground), while being ~200Γ— smaller and ~10Γ— faster on edge devices. The trade-off is on rare classes (person, building) where the small model's limited capacity shows.

FPS (640Γ—360 input, batch 1)

Device TwinLiteNet8 Segformer-b5 Speedup
RTX 3080 (PyTorch fp32) 137 FPS ~50 2.7Γ—
RTX 5090 (PyTorch fp32) ~500 FPS ~150 3.3Γ—
Jetson Orin Nano (TRT FP16, est) ~34–46 FPS ⭐ ~2–5 ~10Γ—
Jetson Orin NX (TRT FP16, est) ~60–80 FPS ~20 ~3Γ—

Target was 10–20 FPS on Orin Nano β€” TwinLiteNet8 doubles that.

Files

File Purpose
twinlite8_best.pt PyTorch checkpoint (1.8 MB), epoch 29, best tree IoU 0.872
twinlite8.onnx ONNX export (1.8 MB), 100% argmax parity verified
predict.py PyTorch inference (matches Segformer's API)
predict_onnx.py ONNX-Runtime inference (CPU/CUDA/TensorRT auto-pick)
export_onnx.py Re-export ONNX from any checkpoint
train_8class.py Full training script (60 epochs, ~70 min on RTX 3080)
model/ TwinLiteNet8 architecture (single-branch 8-output head, channel 7 = unused)
JETSON_DEPLOY.md Step-by-step Jetson deployment + FPS table
samples_20/ 20 OOD inference samples (original β€– prediction overlay)
demo_twinlite_12s.mp4 12-s demo video (360 frames @ 30 FPS, original β€– overlay)
samples/ 6 in-domain validation samples
training_log.txt + history.json Per-epoch metrics

Quick Use (PyTorch)

import sys, cv2, torch
sys.path.insert(0, "<this_dir>")
from predict import load_model, predict, overlay

model = load_model("twinlite8_best.pt", device="cuda")
img = cv2.imread("orchard.jpg")
mask = predict(model, img)            # HΓ—W uint8, values 0..6 (never 7)
viz = overlay(img, mask)
cv2.imwrite("out.jpg", viz)

Quick Use (ONNX, no PyTorch)

import onnxruntime as ort, cv2, numpy as np
sess = ort.InferenceSession("twinlite8.onnx", providers=["CUDAExecutionProvider"])
img = cv2.imread("orchard.jpg")
inp = cv2.resize(img, (640, 360))
rgb = cv2.cvtColor(inp, cv2.COLOR_BGR2RGB).astype(np.float32) / 255.0
x = rgb.transpose(2, 0, 1)[None]
logits = sess.run(None, {"input": x})[0]
logits[:, 7, :, :] = -1e9              # mask the unused background channel
mask = logits.argmax(1)[0]              # 360Γ—640 uint8, values 0..6

Classes (id β†’ name)

ID Class Color (BGR)
0 tree (priority) green
1 ground brown
2 person red
3 sky cyan
4 road gray
5 mountain purple
6 building yellow
7 (unused β€” never output) β€”

Architecture

Single-branch 8-output adaptation of TwinLiteNet:

  • Encoder: ESPNet (ESPNet_Encoder, p = 2 q = 3)
  • Decoder: 3 Γ— UPx2 upsampling blocks
  • Head: 8-channel softmax (7 real classes; channel 7 untrained, masked at inference)
  • Input: 640Γ—360 BGR β†’ ImageNet-style normalize
  • Output: (B, 8, H, W) logits

The original TwinLiteNet has two parallel decoder heads for two binary tasks (drivable area + lane lines). For multi-class semantic seg matching the Segformer setup, we kept one decoder branch and changed its final UPx2 to output 8 channels. Final param count: 0.437 M.

Training Recipe

Hyperparameter Value
Optimizer AdamW, weight_decay 1e-4
LR 5e-4, cosine schedule
Epochs 60
Batch 16
Resolution 640Γ—360
Loss weighted cross-entropy with ignore_index=255
Class weights tree 1.5, ground 0.5, person 1.5, sky 1.0, road 1.0, mountain 1.0, building 1.0, background 0.0
Background handling mask pixels remapped 7 β†’ 255 so they never contribute to loss
Augmentation hflip + HSV jitter
Hardware RTX 3080, ~70 minutes total

Dataset

Same dataset as WEN0256/Segformer85Mv1 v2:

  • ~5300 frames from oak_0415_oneRadar_1 (spring 2024 Korean apple orchard, single OAK-D camera)
  • 311 frames from "Orchard Navigation" (Sep autumn capture + Aug Windows-webcam capture)
  • Pseudo-mask labels generated by Segformer v1 to fill SAM-annotated gaps
  • Temporal split: frames ≀ 4500 β†’ train, frames > 4500 β†’ val (155 frames). No neighbor leakage.

Limitations (same as parent Segformer model)

  • Trained on a single Korean apple orchard, spring + partial autumn
  • ❌ Different orchards (different tree species/layouts) β€” likely degraded
  • ❌ Winter (no leaves), night, rain β€” no training data
  • ❌ Aerial/drone perspectives β€” robot-eye view only
  • For a new deployment, plan to fine-tune on 100–300 in-domain frames (~13 min on a single GPU)

Deployment to Jetson

See JETSON_DEPLOY.md for the full pipeline:

  1. Export to ONNX (this repo already has twinlite8.onnx)
  2. On Jetson: trtexec --onnx=twinlite8.onnx --saveEngine=...engine --fp16
  3. Run via predict_onnx.py --provider TensorrtExecutionProvider or load the .engine via TRT API

License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support