TwinLiteNet8 — Real-time orchard segmentation for edge devices

A 0.44 M-parameter semantic-segmentation model adapted from TwinLiteNet for 7-class apple orchard scenes, designed to run >30 FPS on Jetson-class hardware for robotic navigation.

Drop-in lightweight alternative to WEN0256/Segformer85Mv1 for low-compute deployments.

Why "7-class" but 8 logit channels?

The model is trained to recognize 7 real classes (tree, ground, person, sky, road, mountain, building). The 8th label background is NOT treated as a real class — pixels that fall outside any labeled object are simply masked out of the loss (ignore_index=255). The 8th logit channel exists only to keep the architecture identical to the original TwinLiteNet shape; it is never trained and is forced to -inf before argmax at inference, so the model never outputs background.

This matches what you usually want from a robot's perception stack: "tell me what you DO recognize", not "tell me you don't know".

Performance (no data leakage, temporal split val, fair apples-to-apples)

Metric	TwinLiteNet8	Segformer-b5 (85 M)	Δ vs Segformer
Tree IoU	0.872	0.742	+13 pp ⭐
Ground IoU	0.916	0.851	+6.5 pp
Person IoU	0.441	0.72	-28 pp
Sky IoU	0.835	0.77	+6 pp
Road IoU	0.745	0.80	-5 pp
Mountain IoU	0.592	0.44	+15 pp
Building IoU	0.555	0.71	-16 pp
mIoU (7 classes)	0.708	0.714	-0.6 pp
Model size	1.8 MB	339 MB	188× smaller
Params	0.437 M	85 M	194× fewer

(Segformer numbers come from WEN0256/Segformer85Mv1. Both models tested on the same 155-frame temporal-split val from the original orchard recording, with the same "background pixels excluded" protocol so the IoUs are directly comparable.)

Headline: TwinLiteNet8 matches Segformer-b5 in overall mIoU (0.708 vs 0.714, within noise) and beats it on the two classes that matter most for orchard navigation (tree, ground), while being ~200× smaller and ~10× faster on edge devices. The trade-off is on rare classes (person, building) where the small model's limited capacity shows.

FPS (640×360 input, batch 1)

Device	TwinLiteNet8	Segformer-b5	Speedup
RTX 3080 (PyTorch fp32)	137 FPS	~50	2.7×
RTX 5090 (PyTorch fp32)	~500 FPS	~150	3.3×
Jetson Orin Nano (TRT FP16, est)	~34–46 FPS ⭐	~2–5	~10×
Jetson Orin NX (TRT FP16, est)	~60–80 FPS	~20	~3×

Target was 10–20 FPS on Orin Nano — TwinLiteNet8 doubles that.

Files

File	Purpose
`twinlite8_best.pt`	PyTorch checkpoint (1.8 MB), epoch 29, best tree IoU 0.872
`twinlite8.onnx`	ONNX export (1.8 MB), 100% argmax parity verified
`predict.py`	PyTorch inference (matches Segformer's API)
`predict_onnx.py`	ONNX-Runtime inference (CPU/CUDA/TensorRT auto-pick)
`export_onnx.py`	Re-export ONNX from any checkpoint
`train_8class.py`	Full training script (60 epochs, ~70 min on RTX 3080)
`model/`	TwinLiteNet8 architecture (single-branch 8-output head, channel 7 = unused)
`JETSON_DEPLOY.md`	Step-by-step Jetson deployment + FPS table
`samples_20/`	20 OOD inference samples (original ‖ prediction overlay)
`demo_twinlite_12s.mp4`	12-s demo video (360 frames @ 30 FPS, original ‖ overlay)
`samples/`	6 in-domain validation samples
`training_log.txt` + `history.json`	Per-epoch metrics

Quick Use (PyTorch)

import sys, cv2, torch
sys.path.insert(0, "<this_dir>")
from predict import load_model, predict, overlay

model = load_model("twinlite8_best.pt", device="cuda")
img = cv2.imread("orchard.jpg")
mask = predict(model, img)            # H×W uint8, values 0..6 (never 7)
viz = overlay(img, mask)
cv2.imwrite("out.jpg", viz)

Quick Use (ONNX, no PyTorch)

import onnxruntime as ort, cv2, numpy as np
sess = ort.InferenceSession("twinlite8.onnx", providers=["CUDAExecutionProvider"])
img = cv2.imread("orchard.jpg")
inp = cv2.resize(img, (640, 360))
rgb = cv2.cvtColor(inp, cv2.COLOR_BGR2RGB).astype(np.float32) / 255.0
x = rgb.transpose(2, 0, 1)[None]
logits = sess.run(None, {"input": x})[0]
logits[:, 7, :, :] = -1e9              # mask the unused background channel
mask = logits.argmax(1)[0]              # 360×640 uint8, values 0..6

Classes (id → name)

ID	Class	Color (BGR)
0	tree (priority)	green
1	ground	brown
2	person	red
3	sky	cyan
4	road	gray
5	mountain	purple
6	building	yellow
7	(unused — never output)	—

Architecture

Single-branch 8-output adaptation of TwinLiteNet:

Encoder: ESPNet (ESPNet_Encoder, p = 2 q = 3)
Decoder: 3 × UPx2 upsampling blocks
Head: 8-channel softmax (7 real classes; channel 7 untrained, masked at inference)
Input: 640×360 BGR → ImageNet-style normalize
Output: (B, 8, H, W) logits

The original TwinLiteNet has two parallel decoder heads for two binary tasks (drivable area + lane lines). For multi-class semantic seg matching the Segformer setup, we kept one decoder branch and changed its final UPx2 to output 8 channels. Final param count: 0.437 M.

Training Recipe

Hyperparameter	Value
Optimizer	AdamW, weight_decay 1e-4
LR	5e-4, cosine schedule
Epochs	60
Batch	16
Resolution	640×360
Loss	weighted cross-entropy with `ignore_index=255`
Class weights	tree 1.5, ground 0.5, person 1.5, sky 1.0, road 1.0, mountain 1.0, building 1.0, background 0.0
Background handling	mask pixels remapped 7 → 255 so they never contribute to loss
Augmentation	hflip + HSV jitter
Hardware	RTX 3080, ~70 minutes total

Dataset

Same dataset as WEN0256/Segformer85Mv1 v2:

~5300 frames from oak_0415_oneRadar_1 (spring 2024 Korean apple orchard, single OAK-D camera)
311 frames from "Orchard Navigation" (Sep autumn capture + Aug Windows-webcam capture)
Pseudo-mask labels generated by Segformer v1 to fill SAM-annotated gaps
Temporal split: frames ≤ 4500 → train, frames > 4500 → val (155 frames). No neighbor leakage.

Limitations (same as parent Segformer model)

Trained on a single Korean apple orchard, spring + partial autumn
❌ Different orchards (different tree species/layouts) — likely degraded
❌ Winter (no leaves), night, rain — no training data
❌ Aerial/drone perspectives — robot-eye view only
For a new deployment, plan to fine-tune on 100–300 in-domain frames (~13 min on a single GPU)

Deployment to Jetson

See JETSON_DEPLOY.md for the full pipeline:

Export to ONNX (this repo already has twinlite8.onnx)
On Jetson: trtexec --onnx=twinlite8.onnx --saveEngine=...engine --fp16
Run via predict_onnx.py --provider TensorrtExecutionProvider or load the .engine via TRT API

License

Apache 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track