What is the miou of this model? Is it better, worse, or identical to the official convnext+upernet weights?
Sorry if this is listed somewhere else; I looked and could not find it.
The official convnext repo lists 46.0 miou for this configuration (convnext-tiny + upernet)
However, this smp version appears to use a different encoder (convnext_tiny.in12k_ft_in1k from timm) compared to the official one. Presumably, it is also using different weights for the upernet decoder as well.
When I tried evaluating the miou at 512*512 resolution, I got 44.5 for the smp-hub version. Is this the expected value for this model?
I've trained my own decoder (43.3 miou, segformer instead of upernet) that is much faster (~100 megapixels per second on my cpu instead of 33). I can share more details of my training configuration if interested.
I would be interested to know what the training configuration for this upernet decoder is if you're able to share it.
The miou of this version goes up to 45.96 if I use sliding window inference
- Resize short edge to 512
- Apply model the model to overlapping 512*512 pixel crops with a stride of 341
- Aggregate predictions for overlapping regions
So, I think these weights match the miou of the paper.