What is the miou of this model? Is it better, worse, or identical to the official convnext+upernet weights?

#1
by danjacobellis - opened

Sorry if this is listed somewhere else; I looked and could not find it.

The official convnext repo lists 46.0 miou for this configuration (convnext-tiny + upernet)

However, this smp version appears to use a different encoder (convnext_tiny.in12k_ft_in1k from timm) compared to the official one. Presumably, it is also using different weights for the upernet decoder as well.

When I tried evaluating the miou at 512*512 resolution, I got 44.5 for the smp-hub version. Is this the expected value for this model?

I've trained my own decoder (43.3 miou, segformer instead of upernet) that is much faster (~100 megapixels per second on my cpu instead of 33). I can share more details of my training configuration if interested.

I would be interested to know what the training configuration for this upernet decoder is if you're able to share it.

The miou of this version goes up to 45.96 if I use sliding window inference

  • Resize short edge to 512
  • Apply model the model to overlapping 512*512 pixel crops with a stride of 341
  • Aggregate predictions for overlapping regions

So, I think these weights match the miou of the paper.

danjacobellis changed discussion status to closed

Sign up or log in to comment