Fine-tuning NTv3_650M_post on Tomato Benchmark Fails Due to Unsupported Species Token (MASK token id=2, but requires >=6)

#2
by liujh0223 - opened

I’m fine-tuning the InstaDeepAI/NTv3_650M_post model using the notebook 03_fine_tuning_posttrained_model_biwig.ipynb. The fine-tuning dataset is tomato from the benchmark.

When loading the species, I get a warning that tomato is not in the list of supported species, so the code falls back to using the MASK species token (shown as tensor([2])) as a substitute.

However, when training starts, it fails with an error saying that the species token id must be >= 6.

Has anyone run into this, and what is the correct way to fine-tune on tomato (unsupported species) with NTv3?

I added a new tomato head and initialized it with the Arabidopsis head weights. This indirectly makes the model “support” tomato, so it no longer falls back to the MASK species token, and training can proceed.

Is this a reasonable and valid workaround?

Sign up or log in to comment