nasa-impact
/

bert-e-base-mlm

Model card Files Files and versions

bert-e-base-mlm / README.md

Muthukumaran's picture

Update README.md

f8e5108 about 3 years ago

|

history blame contribute delete

425 Bytes

	This model is further trained on top of scibert-base using masked language modeling loss (MLM). The corpus is roughly abstracts from 270,000 earth science-based publications.

	The tokenizer used is AutoTokenizer, which is trained on the same corpus.

	Stay tuned for further downstream task tests and updates to the model.

	in the works
	- MLM + NSP task loss
	- Add more data sources for training
	- Test using downstream tasks