jd445
/

AnnualBERTs

Model card Files Files and versions

AnnualBERTs / README.md

jd445's picture

Update README.md

5caae66 verified almost 2 years ago

|

history blame contribute delete

902 Bytes

	---
	language:
	- en
	---
	## Model Description
	arXivBERT is a series of models trained on a time-based unit. If you are looking for the best performance on scientific corpora, please use the model from 2020 directly.

	## Why ?arXivBERT
	1. Specialized in Scientific Content: Trained on a large dataset of arXiv papers, ensuring high familiarity with scientific terminology and concepts.
	2. Versatile in Applications: Suitable for a range of NLP tasks, including but not limited to text classification, keyword extraction, summarization of scientific papers, and citation prediction.
	3. Evolutionary Insights: Continuous pre-training captures the long-term relationships and changes within the corpus.

	## How to Use?

	```
	from transformers import AutoTokenizer, AutoModel

	tokenizer = AutoTokenizer.from_pretrained("folderPath/year")
	model = AutoModel.from_pretrained("folderPath/wholewordtokenizer")


	```