|
|
--- |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
## Model Description |
|
|
arXivBERT is a series of models trained on a time-based unit. If you are looking for the best performance on scientific corpora, please use the model from 2020 directly. |
|
|
|
|
|
## Why ?arXivBERT |
|
|
1. Specialized in Scientific Content: Trained on a large dataset of arXiv papers, ensuring high familiarity with scientific terminology and concepts. |
|
|
2. Versatile in Applications: Suitable for a range of NLP tasks, including but not limited to text classification, keyword extraction, summarization of scientific papers, and citation prediction. |
|
|
3. Evolutionary Insights: Continuous pre-training captures the long-term relationships and changes within the corpus. |
|
|
|
|
|
## How to Use? |
|
|
|
|
|
``` |
|
|
from transformers import AutoTokenizer, AutoModel |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("folderPath/year") |
|
|
model = AutoModel.from_pretrained("folderPath/wholewordtokenizer") |
|
|
|
|
|
|
|
|
``` |