jd445 commited on
Commit
5caae66
·
verified ·
1 Parent(s): c73ee0e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -5
README.md CHANGED
@@ -3,21 +3,20 @@ language:
3
  - en
4
  ---
5
  ## Model Description
6
- arXivBERT is a cutting-edge language model specifically trained on a comprehensive corpus of scientific papers from the arXiv database, spanning from 2008 to 2020. This model leverages the robust architecture of RoBERTa and is fine-tuned to grasp the intricacies and nuances of academic language, making it an ideal tool for NLP tasks within the scientific domain.
7
-
8
 
9
  ## Why ?arXivBERT
10
  1. Specialized in Scientific Content: Trained on a large dataset of arXiv papers, ensuring high familiarity with scientific terminology and concepts.
11
  2. Versatile in Applications: Suitable for a range of NLP tasks, including but not limited to text classification, keyword extraction, summarization of scientific papers, and citation prediction.
12
- 3. Evolutionary Insights: Offers unique insights into the evolution of scientific discourse and trends over a significant period (2008-2020).
13
 
14
  ## How to Use?
15
 
16
  ```
17
  from transformers import AutoTokenizer, AutoModel
18
 
19
- tokenizer = AutoTokenizer.from_pretrained("folderPath")
20
- model = AutoModel.from_pretrained("folderPath")
21
 
22
 
23
  ```
 
3
  - en
4
  ---
5
  ## Model Description
6
+ arXivBERT is a series of models trained on a time-based unit. If you are looking for the best performance on scientific corpora, please use the model from 2020 directly.
 
7
 
8
  ## Why ?arXivBERT
9
  1. Specialized in Scientific Content: Trained on a large dataset of arXiv papers, ensuring high familiarity with scientific terminology and concepts.
10
  2. Versatile in Applications: Suitable for a range of NLP tasks, including but not limited to text classification, keyword extraction, summarization of scientific papers, and citation prediction.
11
+ 3. Evolutionary Insights: Continuous pre-training captures the long-term relationships and changes within the corpus.
12
 
13
  ## How to Use?
14
 
15
  ```
16
  from transformers import AutoTokenizer, AutoModel
17
 
18
+ tokenizer = AutoTokenizer.from_pretrained("folderPath/year")
19
+ model = AutoModel.from_pretrained("folderPath/wholewordtokenizer")
20
 
21
 
22
  ```