Eghbal commited on
Commit
d2daae3
·
verified ·
1 Parent(s): 827c06b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -10,6 +10,6 @@ pinned: false
10
  FinText is a repository of financial NLP models and benchmarks.
11
 
12
  <div style="text-align: justify">
13
- <strong>Stage 1 release:<strong>
14
  We release a suite of specialised LLMs designed specifically for the accounting and finance. By being pre-trained on high-quality, domain-specific historical data, FinText has aimed to mitigate critical issues such as look-ahead bias and information leakage, which have frequently undermined the performance of general LLMs in finance-related studies. A diverse range of textual datasets has been utilised, including news articles, regulatory filings, IP records, key corporate information, speeches from the ECB and the FED, transcripts of corporate events, board member information, and Wikipedia for general knowledge, covering the period from 2007 to 2023. Notably, a separate model has been pre-trained for each year within this timeframe. The suite is based on the RoBERTa architecture and includes a base model with approximately 125 million parameters, alongside a smaller variant comprising 51 million parameters, resulting in a total of 34 pre-trained LLMs.
15
  </div>
 
10
  FinText is a repository of financial NLP models and benchmarks.
11
 
12
  <div style="text-align: justify">
13
+ <strong>Stage 1 release:</strong>
14
  We release a suite of specialised LLMs designed specifically for the accounting and finance. By being pre-trained on high-quality, domain-specific historical data, FinText has aimed to mitigate critical issues such as look-ahead bias and information leakage, which have frequently undermined the performance of general LLMs in finance-related studies. A diverse range of textual datasets has been utilised, including news articles, regulatory filings, IP records, key corporate information, speeches from the ECB and the FED, transcripts of corporate events, board member information, and Wikipedia for general knowledge, covering the period from 2007 to 2023. Notably, a separate model has been pre-trained for each year within this timeframe. The suite is based on the RoBERTa architecture and includes a base model with approximately 125 million parameters, alongside a smaller variant comprising 51 million parameters, resulting in a total of 34 pre-trained LLMs.
15
  </div>