Eghbal commited on
Commit
4cb9ea9
Β·
verified Β·
1 Parent(s): d2daae3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -5
README.md CHANGED
@@ -7,9 +7,28 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- FinText is a repository of financial NLP models and benchmarks.
 
11
 
12
- <div style="text-align: justify">
13
- <strong>Stage 1 release:</strong>
14
- We release a suite of specialised LLMs designed specifically for the accounting and finance. By being pre-trained on high-quality, domain-specific historical data, FinText has aimed to mitigate critical issues such as look-ahead bias and information leakage, which have frequently undermined the performance of general LLMs in finance-related studies. A diverse range of textual datasets has been utilised, including news articles, regulatory filings, IP records, key corporate information, speeches from the ECB and the FED, transcripts of corporate events, board member information, and Wikipedia for general knowledge, covering the period from 2007 to 2023. Notably, a separate model has been pre-trained for each year within this timeframe. The suite is based on the RoBERTa architecture and includes a base model with approximately 125 million parameters, alongside a smaller variant comprising 51 million parameters, resulting in a total of 34 pre-trained LLMs.
15
- </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pinned: false
8
  ---
9
 
10
+ # README πŸ’»
11
+ #### A repository of Financial NLP Models and Benchmarks
12
 
13
+ <div style="background: linear-gradient(to right, red, blue); padding: 10px; border-radius: 10px;">
14
+
15
+ ### πŸš€ **Stage 1 Release** πŸŽ‰
16
+
17
+ We are excited to announce the release of a specialised suite of **LLMs** designed specifically for accounting and finance. FinText models have been **pre-trained** on domain-specific historical data to overcome common issues such as **look-ahead bias** and **information leakage**. These models are tailored to enhance the performance of financial studies and analyses.
18
+
19
+ πŸ’‘ **Key Features:**
20
+ - **Domain-Specific Training:** FinText utilises diverse financial datasets such as news articles, regulatory filings, IP records, corporate speeches (ECB, FED), and more.
21
+ - **Time-Period Specific Models:** Separate models are pre-trained for each year from **2007 to 2023**, ensuring the utmost precision and historical relevance.
22
+ - **RoBERTa Architecture:** The suite includes both a **base model** with **125 million parameters** and a **smaller variant** with **51 million parameters**β€”totalling 34 pre-trained models. 🎯
23
+
24
+ πŸ—‚ **Datasets Used:**
25
+ - News articles πŸ“„
26
+ - Regulatory filings πŸ›οΈ
27
+ - IP records πŸ§‘β€πŸ’Ό
28
+ - ECB & FED speeches πŸ—£οΈ
29
+ - Corporate event transcripts πŸ“Š
30
+ - Wikipedia 🧠 (for general knowledge)
31
+
32
+ Stay tuned for further updates and additions to FinText, as we continue refining and expanding our offerings for the financial and academic communities! πŸ“ˆβœ¨
33
+
34
+ </div>