HYdsl
/

FiLM-SEC

 ---
 license: mit
+language:
+- en
 ---
+---
+Update README.md
+## Exploring the Data Efficiency of Cross-Lingual Post-Training in Pretrained Language Models
+(EMNLP 2023 findings)
+Paper: https://arxiv.org/abs/2310.13312
+Github: https://github.com/deep-over/FiLM
+### **FiLM**(**Fi**nancial **L**anguage **M**odel) Models 🌟
+FiLM is a Pre-trained Language Model (PLM) optimized for the Financial domain, built upon a diverse range of Financial domain corpora. Initialized with the RoBERTa-base model, FiLM undergoes further training to achieve performance that surpasses RoBERTa-base in financial domain for the first time.
+To train FiLM, we have categorized our Financial Corpus into specific groups and gathered a diverse range of corpora to ensure optimal performance.
+We offer two versions of the FiLM model, each tailored for specific use-cases in the Financial domain:
+[**FiLM (2.4B): Our Base Model**](https://huggingface.co/HYdsl/FiLM)
+This is our foundational model, trained on the entire range of corpora as outlined in the above Corpus table. Ideal for a wide array of financial applications. 📊
+**FiLM (5.5B): Optimized for SEC Filings**
+This model is specialized for handling SEC filings. We expanded the training set by adding 3.1 billion tokens from the SEC filings corpus dataset. The dataset is sourced from EDGAR-CORPUS: Billions of Tokens Make The World Go Round (Loukas et al., ECONLP 2021) and can be downloaded from Zenodo. 📑
+**Types of Training Corpora 📚**
+![image.png](https://cdn-uploads.huggingface.co/production/uploads/65254614785092cd47b1110b/-cT_wOabHugsct1mogOpa.png)