Instructions to use HYdsl/FiLM-SEC with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use HYdsl/FiLM-SEC with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="HYdsl/FiLM-SEC")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("HYdsl/FiLM-SEC") model = AutoModelForMaskedLM.from_pretrained("HYdsl/FiLM-SEC") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,33 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
---
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
Update README.md
|
| 9 |
+
## Exploring the Data Efficiency of Cross-Lingual Post-Training in Pretrained Language Models
|
| 10 |
+
(EMNLP 2023 findings)
|
| 11 |
+
|
| 12 |
+
Paper: https://arxiv.org/abs/2310.13312
|
| 13 |
+
|
| 14 |
+
Github: https://github.com/deep-over/FiLM
|
| 15 |
+
|
| 16 |
+
### **FiLM**(**Fi**nancial **L**anguage **M**odel) Models ๐
|
| 17 |
+
|
| 18 |
+
FiLM is a Pre-trained Language Model (PLM) optimized for the Financial domain, built upon a diverse range of Financial domain corpora. Initialized with the RoBERTa-base model, FiLM undergoes further training to achieve performance that surpasses RoBERTa-base in financial domain for the first time.
|
| 19 |
+
|
| 20 |
+
To train FiLM, we have categorized our Financial Corpus into specific groups and gathered a diverse range of corpora to ensure optimal performance.
|
| 21 |
+
|
| 22 |
+
We offer two versions of the FiLM model, each tailored for specific use-cases in the Financial domain:
|
| 23 |
+
|
| 24 |
+
[**FiLM (2.4B): Our Base Model**](https://huggingface.co/HYdsl/FiLM)
|
| 25 |
+
|
| 26 |
+
This is our foundational model, trained on the entire range of corpora as outlined in the above Corpus table. Ideal for a wide array of financial applications. ๐
|
| 27 |
+
|
| 28 |
+
**FiLM (5.5B): Optimized for SEC Filings**
|
| 29 |
+
|
| 30 |
+
This model is specialized for handling SEC filings. We expanded the training set by adding 3.1 billion tokens from the SEC filings corpus dataset. The dataset is sourced from EDGAR-CORPUS: Billions of Tokens Make The World Go Round (Loukas et al., ECONLP 2021) and can be downloaded from Zenodo. ๐
|
| 31 |
+
|
| 32 |
+
**Types of Training Corpora ๐**
|
| 33 |
+

|