Instructions to use HYdsl/FiLM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use HYdsl/FiLM with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="HYdsl/FiLM")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("HYdsl/FiLM") model = AutoModelForMaskedLM.from_pretrained("HYdsl/FiLM") - Notebooks
- Google Colab
- Kaggle
File size: 1,939 Bytes
ed1eead a8498e2 bba3474 a0c6e7c 251576b 2dbbdf4 bba3474 251576b bba3474 cf2a5d1 bba3474 6aa139e 7413d08 bba3474 4521bcd bba3474 68ce11a bba3474 7865650 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | ---
license: mit
language:
- en
---
Update README.md
## Exploring the Impact of Corpus Diversity on Financial Pretrained Language Models
(EMNLP 2023 findings)
Paper: https://aclanthology.org/2023.findings-emnlp.138/
Github: https://github.com/deep-over/FiLM
### **FiLM**(**Fi**nancial **L**anguage **M**odel) Models ๐
FiLM is a Pre-trained Language Model (PLM) optimized for the Financial domain, built upon a diverse range of Financial domain corpora. Initialized with the RoBERTa-base model, FiLM undergoes further training to achieve performance that surpasses RoBERTa-base in financial domain for the first time.
To train FiLM, we have categorized our Financial Corpus into specific groups and gathered a diverse range of corpora to ensure optimal performance.
Our model can be called Fin-RoBERTa (Financial RoBERTa).
We offer two versions of the FiLM model, each tailored for specific use-cases in the Financial domain:
**FiLM (2.4B): Our Base Model**
This is our foundational model, trained on the entire range of corpora as outlined in the above Corpus table. Ideal for a wide array of financial applications. ๐
[**FiLM (5.5B): Optimized for SEC Filings**](https://huggingface.co/HYdsl/FiLM-SEC)
This model is specialized for handling SEC filings. We expanded the training set by adding 3.1 billion tokens from the SEC filings corpus dataset. The dataset is sourced from EDGAR-CORPUS: Billions of Tokens Make The World Go Round (Loukas et al., ECONLP 2021) and can be downloaded from Zenodo. ๐
The method to load a tokenizer and a model.
For the FiLM model, you can call 'roberta-base' from the tokenizer.
```python
tokenizer = AutoTokenizer.from_pretrained('roberta-base')
model = AutoModel.from_pretrained('HYdsl/FiLM')
```
**Types of Training Corpora ๐**

#Finance #Financial #RoBERTa |