FinRoBERTa / README.md
DerivedFunction's picture
Update README.md
cd1c628 verified
---
license: apache-2.0
datasets:
- DerivedFunction/sec-filings-snippets-10K
language:
- en
base_model:
- FacebookAI/roberta-base
---
### 📘 Model Description
**FinRoBERTa** is a domain‑adapted variant of **RoBERTa‑base**, trained using **Domain‑Adaptive Pretraining (DAPT)** on the **DerivedFunction/sec-filings-snippets-10K** dataset. This dataset consists of curated excerpts from SEC 10‑K filings, enabling the model to better capture the specialized vocabulary, syntax, and discourse patterns of financial regulatory documents.
Key characteristics:
- **Base model**: RoBERTa‑base (general‑purpose pretrained transformer)
- **Adaptation method**: Domain‑Adaptive Pretraining (DAPT)
- **Domain corpus**: SEC 10‑K filings (snippets)
- **Language**: English
- **License**: Apache 2.0
### 🔍 Intended Use
- As a **foundation for downstream tasks** in financial NLP (e.g., classification, extraction, summarization)
- Research into domain adaptation techniques and their impact on language model performance
- Benchmarking DAPT workflows for financial/legal text corpora
### ⚖️ Limitations
- Not fine‑tuned for specific tasks (classification, QA, summarization) — requires further adaptation for task‑level performance
- Inherits biases from both the RoBERTa base corpus and SEC filings
- Not suitable for predictive financial advice or trading decisions