Update README.md
Browse files
README.md
CHANGED
|
@@ -6,4 +6,26 @@ language:
|
|
| 6 |
- en
|
| 7 |
base_model:
|
| 8 |
- FacebookAI/roberta-base
|
| 9 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
- en
|
| 7 |
base_model:
|
| 8 |
- FacebookAI/roberta-base
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
### 📘 Model Description
|
| 12 |
+
|
| 13 |
+
**FinRoBerta** is a domain‑adapted variant of **RoBERTa‑base**, trained using **Domain‑Adaptive Pretraining (DAPT)** on the **DerivedFunction/sec-filings-snippets-10K** dataset. This dataset consists of curated excerpts from SEC 10‑K filings, enabling the model to better capture the specialized vocabulary, syntax, and discourse patterns of financial regulatory documents.
|
| 14 |
+
|
| 15 |
+
Key characteristics:
|
| 16 |
+
- **Base model**: RoBERTa‑base (general‑purpose pretrained transformer)
|
| 17 |
+
- **Adaptation method**: Domain‑Adaptive Pretraining (DAPT)
|
| 18 |
+
- **Domain corpus**: SEC 10‑K filings (snippets)
|
| 19 |
+
- **Language**: English
|
| 20 |
+
- **License**: Apache 2.0
|
| 21 |
+
|
| 22 |
+
### 🔍 Intended Use
|
| 23 |
+
- As a **foundation for downstream tasks** in financial NLP (e.g., classification, extraction, summarization)
|
| 24 |
+
- Research into domain adaptation techniques and their impact on language model performance
|
| 25 |
+
- Benchmarking DAPT workflows for financial/legal text corpora
|
| 26 |
+
|
| 27 |
+
### ⚖️ Limitations
|
| 28 |
+
- Not fine‑tuned for specific tasks (classification, QA, summarization) — requires further adaptation for task‑level performance
|
| 29 |
+
- Inherits biases from both the RoBERTa base corpus and SEC filings
|
| 30 |
+
- Not suitable for predictive financial advice or trading decisions
|
| 31 |
+
|