File size: 1,414 Bytes
4574252
 
6c46d9e
 
 
 
 
 
75d34ad
 
 
 
cd1c628
75d34ad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
---
license: apache-2.0
datasets:
- DerivedFunction/sec-filings-snippets-10K
language:
- en
base_model:
- FacebookAI/roberta-base
---

### 📘 Model Description

**FinRoBERTa** is a domain‑adapted variant of **RoBERTa‑base**, trained using **Domain‑Adaptive Pretraining (DAPT)** on the **DerivedFunction/sec-filings-snippets-10K** dataset. This dataset consists of curated excerpts from SEC 10‑K filings, enabling the model to better capture the specialized vocabulary, syntax, and discourse patterns of financial regulatory documents.  

Key characteristics:
- **Base model**: RoBERTa‑base (general‑purpose pretrained transformer)  
- **Adaptation method**: Domain‑Adaptive Pretraining (DAPT)  
- **Domain corpus**: SEC 10‑K filings (snippets)  
- **Language**: English  
- **License**: Apache 2.0  

### 🔍 Intended Use
- As a **foundation for downstream tasks** in financial NLP (e.g., classification, extraction, summarization)  
- Research into domain adaptation techniques and their impact on language model performance  
- Benchmarking DAPT workflows for financial/legal text corpora  

### ⚖️ Limitations
- Not fine‑tuned for specific tasks (classification, QA, summarization) — requires further adaptation for task‑level performance  
- Inherits biases from both the RoBERTa base corpus and SEC filings  
- Not suitable for predictive financial advice or trading decisions