| | --- |
| | license: apache-2.0 |
| | datasets: |
| | - DerivedFunction/sec-filings-snippets-10K |
| | language: |
| | - en |
| | base_model: |
| | - FacebookAI/roberta-base |
| | --- |
| | |
| | ### 📘 Model Description |
| |
|
| | **FinRoBERTa** is a domain‑adapted variant of **RoBERTa‑base**, trained using **Domain‑Adaptive Pretraining (DAPT)** on the **DerivedFunction/sec-filings-snippets-10K** dataset. This dataset consists of curated excerpts from SEC 10‑K filings, enabling the model to better capture the specialized vocabulary, syntax, and discourse patterns of financial regulatory documents. |
| |
|
| | Key characteristics: |
| | - **Base model**: RoBERTa‑base (general‑purpose pretrained transformer) |
| | - **Adaptation method**: Domain‑Adaptive Pretraining (DAPT) |
| | - **Domain corpus**: SEC 10‑K filings (snippets) |
| | - **Language**: English |
| | - **License**: Apache 2.0 |
| |
|
| | ### 🔍 Intended Use |
| | - As a **foundation for downstream tasks** in financial NLP (e.g., classification, extraction, summarization) |
| | - Research into domain adaptation techniques and their impact on language model performance |
| | - Benchmarking DAPT workflows for financial/legal text corpora |
| |
|
| | ### ⚖️ Limitations |
| | - Not fine‑tuned for specific tasks (classification, QA, summarization) — requires further adaptation for task‑level performance |
| | - Inherits biases from both the RoBERTa base corpus and SEC filings |
| | - Not suitable for predictive financial advice or trading decisions |
| |
|
| |
|