--- license: apache-2.0 datasets: - DerivedFunction/sec-filings-snippets-10K language: - en base_model: - FacebookAI/roberta-base --- ### šŸ“˜ Model Description **FinRoBERTa** is a domain‑adapted variant of **RoBERTa‑base**, trained using **Domain‑Adaptive Pretraining (DAPT)** on the **DerivedFunction/sec-filings-snippets-10K** dataset. This dataset consists of curated excerpts from SEC 10‑K filings, enabling the model to better capture the specialized vocabulary, syntax, and discourse patterns of financial regulatory documents. Key characteristics: - **Base model**: RoBERTa‑base (general‑purpose pretrained transformer) - **Adaptation method**: Domain‑Adaptive Pretraining (DAPT) - **Domain corpus**: SEC 10‑K filings (snippets) - **Language**: English - **License**: Apache 2.0 ### šŸ” Intended Use - As a **foundation for downstream tasks** in financial NLP (e.g., classification, extraction, summarization) - Research into domain adaptation techniques and their impact on language model performance - Benchmarking DAPT workflows for financial/legal text corpora ### āš–ļø Limitations - Not fine‑tuned for specific tasks (classification, QA, summarization) — requires further adaptation for task‑level performance - Inherits biases from both the RoBERTa base corpus and SEC filings - Not suitable for predictive financial advice or trading decisions