DerivedFunction
/

FinRoBERTa

Model card Files Files and versions

FinRoBERTa / README.md

DerivedFunction's picture

DerivedFunction

Update README.md

cd1c628 verified 3 months ago

|

history blame contribute delete

1.41 kB

	---
	license: apache-2.0
	datasets:
	- DerivedFunction/sec-filings-snippets-10K
	language:
	- en
	base_model:
	- FacebookAI/roberta-base
	---

	### 📘 Model Description

	FinRoBERTa is a domain‑adapted variant of RoBERTa‑base, trained using Domain‑Adaptive Pretraining (DAPT) on the DerivedFunction/sec-filings-snippets-10K dataset. This dataset consists of curated excerpts from SEC 10‑K filings, enabling the model to better capture the specialized vocabulary, syntax, and discourse patterns of financial regulatory documents.

	Key characteristics:
	- Base model: RoBERTa‑base (general‑purpose pretrained transformer)
	- Adaptation method: Domain‑Adaptive Pretraining (DAPT)
	- Domain corpus: SEC 10‑K filings (snippets)
	- Language: English
	- License: Apache 2.0

	### 🔍 Intended Use
	- As a foundation for downstream tasks in financial NLP (e.g., classification, extraction, summarization)
	- Research into domain adaptation techniques and their impact on language model performance
	- Benchmarking DAPT workflows for financial/legal text corpora

	### ⚖️ Limitations
	- Not fine‑tuned for specific tasks (classification, QA, summarization) — requires further adaptation for task‑level performance
	- Inherits biases from both the RoBERTa base corpus and SEC filings
	- Not suitable for predictive financial advice or trading decisions