|
|
--- |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- kritsadaK/EDGAR-CORPUS-Financial-Summarization |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- rouge |
|
|
base_model: |
|
|
- facebook/bart-large-cnn |
|
|
--- |
|
|
# **BART Financial Summarization Model** |
|
|
|
|
|
**Model Name:** `kritsadaK/bart-financial-summarization` |
|
|
**Base Model:** `facebook/bart-large-cnn` |
|
|
**Task:** Financial Text Summarization |
|
|
**Dataset:** `kritsadaK/EDGAR-CORPUS-Financial-Summarization` |
|
|
|
|
|
**Techniques:** |
|
|
- Fine-tuned using the Hugging Face `Trainer` API |
|
|
- Tokenized with `AutoTokenizer` (max length 1024 for input, 256 for summary) |
|
|
- Optimized with AdamW, learning rate `2e-5`, batch size `2`, `fp16` enabled |
|
|
- Evaluated using ROUGE scores |
|
|
|
|
|
**Evaluation Results:** |
|
|
- **Loss:** 1.18 |
|
|
- **Runtime:** 18.9 seconds |
|
|
- **Samples per second:** 56.1 |
|
|
- **Steps per second:** 28.1 |
|
|
- **Epochs:** 3 |
|
|
|
|
|
**Usage Example (Python):** |
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
max_input_length = 1024 |
|
|
summarizer = pipeline("summarization", model="kritsadaK/bart-financial-summarization") |
|
|
text = "Your financial document text here..." |
|
|
summary = summarizer(text, max_length=256, min_length=50, do_sample=False) |
|
|
print(summary) |
|
|
``` |
|
|
|
|
|
|
|
|
The **Financial Statements Summary 10K Dataset** was developed as part of the **CSX4210: Natural Language Processing** project at **Assumption University**. |