|
|
--- |
|
|
license: cc-by-4.0 |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- summarization |
|
|
- text-generation |
|
|
- NLP |
|
|
- transformers |
|
|
datasets: |
|
|
- your-dataset-name |
|
|
--- |
|
|
|
|
|
# BART Fine-Tuned Summarization Model |
|
|
|
|
|
This repository hosts a **BART-based model fine-tuned for text summarization** on a custom dataset of articles and highlights. The model is suitable for **generating concise summaries from long-form text**. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Overview |
|
|
|
|
|
- **Base Model:** `facebook/bart-large-cnn` |
|
|
- **Task:** Text Summarization |
|
|
- **Fine-Tuning Dataset:** Custom CSV dataset containing `document` and `summary` columns |
|
|
- **Dataset Size:** Varies depending on your CSV file |
|
|
- **Framework:** Hugging Face Transformers |
|
|
- **Language:** English |
|
|
|
|
|
--- |
|
|
|
|
|
## Dataset Preparation |
|
|
|
|
|
1. Load your CSV dataset containing columns: `article` (renamed to `document`) and `highlights` (renamed to `summary`). |
|
|
2. Clean the dataset by removing missing or non-string entries. |
|
|
3. Split the dataset into **train** and **validation** sets (80/20 split). |
|
|
|
|
|
```python |
|
|
from datasets import Dataset |
|
|
dataset = Dataset.from_pandas(df) |
|
|
dataset = dataset.train_test_split(test_size=0.2, seed=42) |
|
|
|