BART Fine-Tuned Summarization Model
This repository hosts a BART-based model fine-tuned for text summarization on a custom dataset of articles and highlights. The model is suitable for generating concise summaries from long-form text.
Model Overview
- Base Model:
facebook/bart-large-cnn - Task: Text Summarization
- Fine-Tuning Dataset: Custom CSV dataset containing
documentandsummarycolumns - Dataset Size: Varies depending on your CSV file
- Framework: Hugging Face Transformers
- Language: English
Dataset Preparation
- Load your CSV dataset containing columns:
article(renamed todocument) andhighlights(renamed tosummary). - Clean the dataset by removing missing or non-string entries.
- Split the dataset into train and validation sets (80/20 split).
from datasets import Dataset
dataset = Dataset.from_pandas(df)
dataset = dataset.train_test_split(test_size=0.2, seed=42)