chell9999's picture
Custom dataset CNN/DailyNews
7f07224
|
Raw
History Blame Contribute Delete
1.18 kB

Dataset Card for Custom Text Dataset

Dataset Name

Custom CNN/DailyMail Text Summarization Dataset

Overview

This dataset is a custom subset and extension of the CNN/DailyMail dataset, consisting of news articles and their corresponding summaries.

Composition

Train Dataset: A custom train dataset consisting of one long news article with its manually written summary. Test Dataset: A test dataset sampled from the original CNN/DailyMail dataset, consisting of 100 articles and their corresponding highlights.

Collection Process

The custom train dataset was crafted using news articles from the CNN/DailyMail dataset.

Preprocessing

The intput text was tokenized.

How to Use

from datasets import load_from_disk

# Load the custom dataset
train_dataset = load_from_disk("./results/custom_dataset/train")
test_dataset = load_from_disk("./results/custom_dataset/test")

Evaluation

This dataset can be evaluated using metrics such as ROUGE or BLEU.

Limitations

The train dataset consists of only one example.

Ethical Considerations

The data originates from news sources, which may contain sensitive or politically biased contents.