chell9999's picture
Custom dataset CNN/DailyNews
7f07224
|
Raw
History Blame Contribute Delete
1.18 kB
# Dataset Card for Custom Text Dataset
## Dataset Name
Custom CNN/DailyMail Text Summarization Dataset
## Overview
This dataset is a custom subset and extension of the CNN/DailyMail dataset, consisting of news articles and their corresponding summaries.
## Composition
Train Dataset: A custom train dataset consisting of one long news article with its manually written summary.
Test Dataset: A test dataset sampled from the original CNN/DailyMail dataset, consisting of 100 articles and their corresponding highlights.
## Collection Process
The custom train dataset was crafted using news articles from the CNN/DailyMail dataset.
## Preprocessing
The intput text was tokenized.
## How to Use
```python
from datasets import load_from_disk
# Load the custom dataset
train_dataset = load_from_disk("./results/custom_dataset/train")
test_dataset = load_from_disk("./results/custom_dataset/test")
```
## Evaluation
This dataset can be evaluated using metrics such as ROUGE or BLEU.
## Limitations
The train dataset consists of only one example.
## Ethical Considerations
The data originates from news sources, which may contain sensitive or politically biased contents.