chell9999
/

custom_summarization_dataset

Model card Files Files and versions

custom_summarization_dataset / README.md

chell9999's picture

Custom dataset CNN/DailyNews

7f07224 almost 2 years ago

|

History Blame Contribute Delete

1.18 kB


	# Dataset Card for Custom Text Dataset

	## Dataset Name
	Custom CNN/DailyMail Text Summarization Dataset

	## Overview
	This dataset is a custom subset and extension of the CNN/DailyMail dataset, consisting of news articles and their corresponding summaries.

	## Composition
	Train Dataset: A custom train dataset consisting of one long news article with its manually written summary.
	Test Dataset: A test dataset sampled from the original CNN/DailyMail dataset, consisting of 100 articles and their corresponding highlights.

	## Collection Process
	The custom train dataset was crafted using news articles from the CNN/DailyMail dataset.

	## Preprocessing
	The intput text was tokenized.

	## How to Use
	```python
	from datasets import load_from_disk

	# Load the custom dataset
	train_dataset = load_from_disk("./results/custom_dataset/train")
	test_dataset = load_from_disk("./results/custom_dataset/test")
	```

	## Evaluation
	This dataset can be evaluated using metrics such as ROUGE or BLEU.

	## Limitations
	The train dataset consists of only one example.

	## Ethical Considerations
	The data originates from news sources, which may contain sensitive or politically biased contents.