| # Dataset Card for Custom Text Dataset | |
| ## Dataset Name | |
| Custom CNN/DailyMail Text Summarization Dataset | |
| ## Overview | |
| This dataset is a custom subset and extension of the CNN/DailyMail dataset, consisting of news articles and their corresponding summaries. | |
| ## Composition | |
| Train Dataset: A custom train dataset consisting of one long news article with its manually written summary. | |
| Test Dataset: A test dataset sampled from the original CNN/DailyMail dataset, consisting of 100 articles and their corresponding highlights. | |
| ## Collection Process | |
| The custom train dataset was crafted using news articles from the CNN/DailyMail dataset. | |
| ## Preprocessing | |
| The intput text was tokenized. | |
| ## How to Use | |
| ```python | |
| from datasets import load_from_disk | |
| # Load the custom dataset | |
| train_dataset = load_from_disk("./results/custom_dataset/train") | |
| test_dataset = load_from_disk("./results/custom_dataset/test") | |
| ``` | |
| ## Evaluation | |
| This dataset can be evaluated using metrics such as ROUGE or BLEU. | |
| ## Limitations | |
| The train dataset consists of only one example. | |
| ## Ethical Considerations | |
| The data originates from news sources, which may contain sensitive or politically biased contents. | |