ccc / preprocess_data.py
jljiu's picture
Create preprocess_data.py
73aa68f verified
raw
history blame contribute delete
233 Bytes
import datasets
# 读取txt文件
with open('data/novel.txt', 'r', encoding='utf-8') as f:
lines = f.readlines()
data = {'text': lines}
dataset = datasets.Dataset.from_dict(data)
dataset.save_to_disk('models/processed_dataset')