ierhon
/

neural-chatbot

Text Generation

Model card Files Files and versions

ierhon commited on Jul 29, 2023

Commit

84f4971

·

1 Parent(s): 99918f8

Update train.py

Fix the tokenizer learning

Files changed (1) hide show

train.py +1 -1

train.py CHANGED Viewed

@@ -10,7 +10,7 @@ with open("dataset.json", "r") as f:
     dset = json.load(f)
 tokenizer = Tokenizer()
-tokenizer.fit_on_texts(dset)
 emb_size = 128 # how big are the word vectors in the input (how much information can be fit into one word)
 vocab_size = len(tokenizer.get_vocabulary())

     dset = json.load(f)
 tokenizer = Tokenizer()
+tokenizer.fit_on_texts(list(dset.keys()))
 emb_size = 128 # how big are the word vectors in the input (how much information can be fit into one word)
 vocab_size = len(tokenizer.get_vocabulary())