prajwath commited on
Commit
6bd1ee8
·
verified ·
1 Parent(s): eec604d

Delete Task 2

Browse files
Task 2/Eng_French.ipynb DELETED
@@ -1,868 +0,0 @@
1
- {
2
- "cells": [
3
- {
4
- "cell_type": "markdown",
5
- "metadata": {
6
- "collapsed": true
7
- },
8
- "source": [
9
- "# Machine Translation Project (English to French)"
10
- ]
11
- },
12
- {
13
- "cell_type": "code",
14
- "execution_count": 2,
15
- "metadata": {},
16
- "outputs": [],
17
- "source": [
18
- "import collections\n",
19
- "import numpy as np\n",
20
- "import json\n",
21
- "\n",
22
- "from keras.preprocessing.text import Tokenizer\n",
23
- "from keras.utils import pad_sequences\n",
24
- "from keras.models import Model, Sequential\n",
25
- "from keras.layers import Input, Dense, Embedding, GRU, LSTM, Bidirectional, Dropout, Activation, TimeDistributed, RepeatVector\n",
26
- "from keras.optimizers import Adam\n",
27
- "from keras.losses import sparse_categorical_crossentropy"
28
- ]
29
- },
30
- {
31
- "cell_type": "markdown",
32
- "metadata": {},
33
- "source": [
34
- "### Verify access to the GPU"
35
- ]
36
- },
37
- {
38
- "cell_type": "code",
39
- "execution_count": 3,
40
- "metadata": {},
41
- "outputs": [
42
- {
43
- "name": "stdout",
44
- "output_type": "stream",
45
- "text": [
46
- "[name: \"/device:CPU:0\"\n",
47
- "device_type: \"CPU\"\n",
48
- "memory_limit: 268435456\n",
49
- "locality {\n",
50
- "}\n",
51
- "incarnation: 8951901467623568752\n",
52
- "xla_global_id: -1\n",
53
- ", name: \"/device:GPU:0\"\n",
54
- "device_type: \"GPU\"\n",
55
- "memory_limit: 1733715559\n",
56
- "locality {\n",
57
- " bus_id: 1\n",
58
- " links {\n",
59
- " }\n",
60
- "}\n",
61
- "incarnation: 7542354691675806642\n",
62
- "physical_device_desc: \"device: 0, name: NVIDIA GeForce RTX 2050, pci bus id: 0000:01:00.0, compute capability: 8.6\"\n",
63
- "xla_global_id: 416903419\n",
64
- "]\n"
65
- ]
66
- }
67
- ],
68
- "source": [
69
- "from tensorflow.python.client import device_lib\n",
70
- "print(device_lib.list_local_devices())"
71
- ]
72
- },
73
- {
74
- "cell_type": "markdown",
75
- "metadata": {},
76
- "source": [
77
- "## Dataset\n",
78
- "For our machine translation project, we opt for a dataset featuring a limited vocabulary, specifically designed to facilitate a more manageable and efficient training process. Unlike the extensive [WMT](http://www.statmt.org/) datasets, our chosen dataset ensures a quicker training time and demands fewer computational resources. This strategic decision aims to balance the learning experience while still achieving meaningful results within practical time constraints.\n",
79
- "### Load Data"
80
- ]
81
- },
82
- {
83
- "cell_type": "code",
84
- "execution_count": 4,
85
- "metadata": {},
86
- "outputs": [],
87
- "source": [
88
- "def load_data(path):\n",
89
- " input_file = path\n",
90
- " with open(input_file, \"r\") as f:\n",
91
- " data = f.read()\n",
92
- " return data.split('\\n')\n",
93
- "\n",
94
- "english_sentences = load_data('data/english')\n",
95
- "french_sentences = load_data('data/french')"
96
- ]
97
- },
98
- {
99
- "cell_type": "markdown",
100
- "metadata": {},
101
- "source": [
102
- "### Sample Data"
103
- ]
104
- },
105
- {
106
- "cell_type": "code",
107
- "execution_count": 5,
108
- "metadata": {},
109
- "outputs": [
110
- {
111
- "data": {
112
- "text/plain": [
113
- "['new jersey is sometimes quiet during autumn , and it is snowy in april .',\n",
114
- " 'the united states is usually chilly during july , and it is usually freezing in november .',\n",
115
- " 'california is usually quiet during march , and it is usually hot in june .',\n",
116
- " 'the united states is sometimes mild during june , and it is cold in september .',\n",
117
- " 'your least liked fruit is the grape , but my least liked is the apple .']"
118
- ]
119
- },
120
- "execution_count": 5,
121
- "metadata": {},
122
- "output_type": "execute_result"
123
- }
124
- ],
125
- "source": [
126
- "english_sentences[:5]"
127
- ]
128
- },
129
- {
130
- "cell_type": "markdown",
131
- "metadata": {},
132
- "source": [
133
- "By examining the sentences, it's apparent that they have undergone preprocessing: punctuation has been delimited with spaces, and all the text has been converted to lowercase. This preprocessing serves a crucial purpose in text preparation. Firstly, delimiting punctuation with spaces ensures that each punctuation mark is treated as a separate token, aiding the model in understanding sentence structure. Secondly, converting the entire text to lowercase standardizes the input, preventing the model from distinguishing between words solely based on their casing. This uniformity facilitates more effective training and generalization, enhancing the model's ability to grasp patterns and generate accurate translations."
134
- ]
135
- },
136
- {
137
- "cell_type": "markdown",
138
- "metadata": {},
139
- "source": [
140
- "Structure of the Dataset"
141
- ]
142
- },
143
- {
144
- "cell_type": "code",
145
- "execution_count": 6,
146
- "metadata": {},
147
- "outputs": [
148
- {
149
- "name": "stdout",
150
- "output_type": "stream",
151
- "text": [
152
- "1823250 English words.\n",
153
- "227 unique English words.\n",
154
- "10 Most common words in the English dataset:\n",
155
- "\"is\" \",\" \".\" \"in\" \"it\" \"during\" \"the\" \"but\" \"and\" \"sometimes\"\n",
156
- "\n",
157
- "1961295 French words.\n",
158
- "355 unique French words.\n",
159
- "10 Most common words in the French dataset:\n",
160
- "\"est\" \".\" \",\" \"en\" \"il\" \"les\" \"mais\" \"et\" \"la\" \"parfois\"\n"
161
- ]
162
- }
163
- ],
164
- "source": [
165
- "english_words_counter = collections.Counter([word for sentence in english_sentences for word in sentence.split()])\n",
166
- "french_words_counter = collections.Counter([word for sentence in french_sentences for word in sentence.split()])\n",
167
- "\n",
168
- "print('{} English words.'.format(len([word for sentence in english_sentences for word in sentence.split()])))\n",
169
- "print('{} unique English words.'.format(len(english_words_counter)))\n",
170
- "print('10 Most common words in the English dataset:')\n",
171
- "print('\"' + '\" \"'.join(list(zip(*english_words_counter.most_common(10)))[0]) + '\"')\n",
172
- "\n",
173
- "print()\n",
174
- "print('{} French words.'.format(len([word for sentence in french_sentences for word in sentence.split()])))\n",
175
- "print('{} unique French words.'.format(len(french_words_counter)))\n",
176
- "print('10 Most common words in the French dataset:')\n",
177
- "print('\"' + '\" \"'.join(list(zip(*french_words_counter.most_common(10)))[0]) + '\"')"
178
- ]
179
- },
180
- {
181
- "cell_type": "markdown",
182
- "metadata": {},
183
- "source": [
184
- "### Preprocess\n",
185
- "1. Tokenize the words into ids\n",
186
- "2. Add padding to make all the sequences the same length."
187
- ]
188
- },
189
- {
190
- "cell_type": "code",
191
- "execution_count": 7,
192
- "metadata": {},
193
- "outputs": [
194
- {
195
- "name": "stdout",
196
- "output_type": "stream",
197
- "text": [
198
- "{'the': 1, 'quick': 2, 'a': 3, 'brown': 4, 'fox': 5, 'jumps': 6, 'over': 7, 'lazy': 8, 'dog': 9, 'by': 10, 'jove': 11, 'my': 12, 'study': 13, 'of': 14, 'lexicography': 15, 'won': 16, 'prize': 17, 'this': 18, 'is': 19, 'short': 20, 'sentence': 21}\n",
199
- "\n",
200
- "Sequence 1 in x\n",
201
- " Input: The quick brown fox jumps over the lazy dog .\n",
202
- " Output: [1, 2, 4, 5, 6, 7, 1, 8, 9]\n",
203
- "Sequence 2 in x\n",
204
- " Input: By Jove , my quick study of lexicography won a prize .\n",
205
- " Output: [10, 11, 12, 2, 13, 14, 15, 16, 3, 17]\n",
206
- "Sequence 3 in x\n",
207
- " Input: This is a short sentence .\n",
208
- " Output: [18, 19, 3, 20, 21]\n"
209
- ]
210
- }
211
- ],
212
- "source": [
213
- "def tokenize(x):\n",
214
- " tokenizer = Tokenizer()\n",
215
- " tokenizer.fit_on_texts(x)\n",
216
- " return tokenizer.texts_to_sequences(x), tokenizer\n",
217
- "\n",
218
- "text_sentences = [\n",
219
- " 'The quick brown fox jumps over the lazy dog .',\n",
220
- " 'By Jove , my quick study of lexicography won a prize .',\n",
221
- " 'This is a short sentence .']\n",
222
- "\n",
223
- "text_tokenized, text_tokenizer = tokenize(text_sentences)\n",
224
- "print(text_tokenizer.word_index)\n",
225
- "print()\n",
226
- "for sample_i, (sent, token_sent) in enumerate(zip(text_sentences, text_tokenized)):\n",
227
- " print('Sequence {} in x'.format(sample_i + 1))\n",
228
- " print(' Input: {}'.format(sent))\n",
229
- " print(' Output: {}'.format(token_sent))"
230
- ]
231
- },
232
- {
233
- "cell_type": "code",
234
- "execution_count": 8,
235
- "metadata": {},
236
- "outputs": [
237
- {
238
- "name": "stdout",
239
- "output_type": "stream",
240
- "text": [
241
- "Sequence 1 in x\n",
242
- " Input: [1 2 4 5 6 7 1 8 9]\n",
243
- " Output: [1 2 4 5 6 7 1 8 9 0]\n",
244
- "Sequence 2 in x\n",
245
- " Input: [10 11 12 2 13 14 15 16 3 17]\n",
246
- " Output: [10 11 12 2 13 14 15 16 3 17]\n",
247
- "Sequence 3 in x\n",
248
- " Input: [18 19 3 20 21]\n",
249
- " Output: [18 19 3 20 21 0 0 0 0 0]\n"
250
- ]
251
- }
252
- ],
253
- "source": [
254
- "def pad(x, length=None):\n",
255
- " if length is None:\n",
256
- " length = max([len(sentence) for sentence in x])\n",
257
- " return pad_sequences(x, maxlen=length, padding='post')\n",
258
- "\n",
259
- "test_pad = pad(text_tokenized)\n",
260
- "for sample_i, (token_sent, pad_sent) in enumerate(zip(text_tokenized, test_pad)):\n",
261
- " print('Sequence {} in x'.format(sample_i + 1))\n",
262
- " print(' Input: {}'.format(np.array(token_sent)))\n",
263
- " print(' Output: {}'.format(pad_sent))"
264
- ]
265
- },
266
- {
267
- "cell_type": "code",
268
- "execution_count": 9,
269
- "metadata": {},
270
- "outputs": [
271
- {
272
- "name": "stdout",
273
- "output_type": "stream",
274
- "text": [
275
- "Data Preprocessed\n",
276
- "Max English sentence length: 15\n",
277
- "Max French sentence length: 21\n",
278
- "English vocabulary size: 199\n",
279
- "French vocabulary size: 344\n"
280
- ]
281
- }
282
- ],
283
- "source": [
284
- "def preprocess(x,y):\n",
285
- " preprocess_x, x_tk = tokenize(x)\n",
286
- " preprocess_y, y_tk = tokenize(y)\n",
287
- " \n",
288
- " preprocess_x = pad(preprocess_x)\n",
289
- " preprocess_y = pad(preprocess_y)\n",
290
- " \n",
291
- " preprocess_y = preprocess_y.reshape(*preprocess_y.shape, 1)\n",
292
- " \n",
293
- " return preprocess_x, preprocess_y, x_tk, y_tk\n",
294
- "\n",
295
- "preproc_english_sentences, preproc_french_sentences, english_tokenizer, french_tokenizer = preprocess(english_sentences, french_sentences)\n",
296
- "\n",
297
- "max_english_sequence_length = preproc_english_sentences.shape[1]\n",
298
- "max_french_sequence_length = preproc_french_sentences.shape[1]\n",
299
- "english_vocab_size = len(english_tokenizer.word_index)\n",
300
- "french_vocab_size = len(french_tokenizer.word_index)\n",
301
- "\n",
302
- "print('Data Preprocessed')\n",
303
- "print(\"Max English sentence length:\", max_english_sequence_length)\n",
304
- "print(\"Max French sentence length:\", max_french_sequence_length)\n",
305
- "print(\"English vocabulary size:\", english_vocab_size)\n",
306
- "print(\"French vocabulary size:\", french_vocab_size)"
307
- ]
308
- },
309
- {
310
- "cell_type": "markdown",
311
- "metadata": {},
312
- "source": [
313
- "## Models\n",
314
- "- Model 1 is a simple RNN\n",
315
- "- Model 2 is a Bidirectional RNN\n",
316
- "- Model 3 is an Embedding RNN\n",
317
- "\n",
318
- "### Ids Back to Text\n",
319
- "The neural network will be translating the input to words ids, which isn't the final form we want. We want the French translation. The function `logits_to_text` will bridge the gab between the logits from the neural network to the French translation. You'll be using this function to better understand the output of the neural network."
320
- ]
321
- },
322
- {
323
- "cell_type": "code",
324
- "execution_count": 10,
325
- "metadata": {},
326
- "outputs": [],
327
- "source": [
328
- "def logits_to_text(logits, tokenizer):\n",
329
- " index_to_words = {id: word for word, id in tokenizer.word_index.items()}\n",
330
- " index_to_words[0] = '<PAD>'\n",
331
- " \n",
332
- " return ' '.join([index_to_words[prediction] for prediction in np.argmax(logits, 1)])"
333
- ]
334
- },
335
- {
336
- "cell_type": "markdown",
337
- "metadata": {},
338
- "source": [
339
- "### Model 1: RNN\n",
340
- "![RNN](images/rnn.png)\n",
341
- "A basic RNN model is a good baseline for sequence data. In this model, you'll build a RNN that translates English to French."
342
- ]
343
- },
344
- {
345
- "cell_type": "code",
346
- "execution_count": 11,
347
- "metadata": {},
348
- "outputs": [
349
- {
350
- "name": "stdout",
351
- "output_type": "stream",
352
- "text": [
353
- "Epoch 1/20\n",
354
- "108/108 [==============================] - 17s 90ms/step - loss: 1.9094 - accuracy: 0.5446 - val_loss: nan - val_accuracy: 0.6307\n",
355
- "Epoch 2/20\n",
356
- "108/108 [==============================] - 9s 84ms/step - loss: 1.2243 - accuracy: 0.6429 - val_loss: nan - val_accuracy: 0.6716\n",
357
- "Epoch 3/20\n",
358
- "108/108 [==============================] - 9s 83ms/step - loss: 1.0848 - accuracy: 0.6683 - val_loss: nan - val_accuracy: 0.6864\n",
359
- "Epoch 4/20\n",
360
- "108/108 [==============================] - 9s 83ms/step - loss: 1.0057 - accuracy: 0.6832 - val_loss: nan - val_accuracy: 0.6927\n",
361
- "Epoch 5/20\n",
362
- "108/108 [==============================] - 9s 83ms/step - loss: 0.9484 - accuracy: 0.6931 - val_loss: nan - val_accuracy: 0.7108\n",
363
- "Epoch 6/20\n",
364
- "108/108 [==============================] - 9s 83ms/step - loss: 0.9019 - accuracy: 0.7036 - val_loss: nan - val_accuracy: 0.7007\n",
365
- "Epoch 7/20\n",
366
- "108/108 [==============================] - 9s 84ms/step - loss: 0.8916 - accuracy: 0.6999 - val_loss: nan - val_accuracy: 0.7244\n",
367
- "Epoch 8/20\n",
368
- "108/108 [==============================] - 9s 83ms/step - loss: 0.8407 - accuracy: 0.7178 - val_loss: nan - val_accuracy: 0.7567\n",
369
- "Epoch 9/20\n",
370
- "108/108 [==============================] - 9s 83ms/step - loss: 0.7807 - accuracy: 0.7405 - val_loss: nan - val_accuracy: 0.7405\n",
371
- "Epoch 10/20\n",
372
- "108/108 [==============================] - 9s 84ms/step - loss: 0.7474 - accuracy: 0.7496 - val_loss: nan - val_accuracy: 0.7721\n",
373
- "Epoch 11/20\n",
374
- "108/108 [==============================] - 9s 83ms/step - loss: 0.7739 - accuracy: 0.7392 - val_loss: nan - val_accuracy: 0.7392\n",
375
- "Epoch 12/20\n",
376
- "108/108 [==============================] - 9s 84ms/step - loss: 0.7552 - accuracy: 0.7420 - val_loss: nan - val_accuracy: 0.7851\n",
377
- "Epoch 13/20\n",
378
- "108/108 [==============================] - 9s 83ms/step - loss: 0.7238 - accuracy: 0.7550 - val_loss: nan - val_accuracy: 0.7937\n",
379
- "Epoch 14/20\n",
380
- "108/108 [==============================] - 9s 83ms/step - loss: 0.7126 - accuracy: 0.7568 - val_loss: nan - val_accuracy: 0.7830\n",
381
- "Epoch 15/20\n",
382
- "108/108 [==============================] - 9s 84ms/step - loss: 0.6838 - accuracy: 0.7650 - val_loss: nan - val_accuracy: 0.7976\n",
383
- "Epoch 16/20\n",
384
- "108/108 [==============================] - 9s 83ms/step - loss: 0.6577 - accuracy: 0.7776 - val_loss: nan - val_accuracy: 0.7995\n",
385
- "Epoch 17/20\n",
386
- "108/108 [==============================] - 9s 83ms/step - loss: 0.6447 - accuracy: 0.7821 - val_loss: nan - val_accuracy: 0.8072\n",
387
- "Epoch 18/20\n",
388
- "108/108 [==============================] - 9s 83ms/step - loss: 0.6309 - accuracy: 0.7858 - val_loss: nan - val_accuracy: 0.8100\n",
389
- "Epoch 19/20\n",
390
- "108/108 [==============================] - 9s 84ms/step - loss: 0.6073 - accuracy: 0.7930 - val_loss: nan - val_accuracy: 0.8111\n",
391
- "Epoch 20/20\n",
392
- "108/108 [==============================] - 9s 83ms/step - loss: 0.6100 - accuracy: 0.7912 - val_loss: nan - val_accuracy: 0.8150\n"
393
- ]
394
- },
395
- {
396
- "data": {
397
- "text/plain": [
398
- "<keras.callbacks.History at 0x2573cec49a0>"
399
- ]
400
- },
401
- "execution_count": 11,
402
- "metadata": {},
403
- "output_type": "execute_result"
404
- }
405
- ],
406
- "source": [
407
- "def simple_model(input_shape, output_sequence_length, english_vocab_size, french_vocab_size):\n",
408
- " \n",
409
- " #Hyperparameters\n",
410
- " learning_rate = 0.005\n",
411
- " \n",
412
- " # Build the layers\n",
413
- " model = Sequential()\n",
414
- " model.add(GRU(256, input_shape=input_shape[1:], return_sequences=True))\n",
415
- " model.add(TimeDistributed(Dense(1024, activation='relu')))\n",
416
- " model.add(Dropout(0.5))\n",
417
- " model.add(TimeDistributed(Dense(french_vocab_size, activation='softmax')))\n",
418
- " \n",
419
- " # Compile model\n",
420
- " model.compile(loss = sparse_categorical_crossentropy,\n",
421
- " optimizer = Adam(learning_rate),\n",
422
- " metrics = ['accuracy'])\n",
423
- " \n",
424
- " return model\n",
425
- "\n",
426
- "tmp_x = pad(preproc_english_sentences, max_french_sequence_length)\n",
427
- "tmp_x = tmp_x.reshape((-1, preproc_french_sentences.shape[-2], 1))\n",
428
- "\n",
429
- "#Train the neural network\n",
430
- "simple_rnn_model = simple_model(\n",
431
- " tmp_x.shape,\n",
432
- " max_french_sequence_length,\n",
433
- " english_vocab_size,\n",
434
- " french_vocab_size)\n",
435
- "\n",
436
- "simple_rnn_model.fit(tmp_x, preproc_french_sentences, batch_size=1024, epochs=20, validation_split=0.2)"
437
- ]
438
- },
439
- {
440
- "cell_type": "code",
441
- "execution_count": 12,
442
- "metadata": {},
443
- "outputs": [
444
- {
445
- "name": "stdout",
446
- "output_type": "stream",
447
- "text": [
448
- "Prediciton:\n",
449
- "1/1 [==============================] - 0s 259ms/step\n",
450
- "new jersey est parfois calme en mois et il est il est en <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD>\n",
451
- "\n",
452
- "Correct Translation:\n",
453
- "[\"new jersey est parfois calme pendant l' automne , et il est neigeux en avril .\"]\n",
454
- "\n",
455
- "Original text:\n",
456
- "['new jersey is sometimes quiet during autumn , and it is snowy in april .']\n"
457
- ]
458
- }
459
- ],
460
- "source": [
461
- "# Print prediction(s)\n",
462
- "print(\"Prediciton:\")\n",
463
- "print(logits_to_text(simple_rnn_model.predict(tmp_x[:1])[0], french_tokenizer))\n",
464
- "\n",
465
- "print(\"\\nCorrect Translation:\")\n",
466
- "print(french_sentences[:1])\n",
467
- "\n",
468
- "print('\\nOriginal text:')\n",
469
- "print(english_sentences[:1])"
470
- ]
471
- },
472
- {
473
- "cell_type": "markdown",
474
- "metadata": {},
475
- "source": [
476
- "### Model 2: Bidirectional RNNs\n",
477
- "![RNN](images/bidirectional.png)\n",
478
- "One restriction of a RNN is that it can't see the future input, only the past. This is where bidirectional recurrent neural networks come in. They are able to see the future data."
479
- ]
480
- },
481
- {
482
- "cell_type": "code",
483
- "execution_count": 13,
484
- "metadata": {},
485
- "outputs": [
486
- {
487
- "name": "stdout",
488
- "output_type": "stream",
489
- "text": [
490
- "Model: \"sequential_1\"\n",
491
- "_________________________________________________________________\n",
492
- " Layer (type) Output Shape Param # \n",
493
- "=================================================================\n",
494
- " bidirectional (Bidirectiona (None, 21, 256) 100608 \n",
495
- " l) \n",
496
- " \n",
497
- " time_distributed_2 (TimeDis (None, 21, 1024) 263168 \n",
498
- " tributed) \n",
499
- " \n",
500
- " dropout_1 (Dropout) (None, 21, 1024) 0 \n",
501
- " \n",
502
- " time_distributed_3 (TimeDis (None, 21, 344) 352600 \n",
503
- " tributed) \n",
504
- " \n",
505
- "=================================================================\n",
506
- "Total params: 716,376\n",
507
- "Trainable params: 716,376\n",
508
- "Non-trainable params: 0\n",
509
- "_________________________________________________________________\n",
510
- "None\n",
511
- "Epoch 1/20\n",
512
- "108/108 [==============================] - 12s 90ms/step - loss: 1.7553 - accuracy: 0.5756 - val_loss: nan - val_accuracy: 0.6505\n",
513
- "Epoch 2/20\n",
514
- "108/108 [==============================] - 9s 85ms/step - loss: 1.1655 - accuracy: 0.6550 - val_loss: nan - val_accuracy: 0.6802\n",
515
- "Epoch 3/20\n",
516
- "108/108 [==============================] - 9s 85ms/step - loss: 1.0423 - accuracy: 0.6759 - val_loss: nan - val_accuracy: 0.6903\n",
517
- "Epoch 4/20\n",
518
- "108/108 [==============================] - 9s 85ms/step - loss: 0.9663 - accuracy: 0.6880 - val_loss: nan - val_accuracy: 0.7003\n",
519
- "Epoch 5/20\n",
520
- "108/108 [==============================] - 9s 85ms/step - loss: 0.9119 - accuracy: 0.6974 - val_loss: nan - val_accuracy: 0.7207\n",
521
- "Epoch 6/20\n",
522
- "108/108 [==============================] - 9s 85ms/step - loss: 0.8700 - accuracy: 0.7059 - val_loss: nan - val_accuracy: 0.7287\n",
523
- "Epoch 7/20\n",
524
- "108/108 [==============================] - 9s 85ms/step - loss: 0.8359 - accuracy: 0.7129 - val_loss: nan - val_accuracy: 0.7301\n",
525
- "Epoch 8/20\n",
526
- "108/108 [==============================] - 9s 85ms/step - loss: 0.8495 - accuracy: 0.7090 - val_loss: nan - val_accuracy: 0.7300\n",
527
- "Epoch 9/20\n",
528
- "108/108 [==============================] - 9s 85ms/step - loss: 0.8025 - accuracy: 0.7197 - val_loss: nan - val_accuracy: 0.7386\n",
529
- "Epoch 10/20\n",
530
- "108/108 [==============================] - 9s 85ms/step - loss: 0.7839 - accuracy: 0.7228 - val_loss: nan - val_accuracy: 0.7429\n",
531
- "Epoch 11/20\n",
532
- "108/108 [==============================] - 9s 85ms/step - loss: 0.7671 - accuracy: 0.7248 - val_loss: nan - val_accuracy: 0.7461\n",
533
- "Epoch 12/20\n",
534
- "108/108 [==============================] - 9s 86ms/step - loss: 0.7490 - accuracy: 0.7278 - val_loss: nan - val_accuracy: 0.7487\n",
535
- "Epoch 13/20\n",
536
- "108/108 [==============================] - 9s 85ms/step - loss: 0.7341 - accuracy: 0.7307 - val_loss: nan - val_accuracy: 0.7473\n",
537
- "Epoch 14/20\n",
538
- "108/108 [==============================] - 9s 85ms/step - loss: 0.7183 - accuracy: 0.7363 - val_loss: nan - val_accuracy: 0.7614\n",
539
- "Epoch 15/20\n",
540
- "108/108 [==============================] - 9s 86ms/step - loss: 0.6998 - accuracy: 0.7427 - val_loss: nan - val_accuracy: 0.7594\n",
541
- "Epoch 16/20\n",
542
- "108/108 [==============================] - 9s 85ms/step - loss: 0.7086 - accuracy: 0.7361 - val_loss: nan - val_accuracy: 0.7596\n",
543
- "Epoch 17/20\n",
544
- "108/108 [==============================] - 9s 85ms/step - loss: 0.6889 - accuracy: 0.7424 - val_loss: nan - val_accuracy: 0.7679\n",
545
- "Epoch 18/20\n",
546
- "108/108 [==============================] - 9s 85ms/step - loss: 0.6780 - accuracy: 0.7493 - val_loss: nan - val_accuracy: 0.7763\n",
547
- "Epoch 19/20\n",
548
- "108/108 [==============================] - 9s 85ms/step - loss: 0.6625 - accuracy: 0.7535 - val_loss: nan - val_accuracy: 0.7771\n",
549
- "Epoch 20/20\n",
550
- "108/108 [==============================] - 9s 85ms/step - loss: 0.6572 - accuracy: 0.7553 - val_loss: nan - val_accuracy: 0.7560\n"
551
- ]
552
- },
553
- {
554
- "data": {
555
- "text/plain": [
556
- "<keras.callbacks.History at 0x2573f78f0a0>"
557
- ]
558
- },
559
- "execution_count": 13,
560
- "metadata": {},
561
- "output_type": "execute_result"
562
- }
563
- ],
564
- "source": [
565
- "def bd_model(input_shape, output_sequence_length, english_vocab_size, french_vocab_size):\n",
566
- " \n",
567
- " #Hyperparameters\n",
568
- " learning_rate = 0.005\n",
569
- " \n",
570
- " # Build the layers\n",
571
- " model = Sequential()\n",
572
- " model.add(Bidirectional(GRU(128, return_sequences=True), input_shape=input_shape[1:]))\n",
573
- " model.add(TimeDistributed(Dense(1024, activation='relu')))\n",
574
- " model.add(Dropout(0.5))\n",
575
- " model.add(TimeDistributed(Dense(french_vocab_size, activation='softmax')))\n",
576
- " \n",
577
- " # Compile model\n",
578
- " model.compile(loss = sparse_categorical_crossentropy,\n",
579
- " optimizer = Adam(learning_rate),\n",
580
- " metrics = ['accuracy'])\n",
581
- " \n",
582
- " return model\n",
583
- "\n",
584
- "tmp_x = pad(preproc_english_sentences, max_french_sequence_length)\n",
585
- "tmp_x = tmp_x.reshape((-1, preproc_french_sentences.shape[-2], 1))\n",
586
- "\n",
587
- "# Train the neural network\n",
588
- "bd_rnn_model = bd_model(\n",
589
- " tmp_x.shape,\n",
590
- " max_french_sequence_length,\n",
591
- " english_vocab_size,\n",
592
- " french_vocab_size)\n",
593
- "\n",
594
- "print(bd_rnn_model.summary())\n",
595
- "\n",
596
- "bd_rnn_model.fit(tmp_x, preproc_french_sentences, batch_size=1024, epochs=20, validation_split=0.2)"
597
- ]
598
- },
599
- {
600
- "cell_type": "code",
601
- "execution_count": 14,
602
- "metadata": {},
603
- "outputs": [
604
- {
605
- "name": "stdout",
606
- "output_type": "stream",
607
- "text": [
608
- "Prediciton:\n",
609
- "1/1 [==============================] - 1s 544ms/step\n",
610
- "new jersey est parfois chaud en mois et il et il est en en <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD>\n",
611
- "\n",
612
- "Correct Translation:\n",
613
- "[\"new jersey est parfois calme pendant l' automne , et il est neigeux en avril .\"]\n",
614
- "\n",
615
- "Original text:\n",
616
- "['new jersey is sometimes quiet during autumn , and it is snowy in april .']\n"
617
- ]
618
- }
619
- ],
620
- "source": [
621
- "# Print prediction(s)\n",
622
- "print(\"Prediciton:\")\n",
623
- "print(logits_to_text(bd_rnn_model.predict(tmp_x[:1])[0], french_tokenizer))\n",
624
- "\n",
625
- "print(\"\\nCorrect Translation:\")\n",
626
- "print(french_sentences[:1])\n",
627
- "\n",
628
- "print('\\nOriginal text:')\n",
629
- "print(english_sentences[:1])"
630
- ]
631
- },
632
- {
633
- "cell_type": "markdown",
634
- "metadata": {},
635
- "source": [
636
- "### Model 3: Embedding\n",
637
- "![RNN](images/embedding-words.png)\n",
638
- "You've turned the words into ids, but there's a better representation of a word. This is called word embeddings. An embedding is a vector representation of the word that is close to similar words in n-dimensional space, where the n represents the size of the embedding vectors."
639
- ]
640
- },
641
- {
642
- "cell_type": "code",
643
- "execution_count": 15,
644
- "metadata": {},
645
- "outputs": [
646
- {
647
- "name": "stdout",
648
- "output_type": "stream",
649
- "text": [
650
- "Model: \"sequential_2\"\n",
651
- "_________________________________________________________________\n",
652
- " Layer (type) Output Shape Param # \n",
653
- "=================================================================\n",
654
- " embedding (Embedding) (None, 21, 256) 50944 \n",
655
- " \n",
656
- " bidirectional_1 (Bidirectio (None, 21, 512) 789504 \n",
657
- " nal) \n",
658
- " \n",
659
- " time_distributed_4 (TimeDis (None, 21, 1024) 525312 \n",
660
- " tributed) \n",
661
- " \n",
662
- " dropout_2 (Dropout) (None, 21, 1024) 0 \n",
663
- " \n",
664
- " time_distributed_5 (TimeDis (None, 21, 344) 352600 \n",
665
- " tributed) \n",
666
- " \n",
667
- "=================================================================\n",
668
- "Total params: 1,718,360\n",
669
- "Trainable params: 1,718,360\n",
670
- "Non-trainable params: 0\n",
671
- "_________________________________________________________________\n",
672
- "None\n",
673
- "Epoch 1/20\n",
674
- "108/108 [==============================] - 17s 130ms/step - loss: 1.3473 - accuracy: 0.6924 - val_loss: nan - val_accuracy: 0.8697\n",
675
- "Epoch 2/20\n",
676
- "108/108 [==============================] - 14s 126ms/step - loss: 0.3152 - accuracy: 0.9003 - val_loss: nan - val_accuracy: 0.9346\n",
677
- "Epoch 3/20\n",
678
- "108/108 [==============================] - 14s 126ms/step - loss: 0.1808 - accuracy: 0.9434 - val_loss: nan - val_accuracy: 0.9578\n",
679
- "Epoch 4/20\n",
680
- "108/108 [==============================] - 14s 126ms/step - loss: 0.1291 - accuracy: 0.9601 - val_loss: nan - val_accuracy: 0.9702\n",
681
- "Epoch 5/20\n",
682
- "108/108 [==============================] - 14s 126ms/step - loss: 0.1022 - accuracy: 0.9688 - val_loss: nan - val_accuracy: 0.9737\n",
683
- "Epoch 6/20\n",
684
- "108/108 [==============================] - 14s 126ms/step - loss: 0.0854 - accuracy: 0.9739 - val_loss: nan - val_accuracy: 0.9772\n",
685
- "Epoch 7/20\n",
686
- "108/108 [==============================] - 14s 126ms/step - loss: 0.0763 - accuracy: 0.9767 - val_loss: nan - val_accuracy: 0.9780\n",
687
- "Epoch 8/20\n",
688
- "108/108 [==============================] - 14s 127ms/step - loss: 0.0658 - accuracy: 0.9798 - val_loss: nan - val_accuracy: 0.9798\n",
689
- "Epoch 9/20\n",
690
- "108/108 [==============================] - 14s 127ms/step - loss: 0.0604 - accuracy: 0.9815 - val_loss: nan - val_accuracy: 0.9816\n",
691
- "Epoch 10/20\n",
692
- "108/108 [==============================] - 14s 126ms/step - loss: 0.0572 - accuracy: 0.9825 - val_loss: nan - val_accuracy: 0.9823\n",
693
- "Epoch 11/20\n",
694
- "108/108 [==============================] - 14s 127ms/step - loss: 0.0511 - accuracy: 0.9842 - val_loss: nan - val_accuracy: 0.9836\n",
695
- "Epoch 12/20\n",
696
- "108/108 [==============================] - 14s 126ms/step - loss: 0.0519 - accuracy: 0.9840 - val_loss: nan - val_accuracy: 0.9839\n",
697
- "Epoch 13/20\n",
698
- "108/108 [==============================] - 14s 126ms/step - loss: 0.0494 - accuracy: 0.9850 - val_loss: nan - val_accuracy: 0.9813\n",
699
- "Epoch 14/20\n",
700
- "108/108 [==============================] - 14s 127ms/step - loss: 0.0499 - accuracy: 0.9847 - val_loss: nan - val_accuracy: 0.9837\n",
701
- "Epoch 15/20\n",
702
- "108/108 [==============================] - 14s 126ms/step - loss: 0.0471 - accuracy: 0.9856 - val_loss: nan - val_accuracy: 0.9838\n",
703
- "Epoch 16/20\n",
704
- "108/108 [==============================] - 14s 126ms/step - loss: 0.0411 - accuracy: 0.9874 - val_loss: nan - val_accuracy: 0.9843\n",
705
- "Epoch 17/20\n",
706
- "108/108 [==============================] - 14s 126ms/step - loss: 0.0378 - accuracy: 0.9883 - val_loss: nan - val_accuracy: 0.9851\n",
707
- "Epoch 18/20\n",
708
- "108/108 [==============================] - 14s 126ms/step - loss: 0.0368 - accuracy: 0.9886 - val_loss: nan - val_accuracy: 0.9853\n",
709
- "Epoch 19/20\n",
710
- "108/108 [==============================] - 14s 126ms/step - loss: 0.0386 - accuracy: 0.9883 - val_loss: nan - val_accuracy: 0.9845\n",
711
- "Epoch 20/20\n",
712
- "108/108 [==============================] - 14s 126ms/step - loss: 0.0459 - accuracy: 0.9863 - val_loss: nan - val_accuracy: 0.9843\n"
713
- ]
714
- },
715
- {
716
- "data": {
717
- "text/plain": [
718
- "<keras.callbacks.History at 0x2573f340d90>"
719
- ]
720
- },
721
- "execution_count": 15,
722
- "metadata": {},
723
- "output_type": "execute_result"
724
- }
725
- ],
726
- "source": [
727
- "def bidirectional_embed_model(input_shape, output_sequence_length, english_vocab_size, french_vocab_size):\n",
728
- " \n",
729
- " # Hyperparameters\n",
730
- " learning_rate = 0.005\n",
731
- " \n",
732
- " # Build the layers\n",
733
- " model = Sequential()\n",
734
- " model.add(Embedding(english_vocab_size, 256, input_length=input_shape[1], input_shape=input_shape[1:]))\n",
735
- " model.add(Bidirectional(GRU(256, return_sequences=True)))\n",
736
- " model.add(TimeDistributed(Dense(1024, activation='relu')))\n",
737
- " model.add(Dropout(0.5))\n",
738
- " model.add(TimeDistributed(Dense(french_vocab_size, activation='softmax')))\n",
739
- " \n",
740
- " # Compile model\n",
741
- " model.compile(loss = sparse_categorical_crossentropy,\n",
742
- " optimizer = Adam(learning_rate),\n",
743
- " metrics = ['accuracy'])\n",
744
- " \n",
745
- " return model\n",
746
- "\n",
747
- "tmp_x = pad(preproc_english_sentences, max_french_sequence_length)\n",
748
- "tmp_x = tmp_x.reshape((-1, preproc_french_sentences.shape[-2]))\n",
749
- "\n",
750
- "# Build the model\n",
751
- "embed_rnn_model = bidirectional_embed_model(\n",
752
- " tmp_x.shape,\n",
753
- " max_french_sequence_length,\n",
754
- " english_vocab_size,\n",
755
- " french_vocab_size)\n",
756
- "\n",
757
- "print(embed_rnn_model.summary())\n",
758
- "\n",
759
- "embed_rnn_model.fit(tmp_x, preproc_french_sentences, batch_size=1024, epochs=20, validation_split=0.2)\n",
760
- " "
761
- ]
762
- },
763
- {
764
- "cell_type": "code",
765
- "execution_count": 16,
766
- "metadata": {},
767
- "outputs": [
768
- {
769
- "name": "stdout",
770
- "output_type": "stream",
771
- "text": [
772
- "Prediciton:\n",
773
- "1/1 [==============================] - 0s 410ms/step\n",
774
- "new jersey est parfois calme pendant l' automne et il est neigeux en avril <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD>\n",
775
- "\n",
776
- "Correct Translation:\n",
777
- "[\"new jersey est parfois calme pendant l' automne , et il est neigeux en avril .\"]\n",
778
- "\n",
779
- "Original text:\n",
780
- "['new jersey is sometimes quiet during autumn , and it is snowy in april .']\n"
781
- ]
782
- }
783
- ],
784
- "source": [
785
- "# Print prediction(s)\n",
786
- "print(\"Prediciton:\")\n",
787
- "print(logits_to_text(embed_rnn_model.predict(tmp_x[:1])[0], french_tokenizer))\n",
788
- "\n",
789
- "print(\"\\nCorrect Translation:\")\n",
790
- "print(french_sentences[:1])\n",
791
- "\n",
792
- "print('\\nOriginal text:')\n",
793
- "print(english_sentences[:1])"
794
- ]
795
- },
796
- {
797
- "cell_type": "code",
798
- "execution_count": 17,
799
- "metadata": {},
800
- "outputs": [
801
- {
802
- "name": "stderr",
803
- "output_type": "stream",
804
- "text": [
805
- "WARNING:absl:Found untraced functions such as gru_cell_5_layer_call_fn, gru_cell_5_layer_call_and_return_conditional_losses, gru_cell_6_layer_call_fn, gru_cell_6_layer_call_and_return_conditional_losses while saving (showing 4 of 4). These functions will not be directly callable after loading.\n"
806
- ]
807
- },
808
- {
809
- "name": "stdout",
810
- "output_type": "stream",
811
- "text": [
812
- "INFO:tensorflow:Assets written to: english_to_french_model\\assets\n"
813
- ]
814
- },
815
- {
816
- "name": "stderr",
817
- "output_type": "stream",
818
- "text": [
819
- "INFO:tensorflow:Assets written to: english_to_french_model\\assets\n"
820
- ]
821
- }
822
- ],
823
- "source": [
824
- "embed_rnn_model.save('english_to_french_model')\n",
825
- "# Serialize English Tokenizer to JSON\n",
826
- "with open('english_tokenizer.json', 'w', encoding='utf8') as f:\n",
827
- " f.write(json.dumps(english_tokenizer.to_json(), ensure_ascii=False))\n",
828
- " \n",
829
- "# Serialize French Tokenizer to JSON\n",
830
- "with open('french_tokenizer.json', 'w', encoding='utf8') as f:\n",
831
- " f.write(json.dumps(french_tokenizer.to_json(), ensure_ascii=False))\n",
832
- " \n",
833
- "# Save max lengths\n",
834
- "max_french_sequence_length_json = max_french_sequence_length\n",
835
- "with open('sequence_length.json', 'w', encoding='utf8') as f:\n",
836
- " f.write(json.dumps(max_french_sequence_length_json, ensure_ascii=False))"
837
- ]
838
- },
839
- {
840
- "cell_type": "code",
841
- "execution_count": null,
842
- "metadata": {},
843
- "outputs": [],
844
- "source": []
845
- }
846
- ],
847
- "metadata": {
848
- "kernelspec": {
849
- "display_name": "Python 3",
850
- "language": "python",
851
- "name": "python3"
852
- },
853
- "language_info": {
854
- "codemirror_mode": {
855
- "name": "ipython",
856
- "version": 3
857
- },
858
- "file_extension": ".py",
859
- "mimetype": "text/x-python",
860
- "name": "python",
861
- "nbconvert_exporter": "python",
862
- "pygments_lexer": "ipython3",
863
- "version": "3.9.19"
864
- }
865
- },
866
- "nbformat": 4,
867
- "nbformat_minor": 1
868
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Task 2/Eng_Spanish.ipynb DELETED
@@ -1,782 +0,0 @@
1
- {
2
- "cells": [
3
- {
4
- "cell_type": "markdown",
5
- "metadata": {},
6
- "source": [
7
- "# Machine Translation Project (English to Spanish)"
8
- ]
9
- },
10
- {
11
- "cell_type": "code",
12
- "execution_count": 1,
13
- "metadata": {},
14
- "outputs": [],
15
- "source": [
16
- "import pathlib\n",
17
- "import random\n",
18
- "import string\n",
19
- "import tensorflow.strings as tf_strings\n",
20
- "import tensorflow.data as tf_data\n",
21
- "import re\n",
22
- "from keras.layers import TextVectorization\n",
23
- "import keras\n",
24
- "import tensorflow as tf\n",
25
- "from keras import layers\n",
26
- "import json"
27
- ]
28
- },
29
- {
30
- "cell_type": "markdown",
31
- "metadata": {},
32
- "source": [
33
- "### Verify access to the GPU"
34
- ]
35
- },
36
- {
37
- "cell_type": "code",
38
- "execution_count": 2,
39
- "metadata": {},
40
- "outputs": [
41
- {
42
- "name": "stdout",
43
- "output_type": "stream",
44
- "text": [
45
- "[name: \"/device:CPU:0\"\n",
46
- "device_type: \"CPU\"\n",
47
- "memory_limit: 268435456\n",
48
- "locality {\n",
49
- "}\n",
50
- "incarnation: 16791471205212918184\n",
51
- "xla_global_id: -1\n",
52
- ", name: \"/device:GPU:0\"\n",
53
- "device_type: \"GPU\"\n",
54
- "memory_limit: 1733715559\n",
55
- "locality {\n",
56
- " bus_id: 1\n",
57
- " links {\n",
58
- " }\n",
59
- "}\n",
60
- "incarnation: 6643307082616730570\n",
61
- "physical_device_desc: \"device: 0, name: NVIDIA GeForce RTX 2050, pci bus id: 0000:01:00.0, compute capability: 8.6\"\n",
62
- "xla_global_id: 416903419\n",
63
- "]\n"
64
- ]
65
- }
66
- ],
67
- "source": [
68
- "from tensorflow.python.client import device_lib\n",
69
- "print(device_lib.list_local_devices())"
70
- ]
71
- },
72
- {
73
- "cell_type": "markdown",
74
- "metadata": {},
75
- "source": [
76
- "### Download and prepare the data\n",
77
- "source :\"http://storage.googleapis.com/download.tensorflow.org/data/spa-eng.zip\""
78
- ]
79
- },
80
- {
81
- "cell_type": "code",
82
- "execution_count": 3,
83
- "metadata": {},
84
- "outputs": [
85
- {
86
- "name": "stdout",
87
- "output_type": "stream",
88
- "text": [
89
- "Downloading data from http://storage.googleapis.com/download.tensorflow.org/data/spa-eng.zip\n",
90
- "2638744/2638744 [==============================] - 11s 4us/step\n"
91
- ]
92
- }
93
- ],
94
- "source": [
95
- "text_file = keras.utils.get_file(\n",
96
- " fname = \"spa-eng.zip\",\n",
97
- " origin = \"http://storage.googleapis.com/download.tensorflow.org/data/spa-eng.zip\",\n",
98
- " extract = True,\n",
99
- ")\n",
100
- "\n",
101
- "text_file = pathlib.Path(text_file).parent / \"spa-eng\" / \"spa.txt\"\n",
102
- "\n",
103
- "with open(text_file, \"r\") as f:\n",
104
- " lines = f.read().split(\"\\n\")[:-1]\n",
105
- " \n",
106
- "text_pairs = []\n",
107
- "\n",
108
- "for line in lines:\n",
109
- " eng, spa = line.split(\"\\t\")\n",
110
- " spa = \"[start] \" + spa + \" [end]\"\n",
111
- " text_pairs.append((eng, spa))"
112
- ]
113
- },
114
- {
115
- "cell_type": "code",
116
- "execution_count": 4,
117
- "metadata": {},
118
- "outputs": [],
119
- "source": [
120
- "random.shuffle(text_pairs)"
121
- ]
122
- },
123
- {
124
- "cell_type": "code",
125
- "execution_count": 5,
126
- "metadata": {},
127
- "outputs": [
128
- {
129
- "name": "stdout",
130
- "output_type": "stream",
131
- "text": [
132
- "('Please remind me to phone him tomorrow.', '[start] Por favor, recordadme que le llame mañana. [end]')\n",
133
- "('These apples taste good.', '[start] Estas manzanas están buenas. [end]')\n",
134
- "('Tom is on his own now.', '[start] Tom es independiente ahora. [end]')\n",
135
- "('Hey, you want to have a lot of fun? Come with us.', '[start] Oye, ¿quieres entretenerte?, acompáñanos. [end]')\n",
136
- "('You have to remain detached.', '[start] Tú tienes que permanecer independiente. [end]')\n"
137
- ]
138
- }
139
- ],
140
- "source": [
141
- "for i in range(5):\n",
142
- " print(text_pairs[i])"
143
- ]
144
- },
145
- {
146
- "cell_type": "markdown",
147
- "metadata": {},
148
- "source": [
149
- "Structure of the Dataset"
150
- ]
151
- },
152
- {
153
- "cell_type": "code",
154
- "execution_count": 6,
155
- "metadata": {},
156
- "outputs": [
157
- {
158
- "name": "stdout",
159
- "output_type": "stream",
160
- "text": [
161
- "118964 total pairs\n",
162
- "83276 training pairs\n",
163
- "17844 validation pairs\n",
164
- "17844 test pairs\n"
165
- ]
166
- }
167
- ],
168
- "source": [
169
- "num_val_samples = int(0.15 * len(text_pairs))\n",
170
- "num_train_samples = len(text_pairs) - 2 * num_val_samples\n",
171
- "train_pairs = text_pairs[:num_train_samples]\n",
172
- "val_pairs = text_pairs[num_train_samples:num_train_samples + num_val_samples]\n",
173
- "test_pairs = text_pairs[num_train_samples + num_val_samples:]\n",
174
- "\n",
175
- "print(f\"{len(text_pairs)} total pairs\")\n",
176
- "print(f\"{len(train_pairs)} training pairs\")\n",
177
- "print(f\"{len(val_pairs)} validation pairs\")\n",
178
- "print(f\"{len(test_pairs)} test pairs\")"
179
- ]
180
- },
181
- {
182
- "cell_type": "code",
183
- "execution_count": 7,
184
- "metadata": {},
185
- "outputs": [],
186
- "source": [
187
- "# parameters\n",
188
- "strip_chars = string.punctuation + \"¿\"\n",
189
- "strip_chars = strip_chars.replace(\"[\", \"\")\n",
190
- "strip_chars = strip_chars.replace(\"]\", \"\")\n",
191
- "\n",
192
- "vocab_size = 15000\n",
193
- "sequence_length = 20\n",
194
- "batch_size = 64"
195
- ]
196
- },
197
- {
198
- "cell_type": "markdown",
199
- "metadata": {},
200
- "source": [
201
- "## Vectorize the data"
202
- ]
203
- },
204
- {
205
- "cell_type": "code",
206
- "execution_count": 8,
207
- "metadata": {},
208
- "outputs": [],
209
- "source": [
210
- "def custom_standardization(input_string):\n",
211
- " lowercase = tf_strings.lower(input_string)\n",
212
- " return tf_strings.regex_replace(lowercase, f\"[{re.escape(strip_chars)}]\", \"\")\n",
213
- "\n",
214
- "# vectorization\n",
215
- "eng_vectorization = TextVectorization(\n",
216
- " max_tokens = vocab_size,\n",
217
- " output_mode = \"int\",\n",
218
- " output_sequence_length = sequence_length,\n",
219
- ")\n",
220
- "\n",
221
- "spa_vectorization = TextVectorization(\n",
222
- " max_tokens = vocab_size,\n",
223
- " output_mode = \"int\",\n",
224
- " output_sequence_length = sequence_length + 1,\n",
225
- " standardize = custom_standardization,\n",
226
- ")\n",
227
- "\n",
228
- "train_eng_texts = [pair[0] for pair in train_pairs]\n",
229
- "train_spa_texts = [pair[1] for pair in train_pairs]\n",
230
- "\n",
231
- "eng_vectorization.adapt(train_eng_texts)\n",
232
- "spa_vectorization.adapt(train_spa_texts)\n",
233
- "\n",
234
- "#save the vectorization layers\n",
235
- "eng_vectorization_config = eng_vectorization.get_config()\n",
236
- "eng_vectorization_config.pop('standardize', None)\n",
237
- "eng_vocab = eng_vectorization.get_vocabulary()\n",
238
- "with open('eng_vectorization_config.json', 'w', encoding='utf-8') as f:\n",
239
- " json.dump(eng_vectorization_config, f)\n",
240
- " \n",
241
- "with open('eng_vocab.json', 'w', encoding='utf-8') as f:\n",
242
- " json.dump(eng_vocab, f)\n",
243
- " \n",
244
- "spa_vectorization_config = spa_vectorization.get_config()\n",
245
- "spa_vectorization_config.pop('standardize', None)\n",
246
- "spa_vocab = spa_vectorization.get_vocabulary()\n",
247
- "with open('spa_vectorization_config.json', 'w', encoding='utf-8') as f:\n",
248
- " json.dump(spa_vectorization_config, f)\n",
249
- " \n",
250
- "with open('spa_vocab.json', 'w', encoding='utf-8') as f:\n",
251
- " json.dump(spa_vocab, f)\n",
252
- " \n",
253
- "\n",
254
- "def format_dataset(eng, spa):\n",
255
- " eng = eng_vectorization(eng)\n",
256
- " spa = spa_vectorization(spa)\n",
257
- " return (\n",
258
- " {\n",
259
- " \"encoder_inputs\": eng,\n",
260
- " \"decoder_inputs\": spa[:, :-1],\n",
261
- " },\n",
262
- " spa[:, 1:],\n",
263
- " )\n",
264
- " \n",
265
- "def make_dataset(pairs):\n",
266
- " eng_texts, spa_texts = zip(*pairs)\n",
267
- " eng_texts = list(eng_texts)\n",
268
- " spa_texts = list(spa_texts)\n",
269
- " dataset = tf_data.Dataset.from_tensor_slices((eng_texts, spa_texts))\n",
270
- " dataset = dataset.batch(batch_size)\n",
271
- " dataset = dataset.map(format_dataset)\n",
272
- " return dataset.cache().shuffle(2048).prefetch(16)\n",
273
- "\n",
274
- "train_ds = make_dataset(train_pairs)\n",
275
- "val_ds = make_dataset(val_pairs)\n",
276
- " "
277
- ]
278
- },
279
- {
280
- "cell_type": "code",
281
- "execution_count": 9,
282
- "metadata": {},
283
- "outputs": [
284
- {
285
- "name": "stdout",
286
- "output_type": "stream",
287
- "text": [
288
- "(64, 20)\n",
289
- "(64, 20)\n"
290
- ]
291
- }
292
- ],
293
- "source": [
294
- "for inputs,targets in train_ds.take(1):\n",
295
- " print(inputs[\"encoder_inputs\"].shape)\n",
296
- " print(targets.shape)"
297
- ]
298
- },
299
- {
300
- "cell_type": "markdown",
301
- "metadata": {},
302
- "source": [
303
- "### Model Architecture\n",
304
- "![Encoder-Decoder](images/encoder-decoder-context.png)\n",
305
- "![Encoder-Decoder](images/encoder-decoder-translation.png)\n",
306
- "![Attention Mechanism](images/attention.png)"
307
- ]
308
- },
309
- {
310
- "cell_type": "code",
311
- "execution_count": 10,
312
- "metadata": {},
313
- "outputs": [],
314
- "source": [
315
- "# Creating an Encoder\n",
316
- "class TransformerEncoder(layers.Layer):\n",
317
- " def __init__(self, embed_dim, dense_dim, num_heads, **kwargs):\n",
318
- " super().__init__(**kwargs)\n",
319
- " self.embed_dim = embed_dim\n",
320
- " self.dense_dim = dense_dim\n",
321
- " self.num_heads = num_heads\n",
322
- " self.attention = layers.MultiHeadAttention(\n",
323
- " num_heads = num_heads, key_dim = embed_dim\n",
324
- " )\n",
325
- " self.dense_proj = keras.Sequential(\n",
326
- " [\n",
327
- " layers.Dense(dense_dim, activation = \"relu\"),\n",
328
- " layers.Dense(embed_dim),\n",
329
- " ]\n",
330
- " )\n",
331
- " self.layernorm_1 = layers.LayerNormalization()\n",
332
- " self.layernorm_2 = layers.LayerNormalization()\n",
333
- " self.supports_masking = True\n",
334
- " \n",
335
- " def call(self, inputs, mask=None):\n",
336
- " if mask is not None:\n",
337
- " padding_mask = tf.cast(mask[:, None, :], dtype = tf.int32)\n",
338
- " else:\n",
339
- " padding_mask = None\n",
340
- " \n",
341
- " attention_output = self.attention(\n",
342
- " query = inputs,\n",
343
- " value = inputs,\n",
344
- " key = inputs,\n",
345
- " attention_mask = padding_mask,\n",
346
- " )\n",
347
- " proj_input = self.layernorm_1(inputs + attention_output)\n",
348
- " proj_output = self.dense_proj(proj_input)\n",
349
- " return self.layernorm_2(proj_input + proj_output)\n",
350
- " \n",
351
- " def get_config(self):\n",
352
- " config = super().get_config()\n",
353
- " config.update({\n",
354
- " \"embed_dim\": self.embed_dim,\n",
355
- " \"dense_dim\": self.dense_dim,\n",
356
- " \"num_heads\": self.num_heads,\n",
357
- " })\n",
358
- " return config\n",
359
- " \n",
360
- "# Creating a Positional Embedding\n",
361
- "class PositionalEmbedding(layers.Layer):\n",
362
- " def __init__(self, sequence_length, vocab_size, embed_dim, **kwargs):\n",
363
- " super().__init__(**kwargs)\n",
364
- " self.token_embeddings = layers.Embedding(\n",
365
- " input_dim = vocab_size, output_dim = embed_dim\n",
366
- " )\n",
367
- " self.position_embeddings = layers.Embedding(\n",
368
- " input_dim = sequence_length, output_dim = embed_dim\n",
369
- " )\n",
370
- " self.sequence_length = sequence_length\n",
371
- " self.vocab_size = vocab_size\n",
372
- " self.embed_dim = embed_dim\n",
373
- " \n",
374
- " def call(self, inputs):\n",
375
- " length = tf.shape(inputs)[-1]\n",
376
- " positions = tf.range(start = 0, limit = length, delta = 1)\n",
377
- " embedded_tokens = self.token_embeddings(inputs)\n",
378
- " embedded_positions = self.position_embeddings(positions)\n",
379
- " return embedded_tokens + embedded_positions\n",
380
- " \n",
381
- " def compute_mask(self, inputs, mask=None):\n",
382
- " if mask is not None:\n",
383
- " return tf.not_equal(inputs, 0)\n",
384
- " else:\n",
385
- " return None\n",
386
- " \n",
387
- " def get_config(self):\n",
388
- " config = super().get_config()\n",
389
- " config.update({\n",
390
- " \"vocab_size\": self.vocab_size,\n",
391
- " \"sequence_length\": self.sequence_length,\n",
392
- " \"embed_dim\": self.embed_dim,\n",
393
- " })\n",
394
- " return config\n",
395
- " \n",
396
- "# Creating a Decoder\n",
397
- "class TransformerDecoder(layers.Layer):\n",
398
- " def __init__(self, embed_dim, latent_dim, num_heads, **kwargs):\n",
399
- " super().__init__(**kwargs)\n",
400
- " self.embed_dim = embed_dim\n",
401
- " self.latent_dim = latent_dim\n",
402
- " self.num_heads = num_heads\n",
403
- " self.attention_1 = layers.MultiHeadAttention(\n",
404
- " num_heads = num_heads, key_dim = embed_dim\n",
405
- " )\n",
406
- " self.attention_2 = layers.MultiHeadAttention(\n",
407
- " num_heads = num_heads, key_dim = embed_dim\n",
408
- " )\n",
409
- " self.dense_proj = keras.Sequential(\n",
410
- " [\n",
411
- " layers.Dense(latent_dim, activation = \"relu\"),\n",
412
- " layers.Dense(embed_dim),\n",
413
- " ]\n",
414
- " )\n",
415
- " self.layernorm_1 = layers.LayerNormalization()\n",
416
- " self.layernorm_2 = layers.LayerNormalization()\n",
417
- " self.layernorm_3 = layers.LayerNormalization()\n",
418
- " self.supports_masking = True\n",
419
- " \n",
420
- " def call(self, inputs, encoder_outputs, mask=None):\n",
421
- " casual_mask = self.get_causal_attention_mask(inputs)\n",
422
- " if mask is not None:\n",
423
- " padding_mask = tf.cast(mask[:, None, :], dtype = tf.int32)\n",
424
- " padding_mask = tf.minimum(padding_mask, casual_mask)\n",
425
- " else:\n",
426
- " padding_mask = None\n",
427
- " \n",
428
- " attention_output_1 = self.attention_1(\n",
429
- " query = inputs,\n",
430
- " value = inputs,\n",
431
- " key = inputs,\n",
432
- " attention_mask = casual_mask,\n",
433
- " )\n",
434
- " out_1 = self.layernorm_1(inputs + attention_output_1)\n",
435
- " \n",
436
- " attention_output_2 = self.attention_2(\n",
437
- " query = out_1,\n",
438
- " value = encoder_outputs,\n",
439
- " key = encoder_outputs,\n",
440
- " attention_mask = padding_mask,\n",
441
- " )\n",
442
- " \n",
443
- " out_2 = self.layernorm_2(out_1 + attention_output_2)\n",
444
- " proj_output = self.dense_proj(out_2)\n",
445
- " \n",
446
- " return self.layernorm_3(out_2 + proj_output)\n",
447
- " \n",
448
- " def get_causal_attention_mask(self, inputs):\n",
449
- " input_shape = tf.shape(inputs)\n",
450
- " batch_size, sequence_length = input_shape[0], input_shape[1]\n",
451
- " i = tf.range(sequence_length)[:, None]\n",
452
- " j = tf.range(sequence_length)\n",
453
- " mask = tf.cast(i >= j, tf.int32)\n",
454
- " mask = tf.reshape(mask,(1, input_shape[1], input_shape[1]))\n",
455
- " mult = tf.concat(\n",
456
- " [\n",
457
- " tf.expand_dims(batch_size, -1),\n",
458
- " tf.convert_to_tensor([1, 1]),\n",
459
- " ],\n",
460
- " axis = 0,\n",
461
- " )\n",
462
- " return tf.tile(mask, mult)\n",
463
- " \n",
464
- " def get_config(self):\n",
465
- " config = super().get_config()\n",
466
- " config.update({\n",
467
- " \"embed_dim\": self.embed_dim,\n",
468
- " \"latent_dim\": self.latent_dim,\n",
469
- " \"num_heads\": self.num_heads,\n",
470
- " })\n",
471
- " return config\n"
472
- ]
473
- },
474
- {
475
- "cell_type": "code",
476
- "execution_count": 11,
477
- "metadata": {},
478
- "outputs": [],
479
- "source": [
480
- "# define emmbedding dimensions, latent dimensions, and number of heads\n",
481
- "embed_dim = 256\n",
482
- "latent_dim = 2048\n",
483
- "num_heads = 8\n",
484
- "\n",
485
- "#Encoder\n",
486
- "encoder_inputs = keras.Input(shape = (None,), dtype = \"int64\", name = \"encoder_inputs\")\n",
487
- "\n",
488
- "x = PositionalEmbedding(sequence_length, vocab_size, embed_dim)(encoder_inputs)\n",
489
- "\n",
490
- "encoder_outputs = TransformerEncoder(embed_dim, latent_dim, num_heads)(x)\n",
491
- "\n",
492
- "encoder = keras.Model(encoder_inputs, encoder_outputs, name = \"encoder\")\n",
493
- "\n",
494
- "#Decoder\n",
495
- "decoder_inputs = keras.Input(shape = (None,), dtype = \"int64\", name = \"decoder_inputs\")\n",
496
- "encoder_seq_inputs = keras.Input(shape = (None, embed_dim), name = \"encoder_seq_inputs\")\n",
497
- "\n",
498
- "x = PositionalEmbedding(sequence_length, vocab_size, embed_dim)(decoder_inputs)\n",
499
- "\n",
500
- "x = TransformerDecoder(embed_dim, latent_dim, num_heads)(x, encoder_seq_inputs)\n",
501
- "\n",
502
- "x = layers.Dropout(0.5)(x)\n",
503
- "\n",
504
- "decoder_outputs = layers.Dense(vocab_size, activation = \"softmax\")(x)\n",
505
- "\n",
506
- "decoder = keras.Model([decoder_inputs, encoder_seq_inputs], decoder_outputs, name = \"decoder\")\n",
507
- "\n",
508
- "# Define the final model\n",
509
- "decoder_outputs = decoder([decoder_inputs, encoder_outputs])\n",
510
- "\n",
511
- "transformer = keras.Model(\n",
512
- " [encoder_inputs, decoder_inputs], decoder_outputs, name = \"transformer\"\n",
513
- ")\n"
514
- ]
515
- },
516
- {
517
- "cell_type": "code",
518
- "execution_count": 12,
519
- "metadata": {},
520
- "outputs": [
521
- {
522
- "name": "stdout",
523
- "output_type": "stream",
524
- "text": [
525
- "Model: \"transformer\"\n",
526
- "__________________________________________________________________________________________________\n",
527
- " Layer (type) Output Shape Param # Connected to \n",
528
- "==================================================================================================\n",
529
- " encoder_inputs (InputLayer) [(None, None)] 0 [] \n",
530
- " \n",
531
- " positional_embedding (Position (None, None, 256) 3845120 ['encoder_inputs[0][0]'] \n",
532
- " alEmbedding) \n",
533
- " \n",
534
- " decoder_inputs (InputLayer) [(None, None)] 0 [] \n",
535
- " \n",
536
- " transformer_encoder (Transform (None, None, 256) 3155456 ['positional_embedding[0][0]'] \n",
537
- " erEncoder) \n",
538
- " \n",
539
- " decoder (Functional) (None, None, 15000) 12959640 ['decoder_inputs[0][0]', \n",
540
- " 'transformer_encoder[0][0]'] \n",
541
- " \n",
542
- "==================================================================================================\n",
543
- "Total params: 19,960,216\n",
544
- "Trainable params: 19,960,216\n",
545
- "Non-trainable params: 0\n",
546
- "__________________________________________________________________________________________________\n",
547
- "Epoch 1/20\n",
548
- "1302/1302 [==============================] - 132s 95ms/step - loss: 1.9854 - accuracy: 0.7200 - val_loss: 1.7019 - val_accuracy: 0.7395\n",
549
- "Epoch 2/20\n",
550
- "1302/1302 [==============================] - 123s 94ms/step - loss: 1.7172 - accuracy: 0.7499 - val_loss: 1.5479 - val_accuracy: 0.7610\n",
551
- "Epoch 3/20\n",
552
- "1302/1302 [==============================] - 123s 95ms/step - loss: 1.5636 - accuracy: 0.7712 - val_loss: 1.4142 - val_accuracy: 0.7838\n",
553
- "Epoch 4/20\n",
554
- "1302/1302 [==============================] - 124s 96ms/step - loss: 1.4229 - accuracy: 0.7906 - val_loss: 1.2981 - val_accuracy: 0.8027\n",
555
- "Epoch 5/20\n",
556
- "1302/1302 [==============================] - 125s 96ms/step - loss: 1.3049 - accuracy: 0.8079 - val_loss: 1.2207 - val_accuracy: 0.8173\n",
557
- "Epoch 6/20\n",
558
- "1302/1302 [==============================] - 125s 96ms/step - loss: 1.2307 - accuracy: 0.8232 - val_loss: 1.1703 - val_accuracy: 0.8294\n",
559
- "Epoch 7/20\n",
560
- "1302/1302 [==============================] - 125s 96ms/step - loss: 1.1852 - accuracy: 0.8348 - val_loss: 1.1304 - val_accuracy: 0.8376\n",
561
- "Epoch 8/20\n",
562
- "1302/1302 [==============================] - 124s 95ms/step - loss: 1.1455 - accuracy: 0.8431 - val_loss: 1.1064 - val_accuracy: 0.8436\n",
563
- "Epoch 9/20\n",
564
- "1302/1302 [==============================] - 124s 95ms/step - loss: 1.1154 - accuracy: 0.8496 - val_loss: 1.0878 - val_accuracy: 0.8461\n",
565
- "Epoch 10/20\n",
566
- "1302/1302 [==============================] - 125s 96ms/step - loss: 1.0901 - accuracy: 0.8545 - val_loss: 1.0737 - val_accuracy: 0.8498\n",
567
- "Epoch 11/20\n",
568
- "1302/1302 [==============================] - 124s 95ms/step - loss: 1.0690 - accuracy: 0.8583 - val_loss: 1.0697 - val_accuracy: 0.8472\n",
569
- "Epoch 12/20\n",
570
- "1302/1302 [==============================] - 124s 95ms/step - loss: 1.0495 - accuracy: 0.8616 - val_loss: 1.0458 - val_accuracy: 0.8543\n",
571
- "Epoch 13/20\n",
572
- "1302/1302 [==============================] - 124s 95ms/step - loss: 1.0332 - accuracy: 0.8648 - val_loss: 1.0387 - val_accuracy: 0.8548\n",
573
- "Epoch 14/20\n",
574
- "1302/1302 [==============================] - 123s 95ms/step - loss: 1.0180 - accuracy: 0.8673 - val_loss: 1.0458 - val_accuracy: 0.8550\n",
575
- "Epoch 15/20\n",
576
- "1302/1302 [==============================] - 125s 96ms/step - loss: 1.0036 - accuracy: 0.8695 - val_loss: 1.0303 - val_accuracy: 0.8569\n",
577
- "Epoch 16/20\n",
578
- "1302/1302 [==============================] - 125s 96ms/step - loss: 0.9891 - accuracy: 0.8720 - val_loss: 1.0184 - val_accuracy: 0.8586\n",
579
- "Epoch 17/20\n",
580
- "1302/1302 [==============================] - 125s 96ms/step - loss: 0.9779 - accuracy: 0.8738 - val_loss: 1.0313 - val_accuracy: 0.8567\n",
581
- "Epoch 18/20\n",
582
- "1302/1302 [==============================] - 126s 96ms/step - loss: 0.9668 - accuracy: 0.8754 - val_loss: 1.0106 - val_accuracy: 0.8614\n",
583
- "Epoch 19/20\n",
584
- "1302/1302 [==============================] - 125s 96ms/step - loss: 0.9543 - accuracy: 0.8772 - val_loss: 1.0144 - val_accuracy: 0.8598\n",
585
- "Epoch 20/20\n",
586
- "1302/1302 [==============================] - 124s 95ms/step - loss: 0.9415 - accuracy: 0.8791 - val_loss: 1.0139 - val_accuracy: 0.8617\n"
587
- ]
588
- },
589
- {
590
- "data": {
591
- "text/plain": [
592
- "<keras.callbacks.History at 0x24df73cf5e0>"
593
- ]
594
- },
595
- "execution_count": 12,
596
- "metadata": {},
597
- "output_type": "execute_result"
598
- }
599
- ],
600
- "source": [
601
- "epochs = 20\n",
602
- "\n",
603
- "transformer.summary()\n",
604
- "\n",
605
- "transformer.compile(\n",
606
- " \"rmsprop\", loss = \"sparse_categorical_crossentropy\", metrics = [\"accuracy\"]\n",
607
- ")\n",
608
- "\n",
609
- "transformer.fit(train_ds, epochs = epochs, validation_data = val_ds)"
610
- ]
611
- },
612
- {
613
- "cell_type": "code",
614
- "execution_count": 13,
615
- "metadata": {},
616
- "outputs": [
617
- {
618
- "name": "stderr",
619
- "output_type": "stream",
620
- "text": [
621
- "WARNING:absl:Found untraced functions such as embedding_layer_call_fn, embedding_layer_call_and_return_conditional_losses, embedding_1_layer_call_fn, embedding_1_layer_call_and_return_conditional_losses, multi_head_attention_layer_call_fn while saving (showing 5 of 60). These functions will not be directly callable after loading.\n"
622
- ]
623
- },
624
- {
625
- "name": "stdout",
626
- "output_type": "stream",
627
- "text": [
628
- "INFO:tensorflow:Assets written to: transformer_model\\assets\n"
629
- ]
630
- },
631
- {
632
- "name": "stderr",
633
- "output_type": "stream",
634
- "text": [
635
- "INFO:tensorflow:Assets written to: transformer_model\\assets\n"
636
- ]
637
- }
638
- ],
639
- "source": [
640
- "transformer.save(\"transformer_model\")"
641
- ]
642
- },
643
- {
644
- "cell_type": "code",
645
- "execution_count": 14,
646
- "metadata": {},
647
- "outputs": [
648
- {
649
- "name": "stdout",
650
- "output_type": "stream",
651
- "text": [
652
- "input: a few passengers went on board the plane\n",
653
- "translated: [start] unos dos te [UNK] en los años [end]\n",
654
- "\n",
655
- "input: i think shes an honest woman\n",
656
- "translated: [start] creo que es una mujer [UNK] [end]\n",
657
- "\n",
658
- "input: im old enough to do that on my own\n",
659
- "translated: [start] soy lo suficientemente viejo para hacer por mi [end]\n",
660
- "\n",
661
- "input: youre too drunk to drive\n",
662
- "translated: [start] eres demasiado de conducir [end]\n",
663
- "\n",
664
- "input: id like to go to hawaii\n",
665
- "translated: [start] quisiera ir a china [end]\n",
666
- "\n"
667
- ]
668
- }
669
- ],
670
- "source": [
671
- "spa_vocab = spa_vectorization.get_vocabulary()\n",
672
- "spa_index_lookup = dict(zip(range(len(spa_vocab)), spa_vocab))\n",
673
- "max_decoded_sentence_length = sequence_length\n",
674
- "\n",
675
- "def decode_sentence(input_sentence):\n",
676
- " tokenized_input_sentence = eng_vectorization([input_sentence])\n",
677
- " decoded_sentence = \"[start]\"\n",
678
- " for i in range(max_decoded_sentence_length):\n",
679
- " tokenized_target_sentence = spa_vectorization([decoded_sentence])[:, :-1]\n",
680
- " predictions = transformer([tokenized_input_sentence, tokenized_target_sentence])\n",
681
- " sampled_token_index = tf.argmax(predictions[0, i, :]).numpy().item(0)\n",
682
- " sampled_token = spa_index_lookup[sampled_token_index]\n",
683
- " decoded_sentence += \" \" + sampled_token\n",
684
- " if sampled_token == \"[end]\":\n",
685
- " break\n",
686
- " return decoded_sentence\n",
687
- "\n",
688
- "test_eng_texts = [pair[0] for pair in test_pairs]\n",
689
- "for _ in range(5):\n",
690
- " input_sentence = random.choice(test_eng_texts)\n",
691
- " input_sentence = input_sentence.lower()\n",
692
- " input_sentence = input_sentence.translate(str.maketrans('', '', strip_chars))\n",
693
- " translated = decode_sentence(input_sentence)\n",
694
- " print(f\"input: {input_sentence}\")\n",
695
- " print(f\"translated: {translated}\")\n",
696
- " print()"
697
- ]
698
- },
699
- {
700
- "cell_type": "code",
701
- "execution_count": 2,
702
- "metadata": {},
703
- "outputs": [
704
- {
705
- "name": "stdout",
706
- "output_type": "stream",
707
- "text": [
708
- "Requirement already satisfied: tensorflow in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (2.10.1)Note: you may need to restart the kernel to use updated packages.\n",
709
- "\n",
710
- "Requirement already satisfied: absl-py>=1.0.0 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorflow) (2.1.0)\n",
711
- "Requirement already satisfied: astunparse>=1.6.0 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorflow) (1.6.3)\n",
712
- "Requirement already satisfied: flatbuffers>=2.0 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorflow) (24.3.25)\n",
713
- "Requirement already satisfied: gast<=0.4.0,>=0.2.1 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorflow) (0.4.0)\n",
714
- "Requirement already satisfied: google-pasta>=0.1.1 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorflow) (0.2.0)\n",
715
- "Requirement already satisfied: h5py>=2.9.0 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorflow) (3.11.0)\n",
716
- "Requirement already satisfied: keras-preprocessing>=1.1.1 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorflow) (1.1.2)\n",
717
- "Requirement already satisfied: libclang>=13.0.0 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorflow) (18.1.1)\n",
718
- "Requirement already satisfied: numpy>=1.20 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorflow) (1.26.4)\n",
719
- "Requirement already satisfied: opt-einsum>=2.3.2 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorflow) (3.3.0)\n",
720
- "Requirement already satisfied: packaging in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorflow) (24.0)\n",
721
- "Requirement already satisfied: protobuf<3.20,>=3.9.2 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorflow) (3.19.6)\n",
722
- "Requirement already satisfied: setuptools in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorflow) (69.5.1)\n",
723
- "Requirement already satisfied: six>=1.12.0 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorflow) (1.16.0)\n",
724
- "Requirement already satisfied: termcolor>=1.1.0 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorflow) (2.4.0)\n",
725
- "Requirement already satisfied: typing-extensions>=3.6.6 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorflow) (4.12.2)\n",
726
- "Requirement already satisfied: wrapt>=1.11.0 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorflow) (1.16.0)\n",
727
- "Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorflow) (0.31.0)\n",
728
- "Requirement already satisfied: grpcio<2.0,>=1.24.3 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorflow) (1.64.1)\n",
729
- "Requirement already satisfied: tensorboard<2.11,>=2.10 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorflow) (2.10.1)\n",
730
- "Requirement already satisfied: tensorflow-estimator<2.11,>=2.10.0 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorflow) (2.10.0)\n",
731
- "Requirement already satisfied: keras<2.11,>=2.10.0 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorflow) (2.10.0)\n",
732
- "Requirement already satisfied: wheel<1.0,>=0.23.0 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from astunparse>=1.6.0->tensorflow) (0.43.0)\n",
733
- "Requirement already satisfied: google-auth<3,>=1.6.3 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorboard<2.11,>=2.10->tensorflow) (2.30.0)\n",
734
- "Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorboard<2.11,>=2.10->tensorflow) (0.4.6)\n",
735
- "Requirement already satisfied: markdown>=2.6.8 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorboard<2.11,>=2.10->tensorflow) (3.6)\n",
736
- "Requirement already satisfied: requests<3,>=2.21.0 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorboard<2.11,>=2.10->tensorflow) (2.32.3)\n",
737
- "Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorboard<2.11,>=2.10->tensorflow) (0.6.1)\n",
738
- "Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorboard<2.11,>=2.10->tensorflow) (1.8.1)\n",
739
- "Requirement already satisfied: werkzeug>=1.0.1 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from tensorboard<2.11,>=2.10->tensorflow) (3.0.3)\n",
740
- "Requirement already satisfied: cachetools<6.0,>=2.0.0 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from google-auth<3,>=1.6.3->tensorboard<2.11,>=2.10->tensorflow) (5.3.3)\n",
741
- "Requirement already satisfied: pyasn1-modules>=0.2.1 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from google-auth<3,>=1.6.3->tensorboard<2.11,>=2.10->tensorflow) (0.4.0)\n",
742
- "Requirement already satisfied: rsa<5,>=3.1.4 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from google-auth<3,>=1.6.3->tensorboard<2.11,>=2.10->tensorflow) (4.9)\n",
743
- "Requirement already satisfied: requests-oauthlib>=0.7.0 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.11,>=2.10->tensorflow) (2.0.0)\n",
744
- "Requirement already satisfied: importlib-metadata>=4.4 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from markdown>=2.6.8->tensorboard<2.11,>=2.10->tensorflow) (7.1.0)\n",
745
- "Requirement already satisfied: charset-normalizer<4,>=2 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from requests<3,>=2.21.0->tensorboard<2.11,>=2.10->tensorflow) (3.3.2)\n",
746
- "Requirement already satisfied: idna<4,>=2.5 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from requests<3,>=2.21.0->tensorboard<2.11,>=2.10->tensorflow) (3.7)\n",
747
- "Requirement already satisfied: urllib3<3,>=1.21.1 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from requests<3,>=2.21.0->tensorboard<2.11,>=2.10->tensorflow) (2.2.1)\n",
748
- "Requirement already satisfied: certifi>=2017.4.17 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from requests<3,>=2.21.0->tensorboard<2.11,>=2.10->tensorflow) (2024.6.2)\n",
749
- "Requirement already satisfied: MarkupSafe>=2.1.1 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from werkzeug>=1.0.1->tensorboard<2.11,>=2.10->tensorflow) (2.1.5)\n",
750
- "Requirement already satisfied: zipp>=0.5 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard<2.11,>=2.10->tensorflow) (3.19.2)\n",
751
- "Requirement already satisfied: pyasn1<0.7.0,>=0.4.6 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard<2.11,>=2.10->tensorflow) (0.6.0)\n",
752
- "Requirement already satisfied: oauthlib>=3.0.0 in c:\\users\\prajw\\anaconda3\\envs\\testnullclass\\lib\\site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.11,>=2.10->tensorflow) (3.2.2)\n"
753
- ]
754
- }
755
- ],
756
- "source": [
757
- "pip install tensorflow\n"
758
- ]
759
- }
760
- ],
761
- "metadata": {
762
- "kernelspec": {
763
- "display_name": "base",
764
- "language": "python",
765
- "name": "python3"
766
- },
767
- "language_info": {
768
- "codemirror_mode": {
769
- "name": "ipython",
770
- "version": 3
771
- },
772
- "file_extension": ".py",
773
- "mimetype": "text/x-python",
774
- "name": "python",
775
- "nbconvert_exporter": "python",
776
- "pygments_lexer": "ipython3",
777
- "version": "3.9.19"
778
- }
779
- },
780
- "nbformat": 4,
781
- "nbformat_minor": 2
782
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Task 2/data/english DELETED
The diff for this file is too large to render. See raw diff
 
Task 2/data/french DELETED
The diff for this file is too large to render. See raw diff
 
Task 2/eng_vectorization_config.json DELETED
@@ -1 +0,0 @@
1
- {"name": "text_vectorization", "trainable": true, "batch_input_shape": [null], "dtype": "string", "max_tokens": 15000, "split": "whitespace", "ngrams": null, "output_mode": "int", "output_sequence_length": 20, "pad_to_max_tokens": false, "sparse": false, "ragged": false, "vocabulary": null, "idf_weights": null}
 
 
Task 2/eng_vocab.json DELETED
The diff for this file is too large to render. See raw diff
 
Task 2/english_to_french_model/keras_metadata.pb DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:18727f19a8f2e298e734019630e718acf71e07ef341ce0398a88208498c47dfd
3
- size 21276
 
 
 
 
Task 2/english_to_french_model/saved_model.pb DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:0781d40fb91f9326102fd3c6574611bfc035bff26a81a4d69cedf1ef0b0aa1e5
3
- size 2021585
 
 
 
 
Task 2/english_to_french_model/variables/variables.data-00000-of-00001 DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:a3c3bfe842a940826fd00c874b3f94b44abdf96953f8d695a1caefbfc1ad9608
3
- size 20633894
 
 
 
 
Task 2/english_to_french_model/variables/variables.index DELETED
Binary file (2.91 kB)
 
Task 2/english_tokenizer.json DELETED
@@ -1 +0,0 @@
1
- "{\"class_name\": \"Tokenizer\", \"config\": {\"num_words\": null, \"filters\": \"!\\\"#$%&()*+,-./:;<=>?@[\\\\]^_`{|}~\\t\\n\", \"lower\": true, \"split\": \" \", \"char_level\": false, \"oov_token\": null, \"document_count\": 137861, \"word_counts\": \"{\\\"new\\\": 12197, \\\"jersey\\\": 11225, \\\"is\\\": 205858, \\\"sometimes\\\": 37746, \\\"quiet\\\": 8693, \\\"during\\\": 74933, \\\"autumn\\\": 9004, \\\"and\\\": 59850, \\\"it\\\": 75137, \\\"snowy\\\": 8898, \\\"in\\\": 75525, \\\"april\\\": 8954, \\\"the\\\": 67628, \\\"united\\\": 11270, \\\"states\\\": 11270, \\\"usually\\\": 37507, \\\"chilly\\\": 8770, \\\"july\\\": 8956, \\\"freezing\\\": 8928, \\\"november\\\": 8951, \\\"california\\\": 11250, \\\"march\\\": 9023, \\\"hot\\\": 8639, \\\"june\\\": 9133, \\\"mild\\\": 8743, \\\"cold\\\": 8878, \\\"september\\\": 8958, \\\"your\\\": 9734, \\\"least\\\": 27564, \\\"liked\\\": 14046, \\\"fruit\\\": 27192, \\\"grape\\\": 4848, \\\"but\\\": 63987, \\\"my\\\": 9700, \\\"apple\\\": 4848, \\\"his\\\": 9700, \\\"favorite\\\": 28332, \\\"orange\\\": 4848, \\\"paris\\\": 11334, \\\"relaxing\\\": 8696, \\\"december\\\": 8945, \\\"busy\\\": 8791, \\\"spring\\\": 9102, \\\"never\\\": 37500, \\\"our\\\": 8932, \\\"lemon\\\": 4848, \\\"january\\\": 9090, \\\"warm\\\": 8890, \\\"lime\\\": 4848, \\\"her\\\": 9700, \\\"banana\\\": 4848, \\\"he\\\": 10786, \\\"saw\\\": 648, \\\"a\\\": 1944, \\\"old\\\": 972, \\\"yellow\\\": 972, \\\"truck\\\": 1944, \\\"india\\\": 11277, \\\"rainy\\\": 8761, \\\"that\\\": 2712, \\\"cat\\\": 192, \\\"was\\\": 1867, \\\"most\\\": 14934, \\\"loved\\\": 14166, \\\"animal\\\": 2304, \\\"dislikes\\\": 7314, \\\"grapefruit\\\": 10692, \\\"limes\\\": 5844, \\\"lemons\\\": 5844, \\\"february\\\": 8942, \\\"china\\\": 10953, \\\"pleasant\\\": 8916, \\\"october\\\": 8910, \\\"wonderful\\\": 8808, \\\"nice\\\": 8984, \\\"summer\\\": 8948, \\\"france\\\": 11170, \\\"may\\\": 8995, \\\"grapes\\\": 5844, \\\"mangoes\\\": 5844, \\\"their\\\": 8932, \\\"mango\\\": 4848, \\\"pear\\\": 4848, \\\"august\\\": 8789, \\\"beautiful\\\": 8915, \\\"apples\\\": 5844, \\\"peaches\\\": 5844, \\\"feared\\\": 768, \\\"shark\\\": 192, \\\"wet\\\": 8726, \\\"dry\\\": 8794, \\\"we\\\": 2532, \\\"like\\\": 4588, \\\"oranges\\\": 5844, \\\"they\\\": 3222, \\\"pears\\\": 5844, \\\"she\\\": 10786, \\\"little\\\": 1016, \\\"red\\\": 972, \\\"winter\\\": 9038, \\\"disliked\\\": 648, \\\"rusty\\\": 972, \\\"car\\\": 1944, \\\"strawberries\\\": 5844, \\\"i\\\": 2664, \\\"strawberry\\\": 4848, \\\"bananas\\\": 5844, \\\"going\\\": 666, \\\"to\\\": 5166, \\\"next\\\": 1666, \\\"plan\\\": 714, \\\"visit\\\": 1224, \\\"elephants\\\": 64, \\\"were\\\": 384, \\\"animals\\\": 768, \\\"are\\\": 870, \\\"likes\\\": 7314, \\\"dislike\\\": 4444, \\\"fall\\\": 9134, \\\"driving\\\": 1296, \\\"peach\\\": 4848, \\\"drives\\\": 648, \\\"blue\\\": 972, \\\"you\\\": 2414, \\\"bird\\\": 192, \\\"horses\\\": 64, \\\"mouse\\\": 192, \\\"went\\\": 378, \\\"last\\\": 781, \\\"horse\\\": 192, \\\"automobile\\\": 1944, \\\"dogs\\\": 64, \\\"white\\\": 972, \\\"elephant\\\": 192, \\\"black\\\": 972, \\\"think\\\": 240, \\\"difficult\\\": 260, \\\"translate\\\": 480, \\\"between\\\": 540, \\\"spanish\\\": 312, \\\"portuguese\\\": 312, \\\"big\\\": 1016, \\\"green\\\": 972, \\\"translating\\\": 300, \\\"fun\\\": 260, \\\"where\\\": 12, \\\"dog\\\": 192, \\\"why\\\": 240, \\\"might\\\": 378, \\\"go\\\": 1386, \\\"this\\\": 768, \\\"drove\\\": 648, \\\"shiny\\\": 972, \\\"sharks\\\": 64, \\\"monkey\\\": 192, \\\"how\\\": 67, \\\"weather\\\": 33, \\\"lion\\\": 192, \\\"plans\\\": 476, \\\"bear\\\": 192, \\\"rabbit\\\": 192, \\\"it's\\\": 240, \\\"chinese\\\": 312, \\\"when\\\": 144, \\\"eiffel\\\": 57, \\\"tower\\\": 57, \\\"did\\\": 204, \\\"grocery\\\": 57, \\\"store\\\": 57, \\\"wanted\\\": 378, \\\"does\\\": 24, \\\"football\\\": 57, \\\"field\\\": 57, \\\"wants\\\": 252, \\\"didn't\\\": 60, \\\"snake\\\": 192, \\\"snakes\\\": 64, \\\"do\\\": 84, \\\"easy\\\": 260, \\\"thinks\\\": 360, \\\"english\\\": 312, \\\"french\\\": 312, \\\"would\\\": 48, \\\"aren't\\\": 36, \\\"cats\\\": 64, \\\"rabbits\\\": 64, \\\"has\\\": 24, \\\"been\\\": 36, \\\"monkeys\\\": 64, \\\"lake\\\": 57, \\\"bears\\\": 64, \\\"school\\\": 57, \\\"birds\\\": 64, \\\"want\\\": 126, \\\"isn't\\\": 24, \\\"lions\\\": 64, \\\"am\\\": 24, \\\"mice\\\": 64, \\\"have\\\": 12}\", \"word_docs\": \"{\\\"april\\\": 8954, \\\"sometimes\\\": 32946, \\\"new\\\": 12197, \\\"autumn\\\": 9004, \\\"it\\\": 75137, \\\"snowy\\\": 8898, \\\"and\\\": 59850, \\\"in\\\": 75525, \\\"quiet\\\": 8693, \\\"is\\\": 104561, \\\"jersey\\\": 11225, \\\"during\\\": 74933, \\\"united\\\": 11270, \\\"states\\\": 11270, \\\"chilly\\\": 8770, \\\"freezing\\\": 8928, \\\"usually\\\": 32846, \\\"november\\\": 8951, \\\"the\\\": 41221, \\\"july\\\": 8956, \\\"march\\\": 9023, \\\"june\\\": 9133, \\\"california\\\": 11250, \\\"hot\\\": 8639, \\\"september\\\": 8958, \\\"mild\\\": 8743, \\\"cold\\\": 8878, \\\"liked\\\": 7446, \\\"grape\\\": 4848, \\\"but\\\": 63987, \\\"least\\\": 14364, \\\"apple\\\": 4848, \\\"my\\\": 9700, \\\"your\\\": 9734, \\\"fruit\\\": 27192, \\\"his\\\": 9700, \\\"favorite\\\": 15132, \\\"orange\\\": 4848, \\\"december\\\": 8945, \\\"relaxing\\\": 8696, \\\"paris\\\": 11334, \\\"spring\\\": 9102, \\\"never\\\": 32834, \\\"busy\\\": 8791, \\\"our\\\": 8932, \\\"lemon\\\": 4848, \\\"warm\\\": 8890, \\\"january\\\": 9090, \\\"lime\\\": 4848, \\\"her\\\": 9700, \\\"banana\\\": 4848, \\\"old\\\": 972, \\\"truck\\\": 1944, \\\"a\\\": 1944, \\\"saw\\\": 648, \\\"he\\\": 10786, \\\"yellow\\\": 972, \\\"india\\\": 11277, \\\"rainy\\\": 8761, \\\"most\\\": 8334, \\\"animal\\\": 2304, \\\"was\\\": 1867, \\\"cat\\\": 192, \\\"that\\\": 2712, \\\"loved\\\": 7566, \\\"grapefruit\\\": 10692, \\\"limes\\\": 5844, \\\"dislikes\\\": 7314, \\\"lemons\\\": 5844, \\\"february\\\": 8942, \\\"pleasant\\\": 8916, \\\"october\\\": 8910, \\\"china\\\": 10953, \\\"wonderful\\\": 8808, \\\"nice\\\": 8984, \\\"summer\\\": 8948, \\\"france\\\": 11170, \\\"may\\\": 8995, \\\"mangoes\\\": 5844, \\\"grapes\\\": 5844, \\\"mango\\\": 4848, \\\"pear\\\": 4848, \\\"their\\\": 8932, \\\"august\\\": 8789, \\\"beautiful\\\": 8915, \\\"apples\\\": 5844, \\\"peaches\\\": 5844, \\\"feared\\\": 768, \\\"shark\\\": 192, \\\"dry\\\": 8794, \\\"wet\\\": 8726, \\\"we\\\": 2532, \\\"like\\\": 4588, \\\"oranges\\\": 5844, \\\"they\\\": 3222, \\\"pears\\\": 5844, \\\"red\\\": 972, \\\"little\\\": 1016, \\\"she\\\": 10786, \\\"winter\\\": 9038, \\\"rusty\\\": 972, \\\"disliked\\\": 648, \\\"car\\\": 1944, \\\"strawberries\\\": 5844, \\\"i\\\": 2664, \\\"strawberry\\\": 4848, \\\"bananas\\\": 5844, \\\"to\\\": 4170, \\\"going\\\": 666, \\\"next\\\": 1666, \\\"visit\\\": 1224, \\\"plan\\\": 714, \\\"animals\\\": 768, \\\"were\\\": 384, \\\"elephants\\\": 64, \\\"are\\\": 870, \\\"likes\\\": 7314, \\\"dislike\\\": 4444, \\\"fall\\\": 9134, \\\"driving\\\": 1296, \\\"peach\\\": 4848, \\\"drives\\\": 648, \\\"blue\\\": 972, \\\"you\\\": 2414, \\\"bird\\\": 192, \\\"horses\\\": 64, \\\"mouse\\\": 192, \\\"went\\\": 378, \\\"last\\\": 781, \\\"horse\\\": 192, \\\"automobile\\\": 1944, \\\"dogs\\\": 64, \\\"white\\\": 972, \\\"elephant\\\": 192, \\\"black\\\": 972, \\\"translate\\\": 480, \\\"think\\\": 240, \\\"portuguese\\\": 312, \\\"difficult\\\": 260, \\\"spanish\\\": 312, \\\"between\\\": 540, \\\"big\\\": 1016, \\\"green\\\": 972, \\\"fun\\\": 260, \\\"translating\\\": 300, \\\"where\\\": 12, \\\"dog\\\": 192, \\\"why\\\": 240, \\\"go\\\": 1386, \\\"might\\\": 378, \\\"this\\\": 768, \\\"drove\\\": 648, \\\"shiny\\\": 972, \\\"sharks\\\": 64, \\\"monkey\\\": 192, \\\"weather\\\": 33, \\\"how\\\": 67, \\\"lion\\\": 192, \\\"plans\\\": 476, \\\"bear\\\": 192, \\\"rabbit\\\": 192, \\\"chinese\\\": 312, \\\"it's\\\": 240, \\\"tower\\\": 57, \\\"when\\\": 144, \\\"eiffel\\\": 57, \\\"grocery\\\": 57, \\\"did\\\": 204, \\\"store\\\": 57, \\\"wanted\\\": 378, \\\"football\\\": 57, \\\"field\\\": 57, \\\"does\\\": 24, \\\"wants\\\": 252, \\\"didn't\\\": 60, \\\"snake\\\": 192, \\\"snakes\\\": 64, \\\"easy\\\": 260, \\\"do\\\": 84, \\\"french\\\": 312, \\\"thinks\\\": 360, \\\"english\\\": 312, \\\"would\\\": 48, \\\"aren't\\\": 36, \\\"cats\\\": 64, \\\"rabbits\\\": 64, \\\"has\\\": 24, \\\"been\\\": 36, \\\"monkeys\\\": 64, \\\"lake\\\": 57, \\\"bears\\\": 64, \\\"school\\\": 57, \\\"birds\\\": 64, \\\"want\\\": 126, \\\"isn't\\\": 24, \\\"lions\\\": 64, \\\"am\\\": 24, \\\"mice\\\": 64, \\\"have\\\": 12}\", \"index_docs\": \"{\\\"44\\\": 8954, \\\"8\\\": 32946, \\\"17\\\": 12197, \\\"39\\\": 9004, \\\"3\\\": 75137, \\\"55\\\": 8898, \\\"7\\\": 59850, \\\"2\\\": 75525, \\\"67\\\": 8693, \\\"1\\\": 104561, \\\"23\\\": 11225, \\\"4\\\": 74933, \\\"20\\\": 11270, \\\"21\\\": 11270, \\\"62\\\": 8770, \\\"51\\\": 8928, \\\"9\\\": 32846, \\\"45\\\": 8951, \\\"5\\\": 41221, \\\"43\\\": 8956, \\\"38\\\": 9023, \\\"34\\\": 9133, \\\"22\\\": 11250, \\\"68\\\": 8639, \\\"42\\\": 8958, \\\"64\\\": 8743, \\\"57\\\": 8878, \\\"16\\\": 7446, \\\"82\\\": 4848, \\\"6\\\": 63987, \\\"12\\\": 14364, \\\"83\\\": 4848, \\\"30\\\": 9700, \\\"29\\\": 9734, \\\"13\\\": 27192, \\\"31\\\": 9700, \\\"11\\\": 15132, \\\"84\\\": 4848, \\\"47\\\": 8945, \\\"66\\\": 8696, \\\"18\\\": 11334, \\\"35\\\": 9102, \\\"10\\\": 32834, \\\"60\\\": 8791, \\\"49\\\": 8932, \\\"85\\\": 4848, \\\"56\\\": 8890, \\\"36\\\": 9090, \\\"86\\\": 4848, \\\"32\\\": 9700, \\\"87\\\": 4848, \\\"111\\\": 972, \\\"101\\\": 1944, \\\"100\\\": 1944, \\\"127\\\": 648, \\\"26\\\": 10786, \\\"112\\\": 972, \\\"19\\\": 11277, \\\"63\\\": 8761, \\\"14\\\": 8334, \\\"99\\\": 2304, \\\"104\\\": 1867, \\\"153\\\": 192, \\\"95\\\": 2712, \\\"15\\\": 7566, \\\"28\\\": 10692, \\\"71\\\": 5844, \\\"69\\\": 7314, \\\"72\\\": 5844, \\\"48\\\": 8942, \\\"52\\\": 8916, \\\"54\\\": 8910, \\\"25\\\": 10953, \\\"58\\\": 8808, \\\"41\\\": 8984, \\\"46\\\": 8948, \\\"24\\\": 11170, \\\"40\\\": 8995, \\\"74\\\": 5844, \\\"73\\\": 5844, \\\"88\\\": 4848, \\\"89\\\": 4848, \\\"50\\\": 8932, \\\"61\\\": 8789, \\\"53\\\": 8915, \\\"75\\\": 5844, \\\"76\\\": 5844, \\\"122\\\": 768, \\\"154\\\": 192, \\\"59\\\": 8794, \\\"65\\\": 8726, \\\"97\\\": 2532, \\\"92\\\": 4588, \\\"77\\\": 5844, \\\"94\\\": 3222, \\\"78\\\": 5844, \\\"113\\\": 972, \\\"109\\\": 1016, \\\"27\\\": 10786, \\\"37\\\": 9038, \\\"114\\\": 972, \\\"128\\\": 648, \\\"102\\\": 1944, \\\"79\\\": 5844, \\\"96\\\": 2664, \\\"90\\\": 4848, \\\"80\\\": 5844, \\\"81\\\": 4170, \\\"126\\\": 666, \\\"105\\\": 1666, \\\"108\\\": 1224, \\\"125\\\": 714, \\\"123\\\": 768, \\\"134\\\": 384, \\\"169\\\": 64, \\\"120\\\": 870, \\\"70\\\": 7314, \\\"93\\\": 4444, \\\"33\\\": 9134, \\\"107\\\": 1296, \\\"91\\\": 4848, \\\"129\\\": 648, \\\"115\\\": 972, \\\"98\\\": 2414, \\\"155\\\": 192, \\\"170\\\": 64, \\\"156\\\": 192, \\\"135\\\": 378, \\\"121\\\": 781, \\\"157\\\": 192, \\\"103\\\": 1944, \\\"171\\\": 64, \\\"116\\\": 972, \\\"158\\\": 192, \\\"117\\\": 972, \\\"132\\\": 480, \\\"149\\\": 240, \\\"140\\\": 312, \\\"145\\\": 260, \\\"139\\\": 312, \\\"131\\\": 540, \\\"110\\\": 1016, \\\"118\\\": 972, \\\"146\\\": 260, \\\"144\\\": 300, \\\"198\\\": 12, \\\"159\\\": 192, \\\"150\\\": 240, \\\"106\\\": 1386, \\\"136\\\": 378, \\\"124\\\": 768, \\\"130\\\": 648, \\\"119\\\": 972, \\\"172\\\": 64, \\\"160\\\": 192, \\\"193\\\": 33, \\\"168\\\": 67, \\\"161\\\": 192, \\\"133\\\": 476, \\\"162\\\": 192, \\\"163\\\": 192, \\\"141\\\": 312, \\\"151\\\": 240, \\\"183\\\": 57, \\\"165\\\": 144, \\\"182\\\": 57, \\\"184\\\": 57, \\\"152\\\": 204, \\\"185\\\": 57, \\\"137\\\": 378, \\\"186\\\": 57, \\\"187\\\": 57, \\\"194\\\": 24, \\\"148\\\": 252, \\\"181\\\": 60, \\\"164\\\": 192, \\\"173\\\": 64, \\\"147\\\": 260, \\\"167\\\": 84, \\\"143\\\": 312, \\\"138\\\": 360, \\\"142\\\": 312, \\\"190\\\": 48, \\\"191\\\": 36, \\\"174\\\": 64, \\\"175\\\": 64, \\\"195\\\": 24, \\\"192\\\": 36, \\\"176\\\": 64, \\\"188\\\": 57, \\\"177\\\": 64, \\\"189\\\": 57, \\\"178\\\": 64, \\\"166\\\": 126, \\\"196\\\": 24, \\\"179\\\": 64, \\\"197\\\": 24, \\\"180\\\": 64, \\\"199\\\": 12}\", \"index_word\": \"{\\\"1\\\": \\\"is\\\", \\\"2\\\": \\\"in\\\", \\\"3\\\": \\\"it\\\", \\\"4\\\": \\\"during\\\", \\\"5\\\": \\\"the\\\", \\\"6\\\": \\\"but\\\", \\\"7\\\": \\\"and\\\", \\\"8\\\": \\\"sometimes\\\", \\\"9\\\": \\\"usually\\\", \\\"10\\\": \\\"never\\\", \\\"11\\\": \\\"favorite\\\", \\\"12\\\": \\\"least\\\", \\\"13\\\": \\\"fruit\\\", \\\"14\\\": \\\"most\\\", \\\"15\\\": \\\"loved\\\", \\\"16\\\": \\\"liked\\\", \\\"17\\\": \\\"new\\\", \\\"18\\\": \\\"paris\\\", \\\"19\\\": \\\"india\\\", \\\"20\\\": \\\"united\\\", \\\"21\\\": \\\"states\\\", \\\"22\\\": \\\"california\\\", \\\"23\\\": \\\"jersey\\\", \\\"24\\\": \\\"france\\\", \\\"25\\\": \\\"china\\\", \\\"26\\\": \\\"he\\\", \\\"27\\\": \\\"she\\\", \\\"28\\\": \\\"grapefruit\\\", \\\"29\\\": \\\"your\\\", \\\"30\\\": \\\"my\\\", \\\"31\\\": \\\"his\\\", \\\"32\\\": \\\"her\\\", \\\"33\\\": \\\"fall\\\", \\\"34\\\": \\\"june\\\", \\\"35\\\": \\\"spring\\\", \\\"36\\\": \\\"january\\\", \\\"37\\\": \\\"winter\\\", \\\"38\\\": \\\"march\\\", \\\"39\\\": \\\"autumn\\\", \\\"40\\\": \\\"may\\\", \\\"41\\\": \\\"nice\\\", \\\"42\\\": \\\"september\\\", \\\"43\\\": \\\"july\\\", \\\"44\\\": \\\"april\\\", \\\"45\\\": \\\"november\\\", \\\"46\\\": \\\"summer\\\", \\\"47\\\": \\\"december\\\", \\\"48\\\": \\\"february\\\", \\\"49\\\": \\\"our\\\", \\\"50\\\": \\\"their\\\", \\\"51\\\": \\\"freezing\\\", \\\"52\\\": \\\"pleasant\\\", \\\"53\\\": \\\"beautiful\\\", \\\"54\\\": \\\"october\\\", \\\"55\\\": \\\"snowy\\\", \\\"56\\\": \\\"warm\\\", \\\"57\\\": \\\"cold\\\", \\\"58\\\": \\\"wonderful\\\", \\\"59\\\": \\\"dry\\\", \\\"60\\\": \\\"busy\\\", \\\"61\\\": \\\"august\\\", \\\"62\\\": \\\"chilly\\\", \\\"63\\\": \\\"rainy\\\", \\\"64\\\": \\\"mild\\\", \\\"65\\\": \\\"wet\\\", \\\"66\\\": \\\"relaxing\\\", \\\"67\\\": \\\"quiet\\\", \\\"68\\\": \\\"hot\\\", \\\"69\\\": \\\"dislikes\\\", \\\"70\\\": \\\"likes\\\", \\\"71\\\": \\\"limes\\\", \\\"72\\\": \\\"lemons\\\", \\\"73\\\": \\\"grapes\\\", \\\"74\\\": \\\"mangoes\\\", \\\"75\\\": \\\"apples\\\", \\\"76\\\": \\\"peaches\\\", \\\"77\\\": \\\"oranges\\\", \\\"78\\\": \\\"pears\\\", \\\"79\\\": \\\"strawberries\\\", \\\"80\\\": \\\"bananas\\\", \\\"81\\\": \\\"to\\\", \\\"82\\\": \\\"grape\\\", \\\"83\\\": \\\"apple\\\", \\\"84\\\": \\\"orange\\\", \\\"85\\\": \\\"lemon\\\", \\\"86\\\": \\\"lime\\\", \\\"87\\\": \\\"banana\\\", \\\"88\\\": \\\"mango\\\", \\\"89\\\": \\\"pear\\\", \\\"90\\\": \\\"strawberry\\\", \\\"91\\\": \\\"peach\\\", \\\"92\\\": \\\"like\\\", \\\"93\\\": \\\"dislike\\\", \\\"94\\\": \\\"they\\\", \\\"95\\\": \\\"that\\\", \\\"96\\\": \\\"i\\\", \\\"97\\\": \\\"we\\\", \\\"98\\\": \\\"you\\\", \\\"99\\\": \\\"animal\\\", \\\"100\\\": \\\"a\\\", \\\"101\\\": \\\"truck\\\", \\\"102\\\": \\\"car\\\", \\\"103\\\": \\\"automobile\\\", \\\"104\\\": \\\"was\\\", \\\"105\\\": \\\"next\\\", \\\"106\\\": \\\"go\\\", \\\"107\\\": \\\"driving\\\", \\\"108\\\": \\\"visit\\\", \\\"109\\\": \\\"little\\\", \\\"110\\\": \\\"big\\\", \\\"111\\\": \\\"old\\\", \\\"112\\\": \\\"yellow\\\", \\\"113\\\": \\\"red\\\", \\\"114\\\": \\\"rusty\\\", \\\"115\\\": \\\"blue\\\", \\\"116\\\": \\\"white\\\", \\\"117\\\": \\\"black\\\", \\\"118\\\": \\\"green\\\", \\\"119\\\": \\\"shiny\\\", \\\"120\\\": \\\"are\\\", \\\"121\\\": \\\"last\\\", \\\"122\\\": \\\"feared\\\", \\\"123\\\": \\\"animals\\\", \\\"124\\\": \\\"this\\\", \\\"125\\\": \\\"plan\\\", \\\"126\\\": \\\"going\\\", \\\"127\\\": \\\"saw\\\", \\\"128\\\": \\\"disliked\\\", \\\"129\\\": \\\"drives\\\", \\\"130\\\": \\\"drove\\\", \\\"131\\\": \\\"between\\\", \\\"132\\\": \\\"translate\\\", \\\"133\\\": \\\"plans\\\", \\\"134\\\": \\\"were\\\", \\\"135\\\": \\\"went\\\", \\\"136\\\": \\\"might\\\", \\\"137\\\": \\\"wanted\\\", \\\"138\\\": \\\"thinks\\\", \\\"139\\\": \\\"spanish\\\", \\\"140\\\": \\\"portuguese\\\", \\\"141\\\": \\\"chinese\\\", \\\"142\\\": \\\"english\\\", \\\"143\\\": \\\"french\\\", \\\"144\\\": \\\"translating\\\", \\\"145\\\": \\\"difficult\\\", \\\"146\\\": \\\"fun\\\", \\\"147\\\": \\\"easy\\\", \\\"148\\\": \\\"wants\\\", \\\"149\\\": \\\"think\\\", \\\"150\\\": \\\"why\\\", \\\"151\\\": \\\"it's\\\", \\\"152\\\": \\\"did\\\", \\\"153\\\": \\\"cat\\\", \\\"154\\\": \\\"shark\\\", \\\"155\\\": \\\"bird\\\", \\\"156\\\": \\\"mouse\\\", \\\"157\\\": \\\"horse\\\", \\\"158\\\": \\\"elephant\\\", \\\"159\\\": \\\"dog\\\", \\\"160\\\": \\\"monkey\\\", \\\"161\\\": \\\"lion\\\", \\\"162\\\": \\\"bear\\\", \\\"163\\\": \\\"rabbit\\\", \\\"164\\\": \\\"snake\\\", \\\"165\\\": \\\"when\\\", \\\"166\\\": \\\"want\\\", \\\"167\\\": \\\"do\\\", \\\"168\\\": \\\"how\\\", \\\"169\\\": \\\"elephants\\\", \\\"170\\\": \\\"horses\\\", \\\"171\\\": \\\"dogs\\\", \\\"172\\\": \\\"sharks\\\", \\\"173\\\": \\\"snakes\\\", \\\"174\\\": \\\"cats\\\", \\\"175\\\": \\\"rabbits\\\", \\\"176\\\": \\\"monkeys\\\", \\\"177\\\": \\\"bears\\\", \\\"178\\\": \\\"birds\\\", \\\"179\\\": \\\"lions\\\", \\\"180\\\": \\\"mice\\\", \\\"181\\\": \\\"didn't\\\", \\\"182\\\": \\\"eiffel\\\", \\\"183\\\": \\\"tower\\\", \\\"184\\\": \\\"grocery\\\", \\\"185\\\": \\\"store\\\", \\\"186\\\": \\\"football\\\", \\\"187\\\": \\\"field\\\", \\\"188\\\": \\\"lake\\\", \\\"189\\\": \\\"school\\\", \\\"190\\\": \\\"would\\\", \\\"191\\\": \\\"aren't\\\", \\\"192\\\": \\\"been\\\", \\\"193\\\": \\\"weather\\\", \\\"194\\\": \\\"does\\\", \\\"195\\\": \\\"has\\\", \\\"196\\\": \\\"isn't\\\", \\\"197\\\": \\\"am\\\", \\\"198\\\": \\\"where\\\", \\\"199\\\": \\\"have\\\"}\", \"word_index\": \"{\\\"is\\\": 1, \\\"in\\\": 2, \\\"it\\\": 3, \\\"during\\\": 4, \\\"the\\\": 5, \\\"but\\\": 6, \\\"and\\\": 7, \\\"sometimes\\\": 8, \\\"usually\\\": 9, \\\"never\\\": 10, \\\"favorite\\\": 11, \\\"least\\\": 12, \\\"fruit\\\": 13, \\\"most\\\": 14, \\\"loved\\\": 15, \\\"liked\\\": 16, \\\"new\\\": 17, \\\"paris\\\": 18, \\\"india\\\": 19, \\\"united\\\": 20, \\\"states\\\": 21, \\\"california\\\": 22, \\\"jersey\\\": 23, \\\"france\\\": 24, \\\"china\\\": 25, \\\"he\\\": 26, \\\"she\\\": 27, \\\"grapefruit\\\": 28, \\\"your\\\": 29, \\\"my\\\": 30, \\\"his\\\": 31, \\\"her\\\": 32, \\\"fall\\\": 33, \\\"june\\\": 34, \\\"spring\\\": 35, \\\"january\\\": 36, \\\"winter\\\": 37, \\\"march\\\": 38, \\\"autumn\\\": 39, \\\"may\\\": 40, \\\"nice\\\": 41, \\\"september\\\": 42, \\\"july\\\": 43, \\\"april\\\": 44, \\\"november\\\": 45, \\\"summer\\\": 46, \\\"december\\\": 47, \\\"february\\\": 48, \\\"our\\\": 49, \\\"their\\\": 50, \\\"freezing\\\": 51, \\\"pleasant\\\": 52, \\\"beautiful\\\": 53, \\\"october\\\": 54, \\\"snowy\\\": 55, \\\"warm\\\": 56, \\\"cold\\\": 57, \\\"wonderful\\\": 58, \\\"dry\\\": 59, \\\"busy\\\": 60, \\\"august\\\": 61, \\\"chilly\\\": 62, \\\"rainy\\\": 63, \\\"mild\\\": 64, \\\"wet\\\": 65, \\\"relaxing\\\": 66, \\\"quiet\\\": 67, \\\"hot\\\": 68, \\\"dislikes\\\": 69, \\\"likes\\\": 70, \\\"limes\\\": 71, \\\"lemons\\\": 72, \\\"grapes\\\": 73, \\\"mangoes\\\": 74, \\\"apples\\\": 75, \\\"peaches\\\": 76, \\\"oranges\\\": 77, \\\"pears\\\": 78, \\\"strawberries\\\": 79, \\\"bananas\\\": 80, \\\"to\\\": 81, \\\"grape\\\": 82, \\\"apple\\\": 83, \\\"orange\\\": 84, \\\"lemon\\\": 85, \\\"lime\\\": 86, \\\"banana\\\": 87, \\\"mango\\\": 88, \\\"pear\\\": 89, \\\"strawberry\\\": 90, \\\"peach\\\": 91, \\\"like\\\": 92, \\\"dislike\\\": 93, \\\"they\\\": 94, \\\"that\\\": 95, \\\"i\\\": 96, \\\"we\\\": 97, \\\"you\\\": 98, \\\"animal\\\": 99, \\\"a\\\": 100, \\\"truck\\\": 101, \\\"car\\\": 102, \\\"automobile\\\": 103, \\\"was\\\": 104, \\\"next\\\": 105, \\\"go\\\": 106, \\\"driving\\\": 107, \\\"visit\\\": 108, \\\"little\\\": 109, \\\"big\\\": 110, \\\"old\\\": 111, \\\"yellow\\\": 112, \\\"red\\\": 113, \\\"rusty\\\": 114, \\\"blue\\\": 115, \\\"white\\\": 116, \\\"black\\\": 117, \\\"green\\\": 118, \\\"shiny\\\": 119, \\\"are\\\": 120, \\\"last\\\": 121, \\\"feared\\\": 122, \\\"animals\\\": 123, \\\"this\\\": 124, \\\"plan\\\": 125, \\\"going\\\": 126, \\\"saw\\\": 127, \\\"disliked\\\": 128, \\\"drives\\\": 129, \\\"drove\\\": 130, \\\"between\\\": 131, \\\"translate\\\": 132, \\\"plans\\\": 133, \\\"were\\\": 134, \\\"went\\\": 135, \\\"might\\\": 136, \\\"wanted\\\": 137, \\\"thinks\\\": 138, \\\"spanish\\\": 139, \\\"portuguese\\\": 140, \\\"chinese\\\": 141, \\\"english\\\": 142, \\\"french\\\": 143, \\\"translating\\\": 144, \\\"difficult\\\": 145, \\\"fun\\\": 146, \\\"easy\\\": 147, \\\"wants\\\": 148, \\\"think\\\": 149, \\\"why\\\": 150, \\\"it's\\\": 151, \\\"did\\\": 152, \\\"cat\\\": 153, \\\"shark\\\": 154, \\\"bird\\\": 155, \\\"mouse\\\": 156, \\\"horse\\\": 157, \\\"elephant\\\": 158, \\\"dog\\\": 159, \\\"monkey\\\": 160, \\\"lion\\\": 161, \\\"bear\\\": 162, \\\"rabbit\\\": 163, \\\"snake\\\": 164, \\\"when\\\": 165, \\\"want\\\": 166, \\\"do\\\": 167, \\\"how\\\": 168, \\\"elephants\\\": 169, \\\"horses\\\": 170, \\\"dogs\\\": 171, \\\"sharks\\\": 172, \\\"snakes\\\": 173, \\\"cats\\\": 174, \\\"rabbits\\\": 175, \\\"monkeys\\\": 176, \\\"bears\\\": 177, \\\"birds\\\": 178, \\\"lions\\\": 179, \\\"mice\\\": 180, \\\"didn't\\\": 181, \\\"eiffel\\\": 182, \\\"tower\\\": 183, \\\"grocery\\\": 184, \\\"store\\\": 185, \\\"football\\\": 186, \\\"field\\\": 187, \\\"lake\\\": 188, \\\"school\\\": 189, \\\"would\\\": 190, \\\"aren't\\\": 191, \\\"been\\\": 192, \\\"weather\\\": 193, \\\"does\\\": 194, \\\"has\\\": 195, \\\"isn't\\\": 196, \\\"am\\\": 197, \\\"where\\\": 198, \\\"have\\\": 199}\"}}"
 
 
Task 2/french_tokenizer.json DELETED
@@ -1 +0,0 @@
1
- "{\"class_name\": \"Tokenizer\", \"config\": {\"num_words\": null, \"filters\": \"!\\\"#$%&()*+,-./:;<=>?@[\\\\]^_`{|}~\\t\\n\", \"lower\": true, \"split\": \" \", \"char_level\": false, \"oov_token\": null, \"document_count\": 137861, \"word_counts\": \"{\\\"new\\\": 11047, \\\"jersey\\\": 11052, \\\"est\\\": 196821, \\\"parfois\\\": 37746, \\\"calme\\\": 7256, \\\"pendant\\\": 10741, \\\"l'\\\": 32917, \\\"automne\\\": 14727, \\\"et\\\": 59851, \\\"il\\\": 84115, \\\"neigeux\\\": 1867, \\\"en\\\": 105768, \\\"avril\\\": 8954, \\\"les\\\": 65255, \\\"\\\\u00e9tats\\\": 11267, \\\"unis\\\": 11270, \\\"g\\\\u00e9n\\\\u00e9ralement\\\": 31292, \\\"froid\\\": 16794, \\\"juillet\\\": 8956, \\\"g\\\\u00e8le\\\": 3622, \\\"habituellement\\\": 6215, \\\"novembre\\\": 8951, \\\"california\\\": 3061, \\\"mars\\\": 9023, \\\"chaud\\\": 16405, \\\"juin\\\": 9133, \\\"l\\\\u00e9g\\\\u00e8re\\\": 63, \\\"fait\\\": 2916, \\\"septembre\\\": 8958, \\\"votre\\\": 9368, \\\"moins\\\": 27557, \\\"aim\\\\u00e9\\\": 25852, \\\"fruit\\\": 23626, \\\"le\\\": 35306, \\\"raisin\\\": 4852, \\\"mais\\\": 63987, \\\"mon\\\": 9403, \\\"la\\\": 49861, \\\"pomme\\\": 4848, \\\"son\\\": 16496, \\\"pr\\\\u00e9f\\\\u00e9r\\\\u00e9\\\": 23305, \\\"l'orange\\\": 4848, \\\"paris\\\": 11334, \\\"relaxant\\\": 8458, \\\"d\\\\u00e9cembre\\\": 8945, \\\"occup\\\\u00e9\\\": 7782, \\\"au\\\": 25738, \\\"printemps\\\": 9100, \\\"jamais\\\": 37215, \\\"chaude\\\": 1124, \\\"notre\\\": 8319, \\\"citron\\\": 4848, \\\"janvier\\\": 9090, \\\"chaux\\\": 4848, \\\"des\\\": 2435, \\\"fruits\\\": 3566, \\\"banane\\\": 4848, \\\"a\\\": 1356, \\\"vu\\\": 645, \\\"un\\\": 698, \\\"vieux\\\": 325, \\\"camion\\\": 1944, \\\"jaune\\\": 972, \\\"inde\\\": 11277, \\\"pluvieux\\\": 7658, \\\"ce\\\": 1572, \\\"chat\\\": 192, \\\"\\\\u00e9tait\\\": 1198, \\\"animal\\\": 2248, \\\"plus\\\": 14934, \\\"n'aime\\\": 3131, \\\"pamplemousse\\\": 10140, \\\"citrons\\\": 11679, \\\"verts\\\": 5835, \\\"californie\\\": 8189, \\\"ne\\\": 2715, \\\"f\\\\u00e9vrier\\\": 8942, \\\"gel\\\": 4886, \\\"chine\\\": 10936, \\\"agr\\\\u00e9able\\\": 17751, \\\"octobre\\\": 8911, \\\"merveilleux\\\": 8704, \\\"doux\\\": 8458, \\\"tranquille\\\": 1437, \\\"\\\\u00e0\\\": 13870, \\\"l'automne\\\": 3411, \\\"\\\\u00e9t\\\\u00e9\\\": 8999, \\\"france\\\": 11170, \\\"mois\\\": 14350, \\\"de\\\": 15070, \\\"mai\\\": 8995, \\\"frisquet\\\": 834, \\\"d\\\\u00e9teste\\\": 3743, \\\"raisins\\\": 5780, \\\"mangues\\\": 5774, \\\"leur\\\": 7855, \\\"mangue\\\": 4899, \\\"poire\\\": 4848, \\\"ao\\\\u00fbt\\\": 8789, \\\"beau\\\": 6387, \\\"pommes\\\": 5844, \\\"p\\\\u00eaches\\\": 5844, \\\"redout\\\\u00e9\\\": 576, \\\"que\\\": 667, \\\"requin\\\": 192, \\\"humide\\\": 8446, \\\"d'\\\": 5100, \\\"sec\\\": 7957, \\\"enneig\\\\u00e9e\\\": 4008, \\\"nous\\\": 2520, \\\"aimons\\\": 1111, \\\"oranges\\\": 5844, \\\"ils\\\": 3221, \\\"aiment\\\": 1126, \\\"poires\\\": 5844, \\\"elle\\\": 12080, \\\"petit\\\": 324, \\\"rouge\\\": 972, \\\"cher\\\": 1308, \\\"aim\\\\u00e9e\\\": 105, \\\"neige\\\": 3016, \\\"trop\\\": 173, \\\"monde\\\": 173, \\\"hiver\\\": 9038, \\\"sont\\\": 1018, \\\"n'aimait\\\": 561, \\\"pas\\\": 4495, \\\"une\\\": 1278, \\\"voiture\\\": 3510, \\\"rouill\\\\u00e9e\\\": 486, \\\"fraises\\\": 5844, \\\"cours\\\": 1927, \\\"j'aime\\\": 966, \\\"fraise\\\": 4848, \\\"bananes\\\": 5844, \\\"va\\\": 355, \\\"aux\\\": 392, \\\"prochain\\\": 1666, \\\"je\\\": 1548, \\\"pr\\\\u00e9vois\\\": 233, \\\"visiter\\\": 908, \\\"belle\\\": 2726, \\\"\\\\u00e9l\\\\u00e9phants\\\": 64, \\\"\\\\u00e9taient\\\": 357, \\\"ses\\\": 402, \\\"animaux\\\": 768, \\\"redout\\\\u00e9s\\\": 190, \\\"vont\\\": 168, \\\"aime\\\": 8870, \\\"pr\\\\u00e9f\\\\u00e9r\\\\u00e9e\\\": 770, \\\"n'aiment\\\": 1111, \\\"i\\\": 150, \\\"comme\\\": 259, \\\"conduit\\\": 1706, \\\"p\\\\u00eache\\\": 4848, \\\"nouvelle\\\": 648, \\\"bleue\\\": 504, \\\"vous\\\": 2541, \\\"aimez\\\": 1053, \\\"cet\\\": 286, \\\"oiseau\\\": 128, \\\"pamplemousses\\\": 552, \\\"pleut\\\": 562, \\\"magnifique\\\": 104, \\\"favori\\\": 3857, \\\"vos\\\": 225, \\\"aim\\\\u00e9s\\\": 237, \\\"chevaux\\\": 64, \\\"n'aimez\\\": 1094, \\\"n'aimons\\\": 97, \\\"souris\\\": 256, \\\"d\\\\u00e9testons\\\": 1001, \\\"all\\\\u00e9\\\": 187, \\\"dernier\\\": 757, \\\"conduisait\\\": 673, \\\"petite\\\": 615, \\\"glaciales\\\": 307, \\\"cheval\\\": 192, \\\"vieille\\\": 647, \\\"chiens\\\": 64, \\\"pr\\\\u00e9f\\\\u00e9r\\\\u00e9s\\\": 383, \\\"blanche\\\": 579, \\\"occup\\\\u00e9e\\\": 836, \\\"nos\\\": 613, \\\"l'\\\\u00e9l\\\\u00e9phant\\\": 64, \\\"nouveau\\\": 502, \\\"noire\\\": 602, \\\"pluies\\\": 367, \\\"pense\\\": 540, \\\"qu'il\\\": 393, \\\"difficile\\\": 260, \\\"traduire\\\": 501, \\\"entre\\\": 540, \\\"espagnol\\\": 312, \\\"portugais\\\": 312, \\\"bleu\\\": 468, \\\"rouill\\\\u00e9\\\": 454, \\\"aimait\\\": 707, \\\"grande\\\": 459, \\\"verte\\\": 628, \\\"traduction\\\": 277, \\\"amusant\\\": 260, \\\"cette\\\": 1239, \\\"vert\\\": 344, \\\"grand\\\": 81, \\\"blanc\\\": 393, \\\"volant\\\": 165, \\\"gros\\\": 258, \\\"o\\\\u00f9\\\": 12, \\\"chien\\\": 192, \\\"leurs\\\": 1072, \\\"pourquoi\\\": 240, \\\"l'automobile\\\": 100, \\\"pourrait\\\": 252, \\\"se\\\": 461, \\\"rendre\\\": 350, \\\"pr\\\\u00e9voyons\\\": 232, \\\"maillot\\\": 173, \\\"grosse\\\": 185, \\\"brillant\\\": 587, \\\"pr\\\\u00e9voient\\\": 75, \\\"mouill\\\\u00e9e\\\": 7, \\\"lui\\\": 70, \\\"d\\\\u00e9tendre\\\": 111, \\\"automobile\\\": 278, \\\"pourraient\\\": 126, \\\"aller\\\": 1180, \\\"mes\\\": 297, \\\"s\\\\u00e8che\\\": 837, \\\"l'oiseau\\\": 64, \\\"pluie\\\": 174, \\\"requins\\\": 64, \\\"noir\\\": 370, \\\"singe\\\": 192, \\\"d\\\\u00e9testait\\\": 87, \\\"comment\\\": 67, \\\"temps\\\": 33, \\\"dans\\\": 12, \\\"lion\\\": 192, \\\"pr\\\\u00e9voit\\\": 75, \\\"ours\\\": 192, \\\"porcelaine\\\": 17, \\\"cl\\\\u00e9mentes\\\": 200, \\\"pla\\\\u00eet\\\": 13, \\\"proches\\\": 20, \\\"brillante\\\": 385, \\\"lapin\\\": 192, \\\"l'ours\\\": 64, \\\"chinois\\\": 312, \\\"quand\\\": 144, \\\"tour\\\": 57, \\\"eiffel\\\": 57, \\\"allons\\\": 45, \\\"l'\\\\u00e9picerie\\\": 57, \\\"voulait\\\": 252, \\\"c\\\\u00e9page\\\": 60, \\\"t\\\": 18, \\\"terrain\\\": 57, \\\"football\\\": 57, \\\"du\\\": 39, \\\"veut\\\": 252, \\\"\\\\u00e9l\\\\u00e9phant\\\": 128, \\\"gel\\\\u00e9\\\": 94, \\\"bien\\\": 77, \\\"enneig\\\\u00e9\\\": 7, \\\"gel\\\\u00e9s\\\": 5, \\\"serpent\\\": 192, \\\"all\\\\u00e9s\\\": 150, \\\"all\\\\u00e9e\\\": 150, \\\"envisage\\\": 360, \\\"peu\\\": 41, \\\"mouill\\\\u00e9\\\": 273, \\\"serpents\\\": 64, \\\"pensez\\\": 60, \\\"facile\\\": 260, \\\"anglais\\\": 312, \\\"fran\\\\u00e7ais\\\": 312, \\\"voulez\\\": 12, \\\"grandes\\\": 16, \\\"avez\\\": 162, \\\"aimeraient\\\": 12, \\\"allez\\\": 45, \\\"chats\\\": 64, \\\"lapins\\\": 64, \\\"visite\\\": 68, \\\"ont\\\": 194, \\\"intention\\\": 206, \\\"n'est\\\": 47, \\\"derni\\\\u00e8re\\\": 24, \\\"voulaient\\\": 126, \\\"singes\\\": 64, \\\"\\\\u00eates\\\": 24, \\\"qu'elle\\\": 26, \\\"vers\\\": 76, \\\"lac\\\": 57, \\\"pousse\\\": 41, \\\"d\\\\u00e9testez\\\": 17, \\\"manguiers\\\": 19, \\\"grands\\\": 9, \\\"l'\\\\u00e9cole\\\": 57, \\\"l'animal\\\": 56, \\\"at\\\": 32, \\\"oiseaux\\\": 64, \\\"ressort\\\": 2, \\\"petits\\\": 10, \\\"n'a\\\": 12, \\\"veulent\\\": 126, \\\"rouille\\\": 32, \\\"frais\\\": 20, \\\"limes\\\": 9, \\\"lions\\\": 64, \\\"douce\\\": 14, \\\"envisagent\\\": 9, \\\"petites\\\": 26, \\\"vais\\\": 24, \\\"durant\\\": 14, \\\"c'est\\\": 17, \\\"cong\\\\u00e9lation\\\": 14, \\\"allions\\\": 1, \\\"voudrait\\\": 24, \\\"d\\\\u00e9tend\\\": 2, \\\"trouv\\\\u00e9\\\": 1, \\\"pr\\\\u00e9f\\\\u00e9r\\\\u00e9es\\\": 16, \\\"conduite\\\": 6, \\\"grosses\\\": 8, \\\"b\\\\u00e9nigne\\\": 8, \\\"avons\\\": 19, \\\"sur\\\": 28, \\\"redout\\\\u00e9e\\\": 2, \\\"etats\\\": 3, \\\"moindres\\\": 7, \\\"n'\\\\u00eates\\\": 3, \\\"vit\\\": 3, \\\"as\\\": 1, \\\"tu\\\": 2, \\\"qui\\\": 2, \\\"faire\\\": 1, \\\"traduis\\\": 2, \\\"favoris\\\": 1, \\\"souvent\\\": 1, \\\"es\\\": 1, \\\"appr\\\\u00e9ci\\\\u00e9\\\": 2, \\\"moteur\\\": 1, \\\"tout\\\": 4}\", \"word_docs\": \"{\\\"l'\\\": 28828, \\\"en\\\": 72048, \\\"automne\\\": 14398, \\\"calme\\\": 7256, \\\"et\\\": 59851, \\\"neigeux\\\": 1867, \\\"est\\\": 104279, \\\"il\\\": 84114, \\\"pendant\\\": 10741, \\\"new\\\": 11047, \\\"parfois\\\": 32946, \\\"jersey\\\": 11052, \\\"avril\\\": 8954, \\\"unis\\\": 11270, \\\"froid\\\": 16323, \\\"les\\\": 33404, \\\"juillet\\\": 8956, \\\"\\\\u00e9tats\\\": 11267, \\\"habituellement\\\": 6112, \\\"g\\\\u00e8le\\\": 3622, \\\"novembre\\\": 8951, \\\"g\\\\u00e9n\\\\u00e9ralement\\\": 28101, \\\"mars\\\": 9023, \\\"chaud\\\": 15908, \\\"california\\\": 3061, \\\"juin\\\": 9133, \\\"l\\\\u00e9g\\\\u00e8re\\\": 63, \\\"fait\\\": 2916, \\\"septembre\\\": 8958, \\\"mon\\\": 9403, \\\"moins\\\": 14357, \\\"votre\\\": 9368, \\\"mais\\\": 63987, \\\"la\\\": 39038, \\\"raisin\\\": 4852, \\\"fruit\\\": 23626, \\\"aim\\\\u00e9\\\": 14038, \\\"pomme\\\": 4848, \\\"le\\\": 28877, \\\"pr\\\\u00e9f\\\\u00e9r\\\\u00e9\\\": 14722, \\\"l'orange\\\": 4848, \\\"son\\\": 15184, \\\"relaxant\\\": 8458, \\\"paris\\\": 11334, \\\"d\\\\u00e9cembre\\\": 8945, \\\"printemps\\\": 9100, \\\"occup\\\\u00e9\\\": 7782, \\\"au\\\": 24675, \\\"jamais\\\": 32618, \\\"chaude\\\": 1124, \\\"citron\\\": 4848, \\\"notre\\\": 8319, \\\"janvier\\\": 9090, \\\"banane\\\": 4848, \\\"fruits\\\": 3566, \\\"chaux\\\": 4848, \\\"des\\\": 2435, \\\"a\\\": 1356, \\\"jaune\\\": 972, \\\"vu\\\": 645, \\\"camion\\\": 1944, \\\"vieux\\\": 325, \\\"un\\\": 698, \\\"inde\\\": 11277, \\\"pluvieux\\\": 7658, \\\"animal\\\": 2248, \\\"\\\\u00e9tait\\\": 1198, \\\"plus\\\": 8334, \\\"chat\\\": 192, \\\"ce\\\": 1572, \\\"pamplemousse\\\": 10140, \\\"n'aime\\\": 3131, \\\"verts\\\": 5835, \\\"citrons\\\": 10568, \\\"gel\\\": 4886, \\\"f\\\\u00e9vrier\\\": 8942, \\\"californie\\\": 8189, \\\"ne\\\": 2707, \\\"chine\\\": 10936, \\\"agr\\\\u00e9able\\\": 17151, \\\"octobre\\\": 8910, \\\"merveilleux\\\": 8704, \\\"doux\\\": 8458, \\\"tranquille\\\": 1437, \\\"\\\\u00e0\\\": 13381, \\\"l'automne\\\": 3411, \\\"\\\\u00e9t\\\\u00e9\\\": 8999, \\\"france\\\": 11170, \\\"mois\\\": 14311, \\\"mai\\\": 8995, \\\"frisquet\\\": 834, \\\"de\\\": 14740, \\\"mangues\\\": 5774, \\\"d\\\\u00e9teste\\\": 3743, \\\"raisins\\\": 5780, \\\"mangue\\\": 4899, \\\"poire\\\": 4848, \\\"leur\\\": 7855, \\\"ao\\\\u00fbt\\\": 8789, \\\"beau\\\": 6387, \\\"pommes\\\": 5844, \\\"p\\\\u00eaches\\\": 5844, \\\"requin\\\": 192, \\\"redout\\\\u00e9\\\": 576, \\\"que\\\": 667, \\\"d'\\\": 5100, \\\"sec\\\": 7957, \\\"humide\\\": 8446, \\\"enneig\\\\u00e9e\\\": 4008, \\\"aimons\\\": 1111, \\\"nous\\\": 2520, \\\"oranges\\\": 5844, \\\"poires\\\": 5844, \\\"ils\\\": 3221, \\\"aiment\\\": 1126, \\\"petit\\\": 324, \\\"rouge\\\": 972, \\\"elle\\\": 12080, \\\"aim\\\\u00e9e\\\": 105, \\\"cher\\\": 1308, \\\"monde\\\": 173, \\\"neige\\\": 3016, \\\"trop\\\": 173, \\\"hiver\\\": 9038, \\\"sont\\\": 1018, \\\"n'aimait\\\": 561, \\\"voiture\\\": 3510, \\\"une\\\": 1278, \\\"pas\\\": 4495, \\\"rouill\\\\u00e9e\\\": 486, \\\"fraises\\\": 5844, \\\"cours\\\": 1927, \\\"j'aime\\\": 966, \\\"fraise\\\": 4848, \\\"bananes\\\": 5844, \\\"va\\\": 355, \\\"aux\\\": 392, \\\"prochain\\\": 1666, \\\"visiter\\\": 908, \\\"pr\\\\u00e9vois\\\": 233, \\\"je\\\": 1548, \\\"belle\\\": 2725, \\\"ses\\\": 402, \\\"animaux\\\": 768, \\\"redout\\\\u00e9s\\\": 190, \\\"\\\\u00e9l\\\\u00e9phants\\\": 64, \\\"\\\\u00e9taient\\\": 357, \\\"vont\\\": 168, \\\"aime\\\": 8870, \\\"pr\\\\u00e9f\\\\u00e9r\\\\u00e9e\\\": 770, \\\"n'aiment\\\": 1111, \\\"i\\\": 150, \\\"comme\\\": 259, \\\"conduit\\\": 1706, \\\"p\\\\u00eache\\\": 4848, \\\"bleue\\\": 504, \\\"nouvelle\\\": 648, \\\"vous\\\": 2541, \\\"aimez\\\": 1053, \\\"cet\\\": 286, \\\"oiseau\\\": 128, \\\"pamplemousses\\\": 552, \\\"pleut\\\": 562, \\\"magnifique\\\": 104, \\\"favori\\\": 3857, \\\"aim\\\\u00e9s\\\": 237, \\\"vos\\\": 225, \\\"chevaux\\\": 64, \\\"n'aimez\\\": 1094, \\\"n'aimons\\\": 97, \\\"souris\\\": 256, \\\"d\\\\u00e9testons\\\": 1001, \\\"dernier\\\": 757, \\\"all\\\\u00e9\\\": 187, \\\"petite\\\": 615, \\\"conduisait\\\": 673, \\\"glaciales\\\": 307, \\\"cheval\\\": 192, \\\"vieille\\\": 647, \\\"chiens\\\": 64, \\\"pr\\\\u00e9f\\\\u00e9r\\\\u00e9s\\\": 383, \\\"blanche\\\": 579, \\\"occup\\\\u00e9e\\\": 836, \\\"nos\\\": 613, \\\"l'\\\\u00e9l\\\\u00e9phant\\\": 64, \\\"nouveau\\\": 502, \\\"noire\\\": 602, \\\"pluies\\\": 367, \\\"traduire\\\": 501, \\\"difficile\\\": 260, \\\"pense\\\": 540, \\\"entre\\\": 540, \\\"qu'il\\\": 393, \\\"portugais\\\": 312, \\\"espagnol\\\": 312, \\\"rouill\\\\u00e9\\\": 454, \\\"bleu\\\": 468, \\\"verte\\\": 628, \\\"grande\\\": 459, \\\"aimait\\\": 707, \\\"amusant\\\": 260, \\\"traduction\\\": 277, \\\"cette\\\": 1239, \\\"vert\\\": 344, \\\"grand\\\": 81, \\\"blanc\\\": 393, \\\"volant\\\": 165, \\\"gros\\\": 258, \\\"o\\\\u00f9\\\": 12, \\\"chien\\\": 192, \\\"leurs\\\": 1072, \\\"pourquoi\\\": 240, \\\"l'automobile\\\": 100, \\\"pourrait\\\": 252, \\\"rendre\\\": 350, \\\"se\\\": 461, \\\"maillot\\\": 173, \\\"pr\\\\u00e9voyons\\\": 232, \\\"grosse\\\": 185, \\\"brillant\\\": 587, \\\"pr\\\\u00e9voient\\\": 75, \\\"mouill\\\\u00e9e\\\": 7, \\\"lui\\\": 70, \\\"d\\\\u00e9tendre\\\": 111, \\\"automobile\\\": 278, \\\"aller\\\": 1180, \\\"pourraient\\\": 126, \\\"mes\\\": 297, \\\"s\\\\u00e8che\\\": 837, \\\"l'oiseau\\\": 64, \\\"pluie\\\": 174, \\\"requins\\\": 64, \\\"noir\\\": 370, \\\"singe\\\": 192, \\\"d\\\\u00e9testait\\\": 87, \\\"dans\\\": 12, \\\"temps\\\": 33, \\\"comment\\\": 67, \\\"lion\\\": 192, \\\"pr\\\\u00e9voit\\\": 75, \\\"ours\\\": 192, \\\"porcelaine\\\": 17, \\\"cl\\\\u00e9mentes\\\": 200, \\\"pla\\\\u00eet\\\": 13, \\\"proches\\\": 20, \\\"brillante\\\": 385, \\\"lapin\\\": 192, \\\"l'ours\\\": 64, \\\"chinois\\\": 312, \\\"quand\\\": 144, \\\"tour\\\": 57, \\\"eiffel\\\": 57, \\\"allons\\\": 45, \\\"l'\\\\u00e9picerie\\\": 57, \\\"voulait\\\": 252, \\\"c\\\\u00e9page\\\": 60, \\\"football\\\": 57, \\\"t\\\": 18, \\\"terrain\\\": 57, \\\"du\\\": 39, \\\"veut\\\": 252, \\\"\\\\u00e9l\\\\u00e9phant\\\": 128, \\\"gel\\\\u00e9\\\": 94, \\\"bien\\\": 77, \\\"enneig\\\\u00e9\\\": 7, \\\"gel\\\\u00e9s\\\": 5, \\\"serpent\\\": 192, \\\"all\\\\u00e9s\\\": 150, \\\"all\\\\u00e9e\\\": 150, \\\"envisage\\\": 360, \\\"peu\\\": 41, \\\"mouill\\\\u00e9\\\": 273, \\\"serpents\\\": 64, \\\"facile\\\": 260, \\\"pensez\\\": 60, \\\"anglais\\\": 312, \\\"fran\\\\u00e7ais\\\": 312, \\\"voulez\\\": 12, \\\"grandes\\\": 16, \\\"avez\\\": 162, \\\"aimeraient\\\": 12, \\\"allez\\\": 45, \\\"chats\\\": 64, \\\"lapins\\\": 64, \\\"visite\\\": 68, \\\"intention\\\": 206, \\\"ont\\\": 194, \\\"n'est\\\": 47, \\\"derni\\\\u00e8re\\\": 24, \\\"voulaient\\\": 126, \\\"singes\\\": 64, \\\"\\\\u00eates\\\": 24, \\\"qu'elle\\\": 26, \\\"vers\\\": 76, \\\"lac\\\": 57, \\\"pousse\\\": 41, \\\"d\\\\u00e9testez\\\": 17, \\\"manguiers\\\": 19, \\\"grands\\\": 9, \\\"l'\\\\u00e9cole\\\": 57, \\\"l'animal\\\": 56, \\\"at\\\": 32, \\\"oiseaux\\\": 64, \\\"ressort\\\": 2, \\\"petits\\\": 10, \\\"n'a\\\": 12, \\\"veulent\\\": 126, \\\"rouille\\\": 32, \\\"frais\\\": 20, \\\"limes\\\": 9, \\\"lions\\\": 64, \\\"douce\\\": 14, \\\"envisagent\\\": 9, \\\"petites\\\": 26, \\\"vais\\\": 24, \\\"durant\\\": 14, \\\"c'est\\\": 17, \\\"cong\\\\u00e9lation\\\": 14, \\\"allions\\\": 1, \\\"voudrait\\\": 24, \\\"d\\\\u00e9tend\\\": 2, \\\"trouv\\\\u00e9\\\": 1, \\\"pr\\\\u00e9f\\\\u00e9r\\\\u00e9es\\\": 16, \\\"conduite\\\": 6, \\\"grosses\\\": 8, \\\"b\\\\u00e9nigne\\\": 8, \\\"avons\\\": 19, \\\"sur\\\": 28, \\\"redout\\\\u00e9e\\\": 2, \\\"etats\\\": 3, \\\"moindres\\\": 7, \\\"n'\\\\u00eates\\\": 3, \\\"vit\\\": 3, \\\"as\\\": 1, \\\"tu\\\": 2, \\\"qui\\\": 2, \\\"faire\\\": 1, \\\"traduis\\\": 2, \\\"favoris\\\": 1, \\\"souvent\\\": 1, \\\"es\\\": 1, \\\"appr\\\\u00e9ci\\\\u00e9\\\": 1, \\\"moteur\\\": 1, \\\"tout\\\": 4}\", \"index_docs\": \"{\\\"11\\\": 28828, \\\"2\\\": 72048, \\\"24\\\": 14398, \\\"67\\\": 7256, \\\"6\\\": 59851, \\\"112\\\": 1867, \\\"1\\\": 104279, \\\"3\\\": 84114, \\\"37\\\": 10741, \\\"35\\\": 11047, \\\"8\\\": 32946, \\\"34\\\": 11052, \\\"50\\\": 8954, \\\"31\\\": 11270, \\\"19\\\": 16323, \\\"4\\\": 33404, \\\"49\\\": 8956, \\\"32\\\": 11267, \\\"69\\\": 6112, \\\"95\\\": 3622, \\\"51\\\": 8951, \\\"12\\\": 28101, \\\"45\\\": 9023, \\\"21\\\": 15908, \\\"101\\\": 3061, \\\"41\\\": 9133, \\\"269\\\": 63, \\\"103\\\": 2916, \\\"48\\\": 8958, \\\"39\\\": 9403, \\\"13\\\": 14357, \\\"40\\\": 9368, \\\"5\\\": 63987, \\\"7\\\": 39038, \\\"82\\\": 4852, \\\"16\\\": 23626, \\\"14\\\": 14038, \\\"83\\\": 4848, \\\"10\\\": 28877, \\\"17\\\": 14722, \\\"84\\\": 4848, \\\"20\\\": 15184, \\\"58\\\": 8458, \\\"29\\\": 11334, \\\"52\\\": 8945, \\\"42\\\": 9100, \\\"65\\\": 7782, \\\"15\\\": 24675, \\\"9\\\": 32618, \\\"125\\\": 1124, \\\"85\\\": 4848, \\\"61\\\": 8319, \\\"43\\\": 9090, \\\"87\\\": 4848, \\\"96\\\": 3566, \\\"86\\\": 4848, \\\"108\\\": 2435, \\\"118\\\": 1356, \\\"133\\\": 972, \\\"149\\\": 645, \\\"110\\\": 1944, \\\"183\\\": 325, \\\"144\\\": 698, \\\"30\\\": 11277, \\\"66\\\": 7658, \\\"109\\\": 2248, \\\"122\\\": 1198, \\\"23\\\": 8334, \\\"213\\\": 192, \\\"115\\\": 1572, \\\"38\\\": 10140, \\\"100\\\": 3131, \\\"76\\\": 5835, \\\"28\\\": 10568, \\\"81\\\": 4886, \\\"53\\\": 8942, \\\"62\\\": 8189, \\\"105\\\": 2707, \\\"36\\\": 10936, \\\"18\\\": 17151, \\\"54\\\": 8910, \\\"57\\\": 8704, \\\"59\\\": 8458, \\\"117\\\": 1437, \\\"26\\\": 13381, \\\"98\\\": 3411, \\\"46\\\": 8999, \\\"33\\\": 11170, \\\"25\\\": 14311, \\\"47\\\": 8995, \\\"139\\\": 834, \\\"22\\\": 14740, \\\"78\\\": 5774, \\\"94\\\": 3743, \\\"77\\\": 5780, \\\"80\\\": 4899, \\\"88\\\": 4848, \\\"64\\\": 7855, \\\"56\\\": 8789, \\\"68\\\": 6387, \\\"70\\\": 5844, \\\"71\\\": 5844, \\\"214\\\": 192, \\\"156\\\": 576, \\\"146\\\": 667, \\\"79\\\": 5100, \\\"63\\\": 7957, \\\"60\\\": 8446, \\\"92\\\": 4008, \\\"126\\\": 1111, \\\"107\\\": 2520, \\\"72\\\": 5844, \\\"73\\\": 5844, \\\"99\\\": 3221, \\\"124\\\": 1126, \\\"184\\\": 324, \\\"134\\\": 972, \\\"27\\\": 12080, \\\"242\\\": 105, \\\"119\\\": 1308, \\\"227\\\": 173, \\\"102\\\": 3016, \\\"226\\\": 173, \\\"44\\\": 9038, \\\"131\\\": 1018, \\\"158\\\": 561, \\\"97\\\": 3510, \\\"120\\\": 1278, \\\"91\\\": 4495, \\\"165\\\": 486, \\\"74\\\": 5844, \\\"111\\\": 1927, \\\"135\\\": 966, \\\"89\\\": 4848, \\\"75\\\": 5844, \\\"180\\\": 355, \\\"173\\\": 392, \\\"114\\\": 1666, \\\"136\\\": 908, \\\"207\\\": 233, \\\"116\\\": 1548, \\\"104\\\": 2725, \\\"170\\\": 402, \\\"141\\\": 768, \\\"222\\\": 190, \\\"256\\\": 64, \\\"179\\\": 357, \\\"229\\\": 168, \\\"55\\\": 8870, \\\"140\\\": 770, \\\"127\\\": 1111, \\\"232\\\": 150, \\\"199\\\": 259, \\\"113\\\": 1706, \\\"90\\\": 4848, \\\"162\\\": 504, \\\"147\\\": 648, \\\"106\\\": 2541, \\\"130\\\": 1053, \\\"192\\\": 286, \\\"236\\\": 128, \\\"159\\\": 552, \\\"157\\\": 562, \\\"243\\\": 104, \\\"93\\\": 3857, \\\"206\\\": 237, \\\"209\\\": 225, \\\"257\\\": 64, \\\"128\\\": 1094, \\\"245\\\": 97, \\\"201\\\": 256, \\\"132\\\": 1001, \\\"142\\\": 757, \\\"223\\\": 187, \\\"151\\\": 615, \\\"145\\\": 673, \\\"190\\\": 307, \\\"215\\\": 192, \\\"148\\\": 647, \\\"258\\\": 64, \\\"175\\\": 383, \\\"155\\\": 579, \\\"138\\\": 836, \\\"152\\\": 613, \\\"259\\\": 64, \\\"163\\\": 502, \\\"153\\\": 602, \\\"177\\\": 367, \\\"164\\\": 501, \\\"196\\\": 260, \\\"160\\\": 540, \\\"161\\\": 540, \\\"171\\\": 393, \\\"186\\\": 312, \\\"185\\\": 312, \\\"169\\\": 454, \\\"166\\\": 468, \\\"150\\\": 628, \\\"168\\\": 459, \\\"143\\\": 707, \\\"197\\\": 260, \\\"194\\\": 277, \\\"121\\\": 1239, \\\"182\\\": 344, \\\"248\\\": 81, \\\"172\\\": 393, \\\"230\\\": 165, \\\"200\\\": 258, \\\"310\\\": 12, \\\"216\\\": 192, \\\"129\\\": 1072, \\\"205\\\": 240, \\\"244\\\": 100, \\\"202\\\": 252, \\\"181\\\": 350, \\\"167\\\": 461, \\\"228\\\": 173, \\\"208\\\": 232, \\\"224\\\": 185, \\\"154\\\": 587, \\\"251\\\": 75, \\\"321\\\": 7, \\\"253\\\": 70, \\\"241\\\": 111, \\\"193\\\": 278, \\\"123\\\": 1180, \\\"238\\\": 126, \\\"191\\\": 297, \\\"137\\\": 837, \\\"260\\\": 64, \\\"225\\\": 174, \\\"261\\\": 64, \\\"176\\\": 370, \\\"217\\\": 192, \\\"247\\\": 87, \\\"311\\\": 12, \\\"286\\\": 33, \\\"255\\\": 67, \\\"218\\\": 192, \\\"252\\\": 75, \\\"219\\\": 192, \\\"301\\\": 17, \\\"211\\\": 200, \\\"309\\\": 13, \\\"296\\\": 20, \\\"174\\\": 385, \\\"220\\\": 192, \\\"262\\\": 64, \\\"187\\\": 312, \\\"235\\\": 144, \\\"272\\\": 57, \\\"273\\\": 57, \\\"281\\\": 45, \\\"274\\\": 57, \\\"203\\\": 252, \\\"270\\\": 60, \\\"276\\\": 57, \\\"300\\\": 18, \\\"275\\\": 57, \\\"285\\\": 39, \\\"204\\\": 252, \\\"237\\\": 128, \\\"246\\\": 94, \\\"249\\\": 77, \\\"322\\\": 7, \\\"325\\\": 5, \\\"221\\\": 192, \\\"233\\\": 150, \\\"234\\\": 150, \\\"178\\\": 360, \\\"283\\\": 41, \\\"195\\\": 273, \\\"263\\\": 64, \\\"198\\\": 260, \\\"271\\\": 60, \\\"188\\\": 312, \\\"189\\\": 312, \\\"312\\\": 12, \\\"304\\\": 16, \\\"231\\\": 162, \\\"313\\\": 12, \\\"282\\\": 45, \\\"264\\\": 64, \\\"265\\\": 64, \\\"254\\\": 68, \\\"210\\\": 206, \\\"212\\\": 194, \\\"280\\\": 47, \\\"292\\\": 24, \\\"239\\\": 126, \\\"266\\\": 64, \\\"293\\\": 24, \\\"290\\\": 26, \\\"250\\\": 76, \\\"277\\\": 57, \\\"284\\\": 41, \\\"302\\\": 17, \\\"298\\\": 19, \\\"316\\\": 9, \\\"278\\\": 57, \\\"279\\\": 56, \\\"287\\\": 32, \\\"267\\\": 64, \\\"330\\\": 2, \\\"315\\\": 10, \\\"314\\\": 12, \\\"240\\\": 126, \\\"288\\\": 32, \\\"297\\\": 20, \\\"317\\\": 9, \\\"268\\\": 64, \\\"306\\\": 14, \\\"318\\\": 9, \\\"291\\\": 26, \\\"294\\\": 24, \\\"307\\\": 14, \\\"303\\\": 17, \\\"308\\\": 14, \\\"337\\\": 1, \\\"295\\\": 24, \\\"331\\\": 2, \\\"338\\\": 1, \\\"305\\\": 16, \\\"324\\\": 6, \\\"319\\\": 8, \\\"320\\\": 8, \\\"299\\\": 19, \\\"289\\\": 28, \\\"332\\\": 2, \\\"327\\\": 3, \\\"323\\\": 7, \\\"328\\\": 3, \\\"329\\\": 3, \\\"339\\\": 1, \\\"333\\\": 2, \\\"334\\\": 2, \\\"340\\\": 1, \\\"335\\\": 2, \\\"341\\\": 1, \\\"342\\\": 1, \\\"343\\\": 1, \\\"336\\\": 1, \\\"344\\\": 1, \\\"326\\\": 4}\", \"index_word\": \"{\\\"1\\\": \\\"est\\\", \\\"2\\\": \\\"en\\\", \\\"3\\\": \\\"il\\\", \\\"4\\\": \\\"les\\\", \\\"5\\\": \\\"mais\\\", \\\"6\\\": \\\"et\\\", \\\"7\\\": \\\"la\\\", \\\"8\\\": \\\"parfois\\\", \\\"9\\\": \\\"jamais\\\", \\\"10\\\": \\\"le\\\", \\\"11\\\": \\\"l'\\\", \\\"12\\\": \\\"g\\\\u00e9n\\\\u00e9ralement\\\", \\\"13\\\": \\\"moins\\\", \\\"14\\\": \\\"aim\\\\u00e9\\\", \\\"15\\\": \\\"au\\\", \\\"16\\\": \\\"fruit\\\", \\\"17\\\": \\\"pr\\\\u00e9f\\\\u00e9r\\\\u00e9\\\", \\\"18\\\": \\\"agr\\\\u00e9able\\\", \\\"19\\\": \\\"froid\\\", \\\"20\\\": \\\"son\\\", \\\"21\\\": \\\"chaud\\\", \\\"22\\\": \\\"de\\\", \\\"23\\\": \\\"plus\\\", \\\"24\\\": \\\"automne\\\", \\\"25\\\": \\\"mois\\\", \\\"26\\\": \\\"\\\\u00e0\\\", \\\"27\\\": \\\"elle\\\", \\\"28\\\": \\\"citrons\\\", \\\"29\\\": \\\"paris\\\", \\\"30\\\": \\\"inde\\\", \\\"31\\\": \\\"unis\\\", \\\"32\\\": \\\"\\\\u00e9tats\\\", \\\"33\\\": \\\"france\\\", \\\"34\\\": \\\"jersey\\\", \\\"35\\\": \\\"new\\\", \\\"36\\\": \\\"chine\\\", \\\"37\\\": \\\"pendant\\\", \\\"38\\\": \\\"pamplemousse\\\", \\\"39\\\": \\\"mon\\\", \\\"40\\\": \\\"votre\\\", \\\"41\\\": \\\"juin\\\", \\\"42\\\": \\\"printemps\\\", \\\"43\\\": \\\"janvier\\\", \\\"44\\\": \\\"hiver\\\", \\\"45\\\": \\\"mars\\\", \\\"46\\\": \\\"\\\\u00e9t\\\\u00e9\\\", \\\"47\\\": \\\"mai\\\", \\\"48\\\": \\\"septembre\\\", \\\"49\\\": \\\"juillet\\\", \\\"50\\\": \\\"avril\\\", \\\"51\\\": \\\"novembre\\\", \\\"52\\\": \\\"d\\\\u00e9cembre\\\", \\\"53\\\": \\\"f\\\\u00e9vrier\\\", \\\"54\\\": \\\"octobre\\\", \\\"55\\\": \\\"aime\\\", \\\"56\\\": \\\"ao\\\\u00fbt\\\", \\\"57\\\": \\\"merveilleux\\\", \\\"58\\\": \\\"relaxant\\\", \\\"59\\\": \\\"doux\\\", \\\"60\\\": \\\"humide\\\", \\\"61\\\": \\\"notre\\\", \\\"62\\\": \\\"californie\\\", \\\"63\\\": \\\"sec\\\", \\\"64\\\": \\\"leur\\\", \\\"65\\\": \\\"occup\\\\u00e9\\\", \\\"66\\\": \\\"pluvieux\\\", \\\"67\\\": \\\"calme\\\", \\\"68\\\": \\\"beau\\\", \\\"69\\\": \\\"habituellement\\\", \\\"70\\\": \\\"pommes\\\", \\\"71\\\": \\\"p\\\\u00eaches\\\", \\\"72\\\": \\\"oranges\\\", \\\"73\\\": \\\"poires\\\", \\\"74\\\": \\\"fraises\\\", \\\"75\\\": \\\"bananes\\\", \\\"76\\\": \\\"verts\\\", \\\"77\\\": \\\"raisins\\\", \\\"78\\\": \\\"mangues\\\", \\\"79\\\": \\\"d'\\\", \\\"80\\\": \\\"mangue\\\", \\\"81\\\": \\\"gel\\\", \\\"82\\\": \\\"raisin\\\", \\\"83\\\": \\\"pomme\\\", \\\"84\\\": \\\"l'orange\\\", \\\"85\\\": \\\"citron\\\", \\\"86\\\": \\\"chaux\\\", \\\"87\\\": \\\"banane\\\", \\\"88\\\": \\\"poire\\\", \\\"89\\\": \\\"fraise\\\", \\\"90\\\": \\\"p\\\\u00eache\\\", \\\"91\\\": \\\"pas\\\", \\\"92\\\": \\\"enneig\\\\u00e9e\\\", \\\"93\\\": \\\"favori\\\", \\\"94\\\": \\\"d\\\\u00e9teste\\\", \\\"95\\\": \\\"g\\\\u00e8le\\\", \\\"96\\\": \\\"fruits\\\", \\\"97\\\": \\\"voiture\\\", \\\"98\\\": \\\"l'automne\\\", \\\"99\\\": \\\"ils\\\", \\\"100\\\": \\\"n'aime\\\", \\\"101\\\": \\\"california\\\", \\\"102\\\": \\\"neige\\\", \\\"103\\\": \\\"fait\\\", \\\"104\\\": \\\"belle\\\", \\\"105\\\": \\\"ne\\\", \\\"106\\\": \\\"vous\\\", \\\"107\\\": \\\"nous\\\", \\\"108\\\": \\\"des\\\", \\\"109\\\": \\\"animal\\\", \\\"110\\\": \\\"camion\\\", \\\"111\\\": \\\"cours\\\", \\\"112\\\": \\\"neigeux\\\", \\\"113\\\": \\\"conduit\\\", \\\"114\\\": \\\"prochain\\\", \\\"115\\\": \\\"ce\\\", \\\"116\\\": \\\"je\\\", \\\"117\\\": \\\"tranquille\\\", \\\"118\\\": \\\"a\\\", \\\"119\\\": \\\"cher\\\", \\\"120\\\": \\\"une\\\", \\\"121\\\": \\\"cette\\\", \\\"122\\\": \\\"\\\\u00e9tait\\\", \\\"123\\\": \\\"aller\\\", \\\"124\\\": \\\"aiment\\\", \\\"125\\\": \\\"chaude\\\", \\\"126\\\": \\\"aimons\\\", \\\"127\\\": \\\"n'aiment\\\", \\\"128\\\": \\\"n'aimez\\\", \\\"129\\\": \\\"leurs\\\", \\\"130\\\": \\\"aimez\\\", \\\"131\\\": \\\"sont\\\", \\\"132\\\": \\\"d\\\\u00e9testons\\\", \\\"133\\\": \\\"jaune\\\", \\\"134\\\": \\\"rouge\\\", \\\"135\\\": \\\"j'aime\\\", \\\"136\\\": \\\"visiter\\\", \\\"137\\\": \\\"s\\\\u00e8che\\\", \\\"138\\\": \\\"occup\\\\u00e9e\\\", \\\"139\\\": \\\"frisquet\\\", \\\"140\\\": \\\"pr\\\\u00e9f\\\\u00e9r\\\\u00e9e\\\", \\\"141\\\": \\\"animaux\\\", \\\"142\\\": \\\"dernier\\\", \\\"143\\\": \\\"aimait\\\", \\\"144\\\": \\\"un\\\", \\\"145\\\": \\\"conduisait\\\", \\\"146\\\": \\\"que\\\", \\\"147\\\": \\\"nouvelle\\\", \\\"148\\\": \\\"vieille\\\", \\\"149\\\": \\\"vu\\\", \\\"150\\\": \\\"verte\\\", \\\"151\\\": \\\"petite\\\", \\\"152\\\": \\\"nos\\\", \\\"153\\\": \\\"noire\\\", \\\"154\\\": \\\"brillant\\\", \\\"155\\\": \\\"blanche\\\", \\\"156\\\": \\\"redout\\\\u00e9\\\", \\\"157\\\": \\\"pleut\\\", \\\"158\\\": \\\"n'aimait\\\", \\\"159\\\": \\\"pamplemousses\\\", \\\"160\\\": \\\"pense\\\", \\\"161\\\": \\\"entre\\\", \\\"162\\\": \\\"bleue\\\", \\\"163\\\": \\\"nouveau\\\", \\\"164\\\": \\\"traduire\\\", \\\"165\\\": \\\"rouill\\\\u00e9e\\\", \\\"166\\\": \\\"bleu\\\", \\\"167\\\": \\\"se\\\", \\\"168\\\": \\\"grande\\\", \\\"169\\\": \\\"rouill\\\\u00e9\\\", \\\"170\\\": \\\"ses\\\", \\\"171\\\": \\\"qu'il\\\", \\\"172\\\": \\\"blanc\\\", \\\"173\\\": \\\"aux\\\", \\\"174\\\": \\\"brillante\\\", \\\"175\\\": \\\"pr\\\\u00e9f\\\\u00e9r\\\\u00e9s\\\", \\\"176\\\": \\\"noir\\\", \\\"177\\\": \\\"pluies\\\", \\\"178\\\": \\\"envisage\\\", \\\"179\\\": \\\"\\\\u00e9taient\\\", \\\"180\\\": \\\"va\\\", \\\"181\\\": \\\"rendre\\\", \\\"182\\\": \\\"vert\\\", \\\"183\\\": \\\"vieux\\\", \\\"184\\\": \\\"petit\\\", \\\"185\\\": \\\"espagnol\\\", \\\"186\\\": \\\"portugais\\\", \\\"187\\\": \\\"chinois\\\", \\\"188\\\": \\\"anglais\\\", \\\"189\\\": \\\"fran\\\\u00e7ais\\\", \\\"190\\\": \\\"glaciales\\\", \\\"191\\\": \\\"mes\\\", \\\"192\\\": \\\"cet\\\", \\\"193\\\": \\\"automobile\\\", \\\"194\\\": \\\"traduction\\\", \\\"195\\\": \\\"mouill\\\\u00e9\\\", \\\"196\\\": \\\"difficile\\\", \\\"197\\\": \\\"amusant\\\", \\\"198\\\": \\\"facile\\\", \\\"199\\\": \\\"comme\\\", \\\"200\\\": \\\"gros\\\", \\\"201\\\": \\\"souris\\\", \\\"202\\\": \\\"pourrait\\\", \\\"203\\\": \\\"voulait\\\", \\\"204\\\": \\\"veut\\\", \\\"205\\\": \\\"pourquoi\\\", \\\"206\\\": \\\"aim\\\\u00e9s\\\", \\\"207\\\": \\\"pr\\\\u00e9vois\\\", \\\"208\\\": \\\"pr\\\\u00e9voyons\\\", \\\"209\\\": \\\"vos\\\", \\\"210\\\": \\\"intention\\\", \\\"211\\\": \\\"cl\\\\u00e9mentes\\\", \\\"212\\\": \\\"ont\\\", \\\"213\\\": \\\"chat\\\", \\\"214\\\": \\\"requin\\\", \\\"215\\\": \\\"cheval\\\", \\\"216\\\": \\\"chien\\\", \\\"217\\\": \\\"singe\\\", \\\"218\\\": \\\"lion\\\", \\\"219\\\": \\\"ours\\\", \\\"220\\\": \\\"lapin\\\", \\\"221\\\": \\\"serpent\\\", \\\"222\\\": \\\"redout\\\\u00e9s\\\", \\\"223\\\": \\\"all\\\\u00e9\\\", \\\"224\\\": \\\"grosse\\\", \\\"225\\\": \\\"pluie\\\", \\\"226\\\": \\\"trop\\\", \\\"227\\\": \\\"monde\\\", \\\"228\\\": \\\"maillot\\\", \\\"229\\\": \\\"vont\\\", \\\"230\\\": \\\"volant\\\", \\\"231\\\": \\\"avez\\\", \\\"232\\\": \\\"i\\\", \\\"233\\\": \\\"all\\\\u00e9s\\\", \\\"234\\\": \\\"all\\\\u00e9e\\\", \\\"235\\\": \\\"quand\\\", \\\"236\\\": \\\"oiseau\\\", \\\"237\\\": \\\"\\\\u00e9l\\\\u00e9phant\\\", \\\"238\\\": \\\"pourraient\\\", \\\"239\\\": \\\"voulaient\\\", \\\"240\\\": \\\"veulent\\\", \\\"241\\\": \\\"d\\\\u00e9tendre\\\", \\\"242\\\": \\\"aim\\\\u00e9e\\\", \\\"243\\\": \\\"magnifique\\\", \\\"244\\\": \\\"l'automobile\\\", \\\"245\\\": \\\"n'aimons\\\", \\\"246\\\": \\\"gel\\\\u00e9\\\", \\\"247\\\": \\\"d\\\\u00e9testait\\\", \\\"248\\\": \\\"grand\\\", \\\"249\\\": \\\"bien\\\", \\\"250\\\": \\\"vers\\\", \\\"251\\\": \\\"pr\\\\u00e9voient\\\", \\\"252\\\": \\\"pr\\\\u00e9voit\\\", \\\"253\\\": \\\"lui\\\", \\\"254\\\": \\\"visite\\\", \\\"255\\\": \\\"comment\\\", \\\"256\\\": \\\"\\\\u00e9l\\\\u00e9phants\\\", \\\"257\\\": \\\"chevaux\\\", \\\"258\\\": \\\"chiens\\\", \\\"259\\\": \\\"l'\\\\u00e9l\\\\u00e9phant\\\", \\\"260\\\": \\\"l'oiseau\\\", \\\"261\\\": \\\"requins\\\", \\\"262\\\": \\\"l'ours\\\", \\\"263\\\": \\\"serpents\\\", \\\"264\\\": \\\"chats\\\", \\\"265\\\": \\\"lapins\\\", \\\"266\\\": \\\"singes\\\", \\\"267\\\": \\\"oiseaux\\\", \\\"268\\\": \\\"lions\\\", \\\"269\\\": \\\"l\\\\u00e9g\\\\u00e8re\\\", \\\"270\\\": \\\"c\\\\u00e9page\\\", \\\"271\\\": \\\"pensez\\\", \\\"272\\\": \\\"tour\\\", \\\"273\\\": \\\"eiffel\\\", \\\"274\\\": \\\"l'\\\\u00e9picerie\\\", \\\"275\\\": \\\"terrain\\\", \\\"276\\\": \\\"football\\\", \\\"277\\\": \\\"lac\\\", \\\"278\\\": \\\"l'\\\\u00e9cole\\\", \\\"279\\\": \\\"l'animal\\\", \\\"280\\\": \\\"n'est\\\", \\\"281\\\": \\\"allons\\\", \\\"282\\\": \\\"allez\\\", \\\"283\\\": \\\"peu\\\", \\\"284\\\": \\\"pousse\\\", \\\"285\\\": \\\"du\\\", \\\"286\\\": \\\"temps\\\", \\\"287\\\": \\\"at\\\", \\\"288\\\": \\\"rouille\\\", \\\"289\\\": \\\"sur\\\", \\\"290\\\": \\\"qu'elle\\\", \\\"291\\\": \\\"petites\\\", \\\"292\\\": \\\"derni\\\\u00e8re\\\", \\\"293\\\": \\\"\\\\u00eates\\\", \\\"294\\\": \\\"vais\\\", \\\"295\\\": \\\"voudrait\\\", \\\"296\\\": \\\"proches\\\", \\\"297\\\": \\\"frais\\\", \\\"298\\\": \\\"manguiers\\\", \\\"299\\\": \\\"avons\\\", \\\"300\\\": \\\"t\\\", \\\"301\\\": \\\"porcelaine\\\", \\\"302\\\": \\\"d\\\\u00e9testez\\\", \\\"303\\\": \\\"c'est\\\", \\\"304\\\": \\\"grandes\\\", \\\"305\\\": \\\"pr\\\\u00e9f\\\\u00e9r\\\\u00e9es\\\", \\\"306\\\": \\\"douce\\\", \\\"307\\\": \\\"durant\\\", \\\"308\\\": \\\"cong\\\\u00e9lation\\\", \\\"309\\\": \\\"pla\\\\u00eet\\\", \\\"310\\\": \\\"o\\\\u00f9\\\", \\\"311\\\": \\\"dans\\\", \\\"312\\\": \\\"voulez\\\", \\\"313\\\": \\\"aimeraient\\\", \\\"314\\\": \\\"n'a\\\", \\\"315\\\": \\\"petits\\\", \\\"316\\\": \\\"grands\\\", \\\"317\\\": \\\"limes\\\", \\\"318\\\": \\\"envisagent\\\", \\\"319\\\": \\\"grosses\\\", \\\"320\\\": \\\"b\\\\u00e9nigne\\\", \\\"321\\\": \\\"mouill\\\\u00e9e\\\", \\\"322\\\": \\\"enneig\\\\u00e9\\\", \\\"323\\\": \\\"moindres\\\", \\\"324\\\": \\\"conduite\\\", \\\"325\\\": \\\"gel\\\\u00e9s\\\", \\\"326\\\": \\\"tout\\\", \\\"327\\\": \\\"etats\\\", \\\"328\\\": \\\"n'\\\\u00eates\\\", \\\"329\\\": \\\"vit\\\", \\\"330\\\": \\\"ressort\\\", \\\"331\\\": \\\"d\\\\u00e9tend\\\", \\\"332\\\": \\\"redout\\\\u00e9e\\\", \\\"333\\\": \\\"tu\\\", \\\"334\\\": \\\"qui\\\", \\\"335\\\": \\\"traduis\\\", \\\"336\\\": \\\"appr\\\\u00e9ci\\\\u00e9\\\", \\\"337\\\": \\\"allions\\\", \\\"338\\\": \\\"trouv\\\\u00e9\\\", \\\"339\\\": \\\"as\\\", \\\"340\\\": \\\"faire\\\", \\\"341\\\": \\\"favoris\\\", \\\"342\\\": \\\"souvent\\\", \\\"343\\\": \\\"es\\\", \\\"344\\\": \\\"moteur\\\"}\", \"word_index\": \"{\\\"est\\\": 1, \\\"en\\\": 2, \\\"il\\\": 3, \\\"les\\\": 4, \\\"mais\\\": 5, \\\"et\\\": 6, \\\"la\\\": 7, \\\"parfois\\\": 8, \\\"jamais\\\": 9, \\\"le\\\": 10, \\\"l'\\\": 11, \\\"g\\\\u00e9n\\\\u00e9ralement\\\": 12, \\\"moins\\\": 13, \\\"aim\\\\u00e9\\\": 14, \\\"au\\\": 15, \\\"fruit\\\": 16, \\\"pr\\\\u00e9f\\\\u00e9r\\\\u00e9\\\": 17, \\\"agr\\\\u00e9able\\\": 18, \\\"froid\\\": 19, \\\"son\\\": 20, \\\"chaud\\\": 21, \\\"de\\\": 22, \\\"plus\\\": 23, \\\"automne\\\": 24, \\\"mois\\\": 25, \\\"\\\\u00e0\\\": 26, \\\"elle\\\": 27, \\\"citrons\\\": 28, \\\"paris\\\": 29, \\\"inde\\\": 30, \\\"unis\\\": 31, \\\"\\\\u00e9tats\\\": 32, \\\"france\\\": 33, \\\"jersey\\\": 34, \\\"new\\\": 35, \\\"chine\\\": 36, \\\"pendant\\\": 37, \\\"pamplemousse\\\": 38, \\\"mon\\\": 39, \\\"votre\\\": 40, \\\"juin\\\": 41, \\\"printemps\\\": 42, \\\"janvier\\\": 43, \\\"hiver\\\": 44, \\\"mars\\\": 45, \\\"\\\\u00e9t\\\\u00e9\\\": 46, \\\"mai\\\": 47, \\\"septembre\\\": 48, \\\"juillet\\\": 49, \\\"avril\\\": 50, \\\"novembre\\\": 51, \\\"d\\\\u00e9cembre\\\": 52, \\\"f\\\\u00e9vrier\\\": 53, \\\"octobre\\\": 54, \\\"aime\\\": 55, \\\"ao\\\\u00fbt\\\": 56, \\\"merveilleux\\\": 57, \\\"relaxant\\\": 58, \\\"doux\\\": 59, \\\"humide\\\": 60, \\\"notre\\\": 61, \\\"californie\\\": 62, \\\"sec\\\": 63, \\\"leur\\\": 64, \\\"occup\\\\u00e9\\\": 65, \\\"pluvieux\\\": 66, \\\"calme\\\": 67, \\\"beau\\\": 68, \\\"habituellement\\\": 69, \\\"pommes\\\": 70, \\\"p\\\\u00eaches\\\": 71, \\\"oranges\\\": 72, \\\"poires\\\": 73, \\\"fraises\\\": 74, \\\"bananes\\\": 75, \\\"verts\\\": 76, \\\"raisins\\\": 77, \\\"mangues\\\": 78, \\\"d'\\\": 79, \\\"mangue\\\": 80, \\\"gel\\\": 81, \\\"raisin\\\": 82, \\\"pomme\\\": 83, \\\"l'orange\\\": 84, \\\"citron\\\": 85, \\\"chaux\\\": 86, \\\"banane\\\": 87, \\\"poire\\\": 88, \\\"fraise\\\": 89, \\\"p\\\\u00eache\\\": 90, \\\"pas\\\": 91, \\\"enneig\\\\u00e9e\\\": 92, \\\"favori\\\": 93, \\\"d\\\\u00e9teste\\\": 94, \\\"g\\\\u00e8le\\\": 95, \\\"fruits\\\": 96, \\\"voiture\\\": 97, \\\"l'automne\\\": 98, \\\"ils\\\": 99, \\\"n'aime\\\": 100, \\\"california\\\": 101, \\\"neige\\\": 102, \\\"fait\\\": 103, \\\"belle\\\": 104, \\\"ne\\\": 105, \\\"vous\\\": 106, \\\"nous\\\": 107, \\\"des\\\": 108, \\\"animal\\\": 109, \\\"camion\\\": 110, \\\"cours\\\": 111, \\\"neigeux\\\": 112, \\\"conduit\\\": 113, \\\"prochain\\\": 114, \\\"ce\\\": 115, \\\"je\\\": 116, \\\"tranquille\\\": 117, \\\"a\\\": 118, \\\"cher\\\": 119, \\\"une\\\": 120, \\\"cette\\\": 121, \\\"\\\\u00e9tait\\\": 122, \\\"aller\\\": 123, \\\"aiment\\\": 124, \\\"chaude\\\": 125, \\\"aimons\\\": 126, \\\"n'aiment\\\": 127, \\\"n'aimez\\\": 128, \\\"leurs\\\": 129, \\\"aimez\\\": 130, \\\"sont\\\": 131, \\\"d\\\\u00e9testons\\\": 132, \\\"jaune\\\": 133, \\\"rouge\\\": 134, \\\"j'aime\\\": 135, \\\"visiter\\\": 136, \\\"s\\\\u00e8che\\\": 137, \\\"occup\\\\u00e9e\\\": 138, \\\"frisquet\\\": 139, \\\"pr\\\\u00e9f\\\\u00e9r\\\\u00e9e\\\": 140, \\\"animaux\\\": 141, \\\"dernier\\\": 142, \\\"aimait\\\": 143, \\\"un\\\": 144, \\\"conduisait\\\": 145, \\\"que\\\": 146, \\\"nouvelle\\\": 147, \\\"vieille\\\": 148, \\\"vu\\\": 149, \\\"verte\\\": 150, \\\"petite\\\": 151, \\\"nos\\\": 152, \\\"noire\\\": 153, \\\"brillant\\\": 154, \\\"blanche\\\": 155, \\\"redout\\\\u00e9\\\": 156, \\\"pleut\\\": 157, \\\"n'aimait\\\": 158, \\\"pamplemousses\\\": 159, \\\"pense\\\": 160, \\\"entre\\\": 161, \\\"bleue\\\": 162, \\\"nouveau\\\": 163, \\\"traduire\\\": 164, \\\"rouill\\\\u00e9e\\\": 165, \\\"bleu\\\": 166, \\\"se\\\": 167, \\\"grande\\\": 168, \\\"rouill\\\\u00e9\\\": 169, \\\"ses\\\": 170, \\\"qu'il\\\": 171, \\\"blanc\\\": 172, \\\"aux\\\": 173, \\\"brillante\\\": 174, \\\"pr\\\\u00e9f\\\\u00e9r\\\\u00e9s\\\": 175, \\\"noir\\\": 176, \\\"pluies\\\": 177, \\\"envisage\\\": 178, \\\"\\\\u00e9taient\\\": 179, \\\"va\\\": 180, \\\"rendre\\\": 181, \\\"vert\\\": 182, \\\"vieux\\\": 183, \\\"petit\\\": 184, \\\"espagnol\\\": 185, \\\"portugais\\\": 186, \\\"chinois\\\": 187, \\\"anglais\\\": 188, \\\"fran\\\\u00e7ais\\\": 189, \\\"glaciales\\\": 190, \\\"mes\\\": 191, \\\"cet\\\": 192, \\\"automobile\\\": 193, \\\"traduction\\\": 194, \\\"mouill\\\\u00e9\\\": 195, \\\"difficile\\\": 196, \\\"amusant\\\": 197, \\\"facile\\\": 198, \\\"comme\\\": 199, \\\"gros\\\": 200, \\\"souris\\\": 201, \\\"pourrait\\\": 202, \\\"voulait\\\": 203, \\\"veut\\\": 204, \\\"pourquoi\\\": 205, \\\"aim\\\\u00e9s\\\": 206, \\\"pr\\\\u00e9vois\\\": 207, \\\"pr\\\\u00e9voyons\\\": 208, \\\"vos\\\": 209, \\\"intention\\\": 210, \\\"cl\\\\u00e9mentes\\\": 211, \\\"ont\\\": 212, \\\"chat\\\": 213, \\\"requin\\\": 214, \\\"cheval\\\": 215, \\\"chien\\\": 216, \\\"singe\\\": 217, \\\"lion\\\": 218, \\\"ours\\\": 219, \\\"lapin\\\": 220, \\\"serpent\\\": 221, \\\"redout\\\\u00e9s\\\": 222, \\\"all\\\\u00e9\\\": 223, \\\"grosse\\\": 224, \\\"pluie\\\": 225, \\\"trop\\\": 226, \\\"monde\\\": 227, \\\"maillot\\\": 228, \\\"vont\\\": 229, \\\"volant\\\": 230, \\\"avez\\\": 231, \\\"i\\\": 232, \\\"all\\\\u00e9s\\\": 233, \\\"all\\\\u00e9e\\\": 234, \\\"quand\\\": 235, \\\"oiseau\\\": 236, \\\"\\\\u00e9l\\\\u00e9phant\\\": 237, \\\"pourraient\\\": 238, \\\"voulaient\\\": 239, \\\"veulent\\\": 240, \\\"d\\\\u00e9tendre\\\": 241, \\\"aim\\\\u00e9e\\\": 242, \\\"magnifique\\\": 243, \\\"l'automobile\\\": 244, \\\"n'aimons\\\": 245, \\\"gel\\\\u00e9\\\": 246, \\\"d\\\\u00e9testait\\\": 247, \\\"grand\\\": 248, \\\"bien\\\": 249, \\\"vers\\\": 250, \\\"pr\\\\u00e9voient\\\": 251, \\\"pr\\\\u00e9voit\\\": 252, \\\"lui\\\": 253, \\\"visite\\\": 254, \\\"comment\\\": 255, \\\"\\\\u00e9l\\\\u00e9phants\\\": 256, \\\"chevaux\\\": 257, \\\"chiens\\\": 258, \\\"l'\\\\u00e9l\\\\u00e9phant\\\": 259, \\\"l'oiseau\\\": 260, \\\"requins\\\": 261, \\\"l'ours\\\": 262, \\\"serpents\\\": 263, \\\"chats\\\": 264, \\\"lapins\\\": 265, \\\"singes\\\": 266, \\\"oiseaux\\\": 267, \\\"lions\\\": 268, \\\"l\\\\u00e9g\\\\u00e8re\\\": 269, \\\"c\\\\u00e9page\\\": 270, \\\"pensez\\\": 271, \\\"tour\\\": 272, \\\"eiffel\\\": 273, \\\"l'\\\\u00e9picerie\\\": 274, \\\"terrain\\\": 275, \\\"football\\\": 276, \\\"lac\\\": 277, \\\"l'\\\\u00e9cole\\\": 278, \\\"l'animal\\\": 279, \\\"n'est\\\": 280, \\\"allons\\\": 281, \\\"allez\\\": 282, \\\"peu\\\": 283, \\\"pousse\\\": 284, \\\"du\\\": 285, \\\"temps\\\": 286, \\\"at\\\": 287, \\\"rouille\\\": 288, \\\"sur\\\": 289, \\\"qu'elle\\\": 290, \\\"petites\\\": 291, \\\"derni\\\\u00e8re\\\": 292, \\\"\\\\u00eates\\\": 293, \\\"vais\\\": 294, \\\"voudrait\\\": 295, \\\"proches\\\": 296, \\\"frais\\\": 297, \\\"manguiers\\\": 298, \\\"avons\\\": 299, \\\"t\\\": 300, \\\"porcelaine\\\": 301, \\\"d\\\\u00e9testez\\\": 302, \\\"c'est\\\": 303, \\\"grandes\\\": 304, \\\"pr\\\\u00e9f\\\\u00e9r\\\\u00e9es\\\": 305, \\\"douce\\\": 306, \\\"durant\\\": 307, \\\"cong\\\\u00e9lation\\\": 308, \\\"pla\\\\u00eet\\\": 309, \\\"o\\\\u00f9\\\": 310, \\\"dans\\\": 311, \\\"voulez\\\": 312, \\\"aimeraient\\\": 313, \\\"n'a\\\": 314, \\\"petits\\\": 315, \\\"grands\\\": 316, \\\"limes\\\": 317, \\\"envisagent\\\": 318, \\\"grosses\\\": 319, \\\"b\\\\u00e9nigne\\\": 320, \\\"mouill\\\\u00e9e\\\": 321, \\\"enneig\\\\u00e9\\\": 322, \\\"moindres\\\": 323, \\\"conduite\\\": 324, \\\"gel\\\\u00e9s\\\": 325, \\\"tout\\\": 326, \\\"etats\\\": 327, \\\"n'\\\\u00eates\\\": 328, \\\"vit\\\": 329, \\\"ressort\\\": 330, \\\"d\\\\u00e9tend\\\": 331, \\\"redout\\\\u00e9e\\\": 332, \\\"tu\\\": 333, \\\"qui\\\": 334, \\\"traduis\\\": 335, \\\"appr\\\\u00e9ci\\\\u00e9\\\": 336, \\\"allions\\\": 337, \\\"trouv\\\\u00e9\\\": 338, \\\"as\\\": 339, \\\"faire\\\": 340, \\\"favoris\\\": 341, \\\"souvent\\\": 342, \\\"es\\\": 343, \\\"moteur\\\": 344}\"}}"
 
 
Task 2/gui.py DELETED
@@ -1,218 +0,0 @@
1
- # Import libraries
2
- import tkinter as tk
3
- from tkinter import ttk, messagebox
4
-
5
- from keras.layers import TextVectorization
6
- import re
7
- import tensorflow.strings as tf_strings
8
- import json
9
- import string
10
- from keras.models import load_model
11
- import tensorflow as tf
12
- from keras.preprocessing.text import tokenizer_from_json
13
- from keras.utils import pad_sequences
14
- import numpy as np
15
- import difflib
16
-
17
- # English to Spanish translation
18
- strip_chars = string.punctuation + "¿"
19
- strip_chars = strip_chars.replace("[", "")
20
- strip_chars = strip_chars.replace("]", "")
21
-
22
- def custom_standardization(input_string):
23
- lowercase = tf_strings.lower(input_string)
24
- return tf_strings.regex_replace(lowercase, f"[{re.escape(strip_chars)}]", "")
25
-
26
- # Load the English vectorization layer configuration
27
- with open('eng_vectorization_config.json') as json_file:
28
- eng_vectorization_config = json.load(json_file)
29
-
30
- # Recreate the English vectorization layer with basic configuration
31
- eng_vectorization = TextVectorization(
32
- max_tokens=eng_vectorization_config['max_tokens'],
33
- output_mode=eng_vectorization_config['output_mode'],
34
- output_sequence_length=eng_vectorization_config['output_sequence_length']
35
- )
36
-
37
- # Apply the custom standardization function
38
- eng_vectorization.standardize = custom_standardization
39
-
40
- # Load the Spanish vectorization layer configuration
41
- with open('spa_vectorization_config.json') as json_file:
42
- spa_vectorization_config = json.load(json_file)
43
-
44
- # Recreate the Spanish vectorization layer with basic configuration
45
- spa_vectorization = TextVectorization(
46
- max_tokens=spa_vectorization_config['max_tokens'],
47
- output_mode=spa_vectorization_config['output_mode'],
48
- output_sequence_length=spa_vectorization_config['output_sequence_length'],
49
- standardize=custom_standardization
50
- )
51
-
52
- # Load and set the English vocabulary
53
- with open('eng_vocab.json') as json_file:
54
- eng_vocab = json.load(json_file)
55
- eng_vectorization.set_vocabulary(eng_vocab)
56
-
57
- # Load and set the Spanish vocabulary
58
- with open('spa_vocab.json') as json_file:
59
- spa_vocab = json.load(json_file)
60
- spa_vectorization.set_vocabulary(spa_vocab)
61
-
62
- # Load the Spanish model
63
- transformer = load_model('transformer_model')
64
-
65
- spa_index_lookup = dict(zip(range(len(spa_vocab)), spa_vocab))
66
- max_decoded_sentence_length = 20
67
-
68
- # Initialize list to track incorrect words
69
- incorrect_words = []
70
-
71
- def beam_search_decode(input_sentence, beam_width=3):
72
- tokenized_input_sentence = eng_vectorization([input_sentence])
73
- decoded_sentences = [("[start]", 0.0)]
74
-
75
- for i in range(max_decoded_sentence_length):
76
- all_candidates = []
77
- for decoded_sentence, score in decoded_sentences:
78
- tokenized_target_sentence = spa_vectorization([decoded_sentence])[:, :-1]
79
- predictions = transformer([tokenized_input_sentence, tokenized_target_sentence])
80
- top_k = tf.math.top_k(predictions[0, i, :], k=beam_width)
81
-
82
- for j in range(beam_width):
83
- predicted_token_index = top_k.indices[j].numpy()
84
- predicted_token = spa_index_lookup[predicted_token_index]
85
- candidate = (decoded_sentence + " " + predicted_token, score + top_k.values[j].numpy())
86
- all_candidates.append(candidate)
87
-
88
- ordered = sorted(all_candidates, key=lambda x: x[1], reverse=True)
89
- decoded_sentences = ordered[:beam_width]
90
-
91
- if all(sentence[0].endswith("[end]") for sentence in decoded_sentences):
92
- break
93
-
94
- return decoded_sentences[0][0]
95
-
96
- # English to French translation
97
- # Load French model
98
- model = load_model('english_to_french_model')
99
-
100
- # Load Tokenizer
101
- with open('english_tokenizer.json') as f:
102
- data = json.load(f)
103
- english_tokenizer = tokenizer_from_json(data)
104
-
105
- with open('french_tokenizer.json') as f:
106
- data = json.load(f)
107
- french_tokenizer = tokenizer_from_json(data)
108
-
109
- # Load max length
110
- with open('sequence_length.json') as f:
111
- max_length = json.load(f)
112
-
113
- def pad(x, length=None):
114
- return pad_sequences(x, maxlen=length, padding='post')
115
-
116
- def translate_to_french(english_sentence):
117
- english_sentence = english_sentence.lower()
118
- english_sentence = re.sub(r'[.?!,]', '', english_sentence)
119
- english_sentence = english_tokenizer.texts_to_sequences([english_sentence])
120
- english_sentence = pad(english_sentence, max_length)
121
- english_sentence = english_sentence.reshape((-1, max_length))
122
-
123
- french_sentence = model.predict(english_sentence)[0]
124
- french_sentence = [np.argmax(word) for word in french_sentence]
125
- french_sentence = french_tokenizer.sequences_to_texts([french_sentence])[0]
126
-
127
- return french_sentence
128
-
129
- def get_word_suggestions(word, vocab):
130
- return difflib.get_close_matches(word, vocab, n=3, cutoff=0.6)
131
-
132
- def check_and_correct_sentence(sentence, vocab):
133
- words = sentence.split()
134
- incorrect_words.clear()
135
- corrected_sentence = []
136
- for word in words:
137
- if word not in vocab:
138
- suggestions = get_word_suggestions(word, vocab)
139
- incorrect_words.append((word, suggestions))
140
- else:
141
- corrected_sentence.append(word)
142
-
143
- if incorrect_words:
144
- message = f"Incorrect word(s) detected: {', '.join([w[0] for w in incorrect_words])}\n"
145
- for word, suggestions in incorrect_words:
146
- message += f"Suggestions for '{word}': {', '.join(suggestions) if suggestions else 'No suggestions available'}\n"
147
- if len(incorrect_words) >= 1:
148
- messagebox.showerror("Error", message)
149
- return False
150
- return True
151
-
152
- def translate_to_spanish(english_sentence):
153
- if not check_and_correct_sentence(english_sentence, eng_vocab):
154
- return ""
155
- spanish_sentence = beam_search_decode(english_sentence)
156
- return spanish_sentence.replace("[start]", "").replace("[end]", "").strip()
157
-
158
- # Function to handle translation request based on selected language
159
- def handle_translate():
160
- selected_language = language_var.get()
161
- english_sentence = text_input.get("1.0", "end-1c").strip()
162
-
163
- if not english_sentence:
164
- messagebox.showwarning("Warning", "Please enter a sentence to translate.")
165
- return
166
-
167
- if selected_language == "French":
168
- translation = translate_to_french(english_sentence)
169
- elif selected_language == "Spanish":
170
- translation = translate_to_spanish(english_sentence)
171
-
172
- translation_output.delete("1.0", "end")
173
- translation_output.insert("end", f"{selected_language} translation: {translation}")
174
-
175
- # Setting up the main window
176
- root = tk.Tk()
177
- root.title("Language Translator")
178
- root.geometry("550x600")
179
-
180
- # Font configuration
181
- font_style = "Times New Roman"
182
- font_size = 14
183
-
184
- # Frame for input
185
- input_frame = tk.Frame(root)
186
- input_frame.pack(pady=10)
187
-
188
- # Heading for input
189
- input_heading = tk.Label(input_frame, text="Enter the text to be translated", font=(font_style, font_size, 'bold'))
190
- input_heading.pack()
191
- # Text input for English sentence
192
- text_input = tk.Text(input_frame, height=5, width=50, font=(font_style, font_size))
193
- text_input.pack()
194
-
195
- # Language selection
196
- language_var = tk.StringVar()
197
- language_label = tk.Label(root, text="Select the language to translate to", font=(font_style, font_size, 'bold'))
198
- language_label.pack()
199
- language_select = ttk.Combobox(root, textvariable=language_var, values=["French", "Spanish"], font=(font_style, font_size), state="readonly")
200
- language_select.pack()
201
-
202
- # Submit button
203
- submit_button = ttk.Button(root, text="Translate", command=handle_translate)
204
- submit_button.pack(pady=10)
205
-
206
- # Frame for output
207
- output_frame = tk.Frame(root)
208
- output_frame.pack(pady=10)
209
- # Heading for output
210
- output_heading = tk.Label(output_frame, text="Translation: ", font=(font_style, font_size, 'bold'))
211
- output_heading.pack()
212
-
213
- # Text output for translations
214
- translation_output = tk.Text(output_frame, height=10, width=50, font=(font_style, font_size))
215
- translation_output.pack()
216
-
217
- # Running the application
218
- root.mainloop()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Task 2/images/attention.png DELETED
Binary file (208 kB)
 
Task 2/images/bidirectional.png DELETED
Binary file (47.8 kB)
 
Task 2/images/embedding-words.png DELETED
Binary file (115 kB)
 
Task 2/images/encoder-decoder-context.png DELETED
Binary file (146 kB)
 
Task 2/images/encoder-decoder-translation.png DELETED
Binary file (30.9 kB)
 
Task 2/images/rnn.png DELETED
Binary file (22.3 kB)
 
Task 2/sequence_length.json DELETED
@@ -1 +0,0 @@
1
- 21
 
 
Task 2/spa_vectorization_config.json DELETED
@@ -1 +0,0 @@
1
- {"name": "text_vectorization_1", "trainable": true, "batch_input_shape": [null], "dtype": "string", "max_tokens": 15000, "split": "whitespace", "ngrams": null, "output_mode": "int", "output_sequence_length": 21, "pad_to_max_tokens": false, "sparse": false, "ragged": false, "vocabulary": null, "idf_weights": null}
 
 
Task 2/spa_vocab.json DELETED
The diff for this file is too large to render. See raw diff
 
Task 2/transformer_model/keras_metadata.pb DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:c57e5b1c49a163881bb486d7ab440406af67981fb4d43c1b293d4f3e8be60b6b
3
- size 58739
 
 
 
 
Task 2/transformer_model/saved_model.pb DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:08e90ad65260d2a5f6b86eb2222070bfff801829a65b959df096fdf5609f4979
3
- size 1305169
 
 
 
 
Task 2/transformer_model/variables/variables.data-00000-of-00001 DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:1dee90376f3c6db9eac3b062c85fb1490b06871eaf8b078b2b312b0071e20033
3
- size 159724466
 
 
 
 
Task 2/transformer_model/variables/variables.index DELETED
Binary file (7.6 kB)