wesfggfd commited on
Commit
9f2ab4b
·
verified ·
1 Parent(s): 9ede3a2

Upload 145 files

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitattributes +18 -0
  2. NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/.ipynb_checkpoints/C4W1_Basic_Attention-checkpoint.ipynb +302 -0
  3. NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/.ipynb_checkpoints/C4W1_Bleu_Score-checkpoint.ipynb +585 -0
  4. NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/.ipynb_checkpoints/C4W1_QKV_Attention-checkpoint.ipynb +270 -0
  5. NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/C4W1_Basic_Attention.ipynb +324 -0
  6. NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/C4W1_Bleu_Score.ipynb +585 -0
  7. NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/C4W1_QKV_Attention.ipynb +281 -0
  8. NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/data/embeddings_en.npz +3 -0
  9. NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/data/embeddings_fr.npz +3 -0
  10. NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/data/wmt19_can.txt +1 -0
  11. NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/data/wmt19_ref.txt +1 -0
  12. NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/data/wmt19_src.txt +1 -0
  13. NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/data/word2int_en.pkl +3 -0
  14. NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/data/word2int_fr.pkl +3 -0
  15. NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/images/alignment.png +0 -0
  16. NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/images/alignment_model_3.jpg +0 -0
  17. NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/images/attention.png +0 -0
  18. NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/.ipynb_checkpoints/C4W1_Assignment-checkpoint.ipynb +1994 -0
  19. NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/.ipynb_checkpoints/w1_unittest-checkpoint.py +654 -0
  20. NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/C4W1_Assignment.ipynb +2312 -0
  21. NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/__pycache__/ult.cpython-38.pyc +0 -0
  22. NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/__pycache__/utils.cpython-311.pyc +0 -0
  23. NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/__pycache__/utils.cpython-38.pyc +0 -0
  24. NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/__pycache__/w1_unittest.cpython-311.pyc +0 -0
  25. NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/__pycache__/w1_unittest.cpython-37.pyc +0 -0
  26. NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/__pycache__/w1_unittest.cpython-38.pyc +0 -0
  27. NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/images/NMTModel.png +3 -0
  28. NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/images/att.png +3 -0
  29. NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/images/attention.png +3 -0
  30. NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/images/attention_overview.png +0 -0
  31. NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/images/input_encoder.png +0 -0
  32. NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/images/plain_rnn.png +0 -0
  33. NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/images/pre_attention_decoder.png +0 -0
  34. NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/por-eng/por.txt +3 -0
  35. NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/utils.py +114 -0
  36. NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/w1_unittest.py +702 -0
  37. NLP with Attention Models/QA/BPE_algorithm/home/jovyan/work/.ipynb_checkpoints/C4W3_SentencePiece_and_BPE-checkpoint.ipynb +633 -0
  38. NLP with Attention Models/QA/BPE_algorithm/home/jovyan/work/C4W3_SentencePiece_and_BPE.ipynb +724 -0
  39. NLP with Attention Models/QA/BPE_algorithm/home/jovyan/work/data/data.txt +5 -0
  40. NLP with Attention Models/QA/BPE_algorithm/home/jovyan/work/data/example.txt +30 -0
  41. NLP with Attention Models/QA/BPE_algorithm/home/jovyan/work/data/example_bpe.model +3 -0
  42. NLP with Attention Models/QA/BPE_algorithm/home/jovyan/work/data/example_bpe.vocab +450 -0
  43. NLP with Attention Models/QA/BPE_algorithm/home/jovyan/work/data/sentencepiece.model +3 -0
  44. NLP with Attention Models/QA/BPE_algorithm/home/jovyan/work/example.txt +30 -0
  45. NLP with Attention Models/QA/BPE_algorithm/home/jovyan/work/example_bpe.model +3 -0
  46. NLP with Attention Models/QA/BPE_algorithm/home/jovyan/work/example_bpe.vocab +450 -0
  47. NLP with Attention Models/QA/QA_DistilBERT_pipline_FT/Files/tf/.ipynb_checkpoints/C4W3_HF_Lab1_QA_BERT-checkpoint.ipynb +2110 -0
  48. NLP with Attention Models/QA/QA_DistilBERT_pipline_FT/Files/tf/.ipynb_checkpoints/C4W3_HF_Lab2_QA_BERT-checkpoint.ipynb +644 -0
  49. NLP with Attention Models/QA/QA_DistilBERT_pipline_FT/Files/tf/C4W3_HF_Lab1_QA_BERT.ipynb +2110 -0
  50. NLP with Attention Models/QA/QA_DistilBERT_pipline_FT/Files/tf/C4W3_HF_Lab2_QA_BERT.ipynb +644 -0
.gitattributes CHANGED
@@ -111,3 +111,21 @@ Transformer[[:space:]]Mechanism/Transformer_Implementation/home/jovyan/work/W4A1
111
  Transformer[[:space:]]Mechanism/Transformer_Implementation/home/jovyan/work/W4A1/transformer.png filter=lfs diff=lfs merge=lfs -text
112
  Transformer[[:space:]]Mechanism/Transformer[[:space:]]Pre-Processing/home/jovyan/work/W4A4_UGL_POS/glove/glove.6B.100d.txt filter=lfs diff=lfs merge=lfs -text
113
  Transformer[[:space:]]Mechanism/Transformer[[:space:]]Pre-Processing/home/jovyan/work/W4A4_UGL_POS/preprocessing.png filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
111
  Transformer[[:space:]]Mechanism/Transformer_Implementation/home/jovyan/work/W4A1/transformer.png filter=lfs diff=lfs merge=lfs -text
112
  Transformer[[:space:]]Mechanism/Transformer[[:space:]]Pre-Processing/home/jovyan/work/W4A4_UGL_POS/glove/glove.6B.100d.txt filter=lfs diff=lfs merge=lfs -text
113
  Transformer[[:space:]]Mechanism/Transformer[[:space:]]Pre-Processing/home/jovyan/work/W4A4_UGL_POS/preprocessing.png filter=lfs diff=lfs merge=lfs -text
114
+ NLP[[:space:]]with[[:space:]]Attention[[:space:]]Models/NMT_with_Attention/NMT[[:space:]]with[[:space:]]MBR/Files/tf/images/att.png filter=lfs diff=lfs merge=lfs -text
115
+ NLP[[:space:]]with[[:space:]]Attention[[:space:]]Models/NMT_with_Attention/NMT[[:space:]]with[[:space:]]MBR/Files/tf/images/attention.png filter=lfs diff=lfs merge=lfs -text
116
+ NLP[[:space:]]with[[:space:]]Attention[[:space:]]Models/NMT_with_Attention/NMT[[:space:]]with[[:space:]]MBR/Files/tf/images/NMTModel.png filter=lfs diff=lfs merge=lfs -text
117
+ NLP[[:space:]]with[[:space:]]Attention[[:space:]]Models/NMT_with_Attention/NMT[[:space:]]with[[:space:]]MBR/Files/tf/por-eng/por.txt filter=lfs diff=lfs merge=lfs -text
118
+ NLP[[:space:]]with[[:space:]]Attention[[:space:]]Models/QA/QA_T5/Files/tf/data/c4-en-10k.json filter=lfs diff=lfs merge=lfs -text
119
+ NLP[[:space:]]with[[:space:]]Attention[[:space:]]Models/QA/QA_T5/Files/tf/data/c4-en-10k.jsonl filter=lfs diff=lfs merge=lfs -text
120
+ NLP[[:space:]]with[[:space:]]Attention[[:space:]]Models/QA/QA_T5/Files/tf/data/train-v2.0.json filter=lfs diff=lfs merge=lfs -text
121
+ NLP[[:space:]]with[[:space:]]Attention[[:space:]]Models/QA/QA_T5/Files/tf/images/colab_help_2.png filter=lfs diff=lfs merge=lfs -text
122
+ NLP[[:space:]]with[[:space:]]Attention[[:space:]]Models/QA/QA_T5/Files/tf/images/fulltransformer.png filter=lfs diff=lfs merge=lfs -text
123
+ NLP[[:space:]]with[[:space:]]Attention[[:space:]]Models/QA/QA_T5/Files/tf/images/qa.png filter=lfs diff=lfs merge=lfs -text
124
+ NLP[[:space:]]with[[:space:]]Attention[[:space:]]Models/QA/QA_T5/Files/tf/pretrained_models/model_c4.data-00000-of-00001 filter=lfs diff=lfs merge=lfs -text
125
+ NLP[[:space:]]with[[:space:]]Attention[[:space:]]Models/QA/QA_T5/Files/tf/pretrained_models/model_qa3.data-00000-of-00001 filter=lfs diff=lfs merge=lfs -text
126
+ NLP[[:space:]]with[[:space:]]Attention[[:space:]]Models/Text_Summarization/Summarization/tf/images/decoder_layer.png filter=lfs diff=lfs merge=lfs -text
127
+ NLP[[:space:]]with[[:space:]]Attention[[:space:]]Models/Text_Summarization/Summarization/tf/images/decoder.png filter=lfs diff=lfs merge=lfs -text
128
+ NLP[[:space:]]with[[:space:]]Attention[[:space:]]Models/Text_Summarization/Summarization/tf/images/encoder_layer.png filter=lfs diff=lfs merge=lfs -text
129
+ NLP[[:space:]]with[[:space:]]Attention[[:space:]]Models/Text_Summarization/Summarization/tf/images/encoder.png filter=lfs diff=lfs merge=lfs -text
130
+ NLP[[:space:]]with[[:space:]]Attention[[:space:]]Models/Text_Summarization/Summarization/tf/images/self-attention.png filter=lfs diff=lfs merge=lfs -text
131
+ NLP[[:space:]]with[[:space:]]Attention[[:space:]]Models/Text_Summarization/Summarization/tf/images/transformer.png filter=lfs diff=lfs merge=lfs -text
NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/.ipynb_checkpoints/C4W1_Basic_Attention-checkpoint.ipynb ADDED
@@ -0,0 +1,302 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "id": "9c74bac5",
6
+ "metadata": {},
7
+ "source": [
8
+ "# Basic Attention Operation: Ungraded Lab\n",
9
+ "\n",
10
+ "As you've learned, attention allows a seq2seq decoder to use information from each encoder step instead of just the final encoder hidden state. In the attention operation, the encoder outputs are weighted based on the decoder hidden state, then combined into one context vector. This vector is then used as input to the decoder to predict the next output step.\n",
11
+ "\n",
12
+ "In this ungraded lab, you'll implement a basic attention operation as described in [Bhadanau, et al (2014)](https://arxiv.org/abs/1409.0473) using Numpy.\n",
13
+ "\n",
14
+ "This is a practice notebook, where you can train writing your code. All of the solutions are provided at the end of the notebook."
15
+ ]
16
+ },
17
+ {
18
+ "cell_type": "code",
19
+ "execution_count": null,
20
+ "id": "a5288920",
21
+ "metadata": {},
22
+ "outputs": [],
23
+ "source": [
24
+ "# Import the libraries and define the functions you will need for this lab\n",
25
+ "import numpy as np\n",
26
+ "\n",
27
+ "def softmax(x, axis=0):\n",
28
+ " \"\"\" Calculate softmax function for an array x along specified axis\n",
29
+ " \n",
30
+ " axis=0 calculates softmax across rows which means each column sums to 1 \n",
31
+ " axis=1 calculates softmax across columns which means each row sums to 1\n",
32
+ " \"\"\"\n",
33
+ " return np.exp(x) / np.expand_dims(np.sum(np.exp(x), axis=axis), axis)"
34
+ ]
35
+ },
36
+ {
37
+ "cell_type": "markdown",
38
+ "id": "9a6e0293",
39
+ "metadata": {},
40
+ "source": [
41
+ "## 1: Calculating alignment scores\n",
42
+ "\n",
43
+ "The first step is to calculate the alignment scores. This is a measure of similarity between the decoder hidden state and each encoder hidden state. From the paper, this operation looks like\n",
44
+ "\n",
45
+ "$$\n",
46
+ "\\large e_{ij} = v_a^\\top \\tanh{\\left(W_a s_{i-1} + U_a h_j\\right)}\n",
47
+ "$$\n",
48
+ "\n",
49
+ "where $W_a \\in \\mathbb{R}^{n\\times m}$, $U_a \\in \\mathbb{R}^{n \\times m}$, and $v_a \\in \\mathbb{R}^m$\n",
50
+ "are the weight matrices and $n$ is the hidden state size. In practice, this is implemented as a feedforward neural network with two layers, where $m$ is the size of the layers in the alignment network. It looks something like:\n",
51
+ "\n",
52
+ "![alignment model](./images/alignment_model_3.jpg)\n",
53
+ "\n",
54
+ "Here $h_j$ are the encoder hidden states for each input step $j$ and $s_{i - 1}$ is the decoder hidden state of the previous step. The first layer corresponds to $W_a$ and $U_a$, while the second layer corresponds to $v_a$.\n",
55
+ "\n",
56
+ "To implement this, first concatenate the encoder and decoder hidden states to produce an array with size $K \\times 2n$ where $K$ is the number of encoder states/steps. For this, use `np.concatenate` ([docs](https://numpy.org/doc/stable/reference/generated/numpy.concatenate.html)). Note that there is only one decoder state so you'll need to reshape it to successfully concatenate the arrays. The easiest way is to use `decoder_state.repeat` ([docs](https://numpy.org/doc/stable/reference/generated/numpy.repeat.html#numpy.repeat)) to match the hidden state array size.\n",
57
+ "\n",
58
+ "Then, apply the first layer as a matrix multiplication between the weights and the concatenated input. Use the tanh function to get the activations. Finally, compute the matrix multiplication of the second layer weights and the activations. This returns the alignment scores."
59
+ ]
60
+ },
61
+ {
62
+ "cell_type": "code",
63
+ "execution_count": null,
64
+ "id": "72857076",
65
+ "metadata": {},
66
+ "outputs": [],
67
+ "source": [
68
+ "hidden_size = 16\n",
69
+ "attention_size = 10\n",
70
+ "input_length = 5\n",
71
+ "\n",
72
+ "np.random.seed(42)\n",
73
+ "\n",
74
+ "# Synthetic vectors used to test\n",
75
+ "encoder_states = np.random.randn(input_length, hidden_size)\n",
76
+ "decoder_state = np.random.randn(1, hidden_size)\n",
77
+ "\n",
78
+ "# Weights for the neural network, these are typically learned through training\n",
79
+ "# Use these in the alignment function below as the layer weights\n",
80
+ "layer_1 = np.random.randn(2 * hidden_size, attention_size)\n",
81
+ "layer_2 = np.random.randn(attention_size, 1)\n",
82
+ "\n",
83
+ "# Implement this function. Replace None with your code. Solution at the bottom of the notebook\n",
84
+ "def alignment(encoder_states, decoder_state):\n",
85
+ " # First, concatenate the encoder states and the decoder state\n",
86
+ " inputs = None\n",
87
+ " assert inputs.shape == (input_length, 2 * hidden_size)\n",
88
+ " \n",
89
+ " # Matrix multiplication of the concatenated inputs and layer_1, with tanh activation\n",
90
+ " activations = None\n",
91
+ " assert activations.shape == (input_length, attention_size)\n",
92
+ " \n",
93
+ " # Matrix multiplication of the activations with layer_2. Remember that you don't need tanh here\n",
94
+ " scores = None\n",
95
+ " assert scores.shape == (input_length, 1)\n",
96
+ " \n",
97
+ " return scores"
98
+ ]
99
+ },
100
+ {
101
+ "cell_type": "code",
102
+ "execution_count": null,
103
+ "id": "fb638355",
104
+ "metadata": {},
105
+ "outputs": [],
106
+ "source": [
107
+ "# Run this to test your alignment function\n",
108
+ "scores = alignment(encoder_states, decoder_state)\n",
109
+ "print(scores)"
110
+ ]
111
+ },
112
+ {
113
+ "cell_type": "markdown",
114
+ "id": "f26aae76",
115
+ "metadata": {},
116
+ "source": [
117
+ "If you implemented the function correctly, you should get these scores:\n",
118
+ "\n",
119
+ "```python\n",
120
+ "[[4.35790943]\n",
121
+ " [5.92373433]\n",
122
+ " [4.18673175]\n",
123
+ " [2.11437202]\n",
124
+ " [0.95767155]]\n",
125
+ "```"
126
+ ]
127
+ },
128
+ {
129
+ "cell_type": "markdown",
130
+ "id": "58b8cfa9",
131
+ "metadata": {},
132
+ "source": [
133
+ "## 2: Turning alignment into weights\n",
134
+ "\n",
135
+ "The next step is to calculate the weights from the alignment scores. These weights determine the encoder outputs that are the most important for the decoder output. These weights should be between 0 and 1. You can use the softmax function (which is already implemented above) to get these weights from the attention scores. Pass the attention scores vector to the softmax function to get the weights. Mathematically,\n",
136
+ "\n",
137
+ "$$\n",
138
+ "\\large \\alpha_{ij} = \\frac{\\exp{\\left(e_{ij}\\right)}}{\\sum_{k=1}^K \\exp{\\left(e_{ik}\\right)}}\n",
139
+ "$$\n",
140
+ "\n",
141
+ "\n",
142
+ "\n",
143
+ "## 3: Weight the encoder output vectors and sum\n",
144
+ "\n",
145
+ "The weights tell you the importance of each input word with respect to the decoder state. In this step, you use the weights to modulate the magnitude of the encoder vectors. Words with little importance will be scaled down relative to important words. Multiply each encoder vector by its respective weight to get the alignment vectors, then sum up the weighted alignment vectors to get the context vector. Mathematically,\n",
146
+ "\n",
147
+ "$$\n",
148
+ "\\large c_i = \\sum_{j=1}^K\\alpha_{ij} h_{j}\n",
149
+ "$$\n",
150
+ "\n",
151
+ "Implement these steps in the `attention` function below."
152
+ ]
153
+ },
154
+ {
155
+ "cell_type": "code",
156
+ "execution_count": null,
157
+ "id": "4546cbb5",
158
+ "metadata": {},
159
+ "outputs": [],
160
+ "source": [
161
+ "# Implement this function. Replace None with your code.\n",
162
+ "def attention(encoder_states, decoder_state):\n",
163
+ " \"\"\" Example function that calculates attention, returns the context vector \n",
164
+ " \n",
165
+ " Arguments:\n",
166
+ " encoder_vectors: NxM numpy array, where N is the number of vectors and M is the vector length\n",
167
+ " decoder_vector: 1xM numpy array, M is the vector length, much be the same M as encoder_vectors\n",
168
+ " \"\"\" \n",
169
+ " \n",
170
+ " # First, calculate the alignment scores\n",
171
+ " scores = None\n",
172
+ " \n",
173
+ " # Then take the softmax of the alignment scores to get a weight distribution\n",
174
+ " weights = None\n",
175
+ " \n",
176
+ " # Multiply each encoder state by its respective weight\n",
177
+ " weighted_scores = None\n",
178
+ " \n",
179
+ " # Sum up weighted alignment vectors to get the context vector and return it\n",
180
+ " context = None\n",
181
+ " return context\n",
182
+ "\n",
183
+ "context_vector = attention(encoder_states, decoder_state)\n",
184
+ "print(context_vector)"
185
+ ]
186
+ },
187
+ {
188
+ "cell_type": "markdown",
189
+ "id": "5d9f3df4",
190
+ "metadata": {},
191
+ "source": [
192
+ "If you implemented the `attention` function correctly, the context vector should be\n",
193
+ "\n",
194
+ "```python\n",
195
+ "[-0.63514569 0.04917298 -0.43930867 -0.9268003 1.01903919 -0.43181409\n",
196
+ " 0.13365099 -0.84746874 -0.37572203 0.18279832 -0.90452701 0.17872958\n",
197
+ " -0.58015282 -0.58294027 -0.75457577 1.32985756]\n",
198
+ "```\n",
199
+ "\n"
200
+ ]
201
+ },
202
+ {
203
+ "cell_type": "markdown",
204
+ "id": "4210899c",
205
+ "metadata": {},
206
+ "source": [
207
+ "## See below for solutions"
208
+ ]
209
+ },
210
+ {
211
+ "cell_type": "markdown",
212
+ "id": "3ba0d629",
213
+ "metadata": {},
214
+ "source": [
215
+ "```python\n",
216
+ "# Solution\n",
217
+ "def alignment(encoder_states, decoder_state):\n",
218
+ " # First, concatenate the encoder states and the decoder state.\n",
219
+ " inputs = np.concatenate((encoder_states, decoder_state.repeat(input_length, axis=0)), axis=1)\n",
220
+ " assert inputs.shape == (input_length, 2*hidden_size)\n",
221
+ " \n",
222
+ " # Matrix multiplication of the concatenated inputs and the first layer, with tanh activation\n",
223
+ " activations = np.tanh(np.matmul(inputs, layer_1))\n",
224
+ " assert activations.shape == (input_length, attention_size)\n",
225
+ " \n",
226
+ " # Matrix multiplication of the activations with the second layer. Remember that you don't need tanh here\n",
227
+ " scores = np.matmul(activations, layer_2)\n",
228
+ " assert scores.shape == (input_length, 1)\n",
229
+ " \n",
230
+ " return scores\n",
231
+ "\n",
232
+ "# Run this to test your alignment function\n",
233
+ "scores = alignment(encoder_states, decoder_state)\n",
234
+ "print(scores)\n",
235
+ "```"
236
+ ]
237
+ },
238
+ {
239
+ "cell_type": "markdown",
240
+ "id": "f80faecb",
241
+ "metadata": {},
242
+ "source": [
243
+ "```python\n",
244
+ "# Solution\n",
245
+ "def attention(encoder_states, decoder_state):\n",
246
+ " \"\"\" Example function that calculates attention, returns the context vector \n",
247
+ " \n",
248
+ " Arguments:\n",
249
+ " encoder_vectors: NxM numpy array, where N is the number of vectors and M is the vector length\n",
250
+ " decoder_vector: 1xM numpy array, M is the vector length, much be the same M as encoder_vectors\n",
251
+ " \"\"\" \n",
252
+ " \n",
253
+ " # First, calculate the dot product of each encoder vector with the decoder vector\n",
254
+ " scores = alignment(encoder_states, decoder_state)\n",
255
+ " \n",
256
+ " # Then take the softmax of those scores to get a weight distribution\n",
257
+ " weights = softmax(scores)\n",
258
+ " \n",
259
+ " # Multiply each encoder state by its respective weight\n",
260
+ " weighted_scores = encoder_states * weights\n",
261
+ " \n",
262
+ " # Sum up the weights encoder states\n",
263
+ " context = np.sum(weighted_scores, axis=0)\n",
264
+ " \n",
265
+ " return context\n",
266
+ "\n",
267
+ "context_vector = attention(encoder_states, decoder_state)\n",
268
+ "print(context_vector)\n",
269
+ "```"
270
+ ]
271
+ },
272
+ {
273
+ "cell_type": "code",
274
+ "execution_count": null,
275
+ "id": "16a6caa8",
276
+ "metadata": {},
277
+ "outputs": [],
278
+ "source": []
279
+ }
280
+ ],
281
+ "metadata": {
282
+ "kernelspec": {
283
+ "display_name": "Python 3 (ipykernel)",
284
+ "language": "python",
285
+ "name": "python3"
286
+ },
287
+ "language_info": {
288
+ "codemirror_mode": {
289
+ "name": "ipython",
290
+ "version": 3
291
+ },
292
+ "file_extension": ".py",
293
+ "mimetype": "text/x-python",
294
+ "name": "python",
295
+ "nbconvert_exporter": "python",
296
+ "pygments_lexer": "ipython3",
297
+ "version": "3.10.11"
298
+ }
299
+ },
300
+ "nbformat": 4,
301
+ "nbformat_minor": 5
302
+ }
NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/.ipynb_checkpoints/C4W1_Bleu_Score-checkpoint.ipynb ADDED
@@ -0,0 +1,585 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "# Calculating the Bilingual Evaluation Understudy (BLEU) score: Ungraded Lab"
8
+ ]
9
+ },
10
+ {
11
+ "cell_type": "markdown",
12
+ "metadata": {},
13
+ "source": [
14
+ "In this ungraded lab, you will implement a popular metric for evaluating the quality of machine-translated text: the BLEU score proposed by Kishore Papineni, et al. in their 2002 paper [\"BLEU: a Method for Automatic Evaluation of Machine Translation\"](https://www.aclweb.org/anthology/P02-1040.pdf). The BLEU score works by comparing a \"candidate\" text to one or more \"reference\" texts. The score is higher the better the result. In the following sections you will calculate this value using your own implementation as well as using functions from a library."
15
+ ]
16
+ },
17
+ {
18
+ "cell_type": "markdown",
19
+ "metadata": {},
20
+ "source": [
21
+ "# 1. Importing the Libraries\n",
22
+ "\n",
23
+ "You will start by importing the Python libraries. First, you will implement your own version of the BLEU Score using NumPy. To verify that your implementation is correct, you will compare the results with those generated by the [SacreBLEU library](https://github.com/mjpost/sacrebleu). This package provides hassle-free computation of shareable, comparable, and reproducible BLEU scores. It also knows all the standard test sets and handles downloading, processing, and tokenization."
24
+ ]
25
+ },
26
+ {
27
+ "cell_type": "code",
28
+ "execution_count": 1,
29
+ "metadata": {
30
+ "tags": []
31
+ },
32
+ "outputs": [
33
+ {
34
+ "name": "stderr",
35
+ "output_type": "stream",
36
+ "text": [
37
+ "[nltk_data] Downloading package punkt to /home/jovyan/nltk_data...\n",
38
+ "[nltk_data] Package punkt is already up-to-date!\n"
39
+ ]
40
+ },
41
+ {
42
+ "name": "stdout",
43
+ "output_type": "stream",
44
+ "text": [
45
+ "Requirement already satisfied: sacrebleu in /opt/conda/lib/python3.10/site-packages (2.3.1)\n",
46
+ "Requirement already satisfied: portalocker in /opt/conda/lib/python3.10/site-packages (from sacrebleu) (2.8.2)\n",
47
+ "Requirement already satisfied: regex in /opt/conda/lib/python3.10/site-packages (from sacrebleu) (2023.10.3)\n",
48
+ "Requirement already satisfied: tabulate>=0.8.9 in /opt/conda/lib/python3.10/site-packages (from sacrebleu) (0.9.0)\n",
49
+ "Requirement already satisfied: numpy>=1.17 in /opt/conda/lib/python3.10/site-packages (from sacrebleu) (1.24.3)\n",
50
+ "Requirement already satisfied: colorama in /opt/conda/lib/python3.10/site-packages (from sacrebleu) (0.4.6)\n",
51
+ "Requirement already satisfied: lxml in /opt/conda/lib/python3.10/site-packages (from sacrebleu) (4.9.3)\n"
52
+ ]
53
+ }
54
+ ],
55
+ "source": [
56
+ "import numpy as np # import numpy to make numerical computations.\n",
57
+ "import nltk # import NLTK to handle simple NL tasks like tokenization.\n",
58
+ "nltk.download(\"punkt\")\n",
59
+ "from nltk.util import ngrams\n",
60
+ "from collections import Counter # import a counter.\n",
61
+ "!pip3 install 'sacrebleu' # install the sacrebleu package.\n",
62
+ "import sacrebleu # import sacrebleu in order compute the BLEU score.\n",
63
+ "import matplotlib.pyplot as plt # import pyplot in order to make some illustrations."
64
+ ]
65
+ },
66
+ {
67
+ "cell_type": "markdown",
68
+ "metadata": {},
69
+ "source": [
70
+ "# 2. BLEU score\n",
71
+ "\n",
72
+ "## 2.1 Definitions and formulas\n",
73
+ "\n",
74
+ "You have seen how to calculate the BLEU score in this week's lectures. Formally, you can express the BLEU score as:\n",
75
+ "\n",
76
+ "$$BLEU = BP\\times\\Bigl(\\prod_{i=1}^{n}precision_i\\Bigr)^{(1/n)}.\\tag{1}$$\n",
77
+ "\n",
78
+ "\n",
79
+ "The BLEU score depends on the $BP$, which stands for Brevity Penalty, and the weighted geometric mean precision for different lengths of n-grams, both of which are described below. The product runs from $i=1$ to $i=n$ to account for 1-grams to n-grams and the exponent of $1/n$ is there to calculate the geometrical average. In this notebook, you will use $n=4$\n",
80
+ "\n",
81
+ "The **Brevity Penalty** is defined as an exponential decay:\n",
82
+ "\n",
83
+ "$$BP = min\\Bigl(1, e^{(1-({len(ref)}/{len(cand)}))}\\Bigr),\\tag{2}$$\n",
84
+ "\n",
85
+ "where ${len(ref)}$ and ${len(cand)}$ refer to the length or count of words in the reference and candidate translations. The brevity penalty helps to handle very short translations. \n",
86
+ "\n",
87
+ "The **precision** is defined as :\n",
88
+ "\n",
89
+ "$$precision_i = \\frac {\\sum_{s_i \\in{cand}}min\\Bigl(C(s_i, cand), C(s_i, ref)\\Bigr)}{\\sum_{s_i \\in{cand}} C(s_i, cand)}.\\tag{3}$$\n",
90
+ "\n",
91
+ "The sum goes over all the i-grams $s_i$ in the candidate sentence $cand$. $C(s_i, cand)$ and $C(s_i, ref)$ are the counts of the i-grams in the candidate and reference sentences respectively. So the sum counts all the n-grams in the candidate sentence that also appear in the reference sentence, but only counts them as many times as they appear in the reference sentence and not more. This is then divided by the total number of i-grams in the candidate sentence."
92
+ ]
93
+ },
94
+ {
95
+ "cell_type": "markdown",
96
+ "metadata": {},
97
+ "source": [
98
+ "## 2.2 Visualizing the BLEU score\n",
99
+ "\n",
100
+ "### Brevity Penalty:\n",
101
+ "The brevity penalty penalizes generated translations that are shorter than the reference sentence. It compensates for the fact that the BLEU score has no recall term."
102
+ ]
103
+ },
104
+ {
105
+ "cell_type": "code",
106
+ "execution_count": 2,
107
+ "metadata": {},
108
+ "outputs": [
109
+ {
110
+ "data": {
111
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjcAAAGwCAYAAABVdURTAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAABcvElEQVR4nO3deVwU9eMG8Gd2WW4BEUEEBLxRPBCVwEytxLRMy1LTvLPwzLM0y6vMX5rmrZlXlleKV2UmWZ5oCoIniiIKKoqgcsq1+/n9Ye7XlcNdBAaW5/167eslw8zOswPMPs5+ZkYSQggQERERGQmF3AGIiIiIShLLDRERERkVlhsiIiIyKiw3REREZFRYboiIiMiosNwQERGRUWG5ISIiIqNiIneAsqbRaHDr1i1UqVIFkiTJHYeIiIj0IIRAWloaatasCYWi6GMzla7c3Lp1C25ubnLHICIiomKIj4+Hq6trkfNUunJTpUoVAI82jo2NjcxpiIiISB+pqalwc3PTvo8XpdKVm8cfRdnY2LDcEBERVTD6DCnhgGIiIiIyKiw3REREZFRYboiIiMiosNwQERGRUWG5ISIiIqPCckNERERGheWGiIiIjArLDRERERkVlhsiIiIyKiw3REREZFRkLTeHDh1C165dUbNmTUiShJ07dz5zmYMHD8LX1xfm5uaoXbs2VqxYUfpBiYiIqMKQtdxkZGSgWbNmWLJkiV7zx8bGokuXLmjbti0iIiLw2WefYfTo0QgODi7lpERERFRRyHrjzM6dO6Nz5856z79ixQrUqlULCxYsAAB4eXkhLCwM3377LXr06FFKKfWj1ggkpDyUNQM9Hycbc6iU/KSWiKiiq1B3BT927BgCAwN1pnXq1AmrV69Gbm4uVCpVvmWys7ORnZ2t/To1NbVUsiVnZOPFb/4pleemslHDxhxL+/rA191e7ihERPQcKlS5uX37NpycnHSmOTk5IS8vD0lJSXB2ds63zOzZszFjxowyyWdmwv/1V1RqjcDt1Cz0XnkcU99ohPdfcIckSXLHIiKiYqhQ5QZAvjccIUSB0x+bPHkyxo0bp/06NTUVbm5uJZ7LsYo5Ln2l/0dsVL5kZOfhk21n8PvZBHyx6zwi4h9g9ttNYGailDsaEREZqEIdaqhRowZu376tMy0xMREmJiaoVq1agcuYmZnBxsZG50H0NCszEyzp44MpXbygkIDtp27imz8uyR2LiIiKoUKVG39/f4SEhOhM27dvH1q2bFngeBsiQ0iShKEv1cby930BAGtDYxERd1/mVEREZChZy016ejoiIyMRGRkJ4NGp3pGRkYiLiwPw6COl/v37a+cPCgrC9evXMW7cOERFRWHNmjVYvXo1JkyYIEd8MlKdGtfAWz4uEAKYFHwWOXkauSMREZEBZC03YWFh8PHxgY+PDwBg3Lhx8PHxwdSpUwEACQkJ2qIDAJ6entizZw8OHDiA5s2b48svv8SiRYtkPw2cjM8XbzSCvZUpLt1Jw/cHY+SOQ0REBpDE4xG5lURqaipsbW2RkpLC8TdUpF2RN/Hx5kiYKhXY8/GLqOtYRe5IRESVliHv3xVqzA1RWXqzWU10aFAdOWoNJgWfhUZTqf4fQERUYbHcEBVCkiR89VYTWJkqEXb9Pn7+97rckYiISA8sN0RFcLGzwKTODQEA3/xxETfuZ8qciIiInoXlhugZ+vq5o5VHVWTkqPHZjnOoZMPUiIgqHJYbomdQKCT8X4+mMDVR4FD0XWw/dVPuSEREVASWGyI91KlujbGv1gcAzPztAhLTsmROREREhWG5IdLT0Lae8HaxQcrDXEzbdV7uOEREVAiWGyI9mSgVmNOjGUwUEv44dxt7zibIHYmIiArAckNkgEY1bTC8fR0AwNRd53AvI0fmRERE9DSWGyIDjXi5Luo7WSMpPQczf+XHU0RE5Q3LDZGBzEyUmPNOMygkYGfkLeyPuiN3JCIiegLLDVExNHezw9C2tQEAn+04i5SHuTInIiKix1huiIppbMf68HSwwp3UbMz6/YLccYiI6D8sN0TFZK5SYs47TSFJwC9hN3DgUqLckYiICCw3RM+llYc9BgV4AgAmbz+L1Cx+PEVEJDeWG6LnNLFTA7hXs0RCSha+/j1K7jhERJUeyw3Rc7IwVWLuO80gScDmk/E4FH1X7khERJUayw1RCWjtaY8B/h4AgEnBZ/jxFBGRjFhuiErIJ689+njqVkoWZv3Gj6eIiOTCckNUQixNTbQfT20Ji8c/PHuKiEgWLDdEJai15//OnpoUfAYpmfx4ioiorLHcEJWwiZ0aaC/uN/M3XtyPiKissdwQlTALUyW+fffRxf2CT93AXxd47ykiorLEckNUCnzd7bX3npq84yzuZ+TInIiIqPJguSEqJeM61kddR2vcTcvG1N3n5Y5DRFRpsNwQlRJzlRLz3m0GpULCr6dv4fczCXJHIiKqFFhuiEpRMzc7DG9fBwDw+c6zuJuWLXMiIiLjx3JDVMpGvVwPXs42uJ+Ziyk7zkIIIXckIiKjxnJDVMpMTRSY924zqJQS9l24g+2nbsodiYjIqLHcEJWBRjVtMObV+gCA6bvP49aDhzInIiIyXiw3RGXko5dqw6eWHdKy8zBx22loNPx4ioioNLDcEJURE6UC83s2h7lKgaNXkrH+2DW5IxERGSWWG6Iy5Olghc+6eAEAZv9xETF302VORERkfFhuiMrY+37uaFvPAdl5Goz75TTy1Bq5IxERGRWWG6IyplBImPNOU1QxN8Hp+AdY+k+M3JGIiIwKyw2RDJxtLfBlN28AwKK/L+N0/AN5AxERGRGWGyKZdGteE683dYZaIzD2l0g8zFHLHYmIyCiw3BDJRJIkzOruDccqZrh6NwPf7L0odyQiIqPAckMkIztLU8x9txkAYF3oNRyKvitzIiKiio/lhkhm7epXR39/dwDAxG2n8SAzR+ZEREQVG8sNUTkwubMXajtY4U5qNqbsOMebaxIRPQeWG6JywMJUiQW9m8NEIeH3swnYEcGbaxIRFZfs5WbZsmXw9PSEubk5fH19cfjw4SLnX7p0Kby8vGBhYYEGDRpg/fr1ZZSUqHQ1dbXDmFfrAQCm7jqP+HuZMiciIqqYZC03W7ZswZgxYzBlyhRERESgbdu26Ny5M+Li4gqcf/ny5Zg8eTKmT5+O8+fPY8aMGRgxYgR+/fXXMk5OVDqGta+Llu5VkZ6dh/G/nIaaN9ckIjKYJGT8cN/Pzw8tWrTA8uXLtdO8vLzQvXt3zJ49O9/8AQEBaNOmDebOnaudNmbMGISFheHIkSN6rTM1NRW2trZISUmBjY3N878IohIWfy8TnRceRnp2Hj55rQGGt68rdyQiItkZ8v4t25GbnJwchIeHIzAwUGd6YGAgQkNDC1wmOzsb5ubmOtMsLCxw4sQJ5ObmFrpMamqqzoOoPHOzt8T0NxsDAObvi8bZGykyJyIiqlhkKzdJSUlQq9VwcnLSme7k5ITbt28XuEynTp2watUqhIeHQwiBsLAwrFmzBrm5uUhKSipwmdmzZ8PW1lb7cHNzK/HXQlTSerRwQZcmNZCnEfh4SwQyc/LkjkREVGHIPqBYkiSdr4UQ+aY99sUXX6Bz58544YUXoFKp0K1bNwwcOBAAoFQqC1xm8uTJSElJ0T7i4+NLND9RaZAkCV+/1QQ1bMxx9W4Gvvo9Su5IREQVhmzlxsHBAUqlMt9RmsTExHxHcx6zsLDAmjVrkJmZiWvXriEuLg4eHh6oUqUKHBwcClzGzMwMNjY2Og+iisDO0hTzezaDJAEb/43DvvMFH9EkIiJdspUbU1NT+Pr6IiQkRGd6SEgIAgICilxWpVLB1dUVSqUSmzdvxhtvvAGFQvaDUEQlLqCuAz5sWxsA8GnwGdxJzZI5ERFR+SdrIxg3bhxWrVqFNWvWICoqCmPHjkVcXByCgoIAPPpIqX///tr5o6Oj8fPPP+Py5cs4ceIEevfujXPnzuHrr7+W6yUQlbrxgQ3QuKYN7mfmYsLW09Dw9HAioiKZyLnyXr16ITk5GTNnzkRCQgK8vb2xZ88euLs/us9OQkKCzjVv1Go15s2bh0uXLkGlUqFDhw4IDQ2Fh4eHTK+AqPSZmiiwsLcP3lh8GIcvJ2HN0Vh88N/RHCIiyk/W69zIgde5oYpqw7/XMWXHOaiUEnYMbwNvF1u5IxERlZkKcZ0bIjJMn9a1ENjICblqgdGbeXo4EVFhWG6IKghJkvBNj6ZwsjHD1bsZmPnrBbkjERGVSyw3RBVIVStTfNerOSQJ2HwyHnvOJsgdiYio3GG5IapgAuo4YFi7OgCAScFncPPBQ5kTERGVLyw3RBXQ2I710czNDqlZeRi7OZJ3DyciegLLDVEFpFIqsKh3c1ibmeDEtXtY/PdluSMREZUbLDdEFZR7NSt82f3R3cMX7b+Mf68my5yIiKh8YLkhqsDe8nHF2y1coBHAmC2RuJ+RI3ckIiLZsdwQVXBfdvOGp4MVElKy8EnwGVSy63ISEeXDckNUwVmZmWDxez5QKSWEXLiDn45flzsSEZGsWG6IjIC3iy0mdfYCAHz1exQu3EqVORERkXxYboiMxOA2HniloSNy8jQYuekUMrJ5ewYiqpxYboiMhCRJmPtuM9SwMcfVuxmYuuu83JGIiGTBckNkROytTLGwd3MoJCD41A0Eh9+QOxIRUZljuSEyMn61q2HMq/UBAF/sOoeYu+kyJyIiKlssN0RGaESHugioUw2ZOWqM2HAKWblquSMREZUZlhsiI6RUSFjQqzmqWZni4u00zPo9Su5IRERlhuWGyEg52phjfq/mAICfjl/H72cS5A1ERFRGWG6IjFi7+tUxrH0dAMCnwWdwLSlD5kRERKWP5YbIyI3vWB8t3asiPTsPIzZy/A0RGT+WGyIjZ6JUYHEfH1S1VOH8rVR8vYfjb4jIuLHcEFUCzrYW2vE3649x/A0RGTeWG6JKokMDRwzn+BsiqgRYbogqkXEd66OVx6PxN8N5/RsiMlIsN0SViIlSgcXvtYC9lSkuJKRi5m8X5I5ERFTiWG6IKpkatuZY0Ks5JAnY+G8cdkbclDsSEVGJYrkhqoReql8do16uBwCYvP0sLt9JkzkREVHJYbkhqqQ+fqUeXqzrgIe5agzfcAqZOXlyRyIiKhEsN0SVlFIhYUHv5nCsYobLiemYsuMchBByxyIiem4sN0SVmIO1GZb0aQGlQsKOiJvYdCJe7khERM+N5YaokmvtaY+JnRoAAKbvPo8zNx7IG4iI6Dmx3BARPnqpNjo2ckKOWoNhP5/Cg8wcuSMRERUbyw0RQZIkfPtuM7hXs8TNBw8xdkskNBqOvyGiionlhogAALYWKizv6wszEwX+uXQXS/+5InckIqJiYbkhIq1GNW3wVXdvAMD8v6Jx+PJdmRMRERmO5YaIdLzb0g29W7lBCODjzZG49eCh3JGIiAzCckNE+Ux/szG8XWxwLyMHwzacQnYeb7BJRBUHyw0R5WOuUmJ5X1/YWqhwOv4BZv7KG2wSUcXBckNEBXKzt8TC3o9usLnh3zhsDeMF/oioYmC5IaJCtW/giLGv1gcATNl5DudupsiciIjo2VhuiKhIIzvUxcsNHZGTp0HQz+G4n8EL/BFR+cZyQ0RFUigkfNezOWrZW+LG/Yf4eEsk1LzAHxGVYyw3RPRMtpYqrHjfF+YqBQ5F38WCv6LljkREVCjZy82yZcvg6ekJc3Nz+Pr64vDhw0XOv2HDBjRr1gyWlpZwdnbGoEGDkJycXEZpiSqvRjVt8H9vNwUALP77Cv48f1vmREREBZO13GzZsgVjxozBlClTEBERgbZt26Jz586Ii4srcP4jR46gf//+GDJkCM6fP4+tW7fi5MmT+OCDD8o4OVHl1N3HBYPaeAAAxv9yGlcS0+UNRERUAFnLzfz58zFkyBB88MEH8PLywoIFC+Dm5obly5cXOP/x48fh4eGB0aNHw9PTEy+++CI++ugjhIWFFbqO7OxspKam6jyIqPg+6+IFP097pGfn4cOfwpCWlSt3JCIiHbKVm5ycHISHhyMwMFBnemBgIEJDQwtcJiAgADdu3MCePXsghMCdO3ewbds2vP7664WuZ/bs2bC1tdU+3NzcSvR1EFU2KqUCS/u2gLOtOa7ezcC4X07zDuJEVK7IVm6SkpKgVqvh5OSkM93JyQm3bxf8WX5AQAA2bNiAXr16wdTUFDVq1ICdnR0WL15c6HomT56MlJQU7SM+nhciI3peDtZmWP6+L0yVCoRcuMM7iBNRuSL7gGJJknS+FkLkm/bYhQsXMHr0aEydOhXh4eHYu3cvYmNjERQUVOjzm5mZwcbGRudBRM+vuZudzh3E/754R+ZERESPyFZuHBwcoFQq8x2lSUxMzHc057HZs2ejTZs2mDhxIpo2bYpOnTph2bJlWLNmDRISEsoiNhE9oWcrN/T1q/XoDuKbIhFzlwOMiUh+spUbU1NT+Pr6IiQkRGd6SEgIAgICClwmMzMTCoVuZKVSCeDRER8iKnvTujZGK4+qSMvOw9D1YUjlAGMikpmsH0uNGzcOq1atwpo1axAVFYWxY8ciLi5O+zHT5MmT0b9/f+38Xbt2xfbt27F8+XJcvXoVR48exejRo9G6dWvUrFlTrpdBVKmZmiiwrK8vatj8N8B4SyQHGBORrEzkXHmvXr2QnJyMmTNnIiEhAd7e3tizZw/c3d0BAAkJCTrXvBk4cCDS0tKwZMkSjB8/HnZ2dnj55ZfxzTffyPUSiAhA9SpmWNnfF++sOIa/ohKx4K9ojAtsIHcsIqqkJFHJPs9JTU2Fra0tUlJSOLiYqIQFh9/A+K2nAQAr3m+B17ydZU5ERMbCkPdv2c+WIiLj0cPXFYPbeAIAxv1yGpdup8mciIgqI5YbIipRn3VpiIA61ZCZo8bQ9WG4n5EjdyQiqmRYboioRJkoFVjapwXc7C0Qdy8TwzecQq5aI3csIqpEDC437du3x/r16/Hw4cPSyENERqCqlSlW9W8FK1Mljl1Nxle/XZA7EhFVIgaXG19fX3zyySeoUaMGhg4diuPHj5dGLiKq4BrUqILvejUHAPx47Do2/htX9AJERCXE4HIzb9483Lx5E+vXr8fdu3fx0ksvoVGjRvj2229x5w4vv05E/xPYuAYmBNYHAEzddQ4nYu/JnIiIKoNijblRKpXo1q0bdu7ciZs3b6JPnz744osv4Obmhu7du+Pvv/8u6ZxEVEGN6FAXbzR1Rp5GYNjP4bhxP1PuSERk5J5rQPGJEycwdepUfPvtt3B0dMTkyZPh6OiIrl27YsKECSWVkYgqMEmSMPedZmhc0wbJGTkYuj4cGdl5csciIiNmcLlJTEzEvHnz4O3tjbZt2+Lu3bvYvHkzrl27hhkzZmDlypXYtWsXVqxYURp5iagCsjBV4of+LeFgbYaohFSM5S0aiKgUGVxuXF1dsWrVKgwYMAA3btzAtm3b8Nprr0GSJO08rVu3RqtWrUo0KBFVbDXtLPB9P1+YKhXYd+EO5oVckjsSERkpg2+/cPjwYbRt27a08pQ63n6BSF47Im5g7JZHt2j4rlczvOXjKnMiIqoISvX2C9OmTcODBw8KXOnLL79s6NMRUSXzlo8rhrWvAwD4NPgsTsXdlzkRERkbg8vNwYMHkZOT/3LqWVlZOHz4cImEIiLjNjGwATo2ckJOngYfrg/HzQe8KCgRlRwTfWc8c+YMAEAIgQsXLuD27dva76nVauzduxcuLi4ln5CIjI5CIWFBr+bosTwUF2+n4YMfw7AtyB9WZnrvkoiICqX3mBuFQqEdNFzQIhYWFli8eDEGDx5csglLGMfcEJUfN+5novvSo0hKz0HHRk74/n1fKBTSsxckokrHkPdvvcvN9evXIYRA7dq1ceLECVSvXl37PVNTUzg6OkKpVD5f8jLAckNUvoRfv4f3fvgXOXkafPRSbUzu4iV3JCIqhwx5/9b7GLC7uzsAQKPh3X2JqOT4uttj7jtN8fHmSHx/6Co8HazQu3UtuWMRUQWmV7nZvXu33k/45ptvFjsMEVVO3Zq7IOZuBhbtv4zPd55DLXtLBNR1kDsWEVVQen0spVDod1KVJElQq9XPHao08WMpovJJCIHRmyPx6+lbsDE3wY4RbVCnurXcsYionCjx69xoNBq9HuW92BBR+fXoHlRN4VPLDqlZeRiy7iTuZ+S/7AQR0bM8140ziYhKkrlKiZX9WsLFzgLXkjPx0c/hyMnjOD8iMozBt18AgIyMDBw8eBBxcXH5Lug3evToEgtXGvixFFH5d+l2GnosD0V6dh7e9nHBvJ7NdO5fR0SVT6mcCv5YREQEunTpgszMTGRkZMDe3h5JSUmwtLSEo6Mjrl69+lzhSxvLDVHFcDD6LgavOwm1RmDsq/Xx8av15I5ERDIq1XtLjR07Fl27dsW9e/dgYWGB48eP4/r16/D19cW3335b7NBERE9qV786vuzmDQD47q9o7Ii4IXMiIqooDC43kZGRGD9+PJRKJZRKJbKzs+Hm5oY5c+bgs88+K42MRFRJ9fGrhY9eqg0A+HTbWfx7NVnmRERUERhcblQqlfazbycnJ8TFxQEAbG1ttf8mIiopn77WEJ29ayBHrcGHP4Uj5m663JGIqJwzuNz4+PggLCwMANChQwdMnToVGzZswJgxY9CkSZMSD0hElZtCIeG7Xs3R3M0OKQ9zMXjdSdzjKeJEVASDy83XX38NZ2dnAMCXX36JatWqYdiwYUhMTMTKlStLPCARkblKiVUDWsK1qgWuJ2di6PowZOXyulpEVLBinQpekfFsKaKK60piGt5eForUrDx0aVIDS95rwbuIE1USpXq2FBGRXOo6VsHK/i1hqlRgz9nb+HpPlNyRiKgcMrjc3LlzB/369UPNmjVhYmKiPWvq8YOIqDS9ULsa5r7bFACw6kgs1h6NlTkREZU3et0V/EkDBw5EXFwcvvjiCzg7O/OqoURU5ro1d8HNBw8xZ+8lzPztApxtLfCadw25YxFROWHwmJsqVarg8OHDaN68eSlFKl0cc0NkHIQQmLLzHDb+GwczEwU2ffgCWtSqKncsIiolpTrmxs3NDZVsDDIRlUOSJGHmm43xckNHZOdp8MGPYbiWlCF3LCIqBwwuNwsWLMCkSZNw7dq1UohDRKQ/E6UCi9/zQRMXW9zLyMGAtSeQlJ4tdywikpnBH0tVrVoVmZmZyMvLg6WlJVQqlc737927V6IBSxo/liIyPolpWXh7WShu3H+IZq622Dj0BViZGTykkIjKMUPevw3+61+wYEFxcxERlQrHKub4cXBr9FgeitM3UjBi4yn80L8lVEpe7YKoMuJF/IjIaIRfv4++q44jK1eDni1d8U2Ppjyjk8hIlPpF/GJiYvD555/jvffeQ2JiIgBg7969OH/+fHGejoioRPi6V8Xi91pAIQG/hN3Ad39dljsSEcnA4HJz8OBBNGnSBP/++y+2b9+O9PRHd+g9c+YMpk2bVuIBiYgM0bGRE77q/ugmvov2X8aGf6/LnIiIyprB5WbSpEn46quvEBISAlNTU+30Dh064NixYyUajoioOPr41cLoV+oBAL7YeQ4hF+7InIiIypLB5ebs2bN466238k2vXr06kpOTSyQUEdHzGvtqPfRq6QaNAEZuPIWwa+X7TE4iKjkGlxs7OzskJCTkmx4REQEXFxeDAyxbtgyenp4wNzeHr68vDh8+XOi8AwcOhCRJ+R6NGzc2eL1EZNwkScKst7y1F/kb8mMYou+kyR2LiMqAweWmT58++PTTT3H79m1IkgSNRoOjR49iwoQJ6N+/v0HPtWXLFowZMwZTpkxBREQE2rZti86dOyMuLq7A+RcuXIiEhATtIz4+Hvb29nj33XcNfRlEVAmYKBVY2qcFWtSyQ8rDXPRffQI3HzyUOxYRlTKDTwXPzc3FwIEDsXnzZgghYGJiArVajT59+mDdunUG3Rncz88PLVq0wPLly7XTvLy80L17d8yePfuZy+/cuRNvv/02YmNj4e7uXuA82dnZyM7+3xVLU1NT4ebmxlPBiSqRB5k5eHfFMVxOTEed6lbYGhQAeyvTZy9IROWGIaeCF/s6N1evXsWpU6eg0Wjg4+ODevXqGbR8Tk4OLC0tsXXrVp0xPB9//DEiIyNx8ODBZz5H165dkZ2djX379hU6z/Tp0zFjxox801luiCqXhJSH6LEsFLdSstDczQ4bh/rB0pRXMSaqKErlOjcajQZz585FmzZt0Lp1a6xatQpvvPEGevbsaXCxAYCkpCSo1Wo4OTnpTHdycsLt27efuXxCQgL++OMPfPDBB0XON3nyZKSkpGgf8fHxBmcloorP2dYC64e0hp2lCpHxDzB8wynkqjVyxyKiUqB3ufnmm28wadIkWFlZwdnZGfPnz8fo0aOfO8DTVw8VQuh1RdF169bBzs4O3bt3L3I+MzMz2NjY6DyIqHKq61gFawa2grlKgQOX7uKTbWeg0VSqi7QTVQp6l5t169Zh8eLF2LdvH3bt2oWdO3di/fr1KO7dGxwcHKBUKvMdpUlMTMx3NOdpQgisWbMG/fr107nWDhHRs7SoVRXL+/pCqZCwI+Imvvo9qtj7MSIqn/QuN9evX8cbb7yh/bpTp04QQuDWrVvFWrGpqSl8fX0REhKiMz0kJAQBAQFFLnvw4EFcuXIFQ4YMKda6iahy69DQEXPfaQoAWHM0Fkv+viJzIiIqSXqXm5ycHFhYWGi/liQJpqamOmciGWrcuHFYtWoV1qxZg6ioKIwdOxZxcXEICgoC8Gi8TEGnl69evRp+fn7w9vYu9rqJqHJ7u4Urpr7RCAAwLyQaPx3nbRqIjIVBpwp88cUXsLS01H6dk5ODWbNmwdbWVjtt/vz5ej9fr169kJycjJkzZyIhIQHe3t7Ys2eP9rTuhISEfNe8SUlJQXBwMBYuXGhIdCKifAa/6IkHmTlY9PcVTN11DjbmJujW3PCLkRJR+aL3qeDt27d/5kBfSZLw999/l0iw0mLIqWREZPyEEJi2+zzWH7sOE4WEHwa0RIcGjnLHIqKnlMl1bioqlhsieppGIzBmSyR2n74Fc5UCPw/xQ0sPe7ljEdETSuU6N0RExkqhkDCvZzO0b1AdWbkaDFp3Ehdupcodi4iKieWGiAiASqnA8r6+aOleFWlZeei/5gRikzLkjkVExcByQ0T0HwtTJVYPbAUvZxskpWfj/VX/8kabRBUQyw0R0RNsLVT4aUhr1K5uhZsPHqLfqn9xN634l7wgorLHckNE9BQHazP8PMQPLnYWuJqUgX6r/0VKZq7csYhITwaXGw8PD8ycOTPf9WeIiIxJTTsLbPjAD9WrmOHi7TQMXHcCGdl5csciIj0YXG7Gjx+PXbt2oXbt2ujYsSM2b978XFcpJiIqrzwcrPDTkNawtVAhIu4Bhq4PQ1auWu5YRPQMBpebUaNGITw8HOHh4WjUqBFGjx4NZ2dnjBw5EqdOnSqNjEREsmlYwwY/Dm4NK1MlQmOSMXLjKeSqNXLHIqIiFHvMTbNmzbBw4ULcvHkT06ZNw6pVq9CqVSs0a9YMa9as4V12ichoNHezw6oBrWBmosBfUYkY98tpqDXcxxGVV8UuN7m5ufjll1/w5ptvYvz48WjZsiVWrVqFnj17YsqUKejbt29J5iQikpV/nWpY8b4vVEoJv56+hU+Dz0DDgkNULhl040wAOHXqFNauXYtNmzZBqVSiX79++O6779CwYUPtPIGBgXjppZdKNCgRkdw6NHTEot4+GLkpAtvCb8BcpcCX3byfed89IipbBpebVq1aoWPHjli+fDm6d+8OlUqVb55GjRqhd+/eJRKQiKg86dzEGfPVGozZEomfj8fBzESJz1/3YsEhKkcMLjdXr16Fu7t7kfNYWVlh7dq1xQ5FRFSedWvuguxcDT4JPoPVR2JhoVJiQqcGcsciov8YPOamQ4cOSE5Ozjf9wYMHqF27domEIiIq73q2csPMbo0BAEv+uYIlf1+WORERPWZwubl27RrU6vzXecjOzsbNmzdLJBQRUUXQ398Dn3V5NN7w233RWHX4qsyJiAgw4GOp3bt3a//9559/wtbWVvu1Wq3G/v374eHhUaLhiIjKuw9fqoOsXA3mh0Tjq9+jYGqiQH9/D7ljEVVqepeb7t27AwAkScKAAQN0vqdSqeDh4YF58+aVaDgioopg1Mt1kZWrxrIDMZi66zyUCgl9/Yoem0hEpUfvcqPRPLoip6enJ06ePAkHB4dSC0VEVJFIkoSJnRogTyOw8tBVTNlxDiYKCb1a1ZI7GlGlZPDZUrGxsaWRg4ioQpMkCZM7N0SeWmDN0VhM2n4WSoUC7/i6yh2NqNLRq9wsWrQIH374IczNzbFo0aIi5x09enSJBCMiqmgkScIXb3hBrdHgx2PXMXHbaSgVwFs+LDhEZUkSetwEytPTE2FhYahWrRo8PT0LfzJJwtWr5ftsgdTUVNja2iIlJQU2NjZyxyEiIySEwOc7z2HDv3FQSMCC3j54s1lNuWMRVWiGvH/rdeTmyY+i+LEUEVHRJEnCl928odYIbD4Zj7FbIqGUJLze1FnuaESVgsHXuTl48GBp5CAiMioKhYSv32qCd3xdodYIfLw5An+cTZA7FlGlYHC56dixI2rVqoVJkybh7NmzpZGJiMgoKBQSvunRFG/7uCBPIzByEwsOUVkwuNzcunULn3zyCQ4fPoxmzZqhadOmmDNnDm7cuFEa+YiIKjSlQsLcd5vhbR8XqP8rOHtYcIhKlV4DigsTGxuLjRs3YtOmTbh48SJeeukl/P333yWZr8RxQDERyUGtEZi49TS2R9yEUiFhUW8fjsEhMoAh79/PVW6AR7de+OOPP/DFF1/gzJkzBd53qjxhuSEiuag1AhO3ncb2U48KzsLezfFGU55FRaQPQ96/Df5Y6rGjR49i+PDhcHZ2Rp8+fdC4cWP89ttvxX06IiKjp1RImPtOM/Ro8XiQcSR+PX1L7lhERsfgKxR/9tln2LRpE27duoVXX30VCxYsQPfu3WFpaVka+YiIjIpSIWHOO00hScC28BsYsyUSANCV18EhKjEGl5sDBw5gwoQJ6NWrF+8vRURUDMr/zqKSAGwNv4GPN0dAIwS6NXeROxqRUTC43ISGhpZGDiKiSuVxwQEeFZyxWyKRqxa8FxVRCSjWmJuffvoJbdq0Qc2aNXH9+nUAwIIFC7Br164SDUdEZMweXwfnvdZu0Ahg4rbT2HwiTu5YRBWeweVm+fLlGDduHLp06YIHDx5oz46ys7PDggULSjofEZFRUygkzOreBP393SEEMGn7Wfx07JrcsYgqNIPLzeLFi/HDDz9gypQpUCqV2uktW7bkFYuJiIpBoZAw483GGPLioxsTf7HrPFYf4X38iIrL4HITGxsLHx+ffNPNzMyQkZFRIqGIiCobSZLw+eteGNa+DgDgy98uYPmBGJlTEVVMBpcbT09PREZG5pv+xx9/oFGjRiWRiYioUpIkCZ90aoCPX6kHAPhm70Us2n9Z5lREFY/BZ0tNnDgRI0aMQFZWFoQQOHHiBDZt2oTZs2dj1apVpZGRiKjSkCQJYzvWh6mJAnP/vIT5IdHIydNgfGB9SJIkdzyiCsHgcjNo0CDk5eXhk08+QWZmJvr06QMXFxcsXLgQvXv3Lo2MRESVzogOdaFSSvh6z0Us+ecKHuaq8fnrXiw4RHowqNzk5eVhw4YN6Nq1K4YOHYqkpCRoNBo4OjqWVj4iokrrw5fqwMxEiWm7Hw0wzsxR46vu3lAqWHCIimLQmBsTExMMGzYM2dnZAAAHBwcWGyKiUjQgwANz3mkKhQRsOhGH8b9EIk+tkTsWUblm8IBiPz8/RERElEYWIiIqQM+WbljY2wcmCgk7I29hxMZTyM5Tyx2LqNwyuNwMHz4c48ePx5IlS3Ds2DGcOXNG52GoZcuWwdPTE+bm5vD19cXhw4eLnD87OxtTpkyBu7s7zMzMUKdOHaxZs8bg9RIRVSRdm9XEivd9YWqiwJ/n7+DD9eF4mMOCQ1QQSQghDFlAocjfhyRJghACkiRpr1isjy1btqBfv35YtmwZ2rRpg++//x6rVq3ChQsXUKtWrQKX6datG+7cuYOvvvoKdevWRWJiIvLy8hAQEKDXOlNTU2Fra4uUlBTY2NjonZWIqDw4cjkJQ9eH4WGuGn6e9lg9sBWszQw+N4SowjHk/dvgcvP4XlKFcXd31/u5/Pz80KJFCyxfvlw7zcvLC927d8fs2bPzzb9371707t0bV69ehb29vf6hn8ByQ0QVXdi1exi09iTSsvPQ3M0OPw5qDVtLldyxiEqVIe/fBn8s5e7uXuRDXzk5OQgPD0dgYKDO9MDAwELvPL579260bNkSc+bMgYuLC+rXr48JEybg4cOHha4nOzsbqampOg8iooqspYc9Ng59AXaWKkTGP0CvlceQmJYldyyicsPgcpOcnKz9d3x8PKZOnYqJEyc+c6zM05KSkqBWq+Hk5KQz3cnJCbdv3y5wmatXr+LIkSM4d+4cduzYgQULFmDbtm0YMWJEoeuZPXs2bG1ttQ83NzeDchIRlUdNXG2x5UN/OFYxw8XbaXh3xTHE38uUOxZRuaB3uTl79iw8PDzg6OiIhg0bIjIyEq1atcJ3332HlStXokOHDti5c6fBAZ6+INXjsTsF0Wg0kCQJGzZsQOvWrdGlSxfMnz8f69atK/TozeTJk5GSkqJ9xMfHG5yRiKg8alCjCrYFBcDN3gLXkzPxzopQXL6TJncsItnpXW4++eQTNGnSBAcPHkT79u3xxhtvoEuXLkhJScH9+/fx0Ucf4f/+7//0XrGDgwOUSmW+ozSJiYn5juY85uzsDBcXF9ja2mqneXl5QQiBGzduFLiMmZkZbGxsdB5ERMaiVjVLbAsKQAOnKriTmo13vz+G0/EP5I5FJCu9y83Jkycxa9YsvPjii/j2229x69YtDB8+HAqFAgqFAqNGjcLFixf1XrGpqSl8fX0REhKiMz0kJKTQM5/atGmDW7duIT09XTstOjoaCoUCrq6ueq+biMiYONmYY8tHL6C5mx0eZOaizw/HERqTJHcsItnoXW7u3buHGjVqAACsra1hZWWlc8ZS1apVkZZm2OHQcePGYdWqVVizZg2ioqIwduxYxMXFISgoCMCjj5T69++vnb9Pnz6oVq0aBg0ahAsXLuDQoUOYOHEiBg8eDAsLC4PWTURkTOwsTbHhAz+0qVsNGTlqDFx7EvvOFzx+kcjYGTSg+OmxMM97A7devXphwYIFmDlzJpo3b45Dhw5hz5492rOuEhISEBcXp53f2toaISEhePDgAVq2bIm+ffuia9euWLRo0XPlICIyBlZmJlgzsBU6NXZCTp4GwzacQnB4wR/ZExkzva9zo1Ao0LlzZ5iZmQEAfv31V7z88suwsrIC8OiU67179xp0ET858Do3RGTs8tQaTNp+Ftv+KzZfvNEIQ170lDkV0fMplYv4DRo0SK+Vr127Vq/55MJyQ0SVgUYjMGtPFFYfiQUADGtfB590avDcR9yJ5FKqVyiu6FhuiKiyEEJg+cEYzNl7CQDQs6Urvn6rCUyUBl/ijEh2pXqFYiIiqhgkScLw9nXxTY8mUEjAL2E3EPTzKWTllu/hA0TPi+WGiMjI9WpVC9/3awkzEwX+irqDfqv/RUpmrtyxiEoNyw0RUSXQsZETfhrihyrmJjh57T56fn8Mt1N4PyoyTiw3RESVRGtPe2wNenQ/qkt30tBjeShi7qY/e0GiCoblhoioEmlYwwbBwwJQ28EKNx88xLsrjiGSt2sgI8NyQ0RUybjZW2JrkD+autriXkYO3lt5HH9fvCN3LKISw3JDRFQJVbM2w8ahL6BtPQc8zFVj6PpwbDkZ9+wFiSoAlhsiokrK+r/bNfRo4Qq1RuDT4LP4LiQalezyZ2SEWG6IiCoxlVKBb99tilEv1wUALNx/GZOCzyJPrZE5GVHxsdwQEVVykiRhfGADzHrLGwoJ2BIWj6Hrw5CZkyd3NKJiYbkhIiIAQF8/d3zfryXMVQr8c+kueq88jqT0bLljERmM5YaIiLQ6NnLCxqEvoKqlCmdupKDH8lBcS8qQOxaRQVhuiIhIR4taVRE8LABu9ha4npyJt5eHIvz6fbljEemN5YaIiPKpXd0a24e1QROXR9fC6fPDcew5myB3LCK9sNwQEVGBqlcxw+YPX8CrXo7IztNg+IZT+P5gDE8Vp3KP5YaIiAplZWaC7/u1xMAADwDA7D8u4vOd53iqOJVrLDdERFQkpULC9DcbY+objSBJwIZ/4zB0fRjSs3mqOJVPLDdERKSXwS96YsX7vtpTxXuuOIbbKVlyxyLKh+WGiIj01qlxDWz+0B8O1qa4kJCK7kuP4sKtVLljEelguSEiIoM0d7PDjuFtUNfRGrdTs/DuilAcuJQodywiLZYbIiIymJu9JYKDAuBfuxoyctQYvO4k1h+7JncsIgAsN0REVEy2lir8OLg1erRwhUYAU3edx7RdPJOK5MdyQ0RExWZq8uiu4p++1hAA8OOx6xj8YxhSs3JlTkaVGcsNERE9F0mSMKx9Hax43xcWKiUORd9Fj2WhiEvOlDsaVVIsN0REVCJe866BrUH+cLIxw+XEdHRfdhQnr92TOxZVQiw3RERUYrxdbLFrxIvwdrHBvYwc9P3hXwSH35A7FlUyLDdERFSiatia45eP/PFa4xrIUWswfutpzP3zIjQa3pOKygbLDRERlThLUxMs69sCw9vXAQAs/ScGwzecQmYOb9lApY/lhoiISoVCIeGT1xri23ebQaWUsPf8bfRYfgw37nOgMZUulhsiIipV7/i6YtPQF+BgbYqohFR0W3IUJ2I50JhKD8sNERGVupYe9tg18kU0rmmD5Iwc9F11HJtOxMkdi4wUyw0REZUJFzsLbAsKwOtNnZGrFpi8/Sym7TqHXF7RmEoYyw0REZUZC1MllrzngwmB9QE8uqLxgDUncD8jR+ZkZExYboiIqExJkoSRL9fD9/18YWmqRGhMMrovO4roO2lyRyMjwXJDRESy6NS4BrYPD4BrVQtcT87EW0uP4q8Ld+SORUaA5YaIiGTTsIYNdo98EX6e9sjIUWPoT2FYvP8yL/hHz4XlhoiIZGVvZYqfP/BDvxfcIQQwLyQaQT+HI413FqdiYrkhIiLZqZQKfNndG9/0aAJTpQL7LtxB96VHEXM3Xe5oVAGx3BARUbnRq1Ut/BLkjxo25oi5m4HuS44ihONwyEAsN0REVK40d7PDr6NeRGsPe6Rl52Ho+jAs+Cua43BIb7KXm2XLlsHT0xPm5ubw9fXF4cOHC533wIEDkCQp3+PixYtlmJiIiEpb9Spm2DDUDwP83QEAC/66jA9/Ckcqx+GQHmQtN1u2bMGYMWMwZcoUREREoG3btujcuTPi4oq+JPelS5eQkJCgfdSrV6+MEhMRUVlRKRWY0c0bc99pClMTBf6KejQO50oix+FQ0SQhhGzH+fz8/NCiRQssX75cO83Lywvdu3fH7Nmz881/4MABdOjQAffv34ednV2x1pmamgpbW1ukpKTAxsamuNGJiKgMnbnxAB/9FI6ElCxYm5lgXs9m6NS4htyxqAwZ8v4t25GbnJwchIeHIzAwUGd6YGAgQkNDi1zWx8cHzs7OeOWVV/DPP/8UOW92djZSU1N1HkREVLE0dX00DsfP0x7p2Xn46Kdw/N8fF5HH+1JRAWQrN0lJSVCr1XByctKZ7uTkhNu3bxe4jLOzM1auXIng4GBs374dDRo0wCuvvIJDhw4Vup7Zs2fD1tZW+3BzcyvR10FERGXDwdoMP3/gh8FtPAEAKw7GoN/qE7ibli1zMipvZPtY6tatW3BxcUFoaCj8/f2102fNmoWffvpJ70HCXbt2hSRJ2L17d4Hfz87ORnb2/37xU1NT4ebmxo+liIgqsN/O3MKn284gI0cNJxszLOvbAr7u9nLHolJUIT6WcnBwgFKpzHeUJjExMd/RnKK88MILuHz5cqHfNzMzg42Njc6DiIgqtjea1sSukW1Q19Ead1Kz0ev741h7NBYyDiOlckS2cmNqagpfX1+EhIToTA8JCUFAQIDezxMREQFnZ+eSjkdEROVcXccq2DWiDd5o6ow8jcCMXy9g9OZIZGTnyR2NZGYi58rHjRuHfv36oWXLlvD398fKlSsRFxeHoKAgAMDkyZNx8+ZNrF+/HgCwYMECeHh4oHHjxsjJycHPP/+M4OBgBAcHy/kyiIhIJlZmJlj8ng983ati1u9R+PX0LUQlpGLF+76o62gtdzySiazlplevXkhOTsbMmTORkJAAb29v7NmzB+7ujy7alJCQoHPNm5ycHEyYMAE3b96EhYUFGjdujN9//x1dunSR6yUQEZHMJEnCoDaeaOJiixEbT+FKYjq6LTmCOe80w+tNeWS/MpL1Ojdy4HVuiIiM1920bIzadArHr94DAAxu44lJnRvC1ET2C/LTc6oQA4qJiIhKWvUqZvh5iB+C2tUBAKw5Got3vz+G+HuZMiejssRyQ0RERsVEqcCkzg2xqn9L2FqocDr+AV5fdJh3F69EWG6IiMgovdrICb+PfhHN3OyQmvXo7uKzfr+AXF7V2Oix3BARkdFyrWqJrR/544MXH13V+IfDsej5/THcfPBQ5mRUmlhuiIjIqJmaKPD5G42wsp8vbMxNEBH3AF0WHsb+KH5MZaxYboiIqFIIbFwDv49ui2autkh5mIshP4Zh9p4ofkxlhFhuiIio0nCzt8TWoAAMauMBAPj+0FX0Xnkct/gxlVFhuSEiokrF1ESBaV0bY8X7vqhiboLw6/fRhWdTGRWWGyIiqpRe866B30e1RVNXWzzIzMXQ9WGYtuscsnLVckej58RyQ0RElVatapbYFhSAoW0fnU3147HreGtZKK4kpsucjJ4Hyw0REVVqpiYKTHm9EdYOaoVqVqaISkhF18VHsOVkHCrZHYqMBssNERERgA4NHPHHmLZ4sa4DHuaq8WnwWYzaFIHUrFy5o5GBWG6IiIj+41jFHOsHt8anrzWEiULCb2cS8Pqiw4iIuy93NDIAyw0REdETFAoJw9rXwdYgf7jZWyD+3kO8u+IYlh24Ao2GH1NVBCw3REREBfCpVRW/j26LN5o6I08jMGfvJfRfcwKJqVlyR6NnYLkhIiIqhI25Covf88GcHk1hoVLiyJUkvLbwMPadvy13NCoCyw0REVERJElCz1Zu+HVUGzRytsG9jBx8+FM4Jm8/g8ycPLnjUQFYboiIiPRQ17EKdowIwEcv1YYkAZtOxOP1RUdwOv6B3NHoKSw3REREejIzUWJyFy9s+MAPzrbmiE3KQI/loVjy92WoOdi43GC5ISIiMlBAHQfs/fglvP7fYONv90Wj1/fHEH8vU+5oBJYbIiKiYrG1VGHJez6Y37MZrM1MEHb9PjovPIztp27wysYyY7khIiIqJkmS8HYLV/zxcVu0dK+K9Ow8jPvlNEZuikBKJq9sLBeWGyIioufkZm+JzR++gAmB9WGikPD7mQS8tvAQQq8kyR2tUmK5ISIiKgEmSgVGvlwPwcMC4OlghYSULPRZ9S+m7z6PhzlqueNVKiw3REREJaiZmx1+G/Ui+vjVAgCsC73G+1OVMZYbIiKiEmZlZoKv32qCdYNawcnGDFf/O2X82z8vISdPI3c8o8dyQ0REVEraN3DEvjHt0L15TWgEsOSfK+i+9Cgu3k6VO5pRY7khIiIqRbaWKizo7YNlfVugqqUKFxJS0XXxESw/EMML/5USlhsiIqIy0KWJM/aNbYdXvZyQqxb4Zu9F9Pz+GK4lZcgdzeiw3BAREZWR6lXM8EN/X3z7bjNUMTNB+H8X/lt/7Bo0PIpTYlhuiIiIypAkSXjH1xV7x76ENnWr4WGuGlN3nUf/NSdw88FDueMZBZYbIiIiGbjYWeCnwX6Y8WZjmKsUOHIlCYHzD+Ln49d5FOc5sdwQERHJRKGQMCDAA3tGt0Urj6rIyFHj853n0HfVv4hL5k04i4vlhoiISGa1q1tjy4f+mNa1ESxUShy7moxOCw5h3dFYHsUpBpYbIiKickChkDCojSf2jmmLF2rb42GuGtN/vYDeK48jlmdUGYTlhoiIqBxxr2aFjR+8gC+7e8PKVIkT1+7htQWHsOrwVV4XR08sN0REROWMQiGh3wvu+HPsS2hbzwHZeRp89XsU3lkRiiuJaXLHK/dYboiIiMop16qWWD+4Nf7v7SaoYmaCiLgH6LLoCJYduII8Ne9RVRiWGyIionJMkiT0bl0L+8a9hA4NqiMnT4M5ey/h7eWhuHCL96gqCMsNERFRBeBsa4E1A1th3rvNYGNugjM3UtB1yRF8s/cisnLVcscrV1huiIiIKghJktDD1xV/jWuHLk1qQK0RWH4gBp0WHMLRK0lyxys3WG6IiIgqGEcbcyzr64sf+rdEDRtzXE/ORN9V/2LC1tO4n5EjdzzZyV5uli1bBk9PT5ibm8PX1xeHDx/Wa7mjR4/CxMQEzZs3L92ARERE5VTHRk4IGfcS+vu7Q5KAbeE38Or8g9gVeRNCVN7TxmUtN1u2bMGYMWMwZcoUREREoG3btujcuTPi4uKKXC4lJQX9+/fHK6+8UkZJiYiIyqcq5irM7OaNbUEBqO9kjeSMHHy8ORKD1p3EjfuV8xYOkpCx2vn5+aFFixZYvny5dpqXlxe6d++O2bNnF7pc7969Ua9ePSiVSuzcuRORkZF6rzM1NRW2trZISUmBjY3N88QnIiIqV3LyNFhxMAZL/r6CHLUGlqZKjA9sgIEBHlAqJLnjPRdD3r9lO3KTk5OD8PBwBAYG6kwPDAxEaGhoocutXbsWMTExmDZtml7ryc7ORmpqqs6DiIjIGJmaKDD6lXrY83FbtPawR2aOGl/+dgFvLztaqU4bl63cJCUlQa1Ww8nJSWe6k5MTbt++XeAyly9fxqRJk7BhwwaYmJjotZ7Zs2fD1tZW+3Bzc3vu7EREROVZXUdrbP7wBcx+uwmqmJvg9H+njc/eE4XMnDy545U62QcUS5LuYTIhRL5pAKBWq9GnTx/MmDED9evX1/v5J0+ejJSUFO0jPj7+uTMTERGVdwqFhPda18L+J04b//7QVbw67yD+PH/bqAcc63f4oxQ4ODhAqVTmO0qTmJiY72gOAKSlpSEsLAwREREYOXIkAECj0UAIARMTE+zbtw8vv/xyvuXMzMxgZmZWOi+CiIionHt82vj+qDuYtvs8btx/iI9+CscrDR0x/c3GcLO3lDtiiZPtyI2pqSl8fX0REhKiMz0kJAQBAQH55rexscHZs2cRGRmpfQQFBaFBgwaIjIyEn59fWUUnIiKqcF7xckLI2HYY0aEOVEoJ+y8mouN3B7H0nyvIyTOu+1TJduQGAMaNG4d+/fqhZcuW8Pf3x8qVKxEXF4egoCAAjz5SunnzJtavXw+FQgFvb2+d5R0dHWFubp5vOhEREeVnYarExE4N8ZaPCz7feQ7Hr97D3D8vYUfETXzZzRv+darJHbFEyFpuevXqheTkZMycORMJCQnw9vbGnj174O7uDgBISEh45jVviIiIyDB1Hatg09AXsCPiJmb9HoUriel474fjeNvHBZ+97gUH64o9nEPW69zIgde5ISIi+p+UzFzM+fMiNp6IgxCAjbkJPnmtIfq0rgVFObo2jiHv3yw3REREhMj4B5iy4yzO/3c9nGZudpjV3RveLrYyJ3uE5aYILDdEREQFy1Nr8NPx65i3Lxrp2XlQSMD7L7hjXMf6sLM0lTVbhbhCMREREZUvJkoFBrXxxN/j26Frs5rQCGD9sevo8O0BbPw3DmpNxTgewiM3REREVKDQK0mY/ut5RN9JBwB4u9hgxpve8HWvWuZZ+LFUEVhuiIiI9Jer1uCnY9fx3V/RSMt6dOuGt1u4YFLnhnCsYl5mOVhuisByQ0REZLik9GzM2XsRv4TdAABYm5ng41fqYUCAB0xNSn+UC8tNEVhuiIiIii8y/gGm7TqH0zdSAAB1qlth+puN0bZe9VJdL8tNEVhuiIiIno9GI7At/Aa+2XsRyRk5AIBOjZ3w+euNSu1eVSw3RWC5ISIiKhkpD3Ox4K9orD92HWqNgJmJAkHt6mBY+zowVylLdF0sN0VguSEiIipZl26nYfru8zh2NRkA4GJngR0jAkp0wDGvc0NERERlpkGNKtg41A9L+7RATVtz1K5uheoy3p9K1htnEhERkXGQJAmvN3XGyw0dkZqVC0mS775ULDdERERUYixMlbAwLdnxNobix1JERERkVFhuiIiIyKiw3BAREZFRYbkhIiIio8JyQ0REREaF5YaIiIiMCssNERERGRWWGyIiIjIqLDdERERkVFhuiIiIyKiw3BAREZFRYbkhIiIio8JyQ0REREal0t0VXAgBAEhNTZU5CREREenr8fv24/fxolS6cpOcnAwAcHNzkzkJERERGSotLQ22trZFzlPpyo29vT0AIC4u7pkbpzJITU2Fm5sb4uPjYWNjI3cc2XF7/A+3hS5uD13cHv/DbaGrtLaHEAJpaWmoWbPmM+etdOVGoXg0zMjW1pa/hE+wsbHh9ngCt8f/cFvo4vbQxe3xP9wWukpje+h7UIIDiomIiMiosNwQERGRUal05cbMzAzTpk2DmZmZ3FHKBW4PXdwe/8NtoYvbQxe3x/9wW+gqD9tDEvqcU0VERERUQVS6IzdERERk3FhuiIiIyKiw3BAREZFRYbkhIiIio2KU5WbZsmXw9PSEubk5fH19cfjw4SLnz87OxpQpU+Du7g4zMzPUqVMHa9asKaO0pc/Q7bFhwwY0a9YMlpaWcHZ2xqBBg7S3rajIDh06hK5du6JmzZqQJAk7d+585jIHDx6Er68vzM3NUbt2baxYsaL0g5YRQ7fH9u3b0bFjR1SvXh02Njbw9/fHn3/+WTZhS1lxfjceO3r0KExMTNC8efNSy1fWirM9jHk/WpztYaz70dmzZ6NVq1aoUqUKHB0d0b17d1y6dOmZy5X1vtToys2WLVswZswYTJkyBREREWjbti06d+6MuLi4Qpfp2bMn9u/fj9WrV+PSpUvYtGkTGjZsWIapS4+h2+PIkSPo378/hgwZgvPnz2Pr1q04efIkPvjggzJOXvIyMjLQrFkzLFmyRK/5Y2Nj0aVLF7Rt2xYRERH47LPPMHr0aAQHB5dy0rJh6PY4dOgQOnbsiD179iA8PBwdOnRA165dERERUcpJS5+h2+KxlJQU9O/fH6+88kopJZNHcbaHMe9HDd0exrwfPXjwIEaMGIHjx48jJCQEeXl5CAwMREZGRqHLyLIvFUamdevWIigoSGdaw4YNxaRJkwqc/48//hC2trYiOTm5LOKVOUO3x9y5c0Xt2rV1pi1atEi4urqWWkY5ABA7duwocp5PPvlENGzYUGfaRx99JF544YVSTCYPfbZHQRo1aiRmzJhR8oFkZMi26NWrl/j888/FtGnTRLNmzUo1l1z02R7Gvh99kj7bo7LsR4UQIjExUQAQBw8eLHQeOfalRnXkJicnB+Hh4QgMDNSZHhgYiNDQ0AKX2b17N1q2bIk5c+bAxcUF9evXx4QJE/Dw4cOyiFyqirM9AgICcOPGDezZswdCCNy5cwfbtm3D66+/XhaRy5Vjx47l23adOnVCWFgYcnNzZUpVfmg0GqSlpWlvRlvZrF27FjExMZg2bZrcUWRnzPvR4qhM+9GUlBQAKHI/IMe+1KhunJmUlAS1Wg0nJyed6U5OTrh9+3aBy1y9ehVHjhyBubk5duzYgaSkJAwfPhz37t2r8J8XF2d7BAQEYMOGDejVqxeysrKQl5eHN998E4sXLy6LyOXK7du3C9x2eXl5SEpKgrOzs0zJyod58+YhIyMDPXv2lDtKmbt8+TImTZqEw4cPw8TEqHajxWLM+9HiqCz7USEExo0bhxdffBHe3t6FzifHvtSojtw8JkmSztdCiHzTHtNoNJAkCRs2bEDr1q3RpUsXzJ8/H+vWrTOa/3UYsj0uXLiA0aNHY+rUqQgPD8fevXsRGxuLoKCgsoha7hS07QqaXtls2rQJ06dPx5YtW+Do6Ch3nDKlVqvRp08fzJgxA/Xr15c7TrlQGfajhqgs+9GRI0fizJkz2LRp0zPnLet9qVH9l8PBwQFKpTLfUYnExMR8rfExZ2dnuLi46NxG3cvLC0II3LhxA/Xq1SvVzKWpONtj9uzZaNOmDSZOnAgAaNq0KaysrNC2bVt89dVXlepoRY0aNQrcdiYmJqhWrZpMqeS3ZcsWDBkyBFu3bsWrr74qd5wyl5aWhrCwMERERGDkyJEAHr25CyFgYmKCffv24eWXX5Y5Zdky5v1ocVSG/eioUaOwe/duHDp0CK6urkXOK8e+1KiO3JiamsLX1xchISE600NCQhAQEFDgMm3atMGtW7eQnp6unRYdHQ2FQvHMH1h5V5ztkZmZCYVC99dCqVQC+F/Triz8/f3zbbt9+/ahZcuWUKlUMqWS16ZNmzBw4EBs3LjRKMcP6MPGxgZnz55FZGSk9hEUFIQGDRogMjISfn5+ckcsc8a8Hy0OY96PCiEwcuRIbN++HX///Tc8PT2fuYws+9JSG6osk82bNwuVSiVWr14tLly4IMaMGSOsrKzEtWvXhBBCTJo0SfTr1087f1pamnB1dRXvvPOOOH/+vDh48KCoV6+e+OCDD+R6CSXK0O2xdu1aYWJiIpYtWyZiYmLEkSNHRMuWLUXr1q3legklJi0tTURERIiIiAgBQMyfP19ERESI69evCyHyb4urV68KS0tLMXbsWHHhwgWxevVqoVKpxLZt2+R6CSXK0O2xceNGYWJiIpYuXSoSEhK0jwcPHsj1EkqModviacZ2tpSh28PY96OGbg9j3o8OGzZM2NraigMHDujsBzIzM7XzlId9qdGVGyGEWLp0qXB3dxempqaiRYsWOqeoDRgwQLRr105n/qioKPHqq68KCwsL4erqKsaNG6fzg6roDN0eixYtEo0aNRIWFhbC2dlZ9O3bV9y4caOMU5e8f/75RwDI9xgwYIAQouBtceDAAeHj4yNMTU2Fh4eHWL58edkHLyWGbo927doVOX9FVpzfjScZW7kpzvYw5v1ocbaHse5HC9oOAMTatWu185SHfan0X1giIiIio2BUY26IiIiIWG6IiIjIqLDcEBERkVFhuSEiIiKjwnJDRERERoXlhoiIiIwKyw0REREZFZYbIiIiMiosN1Soa9euQZIkREZGlup6MjMz0aNHD9jY2ECSJDx48EDvZSVJws6dO0s0z7p162BnZ1eiz/k8PDw8sGDBghJ/3qNHj6JJkyZQqVTo3r273suVt+0jhMCHH34Ie3v7Mvl9NTZl9XdeVqZPn47mzZtrvx44cOAzf7/bt2+PMWPGlGouKlssNxXcwIEDIUkSJEmCiYkJatWqhWHDhuH+/fsGP8/TOwA3NzckJCTA29u7BBPn9+OPP+Lw4cMIDQ1FQkKCzp2FH3t6h2WMyro0jBs3Ds2bN0dsbCzWrVtX4DylVaxK0t69e7Fu3Tr89ttvZfL7KoeSKvH6vNEbm4ULFxb6+11cBw4cMPg/YoUprb+xivC3W5pM5A5Az++1117D2rVrkZeXhwsXLmDw4MF48OABNm3a9FzPq1QqUaNGjRJKWbiYmBh4eXkZ5ZtSeRYTE4OgoKBye9fmnJwcmJqaPnO+mJgYODs7F3qne30IIaBWq2Fiwl2isSnoP0tUCZTqnauo1A0YMEB069ZNZ9q4ceOEvb299uu8vDwxePBg4eHhIczNzUX9+vXFggULtN+fNm1avpug/fPPPyI2NlYAEBEREdp5Dxw4IFq1aiVMTU1FjRo1xKeffipyc3OLzLht2zbRqFEjYWpqKtzd3cW3336r/d7TN2Ms6OaEa9euLfQmbQDEDz/8ILp37y4sLCxE3bp1xa5du3SWP3/+vOjcubOwsrISjo6O4v333xd3794tNO/atWuFra2tzrTdu3eLFi1aCDMzM+Hp6SmmT5+u87r1ybFr1y5Rt25dYW5uLtq3by/WrVsnAIj79+8XeGO+adOmCSGEcHd3F7NmzRKDBg0S1tbWws3NTXz//fdFbvOsrCwxatQoUb16dWFmZibatGkjTpw4IYQQ2p9rQdvzSQXdKPPJ7bN3717RsGFDYWVlJTp16iRu3bqls/yaNWtEw4YNhZmZmWjQoIFYunRpkZnbtWsnRowYIcaOHSuqVasmXnrpJSFE0T+/AQMG6ORzd3cXQgih0WjEN998Izw9PYW5ublo2rSp2Lp1q3Zdj7f33r17ha+vr1CpVOLvv//We7m//vpL+Pr6CgsLC+Hv7y8uXryY72ft6+srzMzMRLVq1cRbb72l/V52draYOHGiqFmzprC0tBStW7cW//zzT6Hbxd3dvcDXKIQQy5YtE7Vr1xYqlUrUr19frF+/vtDnedbfeXBwsGjfvr2wsLAQTZs2FaGhoTrLHz16VLRt21aYm5sLV1dXMWrUKJGenl7o+p61HX766Sfh6+srrK2thZOTk3jvvffEnTt3DN7Ws2fPFo6OjsLa2loMHjxYfPrppzo3MX16H5meni769esnrKysRI0aNcS3334r2rVrJz7++GO9shX09/P4BprP+v15WmF/Y8/a3j/++KOwsrIS0dHR2vlHjhwp6tWrJ9LT04t83sqi8r1iI/P0H25MTIxo1KiRcHJy0k7LyckRU6dOFSdOnBBXr14VP//8s7C0tBRbtmwRQgiRlpYmevbsKV577TXt7euzs7PzlZsbN24IS0tLMXz4cBEVFSV27NghHBwctG/CBQkLCxMKhULMnDlTXLp0Saxdu1ZYWFho30yTk5PF0KFDhb+/v0hISBDJycn5niMzM1OMHz9eNG7cWJvv8d2GAQhXV1exceNGcfnyZTF69GhhbW2tfZ5bt24JBwcHMXnyZBEVFSVOnTolOnbsKDp06FBo5qfLzd69e4WNjY1Yt26diImJEfv27RMeHh5i+vTp2nmelSM2NlaoVCoxYcIEcfHiRbFp0ybh4uKiLTfZ2dliwYIFwsbGRvsa09LShBCP3tzs7e3F0qVLxeXLl8Xs2bOFQqEQUVFRhb6G0aNHi5o1a4o9e/aI8+fPiwEDBoiqVauK5ORkkZeXJxISEoSNjY1YsGCBzvZ8UnJysnB1dRUzZ87UZnq8fVQqlXj11VfFyZMnRXh4uPDy8hJ9+vTRLrty5Urh7OwsgoODxdWrV0VwcLCwt7cX69atKzRzu3bthLW1tZg4caK4ePGiiIqKeubP78GDB2LmzJnC1dVVJCQkiMTERCGEEJ999plo2LCh2Lt3r4iJiRFr164VZmZm4sCBA0KI/71xNm3aVOzbt09cuXJFJCUl6b2cn5+fOHDggDh//rxo27atCAgI0L6O3377TSiVSjF16lRx4cIFERkZKWbNmqX9fp8+fURAQIA4dOiQuHLlipg7d64wMzPTeaN6UmJioraAPvkat2/fLlQqlVi6dKm4dOmSmDdvnlAqleLvv/8u8Hme9XfesGFD8dtvv4lLly6Jd955R7i7u2sL/JkzZ4S1tbX47rvvRHR0tDh69Kjw8fERAwcOLPTn+aztsHr1arFnzx4RExMjjh07Jl544QXRuXNn7ff12dZbtmwRpqam4ocffhAXL14UU6ZMEVWqVCmy3AwbNky4urqKffv2iTNnzog33nhDWFtb65SborLl5eWJ4OBgAUBcunRJJCQkiAcPHgghnv1797TC/sb02d7vvvuuaNWqlcjNzRV//PGHUKlU2v/AFPa8lQnLTQU3YMAAoVQqhZWVlTA3N9e29Pnz5xe53PDhw0WPHj10nufpI0BPl5vPPvtMNGjQQGg0Gu08S5cuFdbW1kKtVhe4nj59+oiOHTvqTJs4caJo1KiR9uuPP/64wCM2T5o2bZrODusxAOLzzz/Xfp2eni4kSRJ//PGHEEKIL774QgQGBuosEx8fr90xFeTpctO2bVvx9ddf68zz008/CWdnZ71zfPrpp8Lb21vnOaZMmaItNwWt9zF3d3fx/vvva7/WaDTC0dFRLF++vMD86enpQqVSiQ0bNmin5eTkiJo1a4o5c+Zop9na2hZ4xObpdX/33Xc60x4fSbty5Yp22tKlS3UKtZubm9i4caPOcl9++aXw9/cvdF3t2rUTzZs315mmz8/vu+++0zmakZ6eLszNzfMdeRgyZIh47733hBD/e+PcuXNnsZb766+/tN///fffBQDx8OFDIYQQ/v7+om/fvgW+xitXrghJksTNmzd1pr/yyiti8uTJBW8Y8ej3a8eOHTrTAgICxNChQ3Wmvfvuu6JLly6FPk9Rf+erVq3STjt//rwAoC3Q/fr1Ex9++KHOcocPHxYKhUL7up9W1HYoyIkTJwQAbanXd1sHBQXpPI+fn1+h5SYtLU2YmpqKzZs3a7+fnJwsLCwsdMqNvtke/+0Kod/vT0EK+hvTZ3vfu3dPuLq6imHDhgknJyfx1VdfPfN5KxN+wGwEOnTogOXLlyMzMxOrVq1CdHQ0Ro0apTPPihUrsGrVKly/fh0PHz5ETk6OwQN0o6Ki4O/vD0mStNPatGmD9PR03LhxA7Vq1SpwmW7duulMa9OmDRYsWAC1Wg2lUmlQhoI0bdpU+28rKytUqVIFiYmJAIDw8HD8888/sLa2zrdcTEwM6tev/8znDw8Px8mTJzFr1iztNLVajaysLGRmZsLS0vKZOS5duoRWrVrpPG/r1q2L9RolSUKNGjW0z13Q68rNzUWbNm2001QqFVq3bo2oqCi911kUS0tL1KlTR/u1s7OzNs/du3cRHx+PIUOGYOjQodp58vLynjn+oWXLljpfF+fnd+HCBWRlZaFjx44603NycuDj41Po+gxZ7smfh7OzMwAgMTERtWrVQmRkpM7rftKpU6cghMiXOzs7G9WqVStwmcJERUXhww8/1JnWpk0bLFy40KDneayw19SwYUOEh4fjypUr2LBhg3YeIQQ0Gg1iY2Ph5eWV7/mK2g4AEBERgenTpyMyMhL37t2DRqMBAMTFxaFRo0bPzFWrVi1ERUUhKChI53n9/f3xzz//FLjOmJgY5OTkwN/fXzvN3t4eDRo0KFa2Jxny+/Ms+mzvqlWrYvXq1ejUqRMCAgIwadIkg9Zh7FhujICVlRXq1q0LAFi0aBE6dOiAGTNm4MsvvwQA/PLLLxg7dizmzZsHf39/VKlSBXPnzsW///5r0HqEEDrF5vE0APmm67NMSVGpVDpfS5Kk3RlpNBp07doV33zzTb7lHu8on0Wj0WDGjBl4++23833P3NxcrxzPux2Keu6nFfYzKShDcRWU5/F6H+f64Ycf4OfnpzPfs8qslZWVztfF+fk9Xv/vv/8OFxcXne+ZmZkVuj5Dlnvy9T/epo+Xt7CwKDDX43mUSiXCw8PzbYuCCtyzlOTPuKjXpNFo8NFHH2H06NH5livoPzVA0dshIyMDgYGBCAwMxM8//4zq1asjLi4OnTp1Qk5Ojt65DKXP35wh2Z5kyO/Ps+i7vQ8dOgSlUolbt24hIyMDNjY2Bq3HmLHcGKFp06ahc+fOGDZsGGrWrInDhw8jICAAw4cP184TExOjs4ypqSnUanWRz9uoUSMEBwfr7EBDQ0NRpUqVfH/MTy5z5MgRnWmhoaGoX7++QUdt9MlXkBYtWiA4OBgeHh7FPhOmRYsWuHTpkrZAFkfDhg2xZ88enWlhYWE6Xxf3NT6tbt26MDU1xZEjR9CnTx8AQG5uLsLCwgy+lkdxMjk5OcHFxQVXr15F3759DVr2acX5+TVq1AhmZmaIi4tDu3bt9F5XcZd7WtOmTbF//34MGjQo3/d8fHygVquRmJiItm3b6v2cKpUq38/By8sLR44cQf/+/bXTQkNDCzyK8tjz/B2dP3/eoL+BorbDxYsXkZSUhP/7v/+Dm5sbgPx/D/rw8vLC8ePHdbbB8ePHC52/bt26UKlUOH78uLYk3L9/H9HR0dqfuT7ZHp/F9+S2LO7vT0E/E322d2hoKObMmYNff/0VkyZNwqhRo/Djjz8W+byVCa9zY4Tat2+Pxo0b4+uvvwbw6A86LCwMf/75J6Kjo/HFF1/g5MmTOst4eHjgzJkzuHTpEpKSkpCbm5vveYcPH474+HiMGjUKFy9exK5duzBt2jSMGzcOCkXBv0rjx4/H/v378eWXXyI6Oho//vgjlixZggkTJhj0mjw8PBAbG4vIyEgkJSUhOztbr+VGjBiBe/fu4b333sOJEydw9epV7Nu3D4MHD9b7D3/q1KlYv349pk+fjvPnzyMqKgpbtmzB559/rnf+jz76CBcvXsSnn36K6Oho/PLLL9prbzwuih4eHkhPT8f+/fuRlJSEzMxMvZ//SVZWVhg2bBgmTpyIvXv34sKFCxg6dCgyMzMxZMgQg57Lw8MDhw4dws2bN5GUlKT3ctOnT8fs2bOxcOFCREdH4+zZs1i7di3mz59v0PqL8/OrUqUKJkyYgLFjx+LHH39ETEwMIiIisHTpUp2df0kt97Rp06Zh06ZNmDZtGqKionD27FnMmTMHAFC/fn307dsX/fv3x/bt2xEbG4uTJ0/im2++yVd+n+Th4YH9+/fj9u3b2mtYTZw4EevWrcOKFStw+fJlzJ8/H9u3by/yb0ufv/OCfPrppzh27BhGjBiByMhIXL58Gbt378738be+26FWrVowNTXF4sWLcfXqVezevVt7pNkQH3/8MdasWYM1a9YgOjoa06ZNw/nz5wud39raGkOGDMHEiROxf/9+nDt3DgMHDtTZf+mTzd3dHZIk4bfffsPdu3eRnp5e7N+fgv7GnrW909LS0K9fP4waNQqdO3fGxo0b8csvv2Dr1q1FPm+lIsM4HypBBQ0QFEKIDRs2CFNTUxEXFyeysrLEwIEDha2trbCzsxPDhg0TkyZN0hl0l5iYKDp27Cisra1L7VRwlUolatWqJebOnavzfX0GFGdlZYkePXoIOzu7fKeCPz3Q8umBstHR0eKtt94SdnZ2wsLCQjRs2FCMGTNGZ2D0kwoa2Lt3714REBAgLCwshI2NjWjdurVYuXKl9vv65Hh8KriZmZlo3769WL58uc7gSCGECAoKEtWqVct3KvjTAwObNWtW5FlqDx8+FKNGjRIODg75TgUvLF9Bjh07Jpo2bSrMzMzynQr+pB07duQ73XTDhg2iefPmwtTUVFStWlW89NJLYvv27YWu6+nTcR971s/v6QHFQjwadL1w4ULRoEEDoVKpRPXq1UWnTp3EwYMHhRAFDwgt7nIRERECgIiNjdVOCw4O1r52BwcH8fbbb2u/9/jsRQ8PD6FSqUSNGjXEW2+9Jc6cOVPottm9e7eoW7euMDExKfap4ELo/3d+//597fcfO3HihHZZKysr0bRpU52znwpS1HbYuHGj8PDwEGZmZsLf31/s3r1bJ4e+23rWrFnCwcFBWFtbiwEDBohPPvmkyLOl0tLSxPvvvy8sLS2Fk5OTmDNnTr7fvWdlE0KImTNniho1aghJknROBS/q96cgBf2NPWt7Dxo0SDRp0kRkZWVp51+4cKGwt7cXN27cKPJ5KwtJiBIeAEFEepk1axZWrFiB+Ph4uaMQERkVjrkhKiPLli1Dq1atUK1aNRw9ehRz587FyJEj5Y5FRGR0WG6Iysjly5fx1Vdf4d69e6hVqxbGjx+PyZMnyx2LiMjo8GMpIiIiMio8W4qIiIiMCssNERERGRWWGyIiIjIqLDdERERkVFhuiIiIyKiw3BAREZFRYbkhIiIio8JyQ0REREbl/wEzBPVvSKSe5gAAAABJRU5ErkJggg==",
112
+ "text/plain": [
113
+ "<Figure size 640x480 with 1 Axes>"
114
+ ]
115
+ },
116
+ "metadata": {},
117
+ "output_type": "display_data"
118
+ }
119
+ ],
120
+ "source": [
121
+ "reference_length = 1\n",
122
+ "candidate_length = np.linspace(1.5, 0.5, 100)\n",
123
+ "\n",
124
+ "length_ratio = reference_length / candidate_length\n",
125
+ "BP = np.minimum(1, np.exp(1 - length_ratio))\n",
126
+ "\n",
127
+ "# Plot the data\n",
128
+ "fig, ax = plt.subplots(1)\n",
129
+ "lines = ax.plot(length_ratio, BP)\n",
130
+ "ax.set(\n",
131
+ " xlabel=\"Ratio of the length of the reference to the candidate text\",\n",
132
+ " ylabel=\"Brevity Penalty\",\n",
133
+ ")\n",
134
+ "plt.show()"
135
+ ]
136
+ },
137
+ {
138
+ "cell_type": "markdown",
139
+ "metadata": {},
140
+ "source": [
141
+ "### N-Gram Precision:\n",
142
+ "The n-gram precision counts how many n-grams (in your case unigrams, bigrams, trigrams, and four-grams for i =1 , ... , 4) match their n-gram counterpart in the reference translations. This term acts as a precision metric. Unigrams account for adequacy while longer n-grams account for fluency of the translation. To avoid overcounting, the n-gram counts are clipped to the maximal n-gram count occurring in the reference ($m_{n}^{ref}$). Typically precision shows exponential decay with the degree of the n-gram."
143
+ ]
144
+ },
145
+ {
146
+ "cell_type": "code",
147
+ "execution_count": 3,
148
+ "metadata": {},
149
+ "outputs": [
150
+ {
151
+ "data": {
152
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjcAAAGdCAYAAADuR1K7AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAseElEQVR4nO3de1BV9d7H8c8WZGMomJB4aYuUphTZMbBzwDyWF4qcTj3TxW6aCnMiyiTqlGTlrRN2I7pB+qjHPJVxyuwykrVPpWJkTyJOPWlXtU20kcAOoBYErOcPxz3PDlSWbNiwfL9m1kzrt35rre/2N46ffutmMwzDEAAAgEX08HcBAAAAvkS4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlhLo7wI6W3Nzs3788Uf16dNHNpvN3+UAAIA2MAxDdXV1GjRokHr0OPbczEkXbn788Uc5HA5/lwEAAE5AWVmZTj/99GP2OenCTZ8+fSQd/sMJDQ31czUAAKAtamtr5XA4PP+OH8tJF26OXIoKDQ0l3AAA0M205ZYSbigGAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACW4vdwk5eXp+joaAUHBysuLk5FRUXH7P/SSy/pvPPO0ymnnKKBAwdq5syZqq6u7qRqAQBAV+fXcFNQUKCMjAzNmzdPpaWlGjdunJKTk+VyuVrtv2XLFk2fPl0pKSn64osv9Oqrr+rTTz9VampqJ1cOAAC6Kr+Gm5ycHKWkpCg1NVUxMTHKzc2Vw+FQfn5+q/23bt2qoUOH6o477lB0dLQuvPBC3XLLLdq2bVsnVw4AALoqv4WbhoYGlZSUKCkpyas9KSlJxcXFre6TmJioH374QYWFhTIMQ/v27dNrr72mKVOmHPU89fX1qq2t9VoAAIB1BfrrxFVVVWpqalJkZKRXe2RkpCoqKlrdJzExUS+99JKmTp2qX3/9VY2NjfrLX/6iZ5555qjnyc7O1sKFC31a+7EMnbu+084Fb3uXHD3kAgBOHn6/odhms3mtG4bRou2InTt36o477tCDDz6okpISbdiwQXv27FFaWtpRj5+VlaWamhrPUlZW5tP6AQBA1+K3mZuIiAgFBAS0mKWprKxsMZtzRHZ2tsaOHau//e1vkqRRo0YpJCRE48aN00MPPaSBAwe22Mdut8tut/v+BwAAgC7JbzM3QUFBiouLk9Pp9Gp3Op1KTExsdZ9Dhw6pRw/vkgMCAiQdnvEBAADw62WpzMxMLV++XCtXrtSuXbt05513yuVyeS4zZWVlafr06Z7+l19+uV5//XXl5+dr9+7d+uijj3THHXfoggsu0KBBg/z1MwAAQBfit8tSkjR16lRVV1dr0aJFcrvdio2NVWFhoaKioiRJbrfb6503M2bMUF1dnZ599lnddddd6tu3ryZMmKBHHnnEXz8BAAB0MTbjJLueU1tbq7CwMNXU1Cg0NNTnx+dpKf/haSkAsC4z/377/WkpAAAAXyLcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAAS/F7uMnLy1N0dLSCg4MVFxenoqKio/adMWOGbDZbi+Wcc87pxIoBAEBX5tdwU1BQoIyMDM2bN0+lpaUaN26ckpOT5XK5Wu3/1FNPye12e5aysjL169dP11xzTSdXDgAAuiq/hpucnBylpKQoNTVVMTExys3NlcPhUH5+fqv9w8LCNGDAAM+ybds2/fzzz5o5c2YnVw4AALoqv4WbhoYGlZSUKCkpyas9KSlJxcXFbTrGihUrNGnSJEVFRR21T319vWpra70WAABgXYH+OnFVVZWampoUGRnp1R4ZGamKiorj7u92u/XOO+/o5ZdfPma/7OxsLVy4sF21ApI0dO56f5dw0tq7ZIq/SwDQjfj9hmKbzea1bhhGi7bWrFq1Sn379tWVV155zH5ZWVmqqanxLGVlZe0pFwAAdHF+m7mJiIhQQEBAi1maysrKFrM5v2cYhlauXKlp06YpKCjomH3tdrvsdnu76wUAAN2D32ZugoKCFBcXJ6fT6dXudDqVmJh4zH03bdqkb7/9VikpKR1ZIgAA6Ib8NnMjSZmZmZo2bZri4+OVkJCgZcuWyeVyKS0tTdLhS0rl5eVavXq1134rVqzQH//4R8XGxvqjbAAA0IX5NdxMnTpV1dXVWrRokdxut2JjY1VYWOh5+sntdrd4501NTY3Wrl2rp556yh8lAwCALs6v4UaS0tPTlZ6e3uq2VatWtWgLCwvToUOHOrgqAADQXfn9aSkAAABfItwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABL8Xu4ycvLU3R0tIKDgxUXF6eioqJj9q+vr9e8efMUFRUlu92uM888UytXruykagEAQFcX6M+TFxQUKCMjQ3l5eRo7dqyWLl2q5ORk7dy5U0OGDGl1n2uvvVb79u3TihUrNGzYMFVWVqqxsbGTKwcAAF2VX8NNTk6OUlJSlJqaKknKzc3Vu+++q/z8fGVnZ7fov2HDBm3atEm7d+9Wv379JElDhw7tzJIBAEAX57fLUg0NDSopKVFSUpJXe1JSkoqLi1vd56233lJ8fLweffRRDR48WGeddZbuvvtu/fLLL0c9T319vWpra70WAABgXX6buamqqlJTU5MiIyO92iMjI1VRUdHqPrt379aWLVsUHBysdevWqaqqSunp6dq/f/9R77vJzs7WwoULfV4/AGsYOne9v0s4ae1dMsXfJcCi/H5Dsc1m81o3DKNF2xHNzc2y2Wx66aWXdMEFF+iyyy5TTk6OVq1addTZm6ysLNXU1HiWsrIyn/8GAADQdfht5iYiIkIBAQEtZmkqKytbzOYcMXDgQA0ePFhhYWGetpiYGBmGoR9++EHDhw9vsY/dbpfdbvdt8QAAoMvy28xNUFCQ4uLi5HQ6vdqdTqcSExNb3Wfs2LH68ccfdeDAAU/b119/rR49euj000/v0HoBAED34NfLUpmZmVq+fLlWrlypXbt26c4775TL5VJaWpqkw5eUpk+f7ul/ww03KDw8XDNnztTOnTu1efNm/e1vf9OsWbPUq1cvf/0MAADQhfj1UfCpU6equrpaixYtktvtVmxsrAoLCxUVFSVJcrvdcrlcnv69e/eW0+nU7NmzFR8fr/DwcF177bV66KGH/PUTAABAF+PXcCNJ6enpSk9Pb3XbqlWrWrSNHDmyxaUsAACAI/z+tBQAAIAvEW4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClBJrd4eDBg1qyZInef/99VVZWqrm52Wv77t27fVYcAACAWabDTWpqqjZt2qRp06Zp4MCBstlsHVEXAADACTEdbt555x2tX79eY8eO7Yh6AAAA2sX0PTennnqq+vXr1xG1AAAAtJvpcLN48WI9+OCDOnToUEfUAwAA0C6mL0s98cQT+u677xQZGamhQ4eqZ8+eXtu3b9/us+IAAADMMh1urrzyyg4oAwAAwDdMh5v58+d3RB0AAAA+YTrcHFFSUqJdu3bJZrPp7LPP1ujRo31ZFwAAwAkxHW4qKyt13XXXaePGjerbt68Mw1BNTY0uvvhivfLKKzrttNM6ok4AAIA2Mf201OzZs1VbW6svvvhC+/fv188//6z//d//VW1tre64446OqBEAAKDNTM/cbNiwQf/+978VExPjaTv77LP13HPPKSkpyafFAQAAmGV65qa5ubnF49+S1LNnzxbfmQIAAOhspsPNhAkTNGfOHP3444+etvLyct15552aOHGiT4sDAAAwy3S4efbZZ1VXV6ehQ4fqzDPP1LBhwxQdHa26ujo988wzHVEjAABAm5m+58bhcGj79u1yOp368ssvZRiGzj77bE2aNKkj6gMAADDlhN9zM3nyZE2ePNmXtQAAALRbm8LN008/rb/+9a8KDg7W008/fcy+PA4OAAD8qU3h5sknn9SNN96o4OBgPfnkk0ftZ7PZTIebvLw8PfbYY3K73TrnnHOUm5urcePGtdp348aNuvjii1u079q1SyNHjjR1XgAAYE1tCjd79uxp9b/bq6CgQBkZGcrLy9PYsWO1dOlSJScna+fOnRoyZMhR9/vqq68UGhrqWeetyAAA4AjTT0v9XlNTk3bs2KGff/7Z9L45OTlKSUlRamqqYmJilJubK4fDofz8/GPu179/fw0YMMCzBAQEnGj5AADAYkyHm4yMDK1YsULS4WDz5z//Weeff74cDoc2btzY5uM0NDSopKSkxVuNk5KSVFxcfMx9R48erYEDB2rixIn68MMPj9m3vr5etbW1XgsAALAu009Lvfbaa7rpppskSW+//bb27t2rL7/8UqtXr9a8efP00Ucftek4VVVVampqUmRkpFd7ZGSkKioqWt1n4MCBWrZsmeLi4lRfX69//vOfmjhxojZu3Kg///nPre6TnZ2thQsXmviFAAArGDp3vb9LOGntXTLFr+c3HW6qqqo0YMAASVJhYaGuueYanXXWWUpJSTnuk1StsdlsXuuGYbRoO2LEiBEaMWKEZz0hIUFlZWV6/PHHjxpusrKylJmZ6Vmvra2Vw+EwXScAAOgeTF+WioyM1M6dO9XU1KQNGzZ4Xt536NAhU/e+REREKCAgoMUsTWVlZYvZnGP505/+pG+++eao2+12u0JDQ70WAABgXabDzcyZM3XttdcqNjZWNpvN8yK/Tz75xNTj2EFBQYqLi5PT6fRqdzqdSkxMbPNxSktLNXDgwDb3BwAA1mb6stSCBQsUGxursrIyXXPNNbLb7ZKkgIAAzZ0719SxMjMzNW3aNMXHxyshIUHLli2Ty+VSWlqapMOXlMrLy7V69WpJUm5uroYOHapzzjlHDQ0NevHFF7V27VqtXbvW7M8AAAAWdUKfX7j66qtbtN18882mjzN16lRVV1dr0aJFcrvdio2NVWFhoaKioiRJbrdbLpfL07+hoUF33323ysvL1atXL51zzjlav369LrvsshP5GQAAwIL8/vmF9PR0paent7pt1apVXuv33HOP7rnnHlPHBwAAJxe/f34BAADAl/z6+QUAAABfa/fnFwAAALoS0+Hm6quv1pIlS1q0P/bYY7rmmmt8UhQAAMCJMh1uNm3apClTWr5W+dJLL9XmzZt9UhQAAMCJMh1uDhw4oKCgoBbtPXv25KOUAADA70yHm9jYWBUUFLRof+WVV3T22Wf7pCgAAIATZfolfg888ICuuuoqfffdd5owYYIk6f3339eaNWv06quv+rxAAAAAM0yHm7/85S9644039PDDD+u1115Tr169NGrUKP373//W+PHjO6JGAACANjuhzy9MmTKl1ZuKAQAA/O2E3nPzn//8R8uXL9d9992n/fv3S5K2b9+u8vJynxYHAABglumZm88++0yTJk1SWFiY9u7dq9TUVPXr10/r1q3T999/7/mCNwAAgD+YnrnJzMzUjBkz9M033yg4ONjTnpyczHtuAACA35kON59++qluueWWFu2DBw9WRUWFT4oCAAA4UabDTXBwcKsv6/vqq6902mmn+aQoAACAE2U63FxxxRVatGiRfvvtN0mSzWaTy+XS3LlzddVVV/m8QAAAADNMh5vHH39cP/30k/r3769ffvlF48eP17Bhw9SnTx/9/e9/74gaAQAA2sz001KhoaHasmWLPvjgA23fvl3Nzc06//zzNWnSpI6oDwAAwBRT4aaxsVHBwcHasWOHJkyY4Pn8AgAAQFdh6rJUYGCgoqKi1NTU1FH1AAAAtIvpe27uv/9+ZWVled5MDAAA0JWYvufm6aef1rfffqtBgwYpKipKISEhXtu3b9/us+IAAADMMh1urrzyyg4oAwAAwDdMh5v58+d3RB0AAAA+YTrcHLFt2zbt2rVLNptNMTExiouL82VdAAAAJ8R0uPnhhx90/fXX66OPPlLfvn0lSf/5z3+UmJioNWvWyOFw+LpGAACANjP9tNSsWbP022+/adeuXdq/f7/279+vXbt2yTAMpaSkdESNAAAAbWZ65qaoqEjFxcUaMWKEp23EiBF65plnNHbsWJ8WBwAAYJbpmZshQ4Z4Ppr5/zU2Nmrw4ME+KQoAAOBEmQ43jz76qGbPnq1t27bJMAxJh28unjNnjh5//HGfFwgAAGCG6ctSM2bM0KFDh/THP/5RgYGHd29sbFRgYKBmzZqlWbNmefryFmMAANDZTIeb3NzcDigDAADAN0yHm5tvvrkj6gAAAPAJ0/fc+FpeXp6io6MVHBysuLg4FRUVtWm/jz76SIGBgfrDH/7QsQUCAIBuxa/hpqCgQBkZGZo3b55KS0s1btw4JScny+VyHXO/mpoaTZ8+XRMnTuykSgEAQHfh13CTk5OjlJQUpaamKiYmRrm5uXI4HMrPzz/mfrfccotuuOEGJSQkdFKlAACgu/BbuGloaFBJSYmSkpK82pOSklRcXHzU/f7xj3/ou+++a/MHPOvr61VbW+u1AAAA6/JbuKmqqlJTU5MiIyO92iMjI1VRUdHqPt98843mzp2rl156yfMY+vFkZ2crLCzMs/DtKwAArM3001K//vqrnnnmGX344YeqrKxUc3Oz1/bt27ebOp7NZvNaNwyjRZskNTU16YYbbtDChQt11llntfn4WVlZyszM9KzX1tYScAAAsDDT4WbWrFlyOp26+uqrdcEFF7QaRNoiIiJCAQEBLWZpKisrW8zmSFJdXZ22bdum0tJS3X777ZKk5uZmGYahwMBAvffee5owYUKL/ex2u+x2+wnVCAAAuh/T4Wb9+vUqLCxs90cyg4KCFBcXJ6fTqf/6r//ytDudTl1xxRUt+oeGhurzzz/3asvLy9MHH3yg1157TdHR0e2qBwAAWIPpcDN48GD16dPHJyfPzMzUtGnTFB8fr4SEBC1btkwul0tpaWmSDl9SKi8v1+rVq9WjRw/FxsZ67d+/f38FBwe3aAcAACcv0+HmiSee0L333qvnn39eUVFR7Tr51KlTVV1drUWLFsntdis2NlaFhYWe47rd7uO+8wYAAOD/Mx1u4uPj9euvv+qMM87QKaecop49e3ptN/uxzPT0dKWnp7e6bdWqVcfcd8GCBVqwYIGp8wEAAGszHW6uv/56lZeX6+GHH1ZkZOQJ31AMAADQEUyHm+LiYn388cc677zzOqIeAACAdjH9Er+RI0fql19+6YhaAAAA2s10uFmyZInuuusubdy4UdXV1XzaAAAAdCmmL0tdeumlktTii9xH3izc1NTkm8oAAABOgOlw8+GHH3ZEHQAAAD5hOtyMHz++I+oAAADwCdPh5ohDhw7J5XKpoaHBq33UqFHtLgoAAOBEmQ43P/30k2bOnKl33nmn1e3ccwMAAPzJ9NNSGRkZ+vnnn7V161b16tVLGzZs0AsvvKDhw4frrbfe6ogaAQAA2sz0zM0HH3ygN998U2PGjFGPHj0UFRWlyZMnKzQ0VNnZ2ZoyZUpH1AkAANAmpmduDh48qP79+0uS+vXrp59++kmSdO6552r79u2+rQ4AAMAk0+FmxIgR+uqrryRJf/jDH7R06VKVl5fr+eef18CBA31eIAAAgBmmL0tlZGTI7XZLkubPn69LLrlEL730koKCgo77FW8AAICOZjrc3HjjjZ7/Hj16tPbu3asvv/xSQ4YMUUREhE+LAwAAMMvUZanffvtNZ5xxhnbu3OlpO+WUU3T++ecTbAAAQJdgKtz07NlT9fX1stlsHVUPAABAu5i+oXj27Nl65JFH1NjY2BH1AAAAtIvpe24++eQTvf/++3rvvfd07rnnKiQkxGv766+/7rPiAAAAzDIdbvr27aurrrqqI2oBAABoN9Ph5h//+EdH1AEAAOATpu+5AQAA6MpMz9yMHj261aelbDabgoODNWzYMM2YMUMXX3yxTwoEAAAww/TMzaWXXqrdu3crJCREF198sS666CL17t1b3333ncaMGSO3261JkybpzTff7Ih6AQAAjsn0zE1VVZXuuusuPfDAA17tDz30kL7//nu99957mj9/vhYvXqwrrrjCZ4UCAAC0hemZm3/961+6/vrrW7Rfd911+te//iVJuv766z0f1wQAAOhMpsNNcHCwiouLW7QXFxcrODhYktTc3Cy73d7+6gAAAEwyfVlq9uzZSktLU0lJicaMGSObzab/+Z//0fLly3XfffdJkt59912NHj3a58UCAAAcj+lwc//99ys6OlrPPvus/vnPf0qSRowYof/+7//WDTfcIElKS0vTrbfe6ttKAQAA2sB0uJGkG2+8UTfeeONRt/fq1euECwIAAGiPdr3ELz09XVVVVb6qBQAAoN3aFW5efPFF1dbW+qoWAACAdmtXuDEMw1d1AAAA+ITfvy2Vl5en6OhoBQcHKy4uTkVFRUftu2XLFo0dO1bh4eHq1auXRo4cqSeffLITqwUAAF3dCd1QfERdXV27Tl5QUKCMjAzl5eVp7NixWrp0qZKTk7Vz504NGTKkRf+QkBDdfvvtGjVqlEJCQrRlyxbdcsstCgkJ0V//+td21QIAAKzBrzM3OTk5SklJUWpqqmJiYpSbmyuHw6H8/PxW+48ePVrXX3+9zjnnHA0dOlQ33XSTLrnkkmPO9gAAgJNLm8NNjx49FBAQcMwlMLDtE0ENDQ0qKSlRUlKSV3tSUlKrb0BuTWlpqYqLizV+/Pij9qmvr1dtba3XAgAArKvNaWTdunVH3VZcXKxnnnnG1A3GVVVVampqUmRkpFd7ZGSkKioqjrnv6aefrp9++kmNjY1asGCBUlNTj9o3OztbCxcubHNdAACge2tzuGntC99ffvmlsrKy9Pbbb+vGG2/U4sWLTRdgs9m81g3DaNH2e0VFRTpw4IC2bt2quXPnatiwYa1+zFOSsrKylJmZ6Vmvra2Vw+EwXScAAOgeTuiG4h9//FHz58/XCy+8oEsuuUQ7duxQbGysqWNEREQoICCgxSxNZWVli9mc34uOjpYknXvuudq3b58WLFhw1HBjt9v5iCcAACcRUzcU19TU6N5779WwYcP0xRdf6P3339fbb79tOthIUlBQkOLi4uR0Or3anU6nEhMT23wcwzBUX19v+vwAAMCa2jxz8+ijj+qRRx7RgAEDtGbNmlYvU5mVmZmpadOmKT4+XgkJCVq2bJlcLpfS0tIkHb6kVF5ertWrV0uSnnvuOQ0ZMkQjR46UdPi9N48//rhmz57d7loAAIA1tDnczJ07V7169dKwYcP0wgsv6IUXXmi13+uvv97mk0+dOlXV1dVatGiR3G63YmNjVVhYqKioKEmS2+2Wy+Xy9G9ublZWVpb27NmjwMBAnXnmmVqyZIluueWWNp8TAABYW5vDzfTp0497o++JSE9PV3p6eqvbVq1a5bU+e/ZsZmkAAMAxtTnc/D5oAAAAdEV+/7YUAACALxFuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApfg93OTl5Sk6OlrBwcGKi4tTUVHRUfu+/vrrmjx5sk477TSFhoYqISFB7777bidWCwAAujq/hpuCggJlZGRo3rx5Ki0t1bhx45ScnCyXy9Vq/82bN2vy5MkqLCxUSUmJLr74Yl1++eUqLS3t5MoBAEBX5ddwk5OTo5SUFKWmpiomJka5ublyOBzKz89vtX9ubq7uuecejRkzRsOHD9fDDz+s4cOH6+233+7kygEAQFflt3DT0NCgkpISJSUlebUnJSWpuLi4Tcdobm5WXV2d+vXrd9Q+9fX1qq2t9VoAAIB1+S3cVFVVqampSZGRkV7tkZGRqqioaNMxnnjiCR08eFDXXnvtUftkZ2crLCzMszgcjnbVDQAAuja/31Bss9m81g3DaNHWmjVr1mjBggUqKChQ//79j9ovKytLNTU1nqWsrKzdNQMAgK4r0F8njoiIUEBAQItZmsrKyhazOb9XUFCglJQUvfrqq5o0adIx+9rtdtnt9nbXCwAAuge/zdwEBQUpLi5OTqfTq93pdCoxMfGo+61Zs0YzZszQyy+/rClTpnR0mQAAoJvx28yNJGVmZmratGmKj49XQkKCli1bJpfLpbS0NEmHLymVl5dr9erVkg4Hm+nTp+upp57Sn/70J8+sT69evRQWFua33wEAALoOv4abqVOnqrq6WosWLZLb7VZsbKwKCwsVFRUlSXK73V7vvFm6dKkaGxt122236bbbbvO033zzzVq1alVnlw8AALogv4YbSUpPT1d6enqr234fWDZu3NjxBQEAgG7N709LAQAA+BLhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWIrfw01eXp6io6MVHBysuLg4FRUVHbWv2+3WDTfcoBEjRqhHjx7KyMjovEIBAEC34NdwU1BQoIyMDM2bN0+lpaUaN26ckpOT5XK5Wu1fX1+v0047TfPmzdN5553XydUCAIDuwK/hJicnRykpKUpNTVVMTIxyc3PlcDiUn5/fav+hQ4fqqaee0vTp0xUWFtbJ1QIAgO7Ab+GmoaFBJSUlSkpK8mpPSkpScXGxz85TX1+v2tparwUAAFiX38JNVVWVmpqaFBkZ6dUeGRmpiooKn50nOztbYWFhnsXhcPjs2AAAoOvx+w3FNpvNa90wjBZt7ZGVlaWamhrPUlZW5rNjAwCArifQXyeOiIhQQEBAi1maysrKFrM57WG322W32312PAAA0LX5beYmKChIcXFxcjqdXu1Op1OJiYl+qgoAAHR3fpu5kaTMzExNmzZN8fHxSkhI0LJly+RyuZSWlibp8CWl8vJyrV692rPPjh07JEkHDhzQTz/9pB07digoKEhnn322P34CAADoYvwabqZOnarq6motWrRIbrdbsbGxKiwsVFRUlKTDL+37/TtvRo8e7fnvkpISvfzyy4qKitLevXs7s3QAANBF+TXcSFJ6errS09Nb3bZq1aoWbYZhdHBFAACgO/P701IAAAC+RLgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACW4vdwk5eXp+joaAUHBysuLk5FRUXH7L9p0ybFxcUpODhYZ5xxhp5//vlOqhQAAHQHfg03BQUFysjI0Lx581RaWqpx48YpOTlZLper1f579uzRZZddpnHjxqm0tFT33Xef7rjjDq1du7aTKwcAAF2VX8NNTk6OUlJSlJqaqpiYGOXm5srhcCg/P7/V/s8//7yGDBmi3NxcxcTEKDU1VbNmzdLjjz/eyZUDAICuKtBfJ25oaFBJSYnmzp3r1Z6UlKTi4uJW9/n444+VlJTk1XbJJZdoxYoV+u2339SzZ88W+9TX16u+vt6zXlNTI0mqra1t709oVXP9oQ45Lo6vo8b0CMbWfzpybBlX/+HvrHV1xNgeOaZhGMft67dwU1VVpaamJkVGRnq1R0ZGqqKiotV9KioqWu3f2NioqqoqDRw4sMU+2dnZWrhwYYt2h8PRjurRFYXl+rsCdBTG1poYV+vqyLGtq6tTWFjYMfv4LdwcYbPZvNYNw2jRdrz+rbUfkZWVpczMTM96c3Oz9u/fr/Dw8GOe52RTW1srh8OhsrIyhYaG+rsc+BBja12MrTUxrq0zDEN1dXUaNGjQcfv6LdxEREQoICCgxSxNZWVli9mZIwYMGNBq/8DAQIWHh7e6j91ul91u92rr27fviRducaGhofxlsijG1roYW2tiXFs63ozNEX67oTgoKEhxcXFyOp1e7U6nU4mJia3uk5CQ0KL/e++9p/j4+FbvtwEAACcfvz4tlZmZqeXLl2vlypXatWuX7rzzTrlcLqWlpUk6fElp+vTpnv5paWn6/vvvlZmZqV27dmnlypVasWKF7r77bn/9BAAA0MX49Z6bqVOnqrq6WosWLZLb7VZsbKwKCwsVFRUlSXK73V7vvImOjlZhYaHuvPNOPffccxo0aJCefvppXXXVVf76CZZht9s1f/78Fpfw0P0xttbF2FoT49p+NqMtz1QBAAB0E37//AIAAIAvEW4AAIClEG4AAIClEG4AAIClEG66mc2bN+vyyy/XoEGDZLPZ9MYbb/i7JPhAdna2xowZoz59+qh///668sor9dVXX/m7LPhAfn6+Ro0a5XkhW0JCgt555x1/lwUfy87Ols1mU0ZGhr9LgQg33c7Bgwd13nnn6dlnn+3Q8/z2228denx427Rpk2677TZt3bpVTqdTjY2NSkpK0sGDB31+Lsa2c51++ulasmSJtm3bpm3btmnChAm64oor9MUXX/j0PIyr/3z66adatmyZRo0a1SHHZ2xPgIFuS5Kxbt264/bbtWuXMXbsWMNutxsxMTGG0+n02nfPnj2GJKOgoMAYP368YbfbjZUrVxpVVVXGddddZwwePNjo1auXERsba7z88stexx4/frxx++23G3PmzDH69u1r9O/f31i6dKlx4MABY8aMGUbv3r2NM844wygsLOyAPwHrqqysNCQZmzZtOmY/xrZ7OvXUU43ly5cfdTvj2n3U1dUZw4cPN5xOpzF+/Hhjzpw5x+zP2HYOwk031pZw09TUZIwYMcKYPHmysWPHDqOoqMi44IILWv3LNHToUGPt2rXG7t27jfLycuOHH34wHnvsMaO0tNT47rvvjKefftoICAgwtm7d6jn++PHjjT59+hiLFy82vv76a2Px4sVGjx49jOTkZGPZsmXG119/bdx6661GeHi4cfDgwQ7807CWb775xpBkfP7550ftw9h2P42NjcaaNWuMoKAg44svvmi1D+PavUyfPt3IyMgwDMM4brhhbDsP4aYba0u4eeedd4zAwEDD7XZ72o72fwq5ubnHPedll11m3HXXXZ718ePHGxdeeKFnvbGx0QgJCTGmTZvmaXO73YYk4+OPP27jLzu5NTc3G5dffrnXn2trGNvu47PPPjNCQkKMgIAAIywszFi/fv1R+zKu3ceaNWuM2NhY45dffjEM4/jhhrHtPNxzYyEPP/ywevfu7VlcLpe++uorORwODRgwwNPvggsuaHX/+Ph4r/Wmpib9/e9/16hRoxQeHq7evXvrvffe8/okhiSv68wBAQEKDw/Xueee62k78pX3ysrKdv/Gk8Htt9+uzz77TGvWrPG0Mbbd24gRI7Rjxw5t3bpVt956q26++Wbt3LmTce3GysrKNGfOHL344osKDg5usZ2x9S+/flsKvpWWlqZrr73Wsz5o0CAZhiGbzdam/UNCQrzWn3jiCT355JPKzc3Vueeeq5CQEGVkZKihocGr3++/yG6z2bzajpy/ubnZ1O85Gc2ePVtvvfWWNm/erNNPP93Tzth2b0FBQRo2bJikw/9offrpp3rqqaeUnZ3NuHZTJSUlqqysVFxcnKetqalJmzdv1rPPPqt9+/Yxtn5EuLGQfv36qV+/fl5tI0eOlMvl0r59+zyJ/dNPP23T8YqKinTFFVfopptuknT4L8M333yjmJgY3xYOGYah2bNna926ddq4caOio6O9tjO21mIYhurr6xnXbmzixIn6/PPPvdpmzpypkSNH6t5771V4eLjCw8O9tjO2nYdw080cOHBA3377rWd9z5492rFjh/r166chQ4a06D958mSdeeaZuvnmm/Xoo4+qrq5O8+bNk6Tj/h/EsGHDtHbtWhUXF+vUU09VTk6OKioq+MvUAW677Ta9/PLLevPNN9WnTx9VVFRIksLCwtSrV69W92Fsu4f77rtPycnJcjgcqqur0yuvvKKNGzdqw4YNrfZnXLuHPn36KDY21qstJCRE4eHhLdqPYGw7D/fcdDPbtm3T6NGjNXr0aElSZmamRo8erQcffLDV/gEBAXrjjTd04MABjRkzRqmpqbr//vslqdXrxP/fAw88oPPPP1+XXHKJLrroIg0YMEBXXnmlT38PDsvPz1dNTY0uuugiDRw40LMUFBQcdR/GtnvYt2+fpk2bphEjRmjixIn65JNPtGHDBk2ePLnV/oyrdTG2ncdmGIbh7yLQuT766CNdeOGF+vbbb3XmmWf6uxz4EGNrTYyrdTG2HYNwcxJYt26devfureHDh+vbb7/VnDlzdOqpp2rLli3+Lg3txNhaE+NqXYxt5+Cem5NAXV2d7rnnHpWVlSkiIkKTJk3SE0884e+y4AOMrTUxrtbF2HYOZm4AAIClcEMxAACwFMINAACwFMINAACwFMINAACwFMINAACwFMINAACwFMINAACwFMINAACwFMINAACwlP8DyN1FcQKWPaEAAAAASUVORK5CYII=",
153
+ "text/plain": [
154
+ "<Figure size 640x480 with 1 Axes>"
155
+ ]
156
+ },
157
+ "metadata": {},
158
+ "output_type": "display_data"
159
+ }
160
+ ],
161
+ "source": [
162
+ "# Mocked dataset showing the precision for different n-grams\n",
163
+ "data = {\"1-gram\": 0.8, \"2-gram\": 0.7, \"3-gram\": 0.6, \"4-gram\": 0.5}\n",
164
+ "\n",
165
+ "# Plot the datapoints defined above\n",
166
+ "fig, ax = plt.subplots(1)\n",
167
+ "bars = ax.bar(*zip(*data.items()))\n",
168
+ "ax.set(ylabel=\"N-gram precision\")\n",
169
+ "plt.show()"
170
+ ]
171
+ },
172
+ {
173
+ "cell_type": "markdown",
174
+ "metadata": {},
175
+ "source": [
176
+ "### N-gram BLEU score:\n",
177
+ "When the n-gram precision is normalized by the brevity penalty (BP), then the exponential decay of n-grams is almost fully compensated. The BLEU score corresponds to a geometric average of this modified n-gram precision."
178
+ ]
179
+ },
180
+ {
181
+ "cell_type": "code",
182
+ "execution_count": 4,
183
+ "metadata": {},
184
+ "outputs": [
185
+ {
186
+ "data": {
187
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjcAAAGdCAYAAADuR1K7AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAyCklEQVR4nO3de1RU9f7/8deIMpgKKiR5GZHUDCU9Bl3wkt2ksFVZrbJMTYVOSppIZZp1SutEV8TqgFKax2+ldMJON9Kmm5fMSsJuVlpakA0SWOClIGD//nA5vzMBOhsGB3bPx1p7Leczn8/e7/GzXL367JvNMAxDAAAAFtHG3wUAAAD4EuEGAABYCuEGAABYCuEGAABYCuEGAABYCuEGAABYCuEGAABYCuEGAABYSlt/F3C81dbW6qefflKnTp1ks9n8XQ4AAPCCYRjav3+/evTooTZtjr4285cLNz/99JMcDoe/ywAAAI1QVFSkXr16HbXPXy7cdOrUSdLhv5zg4GA/VwMAALxRUVEhh8Ph/u/40fzlws2RU1HBwcGEGwAAWhlvLinhgmIAAGAphBsAAGAphBsAAGAphBsAAGAphBsAAGAphBsAAGAphBsAAGAphBsAAGAphBsAAGAphBsAAGApfg83mZmZioyMVFBQkGJiYrRx48aj9n/uuec0ZMgQnXDCCerevbumTJmisrKy41QtAABo6fwabnJycpSSkqL58+eroKBAI0eOVEJCggoLC+vtv2nTJk2aNEmJiYn68ssv9Z///Ecff/yxkpKSjnPlAACgpfJruElPT1diYqKSkpIUFRWljIwMORwOZWVl1dt/y5Yt6tOnj2655RZFRkZqxIgRuummm7R169bjXDkAAGip/BZuqqqqlJ+fr/j4eI/2+Ph4bd68ud4xw4YN048//qi8vDwZhqG9e/fqxRdf1CWXXNLgcSorK1VRUeGxAQAA62rrrwOXlpaqpqZG4eHhHu3h4eEqLi6ud8ywYcP03HPPady4cfr9999VXV2tyy67TE888USDx0lLS9OCBQt8WvvR9Jn7+nE7Fjx9/2DDIRcA8Nfh9wuKbTabx2fDMOq0HbF9+3bdcsst+sc//qH8/HytXbtWu3fv1rRp0xrc/7x581ReXu7eioqKfFo/AABoWfy2chMWFqaAgIA6qzQlJSV1VnOOSEtL0/Dhw3X77bdLkgYPHqwOHTpo5MiRuv/++9W9e/c6Y+x2u+x2u+9/AAAAaJH8tnITGBiomJgYOZ1Oj3an06lhw4bVO+bQoUNq08az5ICAAEmHV3wAAAD8eloqNTVVTz/9tJYvX66vvvpKs2fPVmFhofs007x58zRp0iR3/0svvVRr1qxRVlaWdu3apffff1+33HKLzjzzTPXo0cNfPwMAALQgfjstJUnjxo1TWVmZFi5cKJfLpejoaOXl5SkiIkKS5HK5PJ55M3nyZO3fv19PPvmkbr31VnXu3Fnnn3++HnroIX/9BAAA0MLYjL/Y+ZyKigqFhISovLxcwcHBPt8/d0v5D3dLAYB1mfnvt9/vlgIAAPAlwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUvz7nBmhNuM3ff7jNH4AZrNwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABL4a3gAP7SeNu7//C2dzQXVm4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAICl+P3FmZmZmXrkkUfkcrk0aNAgZWRkaOTIkfX2nTx5sv7973/XaR84cKC+/PLL5i4VANCK8FJU//H3S1H9unKTk5OjlJQUzZ8/XwUFBRo5cqQSEhJUWFhYb//FixfL5XK5t6KiInXt2lVXX331ca4cAAC0VH4NN+np6UpMTFRSUpKioqKUkZEhh8OhrKysevuHhITopJNOcm9bt27VL7/8oilTphznygEAQEvlt3BTVVWl/Px8xcfHe7THx8dr8+bNXu1j2bJluvDCCxUREdFgn8rKSlVUVHhsAADAuvwWbkpLS1VTU6Pw8HCP9vDwcBUXFx9zvMvl0htvvKGkpKSj9ktLS1NISIh7czgcTaobAAC0bH6/W8pms3l8NgyjTlt9VqxYoc6dO2vs2LFH7Tdv3jyVl5e7t6KioqaUCwAAWji/3S0VFhamgICAOqs0JSUldVZz/swwDC1fvlwTJ05UYGDgUfva7XbZ7fYm1wsAAFoHv63cBAYGKiYmRk6n06Pd6XRq2LBhRx27fv16ffvtt0pMTGzOEgEAQCvk1+fcpKamauLEiYqNjVVcXJyys7NVWFioadOmSTp8SmnPnj1auXKlx7hly5bprLPOUnR0tD/KBgAALZhfw824ceNUVlamhQsXyuVyKTo6Wnl5ee67n1wuV51n3pSXlys3N1eLFy/2R8kAAKCF8/sTipOTk5WcnFzvdytWrKjTFhISokOHDjVzVQAAoLXy+91SAAAAvkS4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAltLW7ICamhqtWLFCb7/9tkpKSlRbW+vx/TvvvOOz4gAAAMwyHW5mzZqlFStW6JJLLlF0dLRsNltz1AUAANAopsPN6tWr9cILL2jMmDHNUQ8AAECTmL7mJjAwUP369WuOWgAAAJrMdLi59dZbtXjxYhmG0Rz1AAAANInp01KbNm3Su+++qzfeeEODBg1Su3btPL5fs2aNz4oDAAAwy/TKTefOnXXFFVdo1KhRCgsLU0hIiMdmVmZmpiIjIxUUFKSYmBht3LjxqP0rKys1f/58RUREyG63q2/fvlq+fLnp4wIAAGsyvXLzzDPP+OzgOTk5SklJUWZmpoYPH66lS5cqISFB27dvV+/evesdc80112jv3r1atmyZ+vXrp5KSElVXV/usJgAA0LqZDjdH/Pzzz/rmm29ks9l0yimn6MQTTzS9j/T0dCUmJiopKUmSlJGRoXXr1ikrK0tpaWl1+q9du1br16/Xrl271LVrV0lSnz59GvsTAACABZk+LXXw4EFNnTpV3bt31znnnKORI0eqR48eSkxM1KFDh7zeT1VVlfLz8xUfH+/RHh8fr82bN9c75pVXXlFsbKwefvhh9ezZU6eccopuu+02/fbbbw0ep7KyUhUVFR4bAACwLtPhJjU1VevXr9err76qX3/9Vb/++qtefvllrV+/XrfeeqvX+yktLVVNTY3Cw8M92sPDw1VcXFzvmF27dmnTpk364osv9NJLLykjI0Mvvviibr755gaPk5aW5nFNkMPh8LpGAADQ+pgON7m5uVq2bJkSEhIUHBys4OBgjRkzRk899ZRefPFF0wX8+QnHhmE0+NTj2tpa2Ww2PffcczrzzDM1ZswYpaena8WKFQ2u3sybN0/l5eXuraioyHSNAACg9TB9zc2hQ4fqrLZIUrdu3UydlgoLC1NAQECdVZqSkpJ69y9J3bt3V8+ePT3uyoqKipJhGPrxxx/Vv3//OmPsdrvsdrvXdQEAgNbN9MpNXFyc7rnnHv3+++/utt9++00LFixQXFyc1/sJDAxUTEyMnE6nR7vT6dSwYcPqHTN8+HD99NNPOnDggLttx44datOmjXr16mXylwAAACsyvXKzePFiXXzxxerVq5eGDBkim82mbdu2KSgoSOvWrTO1r9TUVE2cOFGxsbGKi4tTdna2CgsLNW3aNEmHTynt2bNHK1eulCSNHz9e9913n6ZMmaIFCxaotLRUt99+u6ZOnar27dub/SkAAMCCTIeb6Oho7dy5U88++6y+/vprGYaha6+9Vtdff73pgDFu3DiVlZVp4cKFcrlcio6OVl5eniIiIiRJLpdLhYWF7v4dO3aU0+nUzJkzFRsbq9DQUF1zzTW6//77zf4MAABgUY16zk379u114403+qSA5ORkJScn1/vdihUr6rSdeuqpdU5lAQAAHOFVuHnllVeUkJCgdu3a6ZVXXjlq38suu8wnhQEAADSGV+Fm7NixKi4uVrdu3TR27NgG+9lsNtXU1PiqNgAAANO8Cje1tbX1/hkAAKClMX0reH1+/fVXX+wGAACgyUyHm4ceekg5OTnuz1dffbW6du2qnj176tNPP/VpcQAAAGaZDjdLly51v5/J6XTqrbfe0tq1a5WQkKDbb7/d5wUCAACYYfpWcJfL5Q43r732mq655hrFx8erT58+Ouuss3xeIAAAgBmmV266dOnifvnk2rVrdeGFF0o6/MJL7pQCAAD+Znrl5sorr9T48ePVv39/lZWVKSEhQZK0bds29evXz+cFAgAAmGE63CxatEh9+vRRUVGRHn74YXXs2FHS4dNVDT1pGAAA4HgxHW7atWun2267rU57SkqKL+oBAABoEl6/AAAALIXXLwAAAEvh9QsAAMBSfPL6BQAAgJbCdLi55ZZb9Pjjj9dpf/LJJ7moGAAA+J3pcJObm6vhw4fXaR82bJhefPFFnxQFAADQWKbDTVlZmUJCQuq0BwcHq7S01CdFAQAANJbpcNOvXz+tXbu2Tvsbb7yhk08+2SdFAQAANJbph/ilpqZqxowZ+vnnn3X++edLkt5++2099thjysjI8HV9AAAAppgON1OnTlVlZaX++c9/6r777pMk9enTR1lZWZo0aZLPCwQAADDDdLiRpOnTp2v69On6+eef1b59e/f7pQAAAPytUc+5qa6u1ltvvaU1a9bIMAxJ0k8//aQDBw74tDgAAACzTK/c/PDDD7r44otVWFioyspKjR49Wp06ddLDDz+s33//XUuWLGmOOgEAALxieuVm1qxZio2N1S+//KL27du726+44gq9/fbbPi0OAADALNMrN5s2bdL777+vwMBAj/aIiAjt2bPHZ4UBAAA0humVm9ra2nrf/P3jjz+qU6dOPikKAACgsUyHm9GjR3s8z8Zms+nAgQO65557NGbMGF/WBgAAYJrp01Lp6ek6//zzNXDgQP3+++8aP368du7cqbCwMK1atao5agQAAPCa6XDTs2dPbdu2TatXr1Z+fr5qa2uVmJio66+/3uMCYwAAAH8wFW7++OMPDRgwQK+99pqmTJmiKVOmNFddAAAAjWLqmpt27dqpsrJSNputueoBAABoEtMXFM+cOVMPPfSQqqurm6MeAACAJjEdbj788EOtWbNGvXv31kUXXaQrr7zSYzMrMzNTkZGRCgoKUkxMjDZu3Nhg3/fee082m63O9vXXX5s+LgAAsCbTFxR37txZV111lU8OnpOTo5SUFGVmZmr48OFaunSpEhIStH37dvXu3bvBcd98842Cg4Pdn0888USf1AMAAFo/0+HmmWee8dnB09PTlZiYqKSkJElSRkaG1q1bp6ysLKWlpTU4rlu3burcubPP6gAAANbRqLeCS1JJSYk2btyoTZs2qaSkxPT4qqoq5efnKz4+3qM9Pj5emzdvPurYoUOHqnv37rrgggv07rvvHrVvZWWlKioqPDYAAGBdpsNNRUWFJk6cqJ49e2rUqFE655xz1LNnT02YMEHl5eVe76e0tFQ1NTUKDw/3aA8PD1dxcXG9Y7p3767s7Gzl5uZqzZo1GjBggC644AJt2LChweOkpaUpJCTEvTkcDq9rBAAArY/pcJOUlKQPP/xQr732mn799VeVl5frtdde09atW3XjjTeaLuDPt5UbhtHgreYDBgzQjTfeqNNPP11xcXHKzMzUJZdcokcffbTB/c+bN0/l5eXuraioyHSNAACg9TB9zc3rr7+udevWacSIEe62iy66SE899ZQuvvhir/cTFhamgICAOqs0JSUldVZzjubss8/Ws88+2+D3drtddrvd6/0BAIDWzfTKTWhoqEJCQuq0h4SEqEuXLl7vJzAwUDExMXI6nR7tTqdTw4YN83o/BQUF6t69u9f9AQCAtZleubnrrruUmpqqlStXukNFcXGxbr/9dt19992m9pWamqqJEycqNjZWcXFxys7OVmFhoaZNmybp8CmlPXv2aOXKlZIO303Vp08fDRo0SFVVVXr22WeVm5ur3Nxcsz8DAABYlOlwk5WVpW+//VYRERHuZ9EUFhbKbrfr559/1tKlS919P/nkk6Pua9y4cSorK9PChQvlcrkUHR2tvLw8RURESJJcLpcKCwvd/auqqnTbbbdpz549at++vQYNGqTXX39dY8aMMfszAACARZkON2PHjvVpAcnJyUpOTq73uxUrVnh8njNnjubMmePT4wMAAGsxHW7uueee5qgDAADAJxr9ED8AAICWiHADAAAshXADAAAshXADAAAshXADAAAsxfTdUoZh6MUXX9S7776rkpIS1dbWeny/Zs0anxUHAABglulwM2vWLGVnZ+u8885TeHh4gy+5BAAA8AfT4ebZZ5/VmjVreCowAABokUxfcxMSEqKTTz65OWoBAABoMtPh5t5779WCBQv022+/NUc9AAAATWL6tNTVV1+tVatWqVu3burTp4/atWvn8f2xXpYJAADQnEyHm8mTJys/P18TJkzggmIAANDimA43r7/+utatW6cRI0Y0Rz0AAABNYvqaG4fDoeDg4OaoBQAAoMlMh5vHHntMc+bM0ffff98M5QAAADSN6dNSEyZM0KFDh9S3b1+dcMIJdS4o3rdvn8+KAwAAMMt0uMnIyGiGMgAAAHzDdLi54YYbmqMOAAAAnzAdbv7Xb7/9pj/++MOjjYuNAQCAP5m+oPjgwYOaMWOGunXrpo4dO6pLly4eGwAAgD+ZDjdz5szRO++8o8zMTNntdj399NNasGCBevTooZUrVzZHjQAAAF4zfVrq1Vdf1cqVK3Xuuedq6tSpGjlypPr166eIiAg999xzuv7665ujTgAAAK+YXrnZt2+fIiMjJR2+vubIrd8jRozQhg0bfFsdAACASabDzcknn+x+gN/AgQP1wgsvSDq8otO5c2df1gYAAGCa6XAzZcoUffrpp5KkefPmua+9mT17tm6//XafFwgAAGCG6WtuZs+e7f7zeeedp6+//lpbt25V3759NWTIEJ8WBwAAYJaplZs//vhD5513nnbs2OFu6927t6688kqCDQAAaBFMhZt27drpiy++kM1ma656AAAAmsT0NTeTJk3SsmXLmqMWAACAJjN9zU1VVZWefvppOZ1OxcbGqkOHDh7fp6en+6w4AAAAs0yHmy+++EKnn366JHlceyOJ01UAAMDvTIebd999tznqAAAA8AnT19z4WmZmpiIjIxUUFKSYmBht3LjRq3Hvv/++2rZtq7/97W/NWyAAAGhVTK/cXHHFFfWefrLZbAoKClK/fv00fvx4DRgw4Jj7ysnJUUpKijIzMzV8+HAtXbpUCQkJ2r59u3r37t3guPLyck2aNEkXXHCB9u7da/YnAAAACzO9chMSEqJ33nlHn3zyiTvkFBQU6J133lF1dbVycnI0ZMgQvf/++8fcV3p6uhITE5WUlKSoqChlZGTI4XAoKyvrqONuuukmjR8/XnFxcWbLBwAAFmc63Jx00kkaP368du3apdzcXK1Zs0bfffedJkyYoL59++qrr77SDTfcoDvuuOOo+6mqqlJ+fr7i4+M92uPj47V58+YGxz3zzDP67rvvdM8993hVb2VlpSoqKjw2AABgXabDzbJly5SSkqI2bf7/0DZt2mjmzJnKzs6WzWbTjBkz9MUXXxx1P6WlpaqpqVF4eLhHe3h4uIqLi+sds3PnTs2dO1fPPfec2rb17oxaWlqaQkJC3JvD4fBqHAAAaJ1Mh5vq6mp9/fXXddq//vpr1dTUSJKCgoK8vi38z/0Mw6h3bE1NjcaPH68FCxbolFNO8breefPmqby83L0VFRV5PRYAALQ+pi8onjhxohITE3XnnXfqjDPOkM1m00cffaQHHnhAkyZNkiStX79egwYNOup+wsLCFBAQUGeVpqSkpM5qjiTt379fW7duVUFBgWbMmCFJqq2tlWEYatu2rd58802df/75dcbZ7XbZ7XazPxMAALRSpsPNokWLFB4erocffth9p1J4eLhmz57tvs4mPj5eF1988VH3ExgYqJiYGDmdTl1xxRXudqfTqcsvv7xO/+DgYH3++ecebZmZmXrnnXf04osvKjIy0uxPAQAAFmQ63AQEBGj+/PmaP3++++Lc4OBgjz5Hu437f6WmpmrixImKjY1VXFycsrOzVVhYqGnTpkk6fEppz549Wrlypdq0aaPo6GiP8d26dVNQUFCddgAA8NdlOtz8r8zMTHcQaYxx48aprKxMCxculMvlUnR0tPLy8hQRESFJcrlcKiwsbEqJAADgL6ZJTyh+4IEHtG/fviYVkJycrO+//16VlZXKz8/XOeec4/5uxYoVeu+99xoce++992rbtm1NOj4AALCWJoUbwzB8VQcAAIBP+P3dUgAAAL7UpGtutm/frh49eviqFgAAgCZrUrjhab8AAKCl8TrcREZGHvOpwzabTd99912TiwIAAGgsr8NNSkpKg999//33Wrp0qSorK31REwAAQKN5HW5mzZpVp23fvn267777lJWVpbPOOksPPfSQT4sDAAAwq1HX3Pz2229KT0/XI488oj59+mjNmjUaM2aMr2sDAAAwzVS4qamp0VNPPaUFCxYoKChITzzxhCZMmOD1G8ABAACam9fh5oUXXtBdd92l8vJy3XnnnZo+fboCAwObszYAAADTvA431157rdq3b6/rrrtOP/zwg+bOnVtvv/T0dJ8VBwAAYJbX4eacc8455q3enJ4CAAD+5nW4OdoLLAEAAFoK3i0FAAAshXADAAAshXADAAAshXADAAAshXADAAAsxau7pT777DOvdzh48OBGFwMAANBUXoWbv/3tb7LZbDIM45jPsqmpqfFJYQAAAI3h1Wmp3bt3a9euXdq9e7dyc3MVGRmpzMxMFRQUqKCgQJmZmerbt69yc3Obu14AAICj8mrlJiIiwv3nq6++Wo8//rjHW8AHDx4sh8Ohu+++W2PHjvV5kQAAAN4yfUHx559/rsjIyDrtkZGR2r59u0+KAgAAaCzT4SYqKkr333+/fv/9d3dbZWWl7r//fkVFRfm0OAAAALO8frfUEUuWLNGll14qh8OhIUOGSJI+/fRT2Ww2vfbaaz4vEAAAwAzT4ebMM8/U7t279eyzz+rrr7+WYRgaN26cxo8frw4dOjRHjQAAAF4zHW4k6YQTTtDf//53X9cCAADQZI16QvH//d//acSIEerRo4d++OEHSdKiRYv08ssv+7Q4AAAAs0yHm6ysLKWmpiohIUG//PKL+6F9Xbp0UUZGhq/rAwAAMMV0uHniiSf01FNPaf78+Wrb9v+f1YqNjdXnn3/u0+IAAADMMh1udu/eraFDh9Zpt9vtOnjwoE+KAgAAaCzT4SYyMlLbtm2r0/7GG29o4MCBvqgJAACg0UzfLXX77bfr5ptv1u+//y7DMPTRRx9p1apVSktL09NPP90cNQIAAHjN9MrNlClTdM8992jOnDk6dOiQxo8fryVLlmjx4sW69tprTReQmZmpyMhIBQUFKSYmRhs3bmyw76ZNmzR8+HCFhoaqffv2OvXUU7Vo0SLTxwQAANbVqOfc3HjjjbrxxhtVWlqq2tpadevWrVEHz8nJUUpKijIzMzV8+HAtXbpUCQkJ2r59u3r37l2nf4cOHTRjxgwNHjxYHTp00KZNm3TTTTepQ4cOPHcHAABIauRzbo4ICwtrdLCRpPT0dCUmJiopKUlRUVHKyMiQw+FQVlZWvf2HDh2q6667ToMGDVKfPn00YcIEXXTRRUdd7QEAAH8tXq3cnH766Xr77bfVpUsXDR06VDabrcG+n3zyiVcHrqqqUn5+vubOnevRHh8fr82bN3u1j4KCAm3evFn3339/g30qKytVWVnp/lxRUeHVvgEAQOvkVbi5/PLLZbfbJUljx471yYFLS0tVU1Oj8PBwj/bw8HAVFxcfdWyvXr30888/q7q6Wvfee6+SkpIa7JuWlqYFCxb4pGYAANDyeRVuunTpojZtDp/BmjJlinr16uX+3FR/XgUyDOOoK0OStHHjRh04cEBbtmzR3Llz1a9fP1133XX19p03b55SU1PdnysqKuRwOJpeOAAAaJG8Cjepqam69tprFRQUpMjISLlcriZdayMdvl4nICCgzipNSUlJndWcP4uMjJQknXbaadq7d6/uvffeBsON3W53rzoBAADr82r5pUePHsrNzdUPP/wgwzD0448/qrCwsN7NW4GBgYqJiZHT6fRodzqdGjZsmNf7MQzD45oaAADw1+bVys1dd92lmTNnasaMGbLZbDrjjDPq9DlyOunIizS9kZqaqokTJyo2NlZxcXHKzs5WYWGhpk2bJunwKaU9e/Zo5cqVkqR//etf6t27t0499VRJh5978+ijj2rmzJleHxMAAFibV+Hm73//u6677jr98MMPGjx4sN566y2FhoY2+eDjxo1TWVmZFi5cKJfLpejoaOXl5SkiIkKS5HK5PFaDamtrNW/ePO3evVtt27ZV37599eCDD+qmm25qci0AAMAavH6IX6dOnRQdHa1nnnlGw4cP99l1LMnJyUpOTq73uxUrVnh8njlzJqs0AADgqEw/ofiGG25ojjoAAAB8wqtw07VrV+3YsUNhYWHq0qXLUW/V3rdvn8+KAwAAMMurcLNo0SJ16tTJ/edjPYcGAADAX7wKN/97Kmry5MnNVQsAAECTeRVuzLyPKTg4uNHFAAAANJVX4aZz585en4oy85wbAAAAX/Mq3Lz77rvuP3///feaO3euJk+erLi4OEnSBx98oH//+99KS0trnioBAAC85FW4GTVqlPvPCxcuVHp6use7nC677DKddtppys7O5lZxAADgV6Zf7f3BBx8oNja2TntsbKw++ugjnxQFAADQWKbDjcPh0JIlS+q0L126VA6HwydFAQAANJbpJxQvWrRIV111ldatW6ezzz5bkrRlyxZ99913ys3N9XmBAAAAZpheuRkzZox27typyy67TPv27VNZWZkuv/xy7dixQ2PGjGmOGgEAALxmeuVGknr16qUHHnjA17UAAAA0WaPCza+//qply5bpq6++ks1m08CBAzV16lSFhIT4uj4AAABTTJ+W2rp1q/r27atFixZp3759Ki0tVXp6uvr27atPPvmkOWoEAADwmumVm9mzZ+uyyy7TU089pbZtDw+vrq5WUlKSUlJStGHDBp8XCQAA4C3T4Wbr1q0ewUaS2rZtqzlz5tT7/BsAAIDjyfRpqeDgYBUWFtZpLyoqUqdOnXxSFAAAQGOZDjfjxo1TYmKicnJyVFRUpB9//FGrV69WUlKSxysZAAAA/MH0aalHH31UNptNkyZNUnV1tSSpXbt2mj59uh588EGfFwgAAGCG6XATGBioxYsXKy0tTd99950Mw1C/fv10wgknNEd9AAAApjTqOTeSdMIJJ+i0007zZS0AAABN5nW4mTp1qlf9li9f3uhiAAAAmsrrcLNixQpFRERo6NChMgyjOWsCAABoNK/DzbRp07R69Wrt2rVLU6dO1YQJE9S1a9fmrA0AAMA0r28Fz8zMlMvl0h133KFXX31VDodD11xzjdatW8dKDgAAaDFMPefGbrfruuuuk9Pp1Pbt2zVo0CAlJycrIiJCBw4caK4aAQAAvGb6IX5H2Gw22Ww2GYah2tpaX9YEAADQaKbCTWVlpVatWqXRo0drwIAB+vzzz/Xkk0+qsLBQHTt2bK4aAQAAvOb1BcXJyclavXq1evfurSlTpmj16tUKDQ1tztoAAABM8zrcLFmyRL1791ZkZKTWr1+v9evX19tvzZo1PisOAADALK/DzaRJk2Sz2ZqzFgAAgCYz9RA/AACAlq7Rd0v5SmZmpiIjIxUUFKSYmBht3Lixwb5r1qzR6NGjdeKJJyo4OFhxcXFat27dcawWAAC0dH4NNzk5OUpJSdH8+fNVUFCgkSNHKiEhQYWFhfX237Bhg0aPHq28vDzl5+frvPPO06WXXqqCgoLjXDkAAGip/Bpu0tPTlZiYqKSkJEVFRSkjI0MOh0NZWVn19s/IyNCcOXN0xhlnqH///nrggQfUv39/vfrqq8e5cgAA0FL5LdxUVVUpPz9f8fHxHu3x8fHavHmzV/uora3V/v37j/qOq8rKSlVUVHhsAADAuvwWbkpLS1VTU6Pw8HCP9vDwcBUXF3u1j8cee0wHDx7UNddc02CftLQ0hYSEuDeHw9GkugEAQMvm9wuK/3x7uWEYXt1yvmrVKt17773KyclRt27dGuw3b948lZeXu7eioqIm1wwAAFour28F97WwsDAFBATUWaUpKSmps5rzZzk5OUpMTNR//vMfXXjhhUfta7fbZbfbm1wvAABoHfy2chMYGKiYmBg5nU6PdqfTqWHDhjU4btWqVZo8ebKef/55XXLJJc1dJgAAaGX8tnIjSampqZo4caJiY2MVFxen7OxsFRYWatq0aZIOn1Las2ePVq5cKelwsJk0aZIWL16ss88+273q0759e4WEhPjtdwAAgJbDr+Fm3LhxKisr08KFC+VyuRQdHa28vDxFRERIklwul8czb5YuXarq6mrdfPPNuvnmm93tN9xwA09QBgAAkvwcbqTDbxtPTk6u97s/B5b33nuv+QsCAACtmt/vlgIAAPAlwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUv4ebzMxMRUZGKigoSDExMdq4cWODfV0ul8aPH68BAwaoTZs2SklJOX6FAgCAVsGv4SYnJ0cpKSmaP3++CgoKNHLkSCUkJKiwsLDe/pWVlTrxxBM1f/58DRky5DhXCwAAWgO/hpv09HQlJiYqKSlJUVFRysjIkMPhUFZWVr39+/Tpo8WLF2vSpEkKCQk5ztUCAIDWwG/hpqqqSvn5+YqPj/doj4+P1+bNm312nMrKSlVUVHhsAADAuvwWbkpLS1VTU6Pw8HCP9vDwcBUXF/vsOGlpaQoJCXFvDofDZ/sGAAAtj98vKLbZbB6fDcOo09YU8+bNU3l5uXsrKiry2b4BAEDL09ZfBw4LC1NAQECdVZqSkpI6qzlNYbfbZbfbfbY/AADQsvlt5SYwMFAxMTFyOp0e7U6nU8OGDfNTVQAAoLXz28qNJKWmpmrixImKjY1VXFycsrOzVVhYqGnTpkk6fEppz549WrlypXvMtm3bJEkHDhzQzz//rG3btikwMFADBw70x08AAAAtjF/Dzbhx41RWVqaFCxfK5XIpOjpaeXl5ioiIkHT4oX1/fubN0KFD3X/Oz8/X888/r4iICH3//ffHs3QAANBC+TXcSFJycrKSk5Pr/W7FihV12gzDaOaKAABAa+b3u6UAAAB8iXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAsxe/hJjMzU5GRkQoKClJMTIw2btx41P7r169XTEyMgoKCdPLJJ2vJkiXHqVIAANAa+DXc5OTkKCUlRfPnz1dBQYFGjhyphIQEFRYW1tt/9+7dGjNmjEaOHKmCggLdeeeduuWWW5Sbm3ucKwcAAC2VX8NNenq6EhMTlZSUpKioKGVkZMjhcCgrK6ve/kuWLFHv3r2VkZGhqKgoJSUlaerUqXr00UePc+UAAKClauuvA1dVVSk/P19z5871aI+Pj9fmzZvrHfPBBx8oPj7eo+2iiy7SsmXL9Mcff6hdu3Z1xlRWVqqystL9uby8XJJUUVHR1J9Qr9rKQ82yXxxbc83pEcyt/zTn3DKv/sO/Wetqjrk9sk/DMI7Z12/hprS0VDU1NQoPD/doDw8PV3Fxcb1jiouL6+1fXV2t0tJSde/evc6YtLQ0LViwoE67w+FoQvVoiUIy/F0Bmgtza03Mq3U159zu379fISEhR+3jt3BzhM1m8/hsGEadtmP1r6/9iHnz5ik1NdX9uba2Vvv27VNoaOhRj/NXU1FRIYfDoaKiIgUHB/u7HPgQc2tdzK01Ma/1MwxD+/fvV48ePY7Z12/hJiwsTAEBAXVWaUpKSuqszhxx0kkn1du/bdu2Cg0NrXeM3W6X3W73aOvcuXPjC7e44OBg/jFZFHNrXcytNTGvdR1rxeYIv11QHBgYqJiYGDmdTo92p9OpYcOG1TsmLi6uTv8333xTsbGx9V5vAwAA/nr8erdUamqqnn76aS1fvlxfffWVZs+ercLCQk2bNk3S4VNKkyZNcvefNm2afvjhB6Wmpuqrr77S8uXLtWzZMt12223++gkAAKCF8es1N+PGjVNZWZkWLlwol8ul6Oho5eXlKSIiQpLkcrk8nnkTGRmpvLw8zZ49W//617/Uo0cPPf7447rqqqv89RMsw26365577qlzCg+tH3NrXcytNTGvTWczvLmnCgAAoJXw++sXAAAAfIlwAwAALIVwAwAALIVwAwAALIVw08ps2LBBl156qXr06CGbzab//ve//i4JPpCWlqYzzjhDnTp1Urdu3TR27Fh98803/i4LPpCVlaXBgwe7H8gWFxenN954w99lwcfS0tJks9mUkpLi71Igwk2rc/DgQQ0ZMkRPPvlksx7njz/+aNb9w9P69et18803a8uWLXI6naqurlZ8fLwOHjzo82Mxt8dXr1699OCDD2rr1q3aunWrzj//fF1++eX68ssvfXoc5tV/Pv74Y2VnZ2vw4MHNsn/mthEMtFqSjJdeeumY/b766itj+PDhht1uN6Kiogyn0+kxdvfu3YYkIycnxxg1apRht9uN5cuXG6Wlpca1115r9OzZ02jfvr0RHR1tPP/88x77HjVqlDFjxgxj1qxZRufOnY1u3boZS5cuNQ4cOGBMnjzZ6Nixo3HyyScbeXl5zfA3YF0lJSWGJGP9+vVH7cfctk5dunQxnn766Qa/Z15bj/379xv9+/c3nE6nMWrUKGPWrFlH7c/cHh+Em1bMm3BTU1NjDBgwwBg9erSxbds2Y+PGjcaZZ55Z7z+mPn36GLm5ucauXbuMPXv2GD/++KPxyCOPGAUFBcZ3331nPP7440ZAQICxZcsW9/5HjRpldOrUybjvvvuMHTt2GPfdd5/Rpk0bIyEhwcjOzjZ27NhhTJ8+3QgNDTUOHjzYjH8b1rJz505DkvH555832Ie5bX2qq6uNVatWGYGBgcaXX35Zbx/mtXWZNGmSkZKSYhiGccxww9weP4SbVsybcPPGG28Ybdu2NVwul7utof9TyMjIOOYxx4wZY9x6663uz6NGjTJGjBjh/lxdXW106NDBmDhxorvN5XIZkowPPvjAy1/211ZbW2tceumlHn+v9WFuW4/PPvvM6NChgxEQEGCEhIQYr7/+eoN9mdfWY9WqVUZ0dLTx22+/GYZx7HDD3B4/XHNjIQ888IA6duzo3goLC/XNN9/I4XDopJNOcvc788wz6x0fGxvr8bmmpkb//Oc/NXjwYIWGhqpjx4568803PV6JIcnjPHNAQIBCQ0N12mmnuduOvOW9pKSkyb/xr2DGjBn67LPPtGrVKncbc9u6DRgwQNu2bdOWLVs0ffp03XDDDdq+fTvz2ooVFRVp1qxZevbZZxUUFFTne+bWv/z6bin41rRp03TNNde4P/fo0UOGYchms3k1vkOHDh6fH3vsMS1atEgZGRk67bTT1KFDB6WkpKiqqsqj35/fyG6z2Tzajhy/trbW1O/5K5o5c6ZeeeUVbdiwQb169XK3M7etW2BgoPr16yfp8H+0Pv74Yy1evFhpaWnMayuVn5+vkpISxcTEuNtqamq0YcMGPfnkk9q7dy9z60eEGwvp2rWrunbt6tF26qmnqrCwUHv37nUn9o8//tir/W3cuFGXX365JkyYIOnwP4adO3cqKirKt4VDhmFo5syZeumll/Tee+8pMjLS43vm1loMw1BlZSXz2opdcMEF+vzzzz3apkyZolNPPVV33HGHQkNDFRoa6vE9c3v8EG5amQMHDujbb791f969e7e2bdumrl27qnfv3nX6jx49Wn379tUNN9yghx9+WPv379f8+fMl6Zj/B9GvXz/l5uZq8+bN6tKli9LT01VcXMw/pmZw88036/nnn9fLL7+sTp06qbi4WJIUEhKi9u3b1zuGuW0d7rzzTiUkJMjhcGj//v1avXq13nvvPa1du7be/sxr69CpUydFR0d7tHXo0EGhoaF12o9gbo8frrlpZbZu3aqhQ4dq6NChkqTU1FQNHTpU//jHP+rtHxAQoP/+9786cOCAzjjjDCUlJemuu+6SpHrPE/+vu+++W6effrouuuginXvuuTrppJM0duxYn/4eHJaVlaXy8nKde+656t69u3vLyclpcAxz2zrs3btXEydO1IABA3TBBRfoww8/1Nq1azV69Oh6+zOv1sXcHj82wzAMfxeB4+v999/XiBEj9O2336pv377+Lgc+xNxaE/NqXcxt8yDc/AW89NJL6tixo/r3769vv/1Ws2bNUpcuXbRp0yZ/l4YmYm6tiXm1Lub2+OCam7+A/fv3a86cOSoqKlJYWJguvPBCPfbYY/4uCz7A3FoT82pdzO3xwcoNAACwFC4oBgAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlvL/AEGIHigLKKERAAAAAElFTkSuQmCC",
188
+ "text/plain": [
189
+ "<Figure size 640x480 with 1 Axes>"
190
+ ]
191
+ },
192
+ "metadata": {},
193
+ "output_type": "display_data"
194
+ }
195
+ ],
196
+ "source": [
197
+ "# Mocked dataset showing the precision multiplied by the BP for different n-grams\n",
198
+ "data = {\"1-gram\": 0.8, \"2-gram\": 0.77, \"3-gram\": 0.74, \"4-gram\": 0.71}\n",
199
+ "\n",
200
+ "# Plot the datapoints defined above\n",
201
+ "fig, ax = plt.subplots(1)\n",
202
+ "bars = ax.bar(*zip(*data.items()))\n",
203
+ "ax.set(ylabel=\"Modified N-gram precision\")\n",
204
+ "plt.show()"
205
+ ]
206
+ },
207
+ {
208
+ "cell_type": "markdown",
209
+ "metadata": {},
210
+ "source": [
211
+ "# 3. Example Calculations of the BLEU score\n",
212
+ "\n",
213
+ "In this example you will have a reference sentence and 2 candidate sentences. You will tokenize all sentences using the NLTK package. Then you will compare the two candidates to the reference using BLEU score.\n",
214
+ "\n",
215
+ "First you define and tokenize the sentences."
216
+ ]
217
+ },
218
+ {
219
+ "cell_type": "code",
220
+ "execution_count": 16,
221
+ "metadata": {
222
+ "tags": []
223
+ },
224
+ "outputs": [
225
+ {
226
+ "name": "stdout",
227
+ "output_type": "stream",
228
+ "text": [
229
+ "The NASA Opportunity rover is battling a massive dust storm on planet Mars. -> ['the', 'nasa', 'opportunity', 'rover', 'is', 'battling', 'a', 'massive', 'dust', 'storm', 'on', 'planet', 'mars', '.']\n",
230
+ "\n",
231
+ "\n",
232
+ "The Opportunity rover is combating a big sandstorm on planet Mars. -> ['the', 'opportunity', 'rover', 'is', 'combating', 'a', 'big', 'sandstorm', 'on', 'planet', 'mars', '.']\n",
233
+ "\n",
234
+ "\n",
235
+ "A NASA rover is fighting a massive storm on planet Mars. -> ['a', 'nasa', 'rover', 'is', 'fighting', 'a', 'massive', 'storm', 'on', 'planet', 'mars', '.']\n"
236
+ ]
237
+ }
238
+ ],
239
+ "source": [
240
+ "reference = \"The NASA Opportunity rover is battling a massive dust storm on planet Mars.\"\n",
241
+ "candidate_1 = \"The Opportunity rover is combating a big sandstorm on planet Mars.\"\n",
242
+ "candidate_2 = \"A NASA rover is fighting a massive storm on planet Mars.\"\n",
243
+ "\n",
244
+ "tokenized_ref = nltk.word_tokenize(reference.lower())\n",
245
+ "tokenized_cand_1 = nltk.word_tokenize(candidate_1.lower())\n",
246
+ "tokenized_cand_2 = nltk.word_tokenize(candidate_2.lower())\n",
247
+ "\n",
248
+ "print(f\"{reference} -> {tokenized_ref}\")\n",
249
+ "print(\"\\n\")\n",
250
+ "print(f\"{candidate_1} -> {tokenized_cand_1}\")\n",
251
+ "print(\"\\n\")\n",
252
+ "print(f\"{candidate_2} -> {tokenized_cand_2}\")"
253
+ ]
254
+ },
255
+ {
256
+ "cell_type": "markdown",
257
+ "metadata": {},
258
+ "source": [
259
+ "## 3.1 Define the functions to calculate the BLEU score\n",
260
+ "\n",
261
+ "### Computing the Brevity Penalty\n",
262
+ "You will start by defining the function for brevity penalty according to the equation (2) in section 2.1."
263
+ ]
264
+ },
265
+ {
266
+ "cell_type": "code",
267
+ "execution_count": 6,
268
+ "metadata": {},
269
+ "outputs": [],
270
+ "source": [
271
+ "def brevity_penalty(candidate, reference):\n",
272
+ " \"\"\"\n",
273
+ " Calculates the brevity penalty given the candidate and reference sentences.\n",
274
+ " \"\"\"\n",
275
+ " reference_length = len(reference)\n",
276
+ " candidate_length = len(candidate)\n",
277
+ "\n",
278
+ " if reference_length < candidate_length:\n",
279
+ " BP = 1\n",
280
+ " else:\n",
281
+ " penalty = 1 - (reference_length / candidate_length)\n",
282
+ " BP = np.exp(penalty)\n",
283
+ "\n",
284
+ " return BP"
285
+ ]
286
+ },
287
+ {
288
+ "cell_type": "markdown",
289
+ "metadata": {},
290
+ "source": [
291
+ "### Computing the clipped Precision\n",
292
+ "Next, you need to define a function to calculate the geometrically averaged clipped precision. This function calculates how many of the n-grams in the candidate sentence actually appear in the reference sentence. The clipping takes care of overcounting. For example if a certain n-gram appears five times in the candidate sentence, but only twice in the reference, the value is clipped to two."
293
+ ]
294
+ },
295
+ {
296
+ "cell_type": "code",
297
+ "execution_count": 17,
298
+ "metadata": {},
299
+ "outputs": [],
300
+ "source": [
301
+ "def average_clipped_precision(candidate, reference):\n",
302
+ " \"\"\"\n",
303
+ " Calculates the precision given the candidate and reference sentences.\n",
304
+ " \"\"\"\n",
305
+ "\n",
306
+ " clipped_precision_score = []\n",
307
+ " \n",
308
+ " # Loop through values 1, 2, 3, 4. This is the length of n-grams\n",
309
+ " for n_gram_length in range(1, 5):\n",
310
+ " reference_n_gram_counts = Counter(ngrams(reference, n_gram_length)) \n",
311
+ " candidate_n_gram_counts = Counter(ngrams(candidate, n_gram_length)) \n",
312
+ "\n",
313
+ " total_candidate_ngrams = sum(candidate_n_gram_counts.values()) \n",
314
+ " \n",
315
+ " for ngram in candidate_n_gram_counts: \n",
316
+ " # check if it is in the reference n-gram\n",
317
+ " if ngram in reference_n_gram_counts:\n",
318
+ " # if the count of the candidate n-gram is bigger than the corresponding\n",
319
+ " # count in the reference n-gram, then set the count of the candidate n-gram \n",
320
+ " # to be equal to the reference n-gram\n",
321
+ " \n",
322
+ " if candidate_n_gram_counts[ngram] > reference_n_gram_counts[ngram]: \n",
323
+ " candidate_n_gram_counts[ngram] = reference_n_gram_counts[ngram] # t\n",
324
+ " \n",
325
+ " else:\n",
326
+ " candidate_n_gram_counts[ngram] = 0 # else set the candidate n-gram equal to zero\n",
327
+ "\n",
328
+ " clipped_candidate_ngrams = sum(candidate_n_gram_counts.values())\n",
329
+ " \n",
330
+ " clipped_precision_score.append(clipped_candidate_ngrams / total_candidate_ngrams)\n",
331
+ " \n",
332
+ " # Calculate the geometric average: take the mean of elemntwise log, then exponentiate\n",
333
+ " # This is equivalent to taking the n-th root of the product as shown in equation (1) above\n",
334
+ " s = np.exp(np.mean(np.log(clipped_precision_score)))\n",
335
+ " \n",
336
+ " return s\n"
337
+ ]
338
+ },
339
+ {
340
+ "cell_type": "markdown",
341
+ "metadata": {},
342
+ "source": [
343
+ "### Computing the BLEU score\n",
344
+ "Finally, you can compute the BLEU score using the above two functions."
345
+ ]
346
+ },
347
+ {
348
+ "cell_type": "code",
349
+ "execution_count": 18,
350
+ "metadata": {},
351
+ "outputs": [],
352
+ "source": [
353
+ "def bleu_score(candidate, reference):\n",
354
+ " BP = brevity_penalty(candidate, reference) \n",
355
+ " geometric_average_precision = average_clipped_precision(candidate, reference) \n",
356
+ " return BP * geometric_average_precision"
357
+ ]
358
+ },
359
+ {
360
+ "cell_type": "markdown",
361
+ "metadata": {},
362
+ "source": [
363
+ "## 3.2 Testing the functions\n",
364
+ "Now you can test the functions with your Example Reference and Candidates Sentences."
365
+ ]
366
+ },
367
+ {
368
+ "cell_type": "code",
369
+ "execution_count": 19,
370
+ "metadata": {
371
+ "tags": []
372
+ },
373
+ "outputs": [
374
+ {
375
+ "name": "stdout",
376
+ "output_type": "stream",
377
+ "text": [
378
+ "BLEU score of reference versus candidate 1: 27.6\n",
379
+ "BLEU score of reference versus candidate 2: 35.3\n"
380
+ ]
381
+ }
382
+ ],
383
+ "source": [
384
+ "result_candidate_1 = round(bleu_score(tokenized_cand_1, tokenized_ref) * 100, 1)\n",
385
+ "print(f\"BLEU score of reference versus candidate 1: {result_candidate_1}\")\n",
386
+ "result_candidate_2 = round(bleu_score(tokenized_cand_2, tokenized_ref) * 100, 1)\n",
387
+ "print(f\"BLEU score of reference versus candidate 2: {result_candidate_2}\")"
388
+ ]
389
+ },
390
+ {
391
+ "cell_type": "markdown",
392
+ "metadata": {},
393
+ "source": [
394
+ "## 3.3 Comparing the Results from your Code with the Sacrebleu Library\n",
395
+ "Below you will do the same calculation, but using the `sacrebleu` library. Compare them with your implementation above."
396
+ ]
397
+ },
398
+ {
399
+ "cell_type": "code",
400
+ "execution_count": 20,
401
+ "metadata": {
402
+ "scrolled": true
403
+ },
404
+ "outputs": [
405
+ {
406
+ "name": "stdout",
407
+ "output_type": "stream",
408
+ "text": [
409
+ "BLEU score of reference versus candidate 1: 27.6\n",
410
+ "BLEU score of reference versus candidate 2: 35.3\n"
411
+ ]
412
+ }
413
+ ],
414
+ "source": [
415
+ "result_candidate_1 = round(sacrebleu.sentence_bleu(candidate_1, [reference]).score, 1)\n",
416
+ "print(f\"BLEU score of reference versus candidate 1: {result_candidate_1}\")\n",
417
+ "result_candidate_2 = round(sacrebleu.sentence_bleu(candidate_2, [reference]).score, 1)\n",
418
+ "print(f\"BLEU score of reference versus candidate 2: {result_candidate_2}\")"
419
+ ]
420
+ },
421
+ {
422
+ "cell_type": "markdown",
423
+ "metadata": {},
424
+ "source": [
425
+ "# 4. BLEU computation on a corpus\n",
426
+ "\n",
427
+ "## 4.1 Loading Datasets for Evaluation Using the BLEU Score\n",
428
+ "\n",
429
+ "In this section, you will use a simple pipeline for evaluating machine translated text. You will use English to German translations generated by [Google Translate](https://translate.google.com). There are three files you will need:\n",
430
+ "\n",
431
+ "1. A source text in English. In this lab, you will use the first 1671 words of the [wmt19](http://statmt.org/wmt19/translation-task.html) evaluation dataset downloaded via SacreBLEU.\n",
432
+ "2. A reference translation to German of the corresponding first 1671 words from the original English text. This is also provided by SacreBLEU.\n",
433
+ "3. A candidate machine translation to German from the same 1671 words. This is generated by Google Translate.\n",
434
+ "\n",
435
+ "With that, you can now compare the reference and candidate translation to get the BLEU Score."
436
+ ]
437
+ },
438
+ {
439
+ "cell_type": "code",
440
+ "execution_count": 21,
441
+ "metadata": {},
442
+ "outputs": [],
443
+ "source": [
444
+ "# Loading the raw data\n",
445
+ "wmt19_src = open(\"data/wmt19_src.txt\", \"r\")\n",
446
+ "wmt19_src_1 = wmt19_src.read()\n",
447
+ "wmt19_src.close()\n",
448
+ "\n",
449
+ "wmt19_ref = open(\"data/wmt19_ref.txt\", \"r\")\n",
450
+ "wmt19_ref_1 = wmt19_ref.read()\n",
451
+ "wmt19_ref.close()\n",
452
+ "\n",
453
+ "wmt19_can = open(\"data/wmt19_can.txt\", \"r\")\n",
454
+ "wmt19_can_1 = wmt19_can.read()\n",
455
+ "wmt19_can.close()\n",
456
+ "\n",
457
+ "tokenized_corpus_src = nltk.word_tokenize(wmt19_src_1.lower())\n",
458
+ "tokenized_corpus_ref = nltk.word_tokenize(wmt19_ref_1.lower())\n",
459
+ "tokenized_corpus_cand = nltk.word_tokenize(wmt19_can_1.lower())"
460
+ ]
461
+ },
462
+ {
463
+ "cell_type": "markdown",
464
+ "metadata": {},
465
+ "source": [
466
+ "Now that you have your data loaded, you can inspect the first sentence of each dataset."
467
+ ]
468
+ },
469
+ {
470
+ "cell_type": "code",
471
+ "execution_count": 22,
472
+ "metadata": {
473
+ "tags": []
474
+ },
475
+ "outputs": [
476
+ {
477
+ "name": "stdout",
478
+ "output_type": "stream",
479
+ "text": [
480
+ "English source text:\n",
481
+ "\n",
482
+ "Welsh AMs worried about 'looking like muppets'\n",
483
+ "There is consternation among some AMs at a suggestion their title should change to MWPs (Member of the Welsh Parliament).\n",
484
+ " -> ['\\ufeffwelsh', 'ams', 'worried', 'about', \"'looking\", 'like', \"muppets'\", 'there', 'is', 'consternation', 'among', 'some', 'ams', 'at', 'a', 'suggestion', 'their', 'title', 'should', 'change', 'to', 'mwps', '(', 'member', 'of', 'the', 'welsh', 'parliament', ')', '.']\n",
485
+ "\n",
486
+ "\n",
487
+ "German reference translation:\n",
488
+ "\n",
489
+ "Walisische Ageordnete sorgen sich \"wie Dödel auszusehen\"\n",
490
+ "Es herrscht Bestürzung unter einigen Mitgliedern der Versammlung über einen Vorschlag, der ihren Titel zu MWPs (Mitglied der walisischen Parlament) ändern soll.\n",
491
+ " -> ['\\ufeffwalisische', 'ageordnete', 'sorgen', 'sich', '``', 'wie', 'dödel', 'auszusehen', \"''\", 'es', 'herrscht', 'bestürzung', 'unter', 'einigen', 'mitgliedern', 'der', 'versammlung', 'über', 'einen', 'vorschlag', ',', 'der', 'ihren', 'titel', 'zu', 'mwps', '(', 'mitglied', 'der', 'walisischen', 'parlament', ')', 'ändern', 'soll', '.']\n",
492
+ "\n",
493
+ "\n",
494
+ "German machine translation:\n",
495
+ "\n",
496
+ "Walisische AMs machten sich Sorgen, dass sie wie Muppets aussehen könnten\n",
497
+ "Einige AMs sind bestürzt über den Vorschlag, ihren Titel in MWPs (Mitglied des walisischen Parlaments) zu ändern.\n",
498
+ "Es ist aufg -> ['walisische', 'ams', 'machten', 'sich', 'sorgen', ',', 'dass', 'sie', 'wie', 'muppets', 'aussehen', 'könnten', 'einige', 'ams', 'sind', 'bestürzt', 'über', 'den', 'vorschlag', ',', 'ihren', 'titel', 'in', 'mwps', '(', 'mitglied', 'des', 'walisischen', 'parlaments']\n"
499
+ ]
500
+ }
501
+ ],
502
+ "source": [
503
+ "print(\"English source text:\\n\")\n",
504
+ "print(f\"{wmt19_src_1[0:170]} -> {tokenized_corpus_src[0:30]}\\n\\n\")\n",
505
+ "print(\"German reference translation:\\n\")\n",
506
+ "print(f\"{wmt19_ref_1[0:219]} -> {tokenized_corpus_ref[0:35]}\\n\\n\")\n",
507
+ "print(\"German machine translation:\\n\")\n",
508
+ "print(f\"{wmt19_can_1[0:199]} -> {tokenized_corpus_cand[0:29]}\")"
509
+ ]
510
+ },
511
+ {
512
+ "cell_type": "markdown",
513
+ "metadata": {},
514
+ "source": [
515
+ "And lastly, you can calculate the BLEU score of the translation."
516
+ ]
517
+ },
518
+ {
519
+ "cell_type": "code",
520
+ "execution_count": 23,
521
+ "metadata": {
522
+ "tags": []
523
+ },
524
+ "outputs": [
525
+ {
526
+ "name": "stdout",
527
+ "output_type": "stream",
528
+ "text": [
529
+ "BLEU score of the reference versus candidate translation: 43.2\n"
530
+ ]
531
+ }
532
+ ],
533
+ "source": [
534
+ "result = round(sacrebleu.sentence_bleu(wmt19_can_1, [wmt19_ref_1]).score, 1)\n",
535
+ "print(f\"BLEU score of the reference versus candidate translation: {result}\")"
536
+ ]
537
+ },
538
+ {
539
+ "cell_type": "markdown",
540
+ "metadata": {},
541
+ "source": [
542
+ "## 4.2 BLEU Score Interpretation on a Corpus\n",
543
+ "The table below (taken from [here](https://cloud.google.com/translate/automl/docs/evaluate)) shows the typical values of BLEU score. You can see that the translation above is of high quality according to this table and in comparison to the given reference sentence. (*if you see \"Hard to get the gist\", please open your workspace, delete `wmt19_can.txt` and get the latest version via the Lab Help button*)\n",
544
+ "\n",
545
+ "|Score | Interpretation |\n",
546
+ "|:---------:|:-------------------------------------------------------------:|\n",
547
+ "| < 10 | Almost useless |\n",
548
+ "| 10 - 19 | Hard to get the gist |\n",
549
+ "| 20 - 29 | The gist is clear, but has significant grammatical errors |\n",
550
+ "| 30 - 40 | Understandable to good translations |\n",
551
+ "| 40 - 50 | High quality translations |\n",
552
+ "| 50 - 60 | Very high quality, adequate, and fluent translations |\n",
553
+ "| > 60 | Quality often better than human |"
554
+ ]
555
+ },
556
+ {
557
+ "cell_type": "code",
558
+ "execution_count": null,
559
+ "metadata": {},
560
+ "outputs": [],
561
+ "source": []
562
+ }
563
+ ],
564
+ "metadata": {
565
+ "kernelspec": {
566
+ "display_name": "Python 3 (ipykernel)",
567
+ "language": "python",
568
+ "name": "python3"
569
+ },
570
+ "language_info": {
571
+ "codemirror_mode": {
572
+ "name": "ipython",
573
+ "version": 3
574
+ },
575
+ "file_extension": ".py",
576
+ "mimetype": "text/x-python",
577
+ "name": "python",
578
+ "nbconvert_exporter": "python",
579
+ "pygments_lexer": "ipython3",
580
+ "version": "3.10.11"
581
+ }
582
+ },
583
+ "nbformat": 4,
584
+ "nbformat_minor": 4
585
+ }
NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/.ipynb_checkpoints/C4W1_QKV_Attention-checkpoint.ipynb ADDED
@@ -0,0 +1,270 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "id": "707052ae",
6
+ "metadata": {},
7
+ "source": [
8
+ "# Scaled Dot-Product Attention: Ungraded Lab\n",
9
+ "\n",
10
+ "The 2017 paper [Attention Is All You Need](https://arxiv.org/abs/1706.03762) introduced the Transformer model and scaled dot-product attention, sometimes also called QKV (**Q**ueries, **K**eys, **V**alues) attention. Since then, Transformers have come to dominate large-scale natural language applications. Scaled dot-product attention can be used to improve seq2seq models as well. In this ungraded lab, you'll implement a simplified version of scaled dot-product attention and replicate word alignment between English and French, as shown in [Bhadanau, et al. (2014)](https://arxiv.org/abs/1409.0473).\n",
11
+ "\n",
12
+ "The Transformer model learns how to align words in different languages. You won't be training any weights here, so instead you will use [pre-trained aligned word embeddings from here](https://fasttext.cc/docs/en/aligned-vectors.html). Run the cell below to load the embeddings and set up the rest of the notebook.\n",
13
+ "\n",
14
+ "This is a practice notebook, where you can train writing your code. All of the solutions are provided at the end of the notebook."
15
+ ]
16
+ },
17
+ {
18
+ "cell_type": "code",
19
+ "execution_count": 1,
20
+ "id": "aa4d9f30",
21
+ "metadata": {},
22
+ "outputs": [],
23
+ "source": [
24
+ "# Import the libraries\n",
25
+ "import pickle\n",
26
+ "import matplotlib.pyplot as plt\n",
27
+ "import numpy as np\n",
28
+ "\n",
29
+ "# Load the word2int dictionaries\n",
30
+ "with open(\"./data/word2int_en.pkl\", \"rb\") as f:\n",
31
+ " en_words = pickle.load(f)\n",
32
+ " \n",
33
+ "with open(\"./data/word2int_fr.pkl\", \"rb\") as f:\n",
34
+ " fr_words = pickle.load(f)\n",
35
+ "\n",
36
+ "# Load the word embeddings\n",
37
+ "en_embeddings = np.load(\"./data/embeddings_en.npz\")[\"embeddings\"]\n",
38
+ "fr_embeddings = np.load(\"./data/embeddings_fr.npz\")[\"embeddings\"]"
39
+ ]
40
+ },
41
+ {
42
+ "cell_type": "code",
43
+ "execution_count": 2,
44
+ "id": "a6914081",
45
+ "metadata": {},
46
+ "outputs": [],
47
+ "source": [
48
+ "# Define some helper functions\n",
49
+ "\n",
50
+ "def tokenize(sentence, token_mapping):\n",
51
+ " tokenized = []\n",
52
+ " \n",
53
+ " for word in sentence.lower().split(\" \"):\n",
54
+ " try:\n",
55
+ " tokenized.append(token_mapping[word])\n",
56
+ " except KeyError:\n",
57
+ " # Using -1 to indicate an unknown word\n",
58
+ " tokenized.append(-1)\n",
59
+ " \n",
60
+ " return tokenized\n",
61
+ "\n",
62
+ "\n",
63
+ "def embed(tokens, embeddings):\n",
64
+ " embed_size = embeddings.shape[1]\n",
65
+ " \n",
66
+ " output = np.zeros((len(tokens), embed_size))\n",
67
+ " for i, token in enumerate(tokens):\n",
68
+ " if token == -1:\n",
69
+ " output[i] = np.zeros((1, embed_size))\n",
70
+ " else:\n",
71
+ " output[i] = embeddings[token]\n",
72
+ " \n",
73
+ " return output"
74
+ ]
75
+ },
76
+ {
77
+ "cell_type": "markdown",
78
+ "id": "6153d4b2",
79
+ "metadata": {},
80
+ "source": [
81
+ "The scaled-dot product attention consists of two matrix multiplications and a softmax scaling as shown in the diagram below from [Vaswani, et al. (2017)](https://arxiv.org/abs/1706.03762). It takes three input matrices, the queries, keys, and values.\n",
82
+ "\n",
83
+ "![scaled-dot product attention diagram](./images/attention.png)\n",
84
+ "\n",
85
+ "Mathematically, this is expressed as\n",
86
+ "\n",
87
+ "$$ \n",
88
+ "\\large \\mathrm{Attention}\\left(Q, K, V\\right) = \\mathrm{softmax}\\left(\\frac{QK^{\\top}}{\\sqrt{d_k}}\\right)V\n",
89
+ "$$\n",
90
+ "\n",
91
+ "where $Q$, $K$, and $V$ are the queries, keys, and values matrices respectively, and $d_k$ is the dimension of the keys. In practice, Q, K, and V all have the same dimensions. This form of attention is faster and more space-efficient than what you implemented before since it consists of only matrix multiplications instead of a learned feed-forward layer.\n",
92
+ "\n",
93
+ "Conceptually, the first matrix multiplication is a measure of the similarity between the queries and the keys. This is transformed into weights using the softmax function. These weights are then applied to the values with the second matrix multiplication resulting in output attention vectors. Typically, decoder states are used as the queries while encoder states are the keys and values.\n",
94
+ "\n",
95
+ "### Exercise 1\n",
96
+ "Implement the softmax function with Numpy and use it to calculate the weights from the queries and keys. Assume the queries and keys are 2D arrays (matrices). Note that since the dot-product of Q and K will be a matrix, you'll need to calculate softmax over a specific axis. See the end of the notebook for solutions."
97
+ ]
98
+ },
99
+ {
100
+ "cell_type": "code",
101
+ "execution_count": null,
102
+ "id": "3932b927",
103
+ "metadata": {},
104
+ "outputs": [],
105
+ "source": [
106
+ "def softmax(x, axis=0): \n",
107
+ " \"\"\" Calculate softmax function for an array x\n",
108
+ "\n",
109
+ " axis=0 calculates softmax across rows which means each column sums to 1 \n",
110
+ " axis=1 calculates softmax across columns which means each row sums to 1\n",
111
+ " \"\"\"\n",
112
+ " # Replace pass with your code.\n",
113
+ " pass\n",
114
+ "\n",
115
+ "def calculate_weights(queries, keys):\n",
116
+ " \"\"\" Calculate the weights for scaled dot-product attention\"\"\"\n",
117
+ " # Replace None with your code.\n",
118
+ " dot = None\n",
119
+ " weights = softmax(dot, axis=1)\n",
120
+ " \n",
121
+ " assert weights.sum(axis=1)[0] == 1, \"Each row in weights must sum to 1\"\n",
122
+ " \n",
123
+ " # Replace pass with your code.\n",
124
+ " pass"
125
+ ]
126
+ },
127
+ {
128
+ "cell_type": "code",
129
+ "execution_count": null,
130
+ "id": "51f47450",
131
+ "metadata": {},
132
+ "outputs": [],
133
+ "source": [
134
+ "# Tokenize example sentences in English and French, then get their embeddings\n",
135
+ "sentence_en = \"The agreement on the European Economic Area was signed in August 1992 .\"\n",
136
+ "tokenized_en = tokenize(sentence_en, en_words)\n",
137
+ "embedded_en = embed(tokenized_en, en_embeddings)\n",
138
+ "\n",
139
+ "sentence_fr = \"L accord sur la zone économique européenne a été signé en août 1992 .\"\n",
140
+ "tokenized_fr = tokenize(sentence_fr, fr_words)\n",
141
+ "embedded_fr = embed(tokenized_fr, fr_embeddings)\n",
142
+ "\n",
143
+ "# These weights indicate alignment between words in English and French\n",
144
+ "alignment = calculate_weights(embedded_fr, embedded_en)\n",
145
+ "\n",
146
+ "# Visualize weights to check for alignment\n",
147
+ "fig, ax = plt.subplots(figsize=(7,7))\n",
148
+ "ax.imshow(alignment, cmap='gray')\n",
149
+ "ax.xaxis.tick_top()\n",
150
+ "ax.set_xticks(np.arange(alignment.shape[1]))\n",
151
+ "ax.set_xticklabels(sentence_en.split(\" \"), rotation=90, size=16);\n",
152
+ "ax.set_yticks(np.arange(alignment.shape[0]));\n",
153
+ "ax.set_yticklabels(sentence_fr.split(\" \"), size=16);"
154
+ ]
155
+ },
156
+ {
157
+ "cell_type": "markdown",
158
+ "id": "d634f0ec",
159
+ "metadata": {},
160
+ "source": [
161
+ "If you implemented the weights calculations correctly, the alignment matrix should look like this:\n",
162
+ "\n",
163
+ "![alignment visualization](./images/alignment.png)\n",
164
+ "\n",
165
+ "This is a demonstration of alignment where the model has learned which words in English correspond to words in French. For example, the words *signed* and *signé* have a large weight because they have the same meaning. Typically, these alignments are learned using linear layers in the model, but you've used pre-trained embeddings here.\n",
166
+ "\n",
167
+ "### Exercise 2\n",
168
+ "Complete the implementation of scaled dot-product attention using your `calculate_weights` function (ignore the mask)."
169
+ ]
170
+ },
171
+ {
172
+ "cell_type": "code",
173
+ "execution_count": null,
174
+ "id": "fbfc157e",
175
+ "metadata": {},
176
+ "outputs": [],
177
+ "source": [
178
+ "def attention_qkv(queries, keys, values):\n",
179
+ " \"\"\" Calculate scaled dot-product attention from queries, keys, and values matrices \"\"\"\n",
180
+ " \n",
181
+ " # Replace pass with your code.\n",
182
+ " pass\n",
183
+ "\n",
184
+ "\n",
185
+ "attention_qkv_result = attention_qkv(embedded_fr, embedded_en, embedded_en)\n",
186
+ "\n",
187
+ "print(f\"The shape of the attention_qkv function is {attention_qkv_result.shape}\")\n",
188
+ "print(f\"Some elements of the attention_qkv function are \\n{attention_qkv_result[0:2,:10]}\")"
189
+ ]
190
+ },
191
+ {
192
+ "cell_type": "markdown",
193
+ "id": "f98335f0",
194
+ "metadata": {},
195
+ "source": [
196
+ "**Expected output**\n",
197
+ "\n",
198
+ "The shape of the attention_qkv function is `(14, 300)`\n",
199
+ "\n",
200
+ "Some elements of the attention_qkv function are \n",
201
+ "```python\n",
202
+ "[[-0.04039161 -0.00275749 0.00389873 0.04842744 -0.02472726 0.01435613\n",
203
+ " -0.00370253 -0.0619686 -0.00206159 0.01615228]\n",
204
+ " [-0.04083253 -0.00245985 0.00409068 0.04830341 -0.02479128 0.01447497\n",
205
+ " -0.00355203 -0.06196036 -0.00241327 0.01582606]]\n",
206
+ "```"
207
+ ]
208
+ },
209
+ {
210
+ "cell_type": "markdown",
211
+ "id": "f87131fb",
212
+ "metadata": {},
213
+ "source": [
214
+ "## Solutions"
215
+ ]
216
+ },
217
+ {
218
+ "cell_type": "markdown",
219
+ "id": "8470a024",
220
+ "metadata": {},
221
+ "source": [
222
+ "```python\n",
223
+ "def softmax(x, axis=0):\n",
224
+ " \"\"\" Calculate softmax function for an array x\n",
225
+ " \n",
226
+ " axis=0 calculates softmax across rows which means each column sums to 1 \n",
227
+ " axis=1 calculates softmax across columns which means each row sums to 1\n",
228
+ " \"\"\"\n",
229
+ " y = np.exp(x) \n",
230
+ " return y / np.expand_dims(np.sum(y, axis=axis), axis)\n",
231
+ "\n",
232
+ "def calculate_weights(queries, keys):\n",
233
+ " \"\"\" Calculate the weights for scaled dot-product attention\"\"\"\n",
234
+ " dot = np.matmul(queries, keys.T)/np.sqrt(keys.shape[1])\n",
235
+ " weights = softmax(dot, axis=1)\n",
236
+ " \n",
237
+ " assert weights.sum(axis=1)[0] == 1, \"Each row in weights must sum to 1\"\n",
238
+ " \n",
239
+ " return weights\n",
240
+ "\n",
241
+ "def attention_qkv(queries, keys, values):\n",
242
+ " \"\"\" Calculate scaled dot-product attention from queries, keys, and values matrices \"\"\"\n",
243
+ " weights = calculate_weights(queries, keys)\n",
244
+ " return np.matmul(weights, values)\n",
245
+ "```"
246
+ ]
247
+ }
248
+ ],
249
+ "metadata": {
250
+ "kernelspec": {
251
+ "display_name": "Python 3 (ipykernel)",
252
+ "language": "python",
253
+ "name": "python3"
254
+ },
255
+ "language_info": {
256
+ "codemirror_mode": {
257
+ "name": "ipython",
258
+ "version": 3
259
+ },
260
+ "file_extension": ".py",
261
+ "mimetype": "text/x-python",
262
+ "name": "python",
263
+ "nbconvert_exporter": "python",
264
+ "pygments_lexer": "ipython3",
265
+ "version": "3.10.11"
266
+ }
267
+ },
268
+ "nbformat": 4,
269
+ "nbformat_minor": 5
270
+ }
NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/C4W1_Basic_Attention.ipynb ADDED
@@ -0,0 +1,324 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "id": "9c74bac5",
6
+ "metadata": {},
7
+ "source": [
8
+ "# Basic Attention Operation: Ungraded Lab\n",
9
+ "\n",
10
+ "As you've learned, attention allows a seq2seq decoder to use information from each encoder step instead of just the final encoder hidden state. In the attention operation, the encoder outputs are weighted based on the decoder hidden state, then combined into one context vector. This vector is then used as input to the decoder to predict the next output step.\n",
11
+ "\n",
12
+ "In this ungraded lab, you'll implement a basic attention operation as described in [Bhadanau, et al (2014)](https://arxiv.org/abs/1409.0473) using Numpy.\n",
13
+ "\n",
14
+ "This is a practice notebook, where you can train writing your code. All of the solutions are provided at the end of the notebook."
15
+ ]
16
+ },
17
+ {
18
+ "cell_type": "code",
19
+ "execution_count": 1,
20
+ "id": "a5288920",
21
+ "metadata": {},
22
+ "outputs": [],
23
+ "source": [
24
+ "# Import the libraries and define the functions you will need for this lab\n",
25
+ "import numpy as np\n",
26
+ "\n",
27
+ "def softmax(x, axis=0):\n",
28
+ " \"\"\" Calculate softmax function for an array x along specified axis\n",
29
+ " \n",
30
+ " axis=0 calculates softmax across rows which means each column sums to 1 \n",
31
+ " axis=1 calculates softmax across columns which means each row sums to 1\n",
32
+ " \"\"\"\n",
33
+ " return np.exp(x) / np.expand_dims(np.sum(np.exp(x), axis=axis), axis)"
34
+ ]
35
+ },
36
+ {
37
+ "cell_type": "markdown",
38
+ "id": "9a6e0293",
39
+ "metadata": {},
40
+ "source": [
41
+ "## 1: Calculating alignment scores\n",
42
+ "\n",
43
+ "The first step is to calculate the alignment scores. This is a measure of similarity between the decoder hidden state and each encoder hidden state. From the paper, this operation looks like\n",
44
+ "\n",
45
+ "$$\n",
46
+ "\\large e_{ij} = v_a^\\top \\tanh{\\left(W_a s_{i-1} + U_a h_j\\right)}\n",
47
+ "$$\n",
48
+ "\n",
49
+ "where $W_a \\in \\mathbb{R}^{n\\times m}$, $U_a \\in \\mathbb{R}^{n \\times m}$, and $v_a \\in \\mathbb{R}^m$\n",
50
+ "are the weight matrices and $n$ is the hidden state size. In practice, this is implemented as a feedforward neural network with two layers, where $m$ is the size of the layers in the alignment network. It looks something like:\n",
51
+ "\n",
52
+ "![alignment model](./images/alignment_model_3.jpg)\n",
53
+ "\n",
54
+ "Here $h_j$ are the encoder hidden states for each input step $j$ and $s_{i - 1}$ is the decoder hidden state of the previous step. The first layer corresponds to $W_a$ and $U_a$, while the second layer corresponds to $v_a$.\n",
55
+ "\n",
56
+ "To implement this, first concatenate the encoder and decoder hidden states to produce an array with size $K \\times 2n$ where $K$ is the number of encoder states/steps. For this, use `np.concatenate` ([docs](https://numpy.org/doc/stable/reference/generated/numpy.concatenate.html)). Note that there is only one decoder state so you'll need to reshape it to successfully concatenate the arrays. The easiest way is to use `decoder_state.repeat` ([docs](https://numpy.org/doc/stable/reference/generated/numpy.repeat.html#numpy.repeat)) to match the hidden state array size.\n",
57
+ "\n",
58
+ "Then, apply the first layer as a matrix multiplication between the weights and the concatenated input. Use the tanh function to get the activations. Finally, compute the matrix multiplication of the second layer weights and the activations. This returns the alignment scores."
59
+ ]
60
+ },
61
+ {
62
+ "cell_type": "code",
63
+ "execution_count": 19,
64
+ "id": "72857076",
65
+ "metadata": {},
66
+ "outputs": [],
67
+ "source": [
68
+ "hidden_size = 16\n",
69
+ "attention_size = 10\n",
70
+ "input_length = 5\n",
71
+ "\n",
72
+ "np.random.seed(42)\n",
73
+ "\n",
74
+ "# Synthetic vectors used to test\n",
75
+ "encoder_states = np.random.randn(input_length, hidden_size)\n",
76
+ "decoder_state = np.random.randn(1, hidden_size)\n",
77
+ "\n",
78
+ "# Weights for the neural network, these are typically learned through training\n",
79
+ "# Use these in the alignment function below as the layer weights\n",
80
+ "layer_1 = np.random.randn(2 * hidden_size, attention_size)\n",
81
+ "layer_2 = np.random.randn(attention_size, 1)\n",
82
+ "\n",
83
+ "# Implement this function. Replace None with your code. Solution at the bottom of the notebook\n",
84
+ "def alignment(encoder_states, decoder_state):\n",
85
+ " # First, concatenate the encoder states and the decoder state\n",
86
+ " inputs = np.concatenate((encoder_states, np.repeat(decoder_state, input_length, axis=0)),axis=1)\n",
87
+ " assert inputs.shape == (input_length, 2 * hidden_size)\n",
88
+ " \n",
89
+ " # Matrix multiplication of the concatenated inputs and layer_1, with tanh activation\n",
90
+ " activations = np.tanh(np.dot(inputs,layer_1))\n",
91
+ " assert activations.shape == (input_length, attention_size)\n",
92
+ " \n",
93
+ " # Matrix multiplication of the activations with layer_2. Remember that you don't need tanh here\n",
94
+ " scores = np.dot(activations, layer_2)\n",
95
+ " assert scores.shape == (input_length, 1)\n",
96
+ " \n",
97
+ " return scores"
98
+ ]
99
+ },
100
+ {
101
+ "cell_type": "code",
102
+ "execution_count": 20,
103
+ "id": "fb638355",
104
+ "metadata": {},
105
+ "outputs": [
106
+ {
107
+ "name": "stdout",
108
+ "output_type": "stream",
109
+ "text": [
110
+ "[[4.35790943]\n",
111
+ " [5.92373433]\n",
112
+ " [4.18673175]\n",
113
+ " [2.11437202]\n",
114
+ " [0.95767155]]\n"
115
+ ]
116
+ }
117
+ ],
118
+ "source": [
119
+ "# Run this to test your alignment function\n",
120
+ "scores = alignment(encoder_states, decoder_state)\n",
121
+ "print(scores)"
122
+ ]
123
+ },
124
+ {
125
+ "cell_type": "markdown",
126
+ "id": "f26aae76",
127
+ "metadata": {},
128
+ "source": [
129
+ "If you implemented the function correctly, you should get these scores:\n",
130
+ "\n",
131
+ "```python\n",
132
+ "[[4.35790943]\n",
133
+ " [5.92373433]\n",
134
+ " [4.18673175]\n",
135
+ " [2.11437202]\n",
136
+ " [0.95767155]]\n",
137
+ "```"
138
+ ]
139
+ },
140
+ {
141
+ "cell_type": "markdown",
142
+ "id": "58b8cfa9",
143
+ "metadata": {},
144
+ "source": [
145
+ "## 2: Turning alignment into weights\n",
146
+ "\n",
147
+ "The next step is to calculate the weights from the alignment scores. These weights determine the encoder outputs that are the most important for the decoder output. These weights should be between 0 and 1. You can use the softmax function (which is already implemented above) to get these weights from the attention scores. Pass the attention scores vector to the softmax function to get the weights. Mathematically,\n",
148
+ "\n",
149
+ "$$\n",
150
+ "\\large \\alpha_{ij} = \\frac{\\exp{\\left(e_{ij}\\right)}}{\\sum_{k=1}^K \\exp{\\left(e_{ik}\\right)}}\n",
151
+ "$$\n",
152
+ "\n",
153
+ "\n",
154
+ "\n",
155
+ "## 3: Weight the encoder output vectors and sum\n",
156
+ "\n",
157
+ "The weights tell you the importance of each input word with respect to the decoder state. In this step, you use the weights to modulate the magnitude of the encoder vectors. Words with little importance will be scaled down relative to important words. Multiply each encoder vector by its respective weight to get the alignment vectors, then sum up the weighted alignment vectors to get the context vector. Mathematically,\n",
158
+ "\n",
159
+ "$$\n",
160
+ "\\large c_i = \\sum_{j=1}^K\\alpha_{ij} h_{j}\n",
161
+ "$$\n",
162
+ "\n",
163
+ "Implement these steps in the `attention` function below."
164
+ ]
165
+ },
166
+ {
167
+ "cell_type": "code",
168
+ "execution_count": 21,
169
+ "id": "4546cbb5",
170
+ "metadata": {},
171
+ "outputs": [
172
+ {
173
+ "name": "stdout",
174
+ "output_type": "stream",
175
+ "text": [
176
+ "[-0.63514569 0.04917298 -0.43930867 -0.9268003 1.01903919 -0.43181409\n",
177
+ " 0.13365099 -0.84746874 -0.37572203 0.18279832 -0.90452701 0.17872958\n",
178
+ " -0.58015282 -0.58294027 -0.75457577 1.32985756]\n"
179
+ ]
180
+ }
181
+ ],
182
+ "source": [
183
+ "# Implement this function. Replace None with your code.\n",
184
+ "def attention(encoder_states, decoder_state):\n",
185
+ " \"\"\" Example function that calculates attention, returns the context vector \n",
186
+ " \n",
187
+ " Arguments:\n",
188
+ " encoder_vectors: NxM numpy array, where N is the number of vectors and M is the vector length\n",
189
+ " decoder_vector: 1xM numpy array, M is the vector length, much be the same M as encoder_vectors\n",
190
+ " \"\"\" \n",
191
+ " \n",
192
+ " # First, calculate the alignment scores\n",
193
+ " scores = alignment(encoder_states, decoder_state)\n",
194
+ " \n",
195
+ " # Then take the softmax of the alignment scores to get a weight distribution\n",
196
+ " weights = softmax(scores)\n",
197
+ " \n",
198
+ " # Multiply each encoder state by its respective weight\n",
199
+ " weighted_scores = weights * encoder_states\n",
200
+ " \n",
201
+ " # Sum up weighted alignment vectors to get the context vector and return it\n",
202
+ " context = np.sum(weighted_scores,axis=0)\n",
203
+ " return context\n",
204
+ "\n",
205
+ "context_vector = attention(encoder_states, decoder_state)\n",
206
+ "print(context_vector)"
207
+ ]
208
+ },
209
+ {
210
+ "cell_type": "markdown",
211
+ "id": "5d9f3df4",
212
+ "metadata": {},
213
+ "source": [
214
+ "If you implemented the `attention` function correctly, the context vector should be\n",
215
+ "\n",
216
+ "```python\n",
217
+ "[-0.63514569 0.04917298 -0.43930867 -0.9268003 1.01903919 -0.43181409\n",
218
+ " 0.13365099 -0.84746874 -0.37572203 0.18279832 -0.90452701 0.17872958\n",
219
+ " -0.58015282 -0.58294027 -0.75457577 1.32985756]\n",
220
+ "```\n",
221
+ "\n"
222
+ ]
223
+ },
224
+ {
225
+ "cell_type": "markdown",
226
+ "id": "4210899c",
227
+ "metadata": {},
228
+ "source": [
229
+ "## See below for solutions"
230
+ ]
231
+ },
232
+ {
233
+ "cell_type": "markdown",
234
+ "id": "3ba0d629",
235
+ "metadata": {},
236
+ "source": [
237
+ "```python\n",
238
+ "# Solution\n",
239
+ "def alignment(encoder_states, decoder_state):\n",
240
+ " # First, concatenate the encoder states and the decoder state.\n",
241
+ " inputs = np.concatenate((encoder_states, decoder_state.repeat(input_length, axis=0)), axis=1)\n",
242
+ " assert inputs.shape == (input_length, 2*hidden_size)\n",
243
+ " \n",
244
+ " # Matrix multiplication of the concatenated inputs and the first layer, with tanh activation\n",
245
+ " activations = np.tanh(np.matmul(inputs, layer_1))\n",
246
+ " assert activations.shape == (input_length, attention_size)\n",
247
+ " \n",
248
+ " # Matrix multiplication of the activations with the second layer. Remember that you don't need tanh here\n",
249
+ " scores = np.matmul(activations, layer_2)\n",
250
+ " assert scores.shape == (input_length, 1)\n",
251
+ " \n",
252
+ " return scores\n",
253
+ "\n",
254
+ "# Run this to test your alignment function\n",
255
+ "scores = alignment(encoder_states, decoder_state)\n",
256
+ "print(scores)\n",
257
+ "```"
258
+ ]
259
+ },
260
+ {
261
+ "cell_type": "markdown",
262
+ "id": "f80faecb",
263
+ "metadata": {},
264
+ "source": [
265
+ "```python\n",
266
+ "# Solution\n",
267
+ "def attention(encoder_states, decoder_state):\n",
268
+ " \"\"\" Example function that calculates attention, returns the context vector \n",
269
+ " \n",
270
+ " Arguments:\n",
271
+ " encoder_vectors: NxM numpy array, where N is the number of vectors and M is the vector length\n",
272
+ " decoder_vector: 1xM numpy array, M is the vector length, much be the same M as encoder_vectors\n",
273
+ " \"\"\" \n",
274
+ " \n",
275
+ " # First, calculate the dot product of each encoder vector with the decoder vector\n",
276
+ " scores = alignment(encoder_states, decoder_state)\n",
277
+ " \n",
278
+ " # Then take the softmax of those scores to get a weight distribution\n",
279
+ " weights = softmax(scores)\n",
280
+ " \n",
281
+ " # Multiply each encoder state by its respective weight\n",
282
+ " weighted_scores = encoder_states * weights\n",
283
+ " \n",
284
+ " # Sum up the weights encoder states\n",
285
+ " context = np.sum(weighted_scores, axis=0)\n",
286
+ " \n",
287
+ " return context\n",
288
+ "\n",
289
+ "context_vector = attention(encoder_states, decoder_state)\n",
290
+ "print(context_vector)\n",
291
+ "```"
292
+ ]
293
+ },
294
+ {
295
+ "cell_type": "code",
296
+ "execution_count": null,
297
+ "id": "16a6caa8",
298
+ "metadata": {},
299
+ "outputs": [],
300
+ "source": []
301
+ }
302
+ ],
303
+ "metadata": {
304
+ "kernelspec": {
305
+ "display_name": "Python 3 (ipykernel)",
306
+ "language": "python",
307
+ "name": "python3"
308
+ },
309
+ "language_info": {
310
+ "codemirror_mode": {
311
+ "name": "ipython",
312
+ "version": 3
313
+ },
314
+ "file_extension": ".py",
315
+ "mimetype": "text/x-python",
316
+ "name": "python",
317
+ "nbconvert_exporter": "python",
318
+ "pygments_lexer": "ipython3",
319
+ "version": "3.10.11"
320
+ }
321
+ },
322
+ "nbformat": 4,
323
+ "nbformat_minor": 5
324
+ }
NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/C4W1_Bleu_Score.ipynb ADDED
@@ -0,0 +1,585 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "# Calculating the Bilingual Evaluation Understudy (BLEU) score: Ungraded Lab"
8
+ ]
9
+ },
10
+ {
11
+ "cell_type": "markdown",
12
+ "metadata": {},
13
+ "source": [
14
+ "In this ungraded lab, you will implement a popular metric for evaluating the quality of machine-translated text: the BLEU score proposed by Kishore Papineni, et al. in their 2002 paper [\"BLEU: a Method for Automatic Evaluation of Machine Translation\"](https://www.aclweb.org/anthology/P02-1040.pdf). The BLEU score works by comparing a \"candidate\" text to one or more \"reference\" texts. The score is higher the better the result. In the following sections you will calculate this value using your own implementation as well as using functions from a library."
15
+ ]
16
+ },
17
+ {
18
+ "cell_type": "markdown",
19
+ "metadata": {},
20
+ "source": [
21
+ "# 1. Importing the Libraries\n",
22
+ "\n",
23
+ "You will start by importing the Python libraries. First, you will implement your own version of the BLEU Score using NumPy. To verify that your implementation is correct, you will compare the results with those generated by the [SacreBLEU library](https://github.com/mjpost/sacrebleu). This package provides hassle-free computation of shareable, comparable, and reproducible BLEU scores. It also knows all the standard test sets and handles downloading, processing, and tokenization."
24
+ ]
25
+ },
26
+ {
27
+ "cell_type": "code",
28
+ "execution_count": 1,
29
+ "metadata": {
30
+ "tags": []
31
+ },
32
+ "outputs": [
33
+ {
34
+ "name": "stderr",
35
+ "output_type": "stream",
36
+ "text": [
37
+ "[nltk_data] Downloading package punkt to /home/jovyan/nltk_data...\n",
38
+ "[nltk_data] Package punkt is already up-to-date!\n"
39
+ ]
40
+ },
41
+ {
42
+ "name": "stdout",
43
+ "output_type": "stream",
44
+ "text": [
45
+ "Requirement already satisfied: sacrebleu in /opt/conda/lib/python3.10/site-packages (2.3.1)\n",
46
+ "Requirement already satisfied: portalocker in /opt/conda/lib/python3.10/site-packages (from sacrebleu) (2.8.2)\n",
47
+ "Requirement already satisfied: regex in /opt/conda/lib/python3.10/site-packages (from sacrebleu) (2023.10.3)\n",
48
+ "Requirement already satisfied: tabulate>=0.8.9 in /opt/conda/lib/python3.10/site-packages (from sacrebleu) (0.9.0)\n",
49
+ "Requirement already satisfied: numpy>=1.17 in /opt/conda/lib/python3.10/site-packages (from sacrebleu) (1.24.3)\n",
50
+ "Requirement already satisfied: colorama in /opt/conda/lib/python3.10/site-packages (from sacrebleu) (0.4.6)\n",
51
+ "Requirement already satisfied: lxml in /opt/conda/lib/python3.10/site-packages (from sacrebleu) (4.9.3)\n"
52
+ ]
53
+ }
54
+ ],
55
+ "source": [
56
+ "import numpy as np # import numpy to make numerical computations.\n",
57
+ "import nltk # import NLTK to handle simple NL tasks like tokenization.\n",
58
+ "nltk.download(\"punkt\")\n",
59
+ "from nltk.util import ngrams\n",
60
+ "from collections import Counter # import a counter.\n",
61
+ "!pip3 install 'sacrebleu' # install the sacrebleu package.\n",
62
+ "import sacrebleu # import sacrebleu in order compute the BLEU score.\n",
63
+ "import matplotlib.pyplot as plt # import pyplot in order to make some illustrations."
64
+ ]
65
+ },
66
+ {
67
+ "cell_type": "markdown",
68
+ "metadata": {},
69
+ "source": [
70
+ "# 2. BLEU score\n",
71
+ "\n",
72
+ "## 2.1 Definitions and formulas\n",
73
+ "\n",
74
+ "You have seen how to calculate the BLEU score in this week's lectures. Formally, you can express the BLEU score as:\n",
75
+ "\n",
76
+ "$$BLEU = BP\\times\\Bigl(\\prod_{i=1}^{n}precision_i\\Bigr)^{(1/n)}.\\tag{1}$$\n",
77
+ "\n",
78
+ "\n",
79
+ "The BLEU score depends on the $BP$, which stands for Brevity Penalty, and the weighted geometric mean precision for different lengths of n-grams, both of which are described below. The product runs from $i=1$ to $i=n$ to account for 1-grams to n-grams and the exponent of $1/n$ is there to calculate the geometrical average. In this notebook, you will use $n=4$\n",
80
+ "\n",
81
+ "The **Brevity Penalty** is defined as an exponential decay:\n",
82
+ "\n",
83
+ "$$BP = min\\Bigl(1, e^{(1-({len(ref)}/{len(cand)}))}\\Bigr),\\tag{2}$$\n",
84
+ "\n",
85
+ "where ${len(ref)}$ and ${len(cand)}$ refer to the length or count of words in the reference and candidate translations. The brevity penalty helps to handle very short translations. \n",
86
+ "\n",
87
+ "The **precision** is defined as :\n",
88
+ "\n",
89
+ "$$precision_i = \\frac {\\sum_{s_i \\in{cand}}min\\Bigl(C(s_i, cand), C(s_i, ref)\\Bigr)}{\\sum_{s_i \\in{cand}} C(s_i, cand)}.\\tag{3}$$\n",
90
+ "\n",
91
+ "The sum goes over all the i-grams $s_i$ in the candidate sentence $cand$. $C(s_i, cand)$ and $C(s_i, ref)$ are the counts of the i-grams in the candidate and reference sentences respectively. So the sum counts all the n-grams in the candidate sentence that also appear in the reference sentence, but only counts them as many times as they appear in the reference sentence and not more. This is then divided by the total number of i-grams in the candidate sentence."
92
+ ]
93
+ },
94
+ {
95
+ "cell_type": "markdown",
96
+ "metadata": {},
97
+ "source": [
98
+ "## 2.2 Visualizing the BLEU score\n",
99
+ "\n",
100
+ "### Brevity Penalty:\n",
101
+ "The brevity penalty penalizes generated translations that are shorter than the reference sentence. It compensates for the fact that the BLEU score has no recall term."
102
+ ]
103
+ },
104
+ {
105
+ "cell_type": "code",
106
+ "execution_count": 2,
107
+ "metadata": {},
108
+ "outputs": [
109
+ {
110
+ "data": {
111
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjcAAAGwCAYAAABVdURTAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAABcvElEQVR4nO3deVwU9eMG8Gd2WW4BEUEEBLxRPBCVwEytxLRMy1LTvLPwzLM0y6vMX5rmrZlXlleKV2UmWZ5oCoIniiIKKoqgcsq1+/n9Ye7XlcNdBAaW5/167eslw8zOswPMPs5+ZkYSQggQERERGQmF3AGIiIiIShLLDRERERkVlhsiIiIyKiw3REREZFRYboiIiMiosNwQERGRUWG5ISIiIqNiIneAsqbRaHDr1i1UqVIFkiTJHYeIiIj0IIRAWloaatasCYWi6GMzla7c3Lp1C25ubnLHICIiomKIj4+Hq6trkfNUunJTpUoVAI82jo2NjcxpiIiISB+pqalwc3PTvo8XpdKVm8cfRdnY2LDcEBERVTD6DCnhgGIiIiIyKiw3REREZFRYboiIiMiosNwQERGRUWG5ISIiIqPCckNERERGheWGiIiIjArLDRERERkVlhsiIiIyKiw3REREZFRkLTeHDh1C165dUbNmTUiShJ07dz5zmYMHD8LX1xfm5uaoXbs2VqxYUfpBiYiIqMKQtdxkZGSgWbNmWLJkiV7zx8bGokuXLmjbti0iIiLw2WefYfTo0QgODi7lpERERFRRyHrjzM6dO6Nz5856z79ixQrUqlULCxYsAAB4eXkhLCwM3377LXr06FFKKfWj1ggkpDyUNQM9Hycbc6iU/KSWiKiiq1B3BT927BgCAwN1pnXq1AmrV69Gbm4uVCpVvmWys7ORnZ2t/To1NbVUsiVnZOPFb/4pleemslHDxhxL+/rA191e7ihERPQcKlS5uX37NpycnHSmOTk5IS8vD0lJSXB2ds63zOzZszFjxowyyWdmwv/1V1RqjcDt1Cz0XnkcU99ohPdfcIckSXLHIiKiYqhQ5QZAvjccIUSB0x+bPHkyxo0bp/06NTUVbm5uJZ7LsYo5Ln2l/0dsVL5kZOfhk21n8PvZBHyx6zwi4h9g9ttNYGailDsaEREZqEIdaqhRowZu376tMy0xMREmJiaoVq1agcuYmZnBxsZG50H0NCszEyzp44MpXbygkIDtp27imz8uyR2LiIiKoUKVG39/f4SEhOhM27dvH1q2bFngeBsiQ0iShKEv1cby930BAGtDYxERd1/mVEREZChZy016ejoiIyMRGRkJ4NGp3pGRkYiLiwPw6COl/v37a+cPCgrC9evXMW7cOERFRWHNmjVYvXo1JkyYIEd8MlKdGtfAWz4uEAKYFHwWOXkauSMREZEBZC03YWFh8PHxgY+PDwBg3Lhx8PHxwdSpUwEACQkJ2qIDAJ6entizZw8OHDiA5s2b48svv8SiRYtkPw2cjM8XbzSCvZUpLt1Jw/cHY+SOQ0REBpDE4xG5lURqaipsbW2RkpLC8TdUpF2RN/Hx5kiYKhXY8/GLqOtYRe5IRESVliHv3xVqzA1RWXqzWU10aFAdOWoNJgWfhUZTqf4fQERUYbHcEBVCkiR89VYTWJkqEXb9Pn7+97rckYiISA8sN0RFcLGzwKTODQEA3/xxETfuZ8qciIiInoXlhugZ+vq5o5VHVWTkqPHZjnOoZMPUiIgqHJYbomdQKCT8X4+mMDVR4FD0XWw/dVPuSEREVASWGyI91KlujbGv1gcAzPztAhLTsmROREREhWG5IdLT0Lae8HaxQcrDXEzbdV7uOEREVAiWGyI9mSgVmNOjGUwUEv44dxt7zibIHYmIiArAckNkgEY1bTC8fR0AwNRd53AvI0fmRERE9DSWGyIDjXi5Luo7WSMpPQczf+XHU0RE5Q3LDZGBzEyUmPNOMygkYGfkLeyPuiN3JCIiegLLDVExNHezw9C2tQEAn+04i5SHuTInIiKix1huiIppbMf68HSwwp3UbMz6/YLccYiI6D8sN0TFZK5SYs47TSFJwC9hN3DgUqLckYiICCw3RM+llYc9BgV4AgAmbz+L1Cx+PEVEJDeWG6LnNLFTA7hXs0RCSha+/j1K7jhERJUeyw3Rc7IwVWLuO80gScDmk/E4FH1X7khERJUayw1RCWjtaY8B/h4AgEnBZ/jxFBGRjFhuiErIJ689+njqVkoWZv3Gj6eIiOTCckNUQixNTbQfT20Ji8c/PHuKiEgWLDdEJai15//OnpoUfAYpmfx4ioiorLHcEJWwiZ0aaC/uN/M3XtyPiKissdwQlTALUyW+fffRxf2CT93AXxd47ykiorLEckNUCnzd7bX3npq84yzuZ+TInIiIqPJguSEqJeM61kddR2vcTcvG1N3n5Y5DRFRpsNwQlRJzlRLz3m0GpULCr6dv4fczCXJHIiKqFFhuiEpRMzc7DG9fBwDw+c6zuJuWLXMiIiLjx3JDVMpGvVwPXs42uJ+Ziyk7zkIIIXckIiKjxnJDVMpMTRSY924zqJQS9l24g+2nbsodiYjIqLHcEJWBRjVtMObV+gCA6bvP49aDhzInIiIyXiw3RGXko5dqw6eWHdKy8zBx22loNPx4ioioNLDcEJURE6UC83s2h7lKgaNXkrH+2DW5IxERGSWWG6Iy5Olghc+6eAEAZv9xETF302VORERkfFhuiMrY+37uaFvPAdl5Goz75TTy1Bq5IxERGRWWG6IyplBImPNOU1QxN8Hp+AdY+k+M3JGIiIwKyw2RDJxtLfBlN28AwKK/L+N0/AN5AxERGRGWGyKZdGteE683dYZaIzD2l0g8zFHLHYmIyCiw3BDJRJIkzOruDccqZrh6NwPf7L0odyQiIqPAckMkIztLU8x9txkAYF3oNRyKvitzIiKiio/lhkhm7epXR39/dwDAxG2n8SAzR+ZEREQVG8sNUTkwubMXajtY4U5qNqbsOMebaxIRPQeWG6JywMJUiQW9m8NEIeH3swnYEcGbaxIRFZfs5WbZsmXw9PSEubk5fH19cfjw4SLnX7p0Kby8vGBhYYEGDRpg/fr1ZZSUqHQ1dbXDmFfrAQCm7jqP+HuZMiciIqqYZC03W7ZswZgxYzBlyhRERESgbdu26Ny5M+Li4gqcf/ny5Zg8eTKmT5+O8+fPY8aMGRgxYgR+/fXXMk5OVDqGta+Llu5VkZ6dh/G/nIaaN9ckIjKYJGT8cN/Pzw8tWrTA8uXLtdO8vLzQvXt3zJ49O9/8AQEBaNOmDebOnaudNmbMGISFheHIkSN6rTM1NRW2trZISUmBjY3N878IohIWfy8TnRceRnp2Hj55rQGGt68rdyQiItkZ8v4t25GbnJwchIeHIzAwUGd6YGAgQkNDC1wmOzsb5ubmOtMsLCxw4sQJ5ObmFrpMamqqzoOoPHOzt8T0NxsDAObvi8bZGykyJyIiqlhkKzdJSUlQq9VwcnLSme7k5ITbt28XuEynTp2watUqhIeHQwiBsLAwrFmzBrm5uUhKSipwmdmzZ8PW1lb7cHNzK/HXQlTSerRwQZcmNZCnEfh4SwQyc/LkjkREVGHIPqBYkiSdr4UQ+aY99sUXX6Bz58544YUXoFKp0K1bNwwcOBAAoFQqC1xm8uTJSElJ0T7i4+NLND9RaZAkCV+/1QQ1bMxx9W4Gvvo9Su5IREQVhmzlxsHBAUqlMt9RmsTExHxHcx6zsLDAmjVrkJmZiWvXriEuLg4eHh6oUqUKHBwcClzGzMwMNjY2Og+iisDO0hTzezaDJAEb/43DvvMFH9EkIiJdspUbU1NT+Pr6IiQkRGd6SEgIAgICilxWpVLB1dUVSqUSmzdvxhtvvAGFQvaDUEQlLqCuAz5sWxsA8GnwGdxJzZI5ERFR+SdrIxg3bhxWrVqFNWvWICoqCmPHjkVcXByCgoIAPPpIqX///tr5o6Oj8fPPP+Py5cs4ceIEevfujXPnzuHrr7+W6yUQlbrxgQ3QuKYN7mfmYsLW09Dw9HAioiKZyLnyXr16ITk5GTNnzkRCQgK8vb2xZ88euLs/us9OQkKCzjVv1Go15s2bh0uXLkGlUqFDhw4IDQ2Fh4eHTK+AqPSZmiiwsLcP3lh8GIcvJ2HN0Vh88N/RHCIiyk/W69zIgde5oYpqw7/XMWXHOaiUEnYMbwNvF1u5IxERlZkKcZ0bIjJMn9a1ENjICblqgdGbeXo4EVFhWG6IKghJkvBNj6ZwsjHD1bsZmPnrBbkjERGVSyw3RBVIVStTfNerOSQJ2HwyHnvOJsgdiYio3GG5IapgAuo4YFi7OgCAScFncPPBQ5kTERGVLyw3RBXQ2I710czNDqlZeRi7OZJ3DyciegLLDVEFpFIqsKh3c1ibmeDEtXtY/PdluSMREZUbLDdEFZR7NSt82f3R3cMX7b+Mf68my5yIiKh8YLkhqsDe8nHF2y1coBHAmC2RuJ+RI3ckIiLZsdwQVXBfdvOGp4MVElKy8EnwGVSy63ISEeXDckNUwVmZmWDxez5QKSWEXLiDn45flzsSEZGsWG6IjIC3iy0mdfYCAHz1exQu3EqVORERkXxYboiMxOA2HniloSNy8jQYuekUMrJ5ewYiqpxYboiMhCRJmPtuM9SwMcfVuxmYuuu83JGIiGTBckNkROytTLGwd3MoJCD41A0Eh9+QOxIRUZljuSEyMn61q2HMq/UBAF/sOoeYu+kyJyIiKlssN0RGaESHugioUw2ZOWqM2HAKWblquSMREZUZlhsiI6RUSFjQqzmqWZni4u00zPo9Su5IRERlhuWGyEg52phjfq/mAICfjl/H72cS5A1ERFRGWG6IjFi7+tUxrH0dAMCnwWdwLSlD5kRERKWP5YbIyI3vWB8t3asiPTsPIzZy/A0RGT+WGyIjZ6JUYHEfH1S1VOH8rVR8vYfjb4jIuLHcEFUCzrYW2vE3649x/A0RGTeWG6JKokMDRwzn+BsiqgRYbogqkXEd66OVx6PxN8N5/RsiMlIsN0SViIlSgcXvtYC9lSkuJKRi5m8X5I5ERFTiWG6IKpkatuZY0Ks5JAnY+G8cdkbclDsSEVGJYrkhqoReql8do16uBwCYvP0sLt9JkzkREVHJYbkhqqQ+fqUeXqzrgIe5agzfcAqZOXlyRyIiKhEsN0SVlFIhYUHv5nCsYobLiemYsuMchBByxyIiem4sN0SVmIO1GZb0aQGlQsKOiJvYdCJe7khERM+N5YaokmvtaY+JnRoAAKbvPo8zNx7IG4iI6Dmx3BARPnqpNjo2ckKOWoNhP5/Cg8wcuSMRERUbyw0RQZIkfPtuM7hXs8TNBw8xdkskNBqOvyGiionlhogAALYWKizv6wszEwX+uXQXS/+5InckIqJiYbkhIq1GNW3wVXdvAMD8v6Jx+PJdmRMRERmO5YaIdLzb0g29W7lBCODjzZG49eCh3JGIiAzCckNE+Ux/szG8XWxwLyMHwzacQnYeb7BJRBUHyw0R5WOuUmJ5X1/YWqhwOv4BZv7KG2wSUcXBckNEBXKzt8TC3o9usLnh3zhsDeMF/oioYmC5IaJCtW/giLGv1gcATNl5DudupsiciIjo2VhuiKhIIzvUxcsNHZGTp0HQz+G4n8EL/BFR+cZyQ0RFUigkfNezOWrZW+LG/Yf4eEsk1LzAHxGVYyw3RPRMtpYqrHjfF+YqBQ5F38WCv6LljkREVCjZy82yZcvg6ekJc3Nz+Pr64vDhw0XOv2HDBjRr1gyWlpZwdnbGoEGDkJycXEZpiSqvRjVt8H9vNwUALP77Cv48f1vmREREBZO13GzZsgVjxozBlClTEBERgbZt26Jz586Ii4srcP4jR46gf//+GDJkCM6fP4+tW7fi5MmT+OCDD8o4OVHl1N3HBYPaeAAAxv9yGlcS0+UNRERUAFnLzfz58zFkyBB88MEH8PLywoIFC+Dm5obly5cXOP/x48fh4eGB0aNHw9PTEy+++CI++ugjhIWFFbqO7OxspKam6jyIqPg+6+IFP097pGfn4cOfwpCWlSt3JCIiHbKVm5ycHISHhyMwMFBnemBgIEJDQwtcJiAgADdu3MCePXsghMCdO3ewbds2vP7664WuZ/bs2bC1tdU+3NzcSvR1EFU2KqUCS/u2gLOtOa7ezcC4X07zDuJEVK7IVm6SkpKgVqvh5OSkM93JyQm3bxf8WX5AQAA2bNiAXr16wdTUFDVq1ICdnR0WL15c6HomT56MlJQU7SM+nhciI3peDtZmWP6+L0yVCoRcuMM7iBNRuSL7gGJJknS+FkLkm/bYhQsXMHr0aEydOhXh4eHYu3cvYmNjERQUVOjzm5mZwcbGRudBRM+vuZudzh3E/754R+ZERESPyFZuHBwcoFQq8x2lSUxMzHc057HZs2ejTZs2mDhxIpo2bYpOnTph2bJlWLNmDRISEsoiNhE9oWcrN/T1q/XoDuKbIhFzlwOMiUh+spUbU1NT+Pr6IiQkRGd6SEgIAgICClwmMzMTCoVuZKVSCeDRER8iKnvTujZGK4+qSMvOw9D1YUjlAGMikpmsH0uNGzcOq1atwpo1axAVFYWxY8ciLi5O+zHT5MmT0b9/f+38Xbt2xfbt27F8+XJcvXoVR48exejRo9G6dWvUrFlTrpdBVKmZmiiwrK8vatj8N8B4SyQHGBORrEzkXHmvXr2QnJyMmTNnIiEhAd7e3tizZw/c3d0BAAkJCTrXvBk4cCDS0tKwZMkSjB8/HnZ2dnj55ZfxzTffyPUSiAhA9SpmWNnfF++sOIa/ohKx4K9ojAtsIHcsIqqkJFHJPs9JTU2Fra0tUlJSOLiYqIQFh9/A+K2nAQAr3m+B17ydZU5ERMbCkPdv2c+WIiLj0cPXFYPbeAIAxv1yGpdup8mciIgqI5YbIipRn3VpiIA61ZCZo8bQ9WG4n5EjdyQiqmRYboioRJkoFVjapwXc7C0Qdy8TwzecQq5aI3csIqpEDC437du3x/r16/Hw4cPSyENERqCqlSlW9W8FK1Mljl1Nxle/XZA7EhFVIgaXG19fX3zyySeoUaMGhg4diuPHj5dGLiKq4BrUqILvejUHAPx47Do2/htX9AJERCXE4HIzb9483Lx5E+vXr8fdu3fx0ksvoVGjRvj2229x5w4vv05E/xPYuAYmBNYHAEzddQ4nYu/JnIiIKoNijblRKpXo1q0bdu7ciZs3b6JPnz744osv4Obmhu7du+Pvv/8u6ZxEVEGN6FAXbzR1Rp5GYNjP4bhxP1PuSERk5J5rQPGJEycwdepUfPvtt3B0dMTkyZPh6OiIrl27YsKECSWVkYgqMEmSMPedZmhc0wbJGTkYuj4cGdl5csciIiNmcLlJTEzEvHnz4O3tjbZt2+Lu3bvYvHkzrl27hhkzZmDlypXYtWsXVqxYURp5iagCsjBV4of+LeFgbYaohFSM5S0aiKgUGVxuXF1dsWrVKgwYMAA3btzAtm3b8Nprr0GSJO08rVu3RqtWrUo0KBFVbDXtLPB9P1+YKhXYd+EO5oVckjsSERkpg2+/cPjwYbRt27a08pQ63n6BSF47Im5g7JZHt2j4rlczvOXjKnMiIqoISvX2C9OmTcODBw8KXOnLL79s6NMRUSXzlo8rhrWvAwD4NPgsTsXdlzkRERkbg8vNwYMHkZOT/3LqWVlZOHz4cImEIiLjNjGwATo2ckJOngYfrg/HzQe8KCgRlRwTfWc8c+YMAEAIgQsXLuD27dva76nVauzduxcuLi4ln5CIjI5CIWFBr+bosTwUF2+n4YMfw7AtyB9WZnrvkoiICqX3mBuFQqEdNFzQIhYWFli8eDEGDx5csglLGMfcEJUfN+5novvSo0hKz0HHRk74/n1fKBTSsxckokrHkPdvvcvN9evXIYRA7dq1ceLECVSvXl37PVNTUzg6OkKpVD5f8jLAckNUvoRfv4f3fvgXOXkafPRSbUzu4iV3JCIqhwx5/9b7GLC7uzsAQKPh3X2JqOT4uttj7jtN8fHmSHx/6Co8HazQu3UtuWMRUQWmV7nZvXu33k/45ptvFjsMEVVO3Zq7IOZuBhbtv4zPd55DLXtLBNR1kDsWEVVQen0spVDod1KVJElQq9XPHao08WMpovJJCIHRmyPx6+lbsDE3wY4RbVCnurXcsYionCjx69xoNBq9HuW92BBR+fXoHlRN4VPLDqlZeRiy7iTuZ+S/7AQR0bM8140ziYhKkrlKiZX9WsLFzgLXkjPx0c/hyMnjOD8iMozBt18AgIyMDBw8eBBxcXH5Lug3evToEgtXGvixFFH5d+l2GnosD0V6dh7e9nHBvJ7NdO5fR0SVT6mcCv5YREQEunTpgszMTGRkZMDe3h5JSUmwtLSEo6Mjrl69+lzhSxvLDVHFcDD6LgavOwm1RmDsq/Xx8av15I5ERDIq1XtLjR07Fl27dsW9e/dgYWGB48eP4/r16/D19cW3335b7NBERE9qV786vuzmDQD47q9o7Ii4IXMiIqooDC43kZGRGD9+PJRKJZRKJbKzs+Hm5oY5c+bgs88+K42MRFRJ9fGrhY9eqg0A+HTbWfx7NVnmRERUERhcblQqlfazbycnJ8TFxQEAbG1ttf8mIiopn77WEJ29ayBHrcGHP4Uj5m663JGIqJwzuNz4+PggLCwMANChQwdMnToVGzZswJgxY9CkSZMSD0hElZtCIeG7Xs3R3M0OKQ9zMXjdSdzjKeJEVASDy83XX38NZ2dnAMCXX36JatWqYdiwYUhMTMTKlStLPCARkblKiVUDWsK1qgWuJ2di6PowZOXyulpEVLBinQpekfFsKaKK60piGt5eForUrDx0aVIDS95rwbuIE1USpXq2FBGRXOo6VsHK/i1hqlRgz9nb+HpPlNyRiKgcMrjc3LlzB/369UPNmjVhYmKiPWvq8YOIqDS9ULsa5r7bFACw6kgs1h6NlTkREZU3et0V/EkDBw5EXFwcvvjiCzg7O/OqoURU5ro1d8HNBw8xZ+8lzPztApxtLfCadw25YxFROWHwmJsqVarg8OHDaN68eSlFKl0cc0NkHIQQmLLzHDb+GwczEwU2ffgCWtSqKncsIiolpTrmxs3NDZVsDDIRlUOSJGHmm43xckNHZOdp8MGPYbiWlCF3LCIqBwwuNwsWLMCkSZNw7dq1UohDRKQ/E6UCi9/zQRMXW9zLyMGAtSeQlJ4tdywikpnBH0tVrVoVmZmZyMvLg6WlJVQqlc737927V6IBSxo/liIyPolpWXh7WShu3H+IZq622Dj0BViZGTykkIjKMUPevw3+61+wYEFxcxERlQrHKub4cXBr9FgeitM3UjBi4yn80L8lVEpe7YKoMuJF/IjIaIRfv4++q44jK1eDni1d8U2Ppjyjk8hIlPpF/GJiYvD555/jvffeQ2JiIgBg7969OH/+fHGejoioRPi6V8Xi91pAIQG/hN3Ad39dljsSEcnA4HJz8OBBNGnSBP/++y+2b9+O9PRHd+g9c+YMpk2bVuIBiYgM0bGRE77q/ugmvov2X8aGf6/LnIiIyprB5WbSpEn46quvEBISAlNTU+30Dh064NixYyUajoioOPr41cLoV+oBAL7YeQ4hF+7InIiIypLB5ebs2bN466238k2vXr06kpOTSyQUEdHzGvtqPfRq6QaNAEZuPIWwa+X7TE4iKjkGlxs7OzskJCTkmx4REQEXFxeDAyxbtgyenp4wNzeHr68vDh8+XOi8AwcOhCRJ+R6NGzc2eL1EZNwkScKst7y1F/kb8mMYou+kyR2LiMqAweWmT58++PTTT3H79m1IkgSNRoOjR49iwoQJ6N+/v0HPtWXLFowZMwZTpkxBREQE2rZti86dOyMuLq7A+RcuXIiEhATtIz4+Hvb29nj33XcNfRlEVAmYKBVY2qcFWtSyQ8rDXPRffQI3HzyUOxYRlTKDTwXPzc3FwIEDsXnzZgghYGJiArVajT59+mDdunUG3Rncz88PLVq0wPLly7XTvLy80L17d8yePfuZy+/cuRNvv/02YmNj4e7uXuA82dnZyM7+3xVLU1NT4ebmxlPBiSqRB5k5eHfFMVxOTEed6lbYGhQAeyvTZy9IROWGIaeCF/s6N1evXsWpU6eg0Wjg4+ODevXqGbR8Tk4OLC0tsXXrVp0xPB9//DEiIyNx8ODBZz5H165dkZ2djX379hU6z/Tp0zFjxox801luiCqXhJSH6LEsFLdSstDczQ4bh/rB0pRXMSaqKErlOjcajQZz585FmzZt0Lp1a6xatQpvvPEGevbsaXCxAYCkpCSo1Wo4OTnpTHdycsLt27efuXxCQgL++OMPfPDBB0XON3nyZKSkpGgf8fHxBmcloorP2dYC64e0hp2lCpHxDzB8wynkqjVyxyKiUqB3ufnmm28wadIkWFlZwdnZGfPnz8fo0aOfO8DTVw8VQuh1RdF169bBzs4O3bt3L3I+MzMz2NjY6DyIqHKq61gFawa2grlKgQOX7uKTbWeg0VSqi7QTVQp6l5t169Zh8eLF2LdvH3bt2oWdO3di/fr1KO7dGxwcHKBUKvMdpUlMTMx3NOdpQgisWbMG/fr107nWDhHRs7SoVRXL+/pCqZCwI+Imvvo9qtj7MSIqn/QuN9evX8cbb7yh/bpTp04QQuDWrVvFWrGpqSl8fX0REhKiMz0kJAQBAQFFLnvw4EFcuXIFQ4YMKda6iahy69DQEXPfaQoAWHM0Fkv+viJzIiIqSXqXm5ycHFhYWGi/liQJpqamOmciGWrcuHFYtWoV1qxZg6ioKIwdOxZxcXEICgoC8Gi8TEGnl69evRp+fn7w9vYu9rqJqHJ7u4Urpr7RCAAwLyQaPx3nbRqIjIVBpwp88cUXsLS01H6dk5ODWbNmwdbWVjtt/vz5ej9fr169kJycjJkzZyIhIQHe3t7Ys2eP9rTuhISEfNe8SUlJQXBwMBYuXGhIdCKifAa/6IkHmTlY9PcVTN11DjbmJujW3PCLkRJR+aL3qeDt27d/5kBfSZLw999/l0iw0mLIqWREZPyEEJi2+zzWH7sOE4WEHwa0RIcGjnLHIqKnlMl1bioqlhsieppGIzBmSyR2n74Fc5UCPw/xQ0sPe7ljEdETSuU6N0RExkqhkDCvZzO0b1AdWbkaDFp3Ehdupcodi4iKieWGiAiASqnA8r6+aOleFWlZeei/5gRikzLkjkVExcByQ0T0HwtTJVYPbAUvZxskpWfj/VX/8kabRBUQyw0R0RNsLVT4aUhr1K5uhZsPHqLfqn9xN634l7wgorLHckNE9BQHazP8PMQPLnYWuJqUgX6r/0VKZq7csYhITwaXGw8PD8ycOTPf9WeIiIxJTTsLbPjAD9WrmOHi7TQMXHcCGdl5csciIj0YXG7Gjx+PXbt2oXbt2ujYsSM2b978XFcpJiIqrzwcrPDTkNawtVAhIu4Bhq4PQ1auWu5YRPQMBpebUaNGITw8HOHh4WjUqBFGjx4NZ2dnjBw5EqdOnSqNjEREsmlYwwY/Dm4NK1MlQmOSMXLjKeSqNXLHIqIiFHvMTbNmzbBw4ULcvHkT06ZNw6pVq9CqVSs0a9YMa9as4V12ichoNHezw6oBrWBmosBfUYkY98tpqDXcxxGVV8UuN7m5ufjll1/w5ptvYvz48WjZsiVWrVqFnj17YsqUKejbt29J5iQikpV/nWpY8b4vVEoJv56+hU+Dz0DDgkNULhl040wAOHXqFNauXYtNmzZBqVSiX79++O6779CwYUPtPIGBgXjppZdKNCgRkdw6NHTEot4+GLkpAtvCb8BcpcCX3byfed89IipbBpebVq1aoWPHjli+fDm6d+8OlUqVb55GjRqhd+/eJRKQiKg86dzEGfPVGozZEomfj8fBzESJz1/3YsEhKkcMLjdXr16Fu7t7kfNYWVlh7dq1xQ5FRFSedWvuguxcDT4JPoPVR2JhoVJiQqcGcsciov8YPOamQ4cOSE5Ozjf9wYMHqF27domEIiIq73q2csPMbo0BAEv+uYIlf1+WORERPWZwubl27RrU6vzXecjOzsbNmzdLJBQRUUXQ398Dn3V5NN7w233RWHX4qsyJiAgw4GOp3bt3a//9559/wtbWVvu1Wq3G/v374eHhUaLhiIjKuw9fqoOsXA3mh0Tjq9+jYGqiQH9/D7ljEVVqepeb7t27AwAkScKAAQN0vqdSqeDh4YF58+aVaDgioopg1Mt1kZWrxrIDMZi66zyUCgl9/Yoem0hEpUfvcqPRPLoip6enJ06ePAkHB4dSC0VEVJFIkoSJnRogTyOw8tBVTNlxDiYKCb1a1ZI7GlGlZPDZUrGxsaWRg4ioQpMkCZM7N0SeWmDN0VhM2n4WSoUC7/i6yh2NqNLRq9wsWrQIH374IczNzbFo0aIi5x09enSJBCMiqmgkScIXb3hBrdHgx2PXMXHbaSgVwFs+LDhEZUkSetwEytPTE2FhYahWrRo8PT0LfzJJwtWr5ftsgdTUVNja2iIlJQU2NjZyxyEiIySEwOc7z2HDv3FQSMCC3j54s1lNuWMRVWiGvH/rdeTmyY+i+LEUEVHRJEnCl928odYIbD4Zj7FbIqGUJLze1FnuaESVgsHXuTl48GBp5CAiMioKhYSv32qCd3xdodYIfLw5An+cTZA7FlGlYHC56dixI2rVqoVJkybh7NmzpZGJiMgoKBQSvunRFG/7uCBPIzByEwsOUVkwuNzcunULn3zyCQ4fPoxmzZqhadOmmDNnDm7cuFEa+YiIKjSlQsLcd5vhbR8XqP8rOHtYcIhKlV4DigsTGxuLjRs3YtOmTbh48SJeeukl/P333yWZr8RxQDERyUGtEZi49TS2R9yEUiFhUW8fjsEhMoAh79/PVW6AR7de+OOPP/DFF1/gzJkzBd53qjxhuSEiuag1AhO3ncb2U48KzsLezfFGU55FRaQPQ96/Df5Y6rGjR49i+PDhcHZ2Rp8+fdC4cWP89ttvxX06IiKjp1RImPtOM/Ro8XiQcSR+PX1L7lhERsfgKxR/9tln2LRpE27duoVXX30VCxYsQPfu3WFpaVka+YiIjIpSIWHOO00hScC28BsYsyUSANCV18EhKjEGl5sDBw5gwoQJ6NWrF+8vRURUDMr/zqKSAGwNv4GPN0dAIwS6NXeROxqRUTC43ISGhpZGDiKiSuVxwQEeFZyxWyKRqxa8FxVRCSjWmJuffvoJbdq0Qc2aNXH9+nUAwIIFC7Br164SDUdEZMweXwfnvdZu0Ahg4rbT2HwiTu5YRBWeweVm+fLlGDduHLp06YIHDx5oz46ys7PDggULSjofEZFRUygkzOreBP393SEEMGn7Wfx07JrcsYgqNIPLzeLFi/HDDz9gypQpUCqV2uktW7bkFYuJiIpBoZAw483GGPLioxsTf7HrPFYf4X38iIrL4HITGxsLHx+ffNPNzMyQkZFRIqGIiCobSZLw+eteGNa+DgDgy98uYPmBGJlTEVVMBpcbT09PREZG5pv+xx9/oFGjRiWRiYioUpIkCZ90aoCPX6kHAPhm70Us2n9Z5lREFY/BZ0tNnDgRI0aMQFZWFoQQOHHiBDZt2oTZs2dj1apVpZGRiKjSkCQJYzvWh6mJAnP/vIT5IdHIydNgfGB9SJIkdzyiCsHgcjNo0CDk5eXhk08+QWZmJvr06QMXFxcsXLgQvXv3Lo2MRESVzogOdaFSSvh6z0Us+ecKHuaq8fnrXiw4RHowqNzk5eVhw4YN6Nq1K4YOHYqkpCRoNBo4OjqWVj4iokrrw5fqwMxEiWm7Hw0wzsxR46vu3lAqWHCIimLQmBsTExMMGzYM2dnZAAAHBwcWGyKiUjQgwANz3mkKhQRsOhGH8b9EIk+tkTsWUblm8IBiPz8/RERElEYWIiIqQM+WbljY2wcmCgk7I29hxMZTyM5Tyx2LqNwyuNwMHz4c48ePx5IlS3Ds2DGcOXNG52GoZcuWwdPTE+bm5vD19cXhw4eLnD87OxtTpkyBu7s7zMzMUKdOHaxZs8bg9RIRVSRdm9XEivd9YWqiwJ/n7+DD9eF4mMOCQ1QQSQghDFlAocjfhyRJghACkiRpr1isjy1btqBfv35YtmwZ2rRpg++//x6rVq3ChQsXUKtWrQKX6datG+7cuYOvvvoKdevWRWJiIvLy8hAQEKDXOlNTU2Fra4uUlBTY2NjonZWIqDw4cjkJQ9eH4WGuGn6e9lg9sBWszQw+N4SowjHk/dvgcvP4XlKFcXd31/u5/Pz80KJFCyxfvlw7zcvLC927d8fs2bPzzb9371707t0bV69ehb29vf6hn8ByQ0QVXdi1exi09iTSsvPQ3M0OPw5qDVtLldyxiEqVIe/fBn8s5e7uXuRDXzk5OQgPD0dgYKDO9MDAwELvPL579260bNkSc+bMgYuLC+rXr48JEybg4cOHha4nOzsbqampOg8iooqspYc9Ng59AXaWKkTGP0CvlceQmJYldyyicsPgcpOcnKz9d3x8PKZOnYqJEyc+c6zM05KSkqBWq+Hk5KQz3cnJCbdv3y5wmatXr+LIkSM4d+4cduzYgQULFmDbtm0YMWJEoeuZPXs2bG1ttQ83NzeDchIRlUdNXG2x5UN/OFYxw8XbaXh3xTHE38uUOxZRuaB3uTl79iw8PDzg6OiIhg0bIjIyEq1atcJ3332HlStXokOHDti5c6fBAZ6+INXjsTsF0Wg0kCQJGzZsQOvWrdGlSxfMnz8f69atK/TozeTJk5GSkqJ9xMfHG5yRiKg8alCjCrYFBcDN3gLXkzPxzopQXL6TJncsItnpXW4++eQTNGnSBAcPHkT79u3xxhtvoEuXLkhJScH9+/fx0Ucf4f/+7//0XrGDgwOUSmW+ozSJiYn5juY85uzsDBcXF9ja2mqneXl5QQiBGzduFLiMmZkZbGxsdB5ERMaiVjVLbAsKQAOnKriTmo13vz+G0/EP5I5FJCu9y83Jkycxa9YsvPjii/j2229x69YtDB8+HAqFAgqFAqNGjcLFixf1XrGpqSl8fX0REhKiMz0kJKTQM5/atGmDW7duIT09XTstOjoaCoUCrq6ueq+biMiYONmYY8tHL6C5mx0eZOaizw/HERqTJHcsItnoXW7u3buHGjVqAACsra1hZWWlc8ZS1apVkZZm2OHQcePGYdWqVVizZg2ioqIwduxYxMXFISgoCMCjj5T69++vnb9Pnz6oVq0aBg0ahAsXLuDQoUOYOHEiBg8eDAsLC4PWTURkTOwsTbHhAz+0qVsNGTlqDFx7EvvOFzx+kcjYGTSg+OmxMM97A7devXphwYIFmDlzJpo3b45Dhw5hz5492rOuEhISEBcXp53f2toaISEhePDgAVq2bIm+ffuia9euWLRo0XPlICIyBlZmJlgzsBU6NXZCTp4GwzacQnB4wR/ZExkzva9zo1Ao0LlzZ5iZmQEAfv31V7z88suwsrIC8OiU67179xp0ET858Do3RGTs8tQaTNp+Ftv+KzZfvNEIQ170lDkV0fMplYv4DRo0SK+Vr127Vq/55MJyQ0SVgUYjMGtPFFYfiQUADGtfB590avDcR9yJ5FKqVyiu6FhuiKiyEEJg+cEYzNl7CQDQs6Urvn6rCUyUBl/ijEh2pXqFYiIiqhgkScLw9nXxTY8mUEjAL2E3EPTzKWTllu/hA0TPi+WGiMjI9WpVC9/3awkzEwX+irqDfqv/RUpmrtyxiEoNyw0RUSXQsZETfhrihyrmJjh57T56fn8Mt1N4PyoyTiw3RESVRGtPe2wNenQ/qkt30tBjeShi7qY/e0GiCoblhoioEmlYwwbBwwJQ28EKNx88xLsrjiGSt2sgI8NyQ0RUybjZW2JrkD+autriXkYO3lt5HH9fvCN3LKISw3JDRFQJVbM2w8ahL6BtPQc8zFVj6PpwbDkZ9+wFiSoAlhsiokrK+r/bNfRo4Qq1RuDT4LP4LiQalezyZ2SEWG6IiCoxlVKBb99tilEv1wUALNx/GZOCzyJPrZE5GVHxsdwQEVVykiRhfGADzHrLGwoJ2BIWj6Hrw5CZkyd3NKJiYbkhIiIAQF8/d3zfryXMVQr8c+kueq88jqT0bLljERmM5YaIiLQ6NnLCxqEvoKqlCmdupKDH8lBcS8qQOxaRQVhuiIhIR4taVRE8LABu9ha4npyJt5eHIvz6fbljEemN5YaIiPKpXd0a24e1QROXR9fC6fPDcew5myB3LCK9sNwQEVGBqlcxw+YPX8CrXo7IztNg+IZT+P5gDE8Vp3KP5YaIiAplZWaC7/u1xMAADwDA7D8u4vOd53iqOJVrLDdERFQkpULC9DcbY+objSBJwIZ/4zB0fRjSs3mqOJVPLDdERKSXwS96YsX7vtpTxXuuOIbbKVlyxyLKh+WGiIj01qlxDWz+0B8O1qa4kJCK7kuP4sKtVLljEelguSEiIoM0d7PDjuFtUNfRGrdTs/DuilAcuJQodywiLZYbIiIymJu9JYKDAuBfuxoyctQYvO4k1h+7JncsIgAsN0REVEy2lir8OLg1erRwhUYAU3edx7RdPJOK5MdyQ0RExWZq8uiu4p++1hAA8OOx6xj8YxhSs3JlTkaVGcsNERE9F0mSMKx9Hax43xcWKiUORd9Fj2WhiEvOlDsaVVIsN0REVCJe866BrUH+cLIxw+XEdHRfdhQnr92TOxZVQiw3RERUYrxdbLFrxIvwdrHBvYwc9P3hXwSH35A7FlUyLDdERFSiatia45eP/PFa4xrIUWswfutpzP3zIjQa3pOKygbLDRERlThLUxMs69sCw9vXAQAs/ScGwzecQmYOb9lApY/lhoiISoVCIeGT1xri23ebQaWUsPf8bfRYfgw37nOgMZUulhsiIipV7/i6YtPQF+BgbYqohFR0W3IUJ2I50JhKD8sNERGVupYe9tg18kU0rmmD5Iwc9F11HJtOxMkdi4wUyw0REZUJFzsLbAsKwOtNnZGrFpi8/Sym7TqHXF7RmEoYyw0REZUZC1MllrzngwmB9QE8uqLxgDUncD8jR+ZkZExYboiIqExJkoSRL9fD9/18YWmqRGhMMrovO4roO2lyRyMjwXJDRESy6NS4BrYPD4BrVQtcT87EW0uP4q8Ld+SORUaA5YaIiGTTsIYNdo98EX6e9sjIUWPoT2FYvP8yL/hHz4XlhoiIZGVvZYqfP/BDvxfcIQQwLyQaQT+HI413FqdiYrkhIiLZqZQKfNndG9/0aAJTpQL7LtxB96VHEXM3Xe5oVAGx3BARUbnRq1Ut/BLkjxo25oi5m4HuS44ihONwyEAsN0REVK40d7PDr6NeRGsPe6Rl52Ho+jAs+Cua43BIb7KXm2XLlsHT0xPm5ubw9fXF4cOHC533wIEDkCQp3+PixYtlmJiIiEpb9Spm2DDUDwP83QEAC/66jA9/Ckcqx+GQHmQtN1u2bMGYMWMwZcoUREREoG3btujcuTPi4oq+JPelS5eQkJCgfdSrV6+MEhMRUVlRKRWY0c0bc99pClMTBf6KejQO50oix+FQ0SQhhGzH+fz8/NCiRQssX75cO83Lywvdu3fH7Nmz881/4MABdOjQAffv34ednV2x1pmamgpbW1ukpKTAxsamuNGJiKgMnbnxAB/9FI6ElCxYm5lgXs9m6NS4htyxqAwZ8v4t25GbnJwchIeHIzAwUGd6YGAgQkNDi1zWx8cHzs7OeOWVV/DPP/8UOW92djZSU1N1HkREVLE0dX00DsfP0x7p2Xn46Kdw/N8fF5HH+1JRAWQrN0lJSVCr1XByctKZ7uTkhNu3bxe4jLOzM1auXIng4GBs374dDRo0wCuvvIJDhw4Vup7Zs2fD1tZW+3BzcyvR10FERGXDwdoMP3/gh8FtPAEAKw7GoN/qE7ibli1zMipvZPtY6tatW3BxcUFoaCj8/f2102fNmoWffvpJ70HCXbt2hSRJ2L17d4Hfz87ORnb2/37xU1NT4ebmxo+liIgqsN/O3MKn284gI0cNJxszLOvbAr7u9nLHolJUIT6WcnBwgFKpzHeUJjExMd/RnKK88MILuHz5cqHfNzMzg42Njc6DiIgqtjea1sSukW1Q19Ead1Kz0ev741h7NBYyDiOlckS2cmNqagpfX1+EhIToTA8JCUFAQIDezxMREQFnZ+eSjkdEROVcXccq2DWiDd5o6ow8jcCMXy9g9OZIZGTnyR2NZGYi58rHjRuHfv36oWXLlvD398fKlSsRFxeHoKAgAMDkyZNx8+ZNrF+/HgCwYMECeHh4oHHjxsjJycHPP/+M4OBgBAcHy/kyiIhIJlZmJlj8ng983ati1u9R+PX0LUQlpGLF+76o62gtdzySiazlplevXkhOTsbMmTORkJAAb29v7NmzB+7ujy7alJCQoHPNm5ycHEyYMAE3b96EhYUFGjdujN9//x1dunSR6yUQEZHMJEnCoDaeaOJiixEbT+FKYjq6LTmCOe80w+tNeWS/MpL1Ojdy4HVuiIiM1920bIzadArHr94DAAxu44lJnRvC1ET2C/LTc6oQA4qJiIhKWvUqZvh5iB+C2tUBAKw5Got3vz+G+HuZMiejssRyQ0RERsVEqcCkzg2xqn9L2FqocDr+AV5fdJh3F69EWG6IiMgovdrICb+PfhHN3OyQmvXo7uKzfr+AXF7V2Oix3BARkdFyrWqJrR/544MXH13V+IfDsej5/THcfPBQ5mRUmlhuiIjIqJmaKPD5G42wsp8vbMxNEBH3AF0WHsb+KH5MZaxYboiIqFIIbFwDv49ui2autkh5mIshP4Zh9p4ofkxlhFhuiIio0nCzt8TWoAAMauMBAPj+0FX0Xnkct/gxlVFhuSEiokrF1ESBaV0bY8X7vqhiboLw6/fRhWdTGRWWGyIiqpRe866B30e1RVNXWzzIzMXQ9WGYtuscsnLVckej58RyQ0RElVatapbYFhSAoW0fnU3147HreGtZKK4kpsucjJ4Hyw0REVVqpiYKTHm9EdYOaoVqVqaISkhF18VHsOVkHCrZHYqMBssNERERgA4NHPHHmLZ4sa4DHuaq8WnwWYzaFIHUrFy5o5GBWG6IiIj+41jFHOsHt8anrzWEiULCb2cS8Pqiw4iIuy93NDIAyw0REdETFAoJw9rXwdYgf7jZWyD+3kO8u+IYlh24Ao2GH1NVBCw3REREBfCpVRW/j26LN5o6I08jMGfvJfRfcwKJqVlyR6NnYLkhIiIqhI25Covf88GcHk1hoVLiyJUkvLbwMPadvy13NCoCyw0REVERJElCz1Zu+HVUGzRytsG9jBx8+FM4Jm8/g8ycPLnjUQFYboiIiPRQ17EKdowIwEcv1YYkAZtOxOP1RUdwOv6B3NHoKSw3REREejIzUWJyFy9s+MAPzrbmiE3KQI/loVjy92WoOdi43GC5ISIiMlBAHQfs/fglvP7fYONv90Wj1/fHEH8vU+5oBJYbIiKiYrG1VGHJez6Y37MZrM1MEHb9PjovPIztp27wysYyY7khIiIqJkmS8HYLV/zxcVu0dK+K9Ow8jPvlNEZuikBKJq9sLBeWGyIioufkZm+JzR++gAmB9WGikPD7mQS8tvAQQq8kyR2tUmK5ISIiKgEmSgVGvlwPwcMC4OlghYSULPRZ9S+m7z6PhzlqueNVKiw3REREJaiZmx1+G/Ui+vjVAgCsC73G+1OVMZYbIiKiEmZlZoKv32qCdYNawcnGDFf/O2X82z8vISdPI3c8o8dyQ0REVEraN3DEvjHt0L15TWgEsOSfK+i+9Cgu3k6VO5pRY7khIiIqRbaWKizo7YNlfVugqqUKFxJS0XXxESw/EMML/5USlhsiIqIy0KWJM/aNbYdXvZyQqxb4Zu9F9Pz+GK4lZcgdzeiw3BAREZWR6lXM8EN/X3z7bjNUMTNB+H8X/lt/7Bo0PIpTYlhuiIiIypAkSXjH1xV7x76ENnWr4WGuGlN3nUf/NSdw88FDueMZBZYbIiIiGbjYWeCnwX6Y8WZjmKsUOHIlCYHzD+Ln49d5FOc5sdwQERHJRKGQMCDAA3tGt0Urj6rIyFHj853n0HfVv4hL5k04i4vlhoiISGa1q1tjy4f+mNa1ESxUShy7moxOCw5h3dFYHsUpBpYbIiKickChkDCojSf2jmmLF2rb42GuGtN/vYDeK48jlmdUGYTlhoiIqBxxr2aFjR+8gC+7e8PKVIkT1+7htQWHsOrwVV4XR08sN0REROWMQiGh3wvu+HPsS2hbzwHZeRp89XsU3lkRiiuJaXLHK/dYboiIiMop16qWWD+4Nf7v7SaoYmaCiLgH6LLoCJYduII8Ne9RVRiWGyIionJMkiT0bl0L+8a9hA4NqiMnT4M5ey/h7eWhuHCL96gqCMsNERFRBeBsa4E1A1th3rvNYGNugjM3UtB1yRF8s/cisnLVcscrV1huiIiIKghJktDD1xV/jWuHLk1qQK0RWH4gBp0WHMLRK0lyxys3WG6IiIgqGEcbcyzr64sf+rdEDRtzXE/ORN9V/2LC1tO4n5EjdzzZyV5uli1bBk9PT5ibm8PX1xeHDx/Wa7mjR4/CxMQEzZs3L92ARERE5VTHRk4IGfcS+vu7Q5KAbeE38Or8g9gVeRNCVN7TxmUtN1u2bMGYMWMwZcoUREREoG3btujcuTPi4uKKXC4lJQX9+/fHK6+8UkZJiYiIyqcq5irM7OaNbUEBqO9kjeSMHHy8ORKD1p3EjfuV8xYOkpCx2vn5+aFFixZYvny5dpqXlxe6d++O2bNnF7pc7969Ua9ePSiVSuzcuRORkZF6rzM1NRW2trZISUmBjY3N88QnIiIqV3LyNFhxMAZL/r6CHLUGlqZKjA9sgIEBHlAqJLnjPRdD3r9lO3KTk5OD8PBwBAYG6kwPDAxEaGhoocutXbsWMTExmDZtml7ryc7ORmpqqs6DiIjIGJmaKDD6lXrY83FbtPawR2aOGl/+dgFvLztaqU4bl63cJCUlQa1Ww8nJSWe6k5MTbt++XeAyly9fxqRJk7BhwwaYmJjotZ7Zs2fD1tZW+3Bzc3vu7EREROVZXUdrbP7wBcx+uwmqmJvg9H+njc/eE4XMnDy545U62QcUS5LuYTIhRL5pAKBWq9GnTx/MmDED9evX1/v5J0+ejJSUFO0jPj7+uTMTERGVdwqFhPda18L+J04b//7QVbw67yD+PH/bqAcc63f4oxQ4ODhAqVTmO0qTmJiY72gOAKSlpSEsLAwREREYOXIkAECj0UAIARMTE+zbtw8vv/xyvuXMzMxgZmZWOi+CiIionHt82vj+qDuYtvs8btx/iI9+CscrDR0x/c3GcLO3lDtiiZPtyI2pqSl8fX0REhKiMz0kJAQBAQH55rexscHZs2cRGRmpfQQFBaFBgwaIjIyEn59fWUUnIiKqcF7xckLI2HYY0aEOVEoJ+y8mouN3B7H0nyvIyTOu+1TJduQGAMaNG4d+/fqhZcuW8Pf3x8qVKxEXF4egoCAAjz5SunnzJtavXw+FQgFvb2+d5R0dHWFubp5vOhEREeVnYarExE4N8ZaPCz7feQ7Hr97D3D8vYUfETXzZzRv+darJHbFEyFpuevXqheTkZMycORMJCQnw9vbGnj174O7uDgBISEh45jVviIiIyDB1Hatg09AXsCPiJmb9HoUriel474fjeNvHBZ+97gUH64o9nEPW69zIgde5ISIi+p+UzFzM+fMiNp6IgxCAjbkJPnmtIfq0rgVFObo2jiHv3yw3REREhMj4B5iy4yzO/3c9nGZudpjV3RveLrYyJ3uE5aYILDdEREQFy1Nr8NPx65i3Lxrp2XlQSMD7L7hjXMf6sLM0lTVbhbhCMREREZUvJkoFBrXxxN/j26Frs5rQCGD9sevo8O0BbPw3DmpNxTgewiM3REREVKDQK0mY/ut5RN9JBwB4u9hgxpve8HWvWuZZ+LFUEVhuiIiI9Jer1uCnY9fx3V/RSMt6dOuGt1u4YFLnhnCsYl5mOVhuisByQ0REZLik9GzM2XsRv4TdAABYm5ng41fqYUCAB0xNSn+UC8tNEVhuiIiIii8y/gGm7TqH0zdSAAB1qlth+puN0bZe9VJdL8tNEVhuiIiIno9GI7At/Aa+2XsRyRk5AIBOjZ3w+euNSu1eVSw3RWC5ISIiKhkpD3Ox4K9orD92HWqNgJmJAkHt6mBY+zowVylLdF0sN0VguSEiIipZl26nYfru8zh2NRkA4GJngR0jAkp0wDGvc0NERERlpkGNKtg41A9L+7RATVtz1K5uheoy3p9K1htnEhERkXGQJAmvN3XGyw0dkZqVC0mS775ULDdERERUYixMlbAwLdnxNobix1JERERkVFhuiIiIyKiw3BAREZFRYbkhIiIio8JyQ0REREaF5YaIiIiMCssNERERGRWWGyIiIjIqLDdERERkVFhuiIiIyKiw3BAREZFRYbkhIiIio8JyQ0REREal0t0VXAgBAEhNTZU5CREREenr8fv24/fxolS6cpOcnAwAcHNzkzkJERERGSotLQ22trZFzlPpyo29vT0AIC4u7pkbpzJITU2Fm5sb4uPjYWNjI3cc2XF7/A+3hS5uD13cHv/DbaGrtLaHEAJpaWmoWbPmM+etdOVGoXg0zMjW1pa/hE+wsbHh9ngCt8f/cFvo4vbQxe3xP9wWukpje+h7UIIDiomIiMiosNwQERGRUal05cbMzAzTpk2DmZmZ3FHKBW4PXdwe/8NtoYvbQxe3x/9wW+gqD9tDEvqcU0VERERUQVS6IzdERERk3FhuiIiIyKiw3BAREZFRYbkhIiIio2KU5WbZsmXw9PSEubk5fH19cfjw4SLnz87OxpQpU+Du7g4zMzPUqVMHa9asKaO0pc/Q7bFhwwY0a9YMlpaWcHZ2xqBBg7S3rajIDh06hK5du6JmzZqQJAk7d+585jIHDx6Er68vzM3NUbt2baxYsaL0g5YRQ7fH9u3b0bFjR1SvXh02Njbw9/fHn3/+WTZhS1lxfjceO3r0KExMTNC8efNSy1fWirM9jHk/WpztYaz70dmzZ6NVq1aoUqUKHB0d0b17d1y6dOmZy5X1vtToys2WLVswZswYTJkyBREREWjbti06d+6MuLi4Qpfp2bMn9u/fj9WrV+PSpUvYtGkTGjZsWIapS4+h2+PIkSPo378/hgwZgvPnz2Pr1q04efIkPvjggzJOXvIyMjLQrFkzLFmyRK/5Y2Nj0aVLF7Rt2xYRERH47LPPMHr0aAQHB5dy0rJh6PY4dOgQOnbsiD179iA8PBwdOnRA165dERERUcpJS5+h2+KxlJQU9O/fH6+88kopJZNHcbaHMe9HDd0exrwfPXjwIEaMGIHjx48jJCQEeXl5CAwMREZGRqHLyLIvFUamdevWIigoSGdaw4YNxaRJkwqc/48//hC2trYiOTm5LOKVOUO3x9y5c0Xt2rV1pi1atEi4urqWWkY5ABA7duwocp5PPvlENGzYUGfaRx99JF544YVSTCYPfbZHQRo1aiRmzJhR8oFkZMi26NWrl/j888/FtGnTRLNmzUo1l1z02R7Gvh99kj7bo7LsR4UQIjExUQAQBw8eLHQeOfalRnXkJicnB+Hh4QgMDNSZHhgYiNDQ0AKX2b17N1q2bIk5c+bAxcUF9evXx4QJE/Dw4cOyiFyqirM9AgICcOPGDezZswdCCNy5cwfbtm3D66+/XhaRy5Vjx47l23adOnVCWFgYcnNzZUpVfmg0GqSlpWlvRlvZrF27FjExMZg2bZrcUWRnzPvR4qhM+9GUlBQAKHI/IMe+1KhunJmUlAS1Wg0nJyed6U5OTrh9+3aBy1y9ehVHjhyBubk5duzYgaSkJAwfPhz37t2r8J8XF2d7BAQEYMOGDejVqxeysrKQl5eHN998E4sXLy6LyOXK7du3C9x2eXl5SEpKgrOzs0zJyod58+YhIyMDPXv2lDtKmbt8+TImTZqEw4cPw8TEqHajxWLM+9HiqCz7USEExo0bhxdffBHe3t6FzifHvtSojtw8JkmSztdCiHzTHtNoNJAkCRs2bEDr1q3RpUsXzJ8/H+vWrTOa/3UYsj0uXLiA0aNHY+rUqQgPD8fevXsRGxuLoKCgsoha7hS07QqaXtls2rQJ06dPx5YtW+Do6Ch3nDKlVqvRp08fzJgxA/Xr15c7TrlQGfajhqgs+9GRI0fizJkz2LRp0zPnLet9qVH9l8PBwQFKpTLfUYnExMR8rfExZ2dnuLi46NxG3cvLC0II3LhxA/Xq1SvVzKWpONtj9uzZaNOmDSZOnAgAaNq0KaysrNC2bVt89dVXlepoRY0aNQrcdiYmJqhWrZpMqeS3ZcsWDBkyBFu3bsWrr74qd5wyl5aWhrCwMERERGDkyJEAHr25CyFgYmKCffv24eWXX5Y5Zdky5v1ocVSG/eioUaOwe/duHDp0CK6urkXOK8e+1KiO3JiamsLX1xchISE600NCQhAQEFDgMm3atMGtW7eQnp6unRYdHQ2FQvHMH1h5V5ztkZmZCYVC99dCqVQC+F/Triz8/f3zbbt9+/ahZcuWUKlUMqWS16ZNmzBw4EBs3LjRKMcP6MPGxgZnz55FZGSk9hEUFIQGDRogMjISfn5+ckcsc8a8Hy0OY96PCiEwcuRIbN++HX///Tc8PT2fuYws+9JSG6osk82bNwuVSiVWr14tLly4IMaMGSOsrKzEtWvXhBBCTJo0SfTr1087f1pamnB1dRXvvPOOOH/+vDh48KCoV6+e+OCDD+R6CSXK0O2xdu1aYWJiIpYtWyZiYmLEkSNHRMuWLUXr1q3legklJi0tTURERIiIiAgBQMyfP19ERESI69evCyHyb4urV68KS0tLMXbsWHHhwgWxevVqoVKpxLZt2+R6CSXK0O2xceNGYWJiIpYuXSoSEhK0jwcPHsj1EkqModviacZ2tpSh28PY96OGbg9j3o8OGzZM2NraigMHDujsBzIzM7XzlId9qdGVGyGEWLp0qXB3dxempqaiRYsWOqeoDRgwQLRr105n/qioKPHqq68KCwsL4erqKsaNG6fzg6roDN0eixYtEo0aNRIWFhbC2dlZ9O3bV9y4caOMU5e8f/75RwDI9xgwYIAQouBtceDAAeHj4yNMTU2Fh4eHWL58edkHLyWGbo927doVOX9FVpzfjScZW7kpzvYw5v1ocbaHse5HC9oOAMTatWu185SHfan0X1giIiIio2BUY26IiIiIWG6IiIjIqLDcEBERkVFhuSEiIiKjwnJDRERERoXlhoiIiIwKyw0REREZFZYbIiIiMiosN1Soa9euQZIkREZGlup6MjMz0aNHD9jY2ECSJDx48EDvZSVJws6dO0s0z7p162BnZ1eiz/k8PDw8sGDBghJ/3qNHj6JJkyZQqVTo3r273suVt+0jhMCHH34Ie3v7Mvl9NTZl9XdeVqZPn47mzZtrvx44cOAzf7/bt2+PMWPGlGouKlssNxXcwIEDIUkSJEmCiYkJatWqhWHDhuH+/fsGP8/TOwA3NzckJCTA29u7BBPn9+OPP+Lw4cMIDQ1FQkKCzp2FH3t6h2WMyro0jBs3Ds2bN0dsbCzWrVtX4DylVaxK0t69e7Fu3Tr89ttvZfL7KoeSKvH6vNEbm4ULFxb6+11cBw4cMPg/YoUprb+xivC3W5pM5A5Az++1117D2rVrkZeXhwsXLmDw4MF48OABNm3a9FzPq1QqUaNGjRJKWbiYmBh4eXkZ5ZtSeRYTE4OgoKBye9fmnJwcmJqaPnO+mJgYODs7F3qne30IIaBWq2Fiwl2isSnoP0tUCZTqnauo1A0YMEB069ZNZ9q4ceOEvb299uu8vDwxePBg4eHhIczNzUX9+vXFggULtN+fNm1avpug/fPPPyI2NlYAEBEREdp5Dxw4IFq1aiVMTU1FjRo1xKeffipyc3OLzLht2zbRqFEjYWpqKtzd3cW3336r/d7TN2Ms6OaEa9euLfQmbQDEDz/8ILp37y4sLCxE3bp1xa5du3SWP3/+vOjcubOwsrISjo6O4v333xd3794tNO/atWuFra2tzrTdu3eLFi1aCDMzM+Hp6SmmT5+u87r1ybFr1y5Rt25dYW5uLtq3by/WrVsnAIj79+8XeGO+adOmCSGEcHd3F7NmzRKDBg0S1tbWws3NTXz//fdFbvOsrCwxatQoUb16dWFmZibatGkjTpw4IYQQ2p9rQdvzSQXdKPPJ7bN3717RsGFDYWVlJTp16iRu3bqls/yaNWtEw4YNhZmZmWjQoIFYunRpkZnbtWsnRowYIcaOHSuqVasmXnrpJSFE0T+/AQMG6ORzd3cXQgih0WjEN998Izw9PYW5ublo2rSp2Lp1q3Zdj7f33r17ha+vr1CpVOLvv//We7m//vpL+Pr6CgsLC+Hv7y8uXryY72ft6+srzMzMRLVq1cRbb72l/V52draYOHGiqFmzprC0tBStW7cW//zzT6Hbxd3dvcDXKIQQy5YtE7Vr1xYqlUrUr19frF+/vtDnedbfeXBwsGjfvr2wsLAQTZs2FaGhoTrLHz16VLRt21aYm5sLV1dXMWrUKJGenl7o+p61HX766Sfh6+srrK2thZOTk3jvvffEnTt3DN7Ws2fPFo6OjsLa2loMHjxYfPrppzo3MX16H5meni769esnrKysRI0aNcS3334r2rVrJz7++GO9shX09/P4BprP+v15WmF/Y8/a3j/++KOwsrIS0dHR2vlHjhwp6tWrJ9LT04t83sqi8r1iI/P0H25MTIxo1KiRcHJy0k7LyckRU6dOFSdOnBBXr14VP//8s7C0tBRbtmwRQgiRlpYmevbsKV577TXt7euzs7PzlZsbN24IS0tLMXz4cBEVFSV27NghHBwctG/CBQkLCxMKhULMnDlTXLp0Saxdu1ZYWFho30yTk5PF0KFDhb+/v0hISBDJycn5niMzM1OMHz9eNG7cWJvv8d2GAQhXV1exceNGcfnyZTF69GhhbW2tfZ5bt24JBwcHMXnyZBEVFSVOnTolOnbsKDp06FBo5qfLzd69e4WNjY1Yt26diImJEfv27RMeHh5i+vTp2nmelSM2NlaoVCoxYcIEcfHiRbFp0ybh4uKiLTfZ2dliwYIFwsbGRvsa09LShBCP3tzs7e3F0qVLxeXLl8Xs2bOFQqEQUVFRhb6G0aNHi5o1a4o9e/aI8+fPiwEDBoiqVauK5ORkkZeXJxISEoSNjY1YsGCBzvZ8UnJysnB1dRUzZ87UZnq8fVQqlXj11VfFyZMnRXh4uPDy8hJ9+vTRLrty5Urh7OwsgoODxdWrV0VwcLCwt7cX69atKzRzu3bthLW1tZg4caK4ePGiiIqKeubP78GDB2LmzJnC1dVVJCQkiMTERCGEEJ999plo2LCh2Lt3r4iJiRFr164VZmZm4sCBA0KI/71xNm3aVOzbt09cuXJFJCUl6b2cn5+fOHDggDh//rxo27atCAgI0L6O3377TSiVSjF16lRx4cIFERkZKWbNmqX9fp8+fURAQIA4dOiQuHLlipg7d64wMzPTeaN6UmJioraAPvkat2/fLlQqlVi6dKm4dOmSmDdvnlAqleLvv/8u8Hme9XfesGFD8dtvv4lLly6Jd955R7i7u2sL/JkzZ4S1tbX47rvvRHR0tDh69Kjw8fERAwcOLPTn+aztsHr1arFnzx4RExMjjh07Jl544QXRuXNn7ff12dZbtmwRpqam4ocffhAXL14UU6ZMEVWqVCmy3AwbNky4urqKffv2iTNnzog33nhDWFtb65SborLl5eWJ4OBgAUBcunRJJCQkiAcPHgghnv1797TC/sb02d7vvvuuaNWqlcjNzRV//PGHUKlU2v/AFPa8lQnLTQU3YMAAoVQqhZWVlTA3N9e29Pnz5xe53PDhw0WPHj10nufpI0BPl5vPPvtMNGjQQGg0Gu08S5cuFdbW1kKtVhe4nj59+oiOHTvqTJs4caJo1KiR9uuPP/64wCM2T5o2bZrODusxAOLzzz/Xfp2eni4kSRJ//PGHEEKIL774QgQGBuosEx8fr90xFeTpctO2bVvx9ddf68zz008/CWdnZ71zfPrpp8Lb21vnOaZMmaItNwWt9zF3d3fx/vvva7/WaDTC0dFRLF++vMD86enpQqVSiQ0bNmin5eTkiJo1a4o5c+Zop9na2hZ4xObpdX/33Xc60x4fSbty5Yp22tKlS3UKtZubm9i4caPOcl9++aXw9/cvdF3t2rUTzZs315mmz8/vu+++0zmakZ6eLszNzfMdeRgyZIh47733hBD/e+PcuXNnsZb766+/tN///fffBQDx8OFDIYQQ/v7+om/fvgW+xitXrghJksTNmzd1pr/yyiti8uTJBW8Y8ej3a8eOHTrTAgICxNChQ3Wmvfvuu6JLly6FPk9Rf+erVq3STjt//rwAoC3Q/fr1Ex9++KHOcocPHxYKhUL7up9W1HYoyIkTJwQAbanXd1sHBQXpPI+fn1+h5SYtLU2YmpqKzZs3a7+fnJwsLCwsdMqNvtke/+0Kod/vT0EK+hvTZ3vfu3dPuLq6imHDhgknJyfx1VdfPfN5KxN+wGwEOnTogOXLlyMzMxOrVq1CdHQ0Ro0apTPPihUrsGrVKly/fh0PHz5ETk6OwQN0o6Ki4O/vD0mStNPatGmD9PR03LhxA7Vq1SpwmW7duulMa9OmDRYsWAC1Wg2lUmlQhoI0bdpU+28rKytUqVIFiYmJAIDw8HD8888/sLa2zrdcTEwM6tev/8znDw8Px8mTJzFr1iztNLVajaysLGRmZsLS0vKZOS5duoRWrVrpPG/r1q2L9RolSUKNGjW0z13Q68rNzUWbNm2001QqFVq3bo2oqCi911kUS0tL1KlTR/u1s7OzNs/du3cRHx+PIUOGYOjQodp58vLynjn+oWXLljpfF+fnd+HCBWRlZaFjx44603NycuDj41Po+gxZ7smfh7OzMwAgMTERtWrVQmRkpM7rftKpU6cghMiXOzs7G9WqVStwmcJERUXhww8/1JnWpk0bLFy40KDneayw19SwYUOEh4fjypUr2LBhg3YeIQQ0Gg1iY2Ph5eWV7/mK2g4AEBERgenTpyMyMhL37t2DRqMBAMTFxaFRo0bPzFWrVi1ERUUhKChI53n9/f3xzz//FLjOmJgY5OTkwN/fXzvN3t4eDRo0KFa2Jxny+/Ms+mzvqlWrYvXq1ejUqRMCAgIwadIkg9Zh7FhujICVlRXq1q0LAFi0aBE6dOiAGTNm4MsvvwQA/PLLLxg7dizmzZsHf39/VKlSBXPnzsW///5r0HqEEDrF5vE0APmm67NMSVGpVDpfS5Kk3RlpNBp07doV33zzTb7lHu8on0Wj0WDGjBl4++23833P3NxcrxzPux2Keu6nFfYzKShDcRWU5/F6H+f64Ycf4OfnpzPfs8qslZWVztfF+fk9Xv/vv/8OFxcXne+ZmZkVuj5Dlnvy9T/epo+Xt7CwKDDX43mUSiXCw8PzbYuCCtyzlOTPuKjXpNFo8NFHH2H06NH5livoPzVA0dshIyMDgYGBCAwMxM8//4zq1asjLi4OnTp1Qk5Ojt65DKXP35wh2Z5kyO/Ps+i7vQ8dOgSlUolbt24hIyMDNjY2Bq3HmLHcGKFp06ahc+fOGDZsGGrWrInDhw8jICAAw4cP184TExOjs4ypqSnUanWRz9uoUSMEBwfr7EBDQ0NRpUqVfH/MTy5z5MgRnWmhoaGoX7++QUdt9MlXkBYtWiA4OBgeHh7FPhOmRYsWuHTpkrZAFkfDhg2xZ88enWlhYWE6Xxf3NT6tbt26MDU1xZEjR9CnTx8AQG5uLsLCwgy+lkdxMjk5OcHFxQVXr15F3759DVr2acX5+TVq1AhmZmaIi4tDu3bt9F5XcZd7WtOmTbF//34MGjQo3/d8fHygVquRmJiItm3b6v2cKpUq38/By8sLR44cQf/+/bXTQkNDCzyK8tjz/B2dP3/eoL+BorbDxYsXkZSUhP/7v/+Dm5sbgPx/D/rw8vLC8ePHdbbB8ePHC52/bt26UKlUOH78uLYk3L9/H9HR0dqfuT7ZHp/F9+S2LO7vT0E/E322d2hoKObMmYNff/0VkyZNwqhRo/Djjz8W+byVCa9zY4Tat2+Pxo0b4+uvvwbw6A86LCwMf/75J6Kjo/HFF1/g5MmTOst4eHjgzJkzuHTpEpKSkpCbm5vveYcPH474+HiMGjUKFy9exK5duzBt2jSMGzcOCkXBv0rjx4/H/v378eWXXyI6Oho//vgjlixZggkTJhj0mjw8PBAbG4vIyEgkJSUhOztbr+VGjBiBe/fu4b333sOJEydw9epV7Nu3D4MHD9b7D3/q1KlYv349pk+fjvPnzyMqKgpbtmzB559/rnf+jz76CBcvXsSnn36K6Oho/PLLL9prbzwuih4eHkhPT8f+/fuRlJSEzMxMvZ//SVZWVhg2bBgmTpyIvXv34sKFCxg6dCgyMzMxZMgQg57Lw8MDhw4dws2bN5GUlKT3ctOnT8fs2bOxcOFCREdH4+zZs1i7di3mz59v0PqL8/OrUqUKJkyYgLFjx+LHH39ETEwMIiIisHTpUp2df0kt97Rp06Zh06ZNmDZtGqKionD27FnMmTMHAFC/fn307dsX/fv3x/bt2xEbG4uTJ0/im2++yVd+n+Th4YH9+/fj9u3b2mtYTZw4EevWrcOKFStw+fJlzJ8/H9u3by/yb0ufv/OCfPrppzh27BhGjBiByMhIXL58Gbt378738be+26FWrVowNTXF4sWLcfXqVezevVt7pNkQH3/8MdasWYM1a9YgOjoa06ZNw/nz5wud39raGkOGDMHEiROxf/9+nDt3DgMHDtTZf+mTzd3dHZIk4bfffsPdu3eRnp5e7N+fgv7GnrW909LS0K9fP4waNQqdO3fGxo0b8csvv2Dr1q1FPm+lIsM4HypBBQ0QFEKIDRs2CFNTUxEXFyeysrLEwIEDha2trbCzsxPDhg0TkyZN0hl0l5iYKDp27Cisra1L7VRwlUolatWqJebOnavzfX0GFGdlZYkePXoIOzu7fKeCPz3Q8umBstHR0eKtt94SdnZ2wsLCQjRs2FCMGTNGZ2D0kwoa2Lt3714REBAgLCwshI2NjWjdurVYuXKl9vv65Hh8KriZmZlo3769WL58uc7gSCGECAoKEtWqVct3KvjTAwObNWtW5FlqDx8+FKNGjRIODg75TgUvLF9Bjh07Jpo2bSrMzMzynQr+pB07duQ73XTDhg2iefPmwtTUVFStWlW89NJLYvv27YWu6+nTcR971s/v6QHFQjwadL1w4ULRoEEDoVKpRPXq1UWnTp3EwYMHhRAFDwgt7nIRERECgIiNjdVOCw4O1r52BwcH8fbbb2u/9/jsRQ8PD6FSqUSNGjXEW2+9Jc6cOVPottm9e7eoW7euMDExKfap4ELo/3d+//597fcfO3HihHZZKysr0bRpU52znwpS1HbYuHGj8PDwEGZmZsLf31/s3r1bJ4e+23rWrFnCwcFBWFtbiwEDBohPPvmkyLOl0tLSxPvvvy8sLS2Fk5OTmDNnTr7fvWdlE0KImTNniho1aghJknROBS/q96cgBf2NPWt7Dxo0SDRp0kRkZWVp51+4cKGwt7cXN27cKPJ5KwtJiBIeAEFEepk1axZWrFiB+Ph4uaMQERkVjrkhKiPLli1Dq1atUK1aNRw9ehRz587FyJEj5Y5FRGR0WG6Iysjly5fx1Vdf4d69e6hVqxbGjx+PyZMnyx2LiMjo8GMpIiIiMio8W4qIiIiMCssNERERGRWWGyIiIjIqLDdERERkVFhuiIiIyKiw3BAREZFRYbkhIiIio8JyQ0REREbl/wEzBPVvSKSe5gAAAABJRU5ErkJggg==",
112
+ "text/plain": [
113
+ "<Figure size 640x480 with 1 Axes>"
114
+ ]
115
+ },
116
+ "metadata": {},
117
+ "output_type": "display_data"
118
+ }
119
+ ],
120
+ "source": [
121
+ "reference_length = 1\n",
122
+ "candidate_length = np.linspace(1.5, 0.5, 100)\n",
123
+ "\n",
124
+ "length_ratio = reference_length / candidate_length\n",
125
+ "BP = np.minimum(1, np.exp(1 - length_ratio))\n",
126
+ "\n",
127
+ "# Plot the data\n",
128
+ "fig, ax = plt.subplots(1)\n",
129
+ "lines = ax.plot(length_ratio, BP)\n",
130
+ "ax.set(\n",
131
+ " xlabel=\"Ratio of the length of the reference to the candidate text\",\n",
132
+ " ylabel=\"Brevity Penalty\",\n",
133
+ ")\n",
134
+ "plt.show()"
135
+ ]
136
+ },
137
+ {
138
+ "cell_type": "markdown",
139
+ "metadata": {},
140
+ "source": [
141
+ "### N-Gram Precision:\n",
142
+ "The n-gram precision counts how many n-grams (in your case unigrams, bigrams, trigrams, and four-grams for i =1 , ... , 4) match their n-gram counterpart in the reference translations. This term acts as a precision metric. Unigrams account for adequacy while longer n-grams account for fluency of the translation. To avoid overcounting, the n-gram counts are clipped to the maximal n-gram count occurring in the reference ($m_{n}^{ref}$). Typically precision shows exponential decay with the degree of the n-gram."
143
+ ]
144
+ },
145
+ {
146
+ "cell_type": "code",
147
+ "execution_count": 3,
148
+ "metadata": {},
149
+ "outputs": [
150
+ {
151
+ "data": {
152
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjcAAAGdCAYAAADuR1K7AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAseElEQVR4nO3de1BV9d7H8c8WZGMomJB4aYuUphTZMbBzwDyWF4qcTj3TxW6aCnMiyiTqlGTlrRN2I7pB+qjHPJVxyuwykrVPpWJkTyJOPWlXtU20kcAOoBYErOcPxz3PDlSWbNiwfL9m1kzrt35rre/2N46ffutmMwzDEAAAgEX08HcBAAAAvkS4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlhLo7wI6W3Nzs3788Uf16dNHNpvN3+UAAIA2MAxDdXV1GjRokHr0OPbczEkXbn788Uc5HA5/lwEAAE5AWVmZTj/99GP2OenCTZ8+fSQd/sMJDQ31czUAAKAtamtr5XA4PP+OH8tJF26OXIoKDQ0l3AAA0M205ZYSbigGAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACW4vdwk5eXp+joaAUHBysuLk5FRUXH7P/SSy/pvPPO0ymnnKKBAwdq5syZqq6u7qRqAQBAV+fXcFNQUKCMjAzNmzdPpaWlGjdunJKTk+VyuVrtv2XLFk2fPl0pKSn64osv9Oqrr+rTTz9VampqJ1cOAAC6Kr+Gm5ycHKWkpCg1NVUxMTHKzc2Vw+FQfn5+q/23bt2qoUOH6o477lB0dLQuvPBC3XLLLdq2bVsnVw4AALoqv4WbhoYGlZSUKCkpyas9KSlJxcXFre6TmJioH374QYWFhTIMQ/v27dNrr72mKVOmHPU89fX1qq2t9VoAAIB1BfrrxFVVVWpqalJkZKRXe2RkpCoqKlrdJzExUS+99JKmTp2qX3/9VY2NjfrLX/6iZ5555qjnyc7O1sKFC31a+7EMnbu+084Fb3uXHD3kAgBOHn6/odhms3mtG4bRou2InTt36o477tCDDz6okpISbdiwQXv27FFaWtpRj5+VlaWamhrPUlZW5tP6AQBA1+K3mZuIiAgFBAS0mKWprKxsMZtzRHZ2tsaOHau//e1vkqRRo0YpJCRE48aN00MPPaSBAwe22Mdut8tut/v+BwAAgC7JbzM3QUFBiouLk9Pp9Gp3Op1KTExsdZ9Dhw6pRw/vkgMCAiQdnvEBAADw62WpzMxMLV++XCtXrtSuXbt05513yuVyeS4zZWVlafr06Z7+l19+uV5//XXl5+dr9+7d+uijj3THHXfoggsu0KBBg/z1MwAAQBfit8tSkjR16lRVV1dr0aJFcrvdio2NVWFhoaKioiRJbrfb6503M2bMUF1dnZ599lnddddd6tu3ryZMmKBHHnnEXz8BAAB0MTbjJLueU1tbq7CwMNXU1Cg0NNTnx+dpKf/haSkAsC4z/377/WkpAAAAXyLcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAASyHcAAAAS/F7uMnLy1N0dLSCg4MVFxenoqKio/adMWOGbDZbi+Wcc87pxIoBAEBX5tdwU1BQoIyMDM2bN0+lpaUaN26ckpOT5XK5Wu3/1FNPye12e5aysjL169dP11xzTSdXDgAAuiq/hpucnBylpKQoNTVVMTExys3NlcPhUH5+fqv9w8LCNGDAAM+ybds2/fzzz5o5c2YnVw4AALoqv4WbhoYGlZSUKCkpyas9KSlJxcXFbTrGihUrNGnSJEVFRR21T319vWpra70WAABgXYH+OnFVVZWampoUGRnp1R4ZGamKiorj7u92u/XOO+/o5ZdfPma/7OxsLVy4sF21ApI0dO56f5dw0tq7ZIq/SwDQjfj9hmKbzea1bhhGi7bWrFq1Sn379tWVV155zH5ZWVmqqanxLGVlZe0pFwAAdHF+m7mJiIhQQEBAi1maysrKFrM5v2cYhlauXKlp06YpKCjomH3tdrvsdnu76wUAAN2D32ZugoKCFBcXJ6fT6dXudDqVmJh4zH03bdqkb7/9VikpKR1ZIgAA6Ib8NnMjSZmZmZo2bZri4+OVkJCgZcuWyeVyKS0tTdLhS0rl5eVavXq1134rVqzQH//4R8XGxvqjbAAA0IX5NdxMnTpV1dXVWrRokdxut2JjY1VYWOh5+sntdrd4501NTY3Wrl2rp556yh8lAwCALs6v4UaS0tPTlZ6e3uq2VatWtWgLCwvToUOHOrgqAADQXfn9aSkAAABfItwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABL8Xu4ycvLU3R0tIKDgxUXF6eioqJj9q+vr9e8efMUFRUlu92uM888UytXruykagEAQFcX6M+TFxQUKCMjQ3l5eRo7dqyWLl2q5ORk7dy5U0OGDGl1n2uvvVb79u3TihUrNGzYMFVWVqqxsbGTKwcAAF2VX8NNTk6OUlJSlJqaKknKzc3Vu+++q/z8fGVnZ7fov2HDBm3atEm7d+9Wv379JElDhw7tzJIBAEAX57fLUg0NDSopKVFSUpJXe1JSkoqLi1vd56233lJ8fLweffRRDR48WGeddZbuvvtu/fLLL0c9T319vWpra70WAABgXX6buamqqlJTU5MiIyO92iMjI1VRUdHqPrt379aWLVsUHBysdevWqaqqSunp6dq/f/9R77vJzs7WwoULfV4/AGsYOne9v0s4ae1dMsXfJcCi/H5Dsc1m81o3DKNF2xHNzc2y2Wx66aWXdMEFF+iyyy5TTk6OVq1addTZm6ysLNXU1HiWsrIyn/8GAADQdfht5iYiIkIBAQEtZmkqKytbzOYcMXDgQA0ePFhhYWGetpiYGBmGoR9++EHDhw9vsY/dbpfdbvdt8QAAoMvy28xNUFCQ4uLi5HQ6vdqdTqcSExNb3Wfs2LH68ccfdeDAAU/b119/rR49euj000/v0HoBAED34NfLUpmZmVq+fLlWrlypXbt26c4775TL5VJaWpqkw5eUpk+f7ul/ww03KDw8XDNnztTOnTu1efNm/e1vf9OsWbPUq1cvf/0MAADQhfj1UfCpU6equrpaixYtktvtVmxsrAoLCxUVFSVJcrvdcrlcnv69e/eW0+nU7NmzFR8fr/DwcF177bV66KGH/PUTAABAF+PXcCNJ6enpSk9Pb3XbqlWrWrSNHDmyxaUsAACAI/z+tBQAAIAvEW4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClBJrd4eDBg1qyZInef/99VVZWqrm52Wv77t27fVYcAACAWabDTWpqqjZt2qRp06Zp4MCBstlsHVEXAADACTEdbt555x2tX79eY8eO7Yh6AAAA2sX0PTennnqq+vXr1xG1AAAAtJvpcLN48WI9+OCDOnToUEfUAwAA0C6mL0s98cQT+u677xQZGamhQ4eqZ8+eXtu3b9/us+IAAADMMh1urrzyyg4oAwAAwDdMh5v58+d3RB0AAAA+YTrcHFFSUqJdu3bJZrPp7LPP1ujRo31ZFwAAwAkxHW4qKyt13XXXaePGjerbt68Mw1BNTY0uvvhivfLKKzrttNM6ok4AAIA2Mf201OzZs1VbW6svvvhC+/fv188//6z//d//VW1tre64446OqBEAAKDNTM/cbNiwQf/+978VExPjaTv77LP13HPPKSkpyafFAQAAmGV65qa5ubnF49+S1LNnzxbfmQIAAOhspsPNhAkTNGfOHP3444+etvLyct15552aOHGiT4sDAAAwy3S4efbZZ1VXV6ehQ4fqzDPP1LBhwxQdHa26ujo988wzHVEjAABAm5m+58bhcGj79u1yOp368ssvZRiGzj77bE2aNKkj6gMAADDlhN9zM3nyZE2ePNmXtQAAALRbm8LN008/rb/+9a8KDg7W008/fcy+PA4OAAD8qU3h5sknn9SNN96o4OBgPfnkk0ftZ7PZTIebvLw8PfbYY3K73TrnnHOUm5urcePGtdp348aNuvjii1u079q1SyNHjjR1XgAAYE1tCjd79uxp9b/bq6CgQBkZGcrLy9PYsWO1dOlSJScna+fOnRoyZMhR9/vqq68UGhrqWeetyAAA4AjTT0v9XlNTk3bs2KGff/7Z9L45OTlKSUlRamqqYmJilJubK4fDofz8/GPu179/fw0YMMCzBAQEnGj5AADAYkyHm4yMDK1YsULS4WDz5z//Weeff74cDoc2btzY5uM0NDSopKSkxVuNk5KSVFxcfMx9R48erYEDB2rixIn68MMPj9m3vr5etbW1XgsAALAu009Lvfbaa7rpppskSW+//bb27t2rL7/8UqtXr9a8efP00Ucftek4VVVVampqUmRkpFd7ZGSkKioqWt1n4MCBWrZsmeLi4lRfX69//vOfmjhxojZu3Kg///nPre6TnZ2thQsXmviFAAArGDp3vb9LOGntXTLFr+c3HW6qqqo0YMAASVJhYaGuueYanXXWWUpJSTnuk1StsdlsXuuGYbRoO2LEiBEaMWKEZz0hIUFlZWV6/PHHjxpusrKylJmZ6Vmvra2Vw+EwXScAAOgeTF+WioyM1M6dO9XU1KQNGzZ4Xt536NAhU/e+REREKCAgoMUsTWVlZYvZnGP505/+pG+++eao2+12u0JDQ70WAABgXabDzcyZM3XttdcqNjZWNpvN8yK/Tz75xNTj2EFBQYqLi5PT6fRqdzqdSkxMbPNxSktLNXDgwDb3BwAA1mb6stSCBQsUGxursrIyXXPNNbLb7ZKkgIAAzZ0719SxMjMzNW3aNMXHxyshIUHLli2Ty+VSWlqapMOXlMrLy7V69WpJUm5uroYOHapzzjlHDQ0NevHFF7V27VqtXbvW7M8AAAAWdUKfX7j66qtbtN18882mjzN16lRVV1dr0aJFcrvdio2NVWFhoaKioiRJbrdbLpfL07+hoUF33323ysvL1atXL51zzjlav369LrvsshP5GQAAwIL8/vmF9PR0paent7pt1apVXuv33HOP7rnnHlPHBwAAJxe/f34BAADAl/z6+QUAAABfa/fnFwAAALoS0+Hm6quv1pIlS1q0P/bYY7rmmmt8UhQAAMCJMh1uNm3apClTWr5W+dJLL9XmzZt9UhQAAMCJMh1uDhw4oKCgoBbtPXv25KOUAADA70yHm9jYWBUUFLRof+WVV3T22Wf7pCgAAIATZfolfg888ICuuuoqfffdd5owYYIk6f3339eaNWv06quv+rxAAAAAM0yHm7/85S9644039PDDD+u1115Tr169NGrUKP373//W+PHjO6JGAACANjuhzy9MmTKl1ZuKAQAA/O2E3nPzn//8R8uXL9d9992n/fv3S5K2b9+u8vJynxYHAABglumZm88++0yTJk1SWFiY9u7dq9TUVPXr10/r1q3T999/7/mCNwAAgD+YnrnJzMzUjBkz9M033yg4ONjTnpyczHtuAACA35kON59++qluueWWFu2DBw9WRUWFT4oCAAA4UabDTXBwcKsv6/vqq6902mmn+aQoAACAE2U63FxxxRVatGiRfvvtN0mSzWaTy+XS3LlzddVVV/m8QAAAADNMh5vHH39cP/30k/r3769ffvlF48eP17Bhw9SnTx/9/e9/74gaAQAA2sz001KhoaHasmWLPvjgA23fvl3Nzc06//zzNWnSpI6oDwAAwBRT4aaxsVHBwcHasWOHJkyY4Pn8AgAAQFdh6rJUYGCgoqKi1NTU1FH1AAAAtIvpe27uv/9+ZWVled5MDAAA0JWYvufm6aef1rfffqtBgwYpKipKISEhXtu3b9/us+IAAADMMh1urrzyyg4oAwAAwDdMh5v58+d3RB0AAAA+YTrcHLFt2zbt2rVLNptNMTExiouL82VdAAAAJ8R0uPnhhx90/fXX66OPPlLfvn0lSf/5z3+UmJioNWvWyOFw+LpGAACANjP9tNSsWbP022+/adeuXdq/f7/279+vXbt2yTAMpaSkdESNAAAAbWZ65qaoqEjFxcUaMWKEp23EiBF65plnNHbsWJ8WBwAAYJbpmZshQ4Z4Ppr5/zU2Nmrw4ME+KQoAAOBEmQ43jz76qGbPnq1t27bJMAxJh28unjNnjh5//HGfFwgAAGCG6ctSM2bM0KFDh/THP/5RgYGHd29sbFRgYKBmzZqlWbNmefryFmMAANDZTIeb3NzcDigDAADAN0yHm5tvvrkj6gAAAPAJ0/fc+FpeXp6io6MVHBysuLg4FRUVtWm/jz76SIGBgfrDH/7QsQUCAIBuxa/hpqCgQBkZGZo3b55KS0s1btw4JScny+VyHXO/mpoaTZ8+XRMnTuykSgEAQHfh13CTk5OjlJQUpaamKiYmRrm5uXI4HMrPzz/mfrfccotuuOEGJSQkdFKlAACgu/BbuGloaFBJSYmSkpK82pOSklRcXHzU/f7xj3/ou+++a/MHPOvr61VbW+u1AAAA6/JbuKmqqlJTU5MiIyO92iMjI1VRUdHqPt98843mzp2rl156yfMY+vFkZ2crLCzMs/DtKwAArM3001K//vqrnnnmGX344YeqrKxUc3Oz1/bt27ebOp7NZvNaNwyjRZskNTU16YYbbtDChQt11llntfn4WVlZyszM9KzX1tYScAAAsDDT4WbWrFlyOp26+uqrdcEFF7QaRNoiIiJCAQEBLWZpKisrW8zmSFJdXZ22bdum0tJS3X777ZKk5uZmGYahwMBAvffee5owYUKL/ex2u+x2+wnVCAAAuh/T4Wb9+vUqLCxs90cyg4KCFBcXJ6fTqf/6r//ytDudTl1xxRUt+oeGhurzzz/3asvLy9MHH3yg1157TdHR0e2qBwAAWIPpcDN48GD16dPHJyfPzMzUtGnTFB8fr4SEBC1btkwul0tpaWmSDl9SKi8v1+rVq9WjRw/FxsZ67d+/f38FBwe3aAcAACcv0+HmiSee0L333qvnn39eUVFR7Tr51KlTVV1drUWLFsntdis2NlaFhYWe47rd7uO+8wYAAOD/Mx1u4uPj9euvv+qMM87QKaecop49e3ptN/uxzPT0dKWnp7e6bdWqVcfcd8GCBVqwYIGp8wEAAGszHW6uv/56lZeX6+GHH1ZkZOQJ31AMAADQEUyHm+LiYn388cc677zzOqIeAACAdjH9Er+RI0fql19+6YhaAAAA2s10uFmyZInuuusubdy4UdXV1XzaAAAAdCmmL0tdeumlktTii9xH3izc1NTkm8oAAABOgOlw8+GHH3ZEHQAAAD5hOtyMHz++I+oAAADwCdPh5ohDhw7J5XKpoaHBq33UqFHtLgoAAOBEmQ43P/30k2bOnKl33nmn1e3ccwMAAPzJ9NNSGRkZ+vnnn7V161b16tVLGzZs0AsvvKDhw4frrbfe6ogaAQAA2sz0zM0HH3ygN998U2PGjFGPHj0UFRWlyZMnKzQ0VNnZ2ZoyZUpH1AkAANAmpmduDh48qP79+0uS+vXrp59++kmSdO6552r79u2+rQ4AAMAk0+FmxIgR+uqrryRJf/jDH7R06VKVl5fr+eef18CBA31eIAAAgBmmL0tlZGTI7XZLkubPn69LLrlEL730koKCgo77FW8AAICOZjrc3HjjjZ7/Hj16tPbu3asvv/xSQ4YMUUREhE+LAwAAMMvUZanffvtNZ5xxhnbu3OlpO+WUU3T++ecTbAAAQJdgKtz07NlT9fX1stlsHVUPAABAu5i+oXj27Nl65JFH1NjY2BH1AAAAtIvpe24++eQTvf/++3rvvfd07rnnKiQkxGv766+/7rPiAAAAzDIdbvr27aurrrqqI2oBAABoN9Ph5h//+EdH1AEAAOATpu+5AQAA6MpMz9yMHj261aelbDabgoODNWzYMM2YMUMXX3yxTwoEAAAww/TMzaWXXqrdu3crJCREF198sS666CL17t1b3333ncaMGSO3261JkybpzTff7Ih6AQAAjsn0zE1VVZXuuusuPfDAA17tDz30kL7//nu99957mj9/vhYvXqwrrrjCZ4UCAAC0hemZm3/961+6/vrrW7Rfd911+te//iVJuv766z0f1wQAAOhMpsNNcHCwiouLW7QXFxcrODhYktTc3Cy73d7+6gAAAEwyfVlq9uzZSktLU0lJicaMGSObzab/+Z//0fLly3XfffdJkt59912NHj3a58UCAAAcj+lwc//99ys6OlrPPvus/vnPf0qSRowYof/+7//WDTfcIElKS0vTrbfe6ttKAQAA2sB0uJGkG2+8UTfeeONRt/fq1euECwIAAGiPdr3ELz09XVVVVb6qBQAAoN3aFW5efPFF1dbW+qoWAACAdmtXuDEMw1d1AAAA+ITfvy2Vl5en6OhoBQcHKy4uTkVFRUftu2XLFo0dO1bh4eHq1auXRo4cqSeffLITqwUAAF3dCd1QfERdXV27Tl5QUKCMjAzl5eVp7NixWrp0qZKTk7Vz504NGTKkRf+QkBDdfvvtGjVqlEJCQrRlyxbdcsstCgkJ0V//+td21QIAAKzBrzM3OTk5SklJUWpqqmJiYpSbmyuHw6H8/PxW+48ePVrXX3+9zjnnHA0dOlQ33XSTLrnkkmPO9gAAgJNLm8NNjx49FBAQcMwlMLDtE0ENDQ0qKSlRUlKSV3tSUlKrb0BuTWlpqYqLizV+/Pij9qmvr1dtba3XAgAArKvNaWTdunVH3VZcXKxnnnnG1A3GVVVVampqUmRkpFd7ZGSkKioqjrnv6aefrp9++kmNjY1asGCBUlNTj9o3OztbCxcubHNdAACge2tzuGntC99ffvmlsrKy9Pbbb+vGG2/U4sWLTRdgs9m81g3DaNH2e0VFRTpw4IC2bt2quXPnatiwYa1+zFOSsrKylJmZ6Vmvra2Vw+EwXScAAOgeTuiG4h9//FHz58/XCy+8oEsuuUQ7duxQbGysqWNEREQoICCgxSxNZWVli9mc34uOjpYknXvuudq3b58WLFhw1HBjt9v5iCcAACcRUzcU19TU6N5779WwYcP0xRdf6P3339fbb79tOthIUlBQkOLi4uR0Or3anU6nEhMT23wcwzBUX19v+vwAAMCa2jxz8+ijj+qRRx7RgAEDtGbNmlYvU5mVmZmpadOmKT4+XgkJCVq2bJlcLpfS0tIkHb6kVF5ertWrV0uSnnvuOQ0ZMkQjR46UdPi9N48//rhmz57d7loAAIA1tDnczJ07V7169dKwYcP0wgsv6IUXXmi13+uvv97mk0+dOlXV1dVatGiR3G63YmNjVVhYqKioKEmS2+2Wy+Xy9G9ublZWVpb27NmjwMBAnXnmmVqyZIluueWWNp8TAABYW5vDzfTp0497o++JSE9PV3p6eqvbVq1a5bU+e/ZsZmkAAMAxtTnc/D5oAAAAdEV+/7YUAACALxFuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApRBuAACApfg93OTl5Sk6OlrBwcGKi4tTUVHRUfu+/vrrmjx5sk477TSFhoYqISFB7777bidWCwAAujq/hpuCggJlZGRo3rx5Ki0t1bhx45ScnCyXy9Vq/82bN2vy5MkqLCxUSUmJLr74Yl1++eUqLS3t5MoBAEBX5ddwk5OTo5SUFKWmpiomJka5ublyOBzKz89vtX9ubq7uuecejRkzRsOHD9fDDz+s4cOH6+233+7kygEAQFflt3DT0NCgkpISJSUlebUnJSWpuLi4Tcdobm5WXV2d+vXrd9Q+9fX1qq2t9VoAAIB1+S3cVFVVqampSZGRkV7tkZGRqqioaNMxnnjiCR08eFDXXnvtUftkZ2crLCzMszgcjnbVDQAAuja/31Bss9m81g3DaNHWmjVr1mjBggUqKChQ//79j9ovKytLNTU1nqWsrKzdNQMAgK4r0F8njoiIUEBAQItZmsrKyhazOb9XUFCglJQUvfrqq5o0adIx+9rtdtnt9nbXCwAAuge/zdwEBQUpLi5OTqfTq93pdCoxMfGo+61Zs0YzZszQyy+/rClTpnR0mQAAoJvx28yNJGVmZmratGmKj49XQkKCli1bJpfLpbS0NEmHLymVl5dr9erVkg4Hm+nTp+upp57Sn/70J8+sT69evRQWFua33wEAALoOv4abqVOnqrq6WosWLZLb7VZsbKwKCwsVFRUlSXK73V7vvFm6dKkaGxt122236bbbbvO033zzzVq1alVnlw8AALogv4YbSUpPT1d6enqr234fWDZu3NjxBQEAgG7N709LAQAA+BLhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWArhBgAAWIrfw01eXp6io6MVHBysuLg4FRUVHbWv2+3WDTfcoBEjRqhHjx7KyMjovEIBAEC34NdwU1BQoIyMDM2bN0+lpaUaN26ckpOT5XK5Wu1fX1+v0047TfPmzdN5553XydUCAIDuwK/hJicnRykpKUpNTVVMTIxyc3PlcDiUn5/fav+hQ4fqqaee0vTp0xUWFtbJ1QIAgO7Ab+GmoaFBJSUlSkpK8mpPSkpScXGxz85TX1+v2tparwUAAFiX38JNVVWVmpqaFBkZ6dUeGRmpiooKn50nOztbYWFhnsXhcPjs2AAAoOvx+w3FNpvNa90wjBZt7ZGVlaWamhrPUlZW5rNjAwCArifQXyeOiIhQQEBAi1maysrKFrM57WG322W32312PAAA0LX5beYmKChIcXFxcjqdXu1Op1OJiYl+qgoAAHR3fpu5kaTMzExNmzZN8fHxSkhI0LJly+RyuZSWlibp8CWl8vJyrV692rPPjh07JEkHDhzQTz/9pB07digoKEhnn322P34CAADoYvwabqZOnarq6motWrRIbrdbsbGxKiwsVFRUlKTDL+37/TtvRo8e7fnvkpISvfzyy4qKitLevXs7s3QAANBF+TXcSFJ6errS09Nb3bZq1aoWbYZhdHBFAACgO/P701IAAAC+RLgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACWQrgBAACW4vdwk5eXp+joaAUHBysuLk5FRUXH7L9p0ybFxcUpODhYZ5xxhp5//vlOqhQAAHQHfg03BQUFysjI0Lx581RaWqpx48YpOTlZLper1f579uzRZZddpnHjxqm0tFT33Xef7rjjDq1du7aTKwcAAF2VX8NNTk6OUlJSlJqaqpiYGOXm5srhcCg/P7/V/s8//7yGDBmi3NxcxcTEKDU1VbNmzdLjjz/eyZUDAICuKtBfJ25oaFBJSYnmzp3r1Z6UlKTi4uJW9/n444+VlJTk1XbJJZdoxYoV+u2339SzZ88W+9TX16u+vt6zXlNTI0mqra1t709oVXP9oQ45Lo6vo8b0CMbWfzpybBlX/+HvrHV1xNgeOaZhGMft67dwU1VVpaamJkVGRnq1R0ZGqqKiotV9KioqWu3f2NioqqoqDRw4sMU+2dnZWrhwYYt2h8PRjurRFYXl+rsCdBTG1poYV+vqyLGtq6tTWFjYMfv4LdwcYbPZvNYNw2jRdrz+rbUfkZWVpczMTM96c3Oz9u/fr/Dw8GOe52RTW1srh8OhsrIyhYaG+rsc+BBja12MrTUxrq0zDEN1dXUaNGjQcfv6LdxEREQoICCgxSxNZWVli9mZIwYMGNBq/8DAQIWHh7e6j91ul91u92rr27fviRducaGhofxlsijG1roYW2tiXFs63ozNEX67oTgoKEhxcXFyOp1e7U6nU4mJia3uk5CQ0KL/e++9p/j4+FbvtwEAACcfvz4tlZmZqeXLl2vlypXatWuX7rzzTrlcLqWlpUk6fElp+vTpnv5paWn6/vvvlZmZqV27dmnlypVasWKF7r77bn/9BAAA0MX49Z6bqVOnqrq6WosWLZLb7VZsbKwKCwsVFRUlSXK73V7vvImOjlZhYaHuvPNOPffccxo0aJCefvppXXXVVf76CZZht9s1f/78Fpfw0P0xttbF2FoT49p+NqMtz1QBAAB0E37//AIAAIAvEW4AAIClEG4AAIClEG4AAIClEG66mc2bN+vyyy/XoEGDZLPZ9MYbb/i7JPhAdna2xowZoz59+qh///668sor9dVXX/m7LPhAfn6+Ro0a5XkhW0JCgt555x1/lwUfy87Ols1mU0ZGhr9LgQg33c7Bgwd13nnn6dlnn+3Q8/z2228denx427Rpk2677TZt3bpVTqdTjY2NSkpK0sGDB31+Lsa2c51++ulasmSJtm3bpm3btmnChAm64oor9MUXX/j0PIyr/3z66adatmyZRo0a1SHHZ2xPgIFuS5Kxbt264/bbtWuXMXbsWMNutxsxMTGG0+n02nfPnj2GJKOgoMAYP368YbfbjZUrVxpVVVXGddddZwwePNjo1auXERsba7z88stexx4/frxx++23G3PmzDH69u1r9O/f31i6dKlx4MABY8aMGUbv3r2NM844wygsLOyAPwHrqqysNCQZmzZtOmY/xrZ7OvXUU43ly5cfdTvj2n3U1dUZw4cPN5xOpzF+/Hhjzpw5x+zP2HYOwk031pZw09TUZIwYMcKYPHmysWPHDqOoqMi44IILWv3LNHToUGPt2rXG7t27jfLycuOHH34wHnvsMaO0tNT47rvvjKefftoICAgwtm7d6jn++PHjjT59+hiLFy82vv76a2Px4sVGjx49jOTkZGPZsmXG119/bdx6661GeHi4cfDgwQ7807CWb775xpBkfP7550ftw9h2P42NjcaaNWuMoKAg44svvmi1D+PavUyfPt3IyMgwDMM4brhhbDsP4aYba0u4eeedd4zAwEDD7XZ72o72fwq5ubnHPedll11m3HXXXZ718ePHGxdeeKFnvbGx0QgJCTGmTZvmaXO73YYk4+OPP27jLzu5NTc3G5dffrnXn2trGNvu47PPPjNCQkKMgIAAIywszFi/fv1R+zKu3ceaNWuM2NhY45dffjEM4/jhhrHtPNxzYyEPP/ywevfu7VlcLpe++uorORwODRgwwNPvggsuaHX/+Ph4r/Wmpib9/e9/16hRoxQeHq7evXvrvffe8/okhiSv68wBAQEKDw/Xueee62k78pX3ysrKdv/Gk8Htt9+uzz77TGvWrPG0Mbbd24gRI7Rjxw5t3bpVt956q26++Wbt3LmTce3GysrKNGfOHL344osKDg5usZ2x9S+/flsKvpWWlqZrr73Wsz5o0CAZhiGbzdam/UNCQrzWn3jiCT355JPKzc3Vueeeq5CQEGVkZKihocGr3++/yG6z2bzajpy/ubnZ1O85Gc2ePVtvvfWWNm/erNNPP93Tzth2b0FBQRo2bJikw/9offrpp3rqqaeUnZ3NuHZTJSUlqqysVFxcnKetqalJmzdv1rPPPqt9+/Yxtn5EuLGQfv36qV+/fl5tI0eOlMvl0r59+zyJ/dNPP23T8YqKinTFFVfopptuknT4L8M333yjmJgY3xYOGYah2bNna926ddq4caOio6O9tjO21mIYhurr6xnXbmzixIn6/PPPvdpmzpypkSNH6t5771V4eLjCw8O9tjO2nYdw080cOHBA3377rWd9z5492rFjh/r166chQ4a06D958mSdeeaZuvnmm/Xoo4+qrq5O8+bNk6Tj/h/EsGHDtHbtWhUXF+vUU09VTk6OKioq+MvUAW677Ta9/PLLevPNN9WnTx9VVFRIksLCwtSrV69W92Fsu4f77rtPycnJcjgcqqur0yuvvKKNGzdqw4YNrfZnXLuHPn36KDY21qstJCRE4eHhLdqPYGw7D/fcdDPbtm3T6NGjNXr0aElSZmamRo8erQcffLDV/gEBAXrjjTd04MABjRkzRqmpqbr//vslqdXrxP/fAw88oPPPP1+XXHKJLrroIg0YMEBXXnmlT38PDsvPz1dNTY0uuugiDRw40LMUFBQcdR/GtnvYt2+fpk2bphEjRmjixIn65JNPtGHDBk2ePLnV/oyrdTG2ncdmGIbh7yLQuT766CNdeOGF+vbbb3XmmWf6uxz4EGNrTYyrdTG2HYNwcxJYt26devfureHDh+vbb7/VnDlzdOqpp2rLli3+Lg3txNhaE+NqXYxt5+Cem5NAXV2d7rnnHpWVlSkiIkKTJk3SE0884e+y4AOMrTUxrtbF2HYOZm4AAIClcEMxAACwFMINAACwFMINAACwFMINAACwFMINAACwFMINAACwFMINAACwFMINAACwFMINAACwlP8DyN1FcQKWPaEAAAAASUVORK5CYII=",
153
+ "text/plain": [
154
+ "<Figure size 640x480 with 1 Axes>"
155
+ ]
156
+ },
157
+ "metadata": {},
158
+ "output_type": "display_data"
159
+ }
160
+ ],
161
+ "source": [
162
+ "# Mocked dataset showing the precision for different n-grams\n",
163
+ "data = {\"1-gram\": 0.8, \"2-gram\": 0.7, \"3-gram\": 0.6, \"4-gram\": 0.5}\n",
164
+ "\n",
165
+ "# Plot the datapoints defined above\n",
166
+ "fig, ax = plt.subplots(1)\n",
167
+ "bars = ax.bar(*zip(*data.items()))\n",
168
+ "ax.set(ylabel=\"N-gram precision\")\n",
169
+ "plt.show()"
170
+ ]
171
+ },
172
+ {
173
+ "cell_type": "markdown",
174
+ "metadata": {},
175
+ "source": [
176
+ "### N-gram BLEU score:\n",
177
+ "When the n-gram precision is normalized by the brevity penalty (BP), then the exponential decay of n-grams is almost fully compensated. The BLEU score corresponds to a geometric average of this modified n-gram precision."
178
+ ]
179
+ },
180
+ {
181
+ "cell_type": "code",
182
+ "execution_count": 4,
183
+ "metadata": {},
184
+ "outputs": [
185
+ {
186
+ "data": {
187
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjcAAAGdCAYAAADuR1K7AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAyCklEQVR4nO3de1RU9f7/8deIMpgKKiR5GZHUDCU9Bl3wkt2ksFVZrbJMTYVOSppIZZp1SutEV8TqgFKax2+ldMJON9Kmm5fMSsJuVlpakA0SWOClIGD//nA5vzMBOhsGB3bPx1p7Leczn8/e7/GzXL367JvNMAxDAAAAFtHG3wUAAAD4EuEGAABYCuEGAABYCuEGAABYCuEGAABYCuEGAABYCuEGAABYCuEGAABYSlt/F3C81dbW6qefflKnTp1ks9n8XQ4AAPCCYRjav3+/evTooTZtjr4285cLNz/99JMcDoe/ywAAAI1QVFSkXr16HbXPXy7cdOrUSdLhv5zg4GA/VwMAALxRUVEhh8Ph/u/40fzlws2RU1HBwcGEGwAAWhlvLinhgmIAAGAphBsAAGAphBsAAGAphBsAAGAphBsAAGAphBsAAGAphBsAAGAphBsAAGAphBsAAGAphBsAAGApfg83mZmZioyMVFBQkGJiYrRx48aj9n/uuec0ZMgQnXDCCerevbumTJmisrKy41QtAABo6fwabnJycpSSkqL58+eroKBAI0eOVEJCggoLC+vtv2nTJk2aNEmJiYn68ssv9Z///Ecff/yxkpKSjnPlAACgpfJruElPT1diYqKSkpIUFRWljIwMORwOZWVl1dt/y5Yt6tOnj2655RZFRkZqxIgRuummm7R169bjXDkAAGip/BZuqqqqlJ+fr/j4eI/2+Ph4bd68ud4xw4YN048//qi8vDwZhqG9e/fqxRdf1CWXXNLgcSorK1VRUeGxAQAA62rrrwOXlpaqpqZG4eHhHu3h4eEqLi6ud8ywYcP03HPPady4cfr9999VXV2tyy67TE888USDx0lLS9OCBQt8WvvR9Jn7+nE7Fjx9/2DDIRcA8Nfh9wuKbTabx2fDMOq0HbF9+3bdcsst+sc//qH8/HytXbtWu3fv1rRp0xrc/7x581ReXu7eioqKfFo/AABoWfy2chMWFqaAgIA6qzQlJSV1VnOOSEtL0/Dhw3X77bdLkgYPHqwOHTpo5MiRuv/++9W9e/c6Y+x2u+x2u+9/AAAAaJH8tnITGBiomJgYOZ1Oj3an06lhw4bVO+bQoUNq08az5ICAAEmHV3wAAAD8eloqNTVVTz/9tJYvX66vvvpKs2fPVmFhofs007x58zRp0iR3/0svvVRr1qxRVlaWdu3apffff1+33HKLzjzzTPXo0cNfPwMAALQgfjstJUnjxo1TWVmZFi5cKJfLpejoaOXl5SkiIkKS5HK5PJ55M3nyZO3fv19PPvmkbr31VnXu3Fnnn3++HnroIX/9BAAA0MLYjL/Y+ZyKigqFhISovLxcwcHBPt8/d0v5D3dLAYB1mfnvt9/vlgIAAPAlwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUvz7nBmhNuM3ff7jNH4AZrNwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABLIdwAAABL4a3gAP7SeNu7//C2dzQXVm4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAIClEG4AAICl+P3FmZmZmXrkkUfkcrk0aNAgZWRkaOTIkfX2nTx5sv7973/XaR84cKC+/PLL5i4VANCK8FJU//H3S1H9unKTk5OjlJQUzZ8/XwUFBRo5cqQSEhJUWFhYb//FixfL5XK5t6KiInXt2lVXX331ca4cAAC0VH4NN+np6UpMTFRSUpKioqKUkZEhh8OhrKysevuHhITopJNOcm9bt27VL7/8oilTphznygEAQEvlt3BTVVWl/Px8xcfHe7THx8dr8+bNXu1j2bJluvDCCxUREdFgn8rKSlVUVHhsAADAuvwWbkpLS1VTU6Pw8HCP9vDwcBUXFx9zvMvl0htvvKGkpKSj9ktLS1NISIh7czgcTaobAAC0bH6/W8pms3l8NgyjTlt9VqxYoc6dO2vs2LFH7Tdv3jyVl5e7t6KioqaUCwAAWji/3S0VFhamgICAOqs0JSUldVZz/swwDC1fvlwTJ05UYGDgUfva7XbZ7fYm1wsAAFoHv63cBAYGKiYmRk6n06Pd6XRq2LBhRx27fv16ffvtt0pMTGzOEgEAQCvk1+fcpKamauLEiYqNjVVcXJyys7NVWFioadOmSTp8SmnPnj1auXKlx7hly5bprLPOUnR0tD/KBgAALZhfw824ceNUVlamhQsXyuVyKTo6Wnl5ee67n1wuV51n3pSXlys3N1eLFy/2R8kAAKCF8/sTipOTk5WcnFzvdytWrKjTFhISokOHDjVzVQAAoLXy+91SAAAAvkS4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAltLW7ICamhqtWLFCb7/9tkpKSlRbW+vx/TvvvOOz4gAAAMwyHW5mzZqlFStW6JJLLlF0dLRsNltz1AUAANAopsPN6tWr9cILL2jMmDHNUQ8AAECTmL7mJjAwUP369WuOWgAAAJrMdLi59dZbtXjxYhmG0Rz1AAAANInp01KbNm3Su+++qzfeeEODBg1Su3btPL5fs2aNz4oDAAAwy/TKTefOnXXFFVdo1KhRCgsLU0hIiMdmVmZmpiIjIxUUFKSYmBht3LjxqP0rKys1f/58RUREyG63q2/fvlq+fLnp4wIAAGsyvXLzzDPP+OzgOTk5SklJUWZmpoYPH66lS5cqISFB27dvV+/evesdc80112jv3r1atmyZ+vXrp5KSElVXV/usJgAA0LqZDjdH/Pzzz/rmm29ks9l0yimn6MQTTzS9j/T0dCUmJiopKUmSlJGRoXXr1ikrK0tpaWl1+q9du1br16/Xrl271LVrV0lSnz59GvsTAACABZk+LXXw4EFNnTpV3bt31znnnKORI0eqR48eSkxM1KFDh7zeT1VVlfLz8xUfH+/RHh8fr82bN9c75pVXXlFsbKwefvhh9ezZU6eccopuu+02/fbbbw0ep7KyUhUVFR4bAACwLtPhJjU1VevXr9err76qX3/9Vb/++qtefvllrV+/XrfeeqvX+yktLVVNTY3Cw8M92sPDw1VcXFzvmF27dmnTpk364osv9NJLLykjI0Mvvviibr755gaPk5aW5nFNkMPh8LpGAADQ+pgON7m5uVq2bJkSEhIUHBys4OBgjRkzRk899ZRefPFF0wX8+QnHhmE0+NTj2tpa2Ww2PffcczrzzDM1ZswYpaena8WKFQ2u3sybN0/l5eXuraioyHSNAACg9TB9zc2hQ4fqrLZIUrdu3UydlgoLC1NAQECdVZqSkpJ69y9J3bt3V8+ePT3uyoqKipJhGPrxxx/Vv3//OmPsdrvsdrvXdQEAgNbN9MpNXFyc7rnnHv3+++/utt9++00LFixQXFyc1/sJDAxUTEyMnE6nR7vT6dSwYcPqHTN8+HD99NNPOnDggLttx44datOmjXr16mXylwAAACsyvXKzePFiXXzxxerVq5eGDBkim82mbdu2KSgoSOvWrTO1r9TUVE2cOFGxsbGKi4tTdna2CgsLNW3aNEmHTynt2bNHK1eulCSNHz9e9913n6ZMmaIFCxaotLRUt99+u6ZOnar27dub/SkAAMCCTIeb6Oho7dy5U88++6y+/vprGYaha6+9Vtdff73pgDFu3DiVlZVp4cKFcrlcio6OVl5eniIiIiRJLpdLhYWF7v4dO3aU0+nUzJkzFRsbq9DQUF1zzTW6//77zf4MAABgUY16zk379u114403+qSA5ORkJScn1/vdihUr6rSdeuqpdU5lAQAAHOFVuHnllVeUkJCgdu3a6ZVXXjlq38suu8wnhQEAADSGV+Fm7NixKi4uVrdu3TR27NgG+9lsNtXU1PiqNgAAANO8Cje1tbX1/hkAAKClMX0reH1+/fVXX+wGAACgyUyHm4ceekg5OTnuz1dffbW6du2qnj176tNPP/VpcQAAAGaZDjdLly51v5/J6XTqrbfe0tq1a5WQkKDbb7/d5wUCAACYYfpWcJfL5Q43r732mq655hrFx8erT58+Ouuss3xeIAAAgBmmV266dOnifvnk2rVrdeGFF0o6/MJL7pQCAAD+Znrl5sorr9T48ePVv39/lZWVKSEhQZK0bds29evXz+cFAgAAmGE63CxatEh9+vRRUVGRHn74YXXs2FHS4dNVDT1pGAAA4HgxHW7atWun2267rU57SkqKL+oBAABoEl6/AAAALIXXLwAAAEvh9QsAAMBSfPL6BQAAgJbCdLi55ZZb9Pjjj9dpf/LJJ7moGAAA+J3pcJObm6vhw4fXaR82bJhefPFFnxQFAADQWKbDTVlZmUJCQuq0BwcHq7S01CdFAQAANJbpcNOvXz+tXbu2Tvsbb7yhk08+2SdFAQAANJbph/ilpqZqxowZ+vnnn3X++edLkt5++2099thjysjI8HV9AAAAppgON1OnTlVlZaX++c9/6r777pMk9enTR1lZWZo0aZLPCwQAADDDdLiRpOnTp2v69On6+eef1b59e/f7pQAAAPytUc+5qa6u1ltvvaU1a9bIMAxJ0k8//aQDBw74tDgAAACzTK/c/PDDD7r44otVWFioyspKjR49Wp06ddLDDz+s33//XUuWLGmOOgEAALxieuVm1qxZio2N1S+//KL27du726+44gq9/fbbPi0OAADALNMrN5s2bdL777+vwMBAj/aIiAjt2bPHZ4UBAAA0humVm9ra2nrf/P3jjz+qU6dOPikKAACgsUyHm9GjR3s8z8Zms+nAgQO65557NGbMGF/WBgAAYJrp01Lp6ek6//zzNXDgQP3+++8aP368du7cqbCwMK1atao5agQAAPCa6XDTs2dPbdu2TatXr1Z+fr5qa2uVmJio66+/3uMCYwAAAH8wFW7++OMPDRgwQK+99pqmTJmiKVOmNFddAAAAjWLqmpt27dqpsrJSNputueoBAABoEtMXFM+cOVMPPfSQqqurm6MeAACAJjEdbj788EOtWbNGvXv31kUXXaQrr7zSYzMrMzNTkZGRCgoKUkxMjDZu3Nhg3/fee082m63O9vXXX5s+LgAAsCbTFxR37txZV111lU8OnpOTo5SUFGVmZmr48OFaunSpEhIStH37dvXu3bvBcd98842Cg4Pdn0888USf1AMAAFo/0+HmmWee8dnB09PTlZiYqKSkJElSRkaG1q1bp6ysLKWlpTU4rlu3burcubPP6gAAANbRqLeCS1JJSYk2btyoTZs2qaSkxPT4qqoq5efnKz4+3qM9Pj5emzdvPurYoUOHqnv37rrgggv07rvvHrVvZWWlKioqPDYAAGBdpsNNRUWFJk6cqJ49e2rUqFE655xz1LNnT02YMEHl5eVe76e0tFQ1NTUKDw/3aA8PD1dxcXG9Y7p3767s7Gzl5uZqzZo1GjBggC644AJt2LChweOkpaUpJCTEvTkcDq9rBAAArY/pcJOUlKQPP/xQr732mn799VeVl5frtdde09atW3XjjTeaLuDPt5UbhtHgreYDBgzQjTfeqNNPP11xcXHKzMzUJZdcokcffbTB/c+bN0/l5eXuraioyHSNAACg9TB9zc3rr7+udevWacSIEe62iy66SE899ZQuvvhir/cTFhamgICAOqs0JSUldVZzjubss8/Ws88+2+D3drtddrvd6/0BAIDWzfTKTWhoqEJCQuq0h4SEqEuXLl7vJzAwUDExMXI6nR7tTqdTw4YN83o/BQUF6t69u9f9AQCAtZleubnrrruUmpqqlStXukNFcXGxbr/9dt19992m9pWamqqJEycqNjZWcXFxys7OVmFhoaZNmybp8CmlPXv2aOXKlZIO303Vp08fDRo0SFVVVXr22WeVm5ur3Nxcsz8DAABYlOlwk5WVpW+//VYRERHuZ9EUFhbKbrfr559/1tKlS919P/nkk6Pua9y4cSorK9PChQvlcrkUHR2tvLw8RURESJJcLpcKCwvd/auqqnTbbbdpz549at++vQYNGqTXX39dY8aMMfszAACARZkON2PHjvVpAcnJyUpOTq73uxUrVnh8njNnjubMmePT4wMAAGsxHW7uueee5qgDAADAJxr9ED8AAICWiHADAAAshXADAAAshXADAAAshXADAAAsxfTdUoZh6MUXX9S7776rkpIS1dbWeny/Zs0anxUHAABglulwM2vWLGVnZ+u8885TeHh4gy+5BAAA8AfT4ebZZ5/VmjVreCowAABokUxfcxMSEqKTTz65OWoBAABoMtPh5t5779WCBQv022+/NUc9AAAATWL6tNTVV1+tVatWqVu3burTp4/atWvn8f2xXpYJAADQnEyHm8mTJys/P18TJkzggmIAANDimA43r7/+utatW6cRI0Y0Rz0AAABNYvqaG4fDoeDg4OaoBQAAoMlMh5vHHntMc+bM0ffff98M5QAAADSN6dNSEyZM0KFDh9S3b1+dcMIJdS4o3rdvn8+KAwAAMMt0uMnIyGiGMgAAAHzDdLi54YYbmqMOAAAAnzAdbv7Xb7/9pj/++MOjjYuNAQCAP5m+oPjgwYOaMWOGunXrpo4dO6pLly4eGwAAgD+ZDjdz5szRO++8o8zMTNntdj399NNasGCBevTooZUrVzZHjQAAAF4zfVrq1Vdf1cqVK3Xuuedq6tSpGjlypPr166eIiAg999xzuv7665ujTgAAAK+YXrnZt2+fIiMjJR2+vubIrd8jRozQhg0bfFsdAACASabDzcknn+x+gN/AgQP1wgsvSDq8otO5c2df1gYAAGCa6XAzZcoUffrpp5KkefPmua+9mT17tm6//XafFwgAAGCG6WtuZs+e7f7zeeedp6+//lpbt25V3759NWTIEJ8WBwAAYJaplZs//vhD5513nnbs2OFu6927t6688kqCDQAAaBFMhZt27drpiy++kM1ma656AAAAmsT0NTeTJk3SsmXLmqMWAACAJjN9zU1VVZWefvppOZ1OxcbGqkOHDh7fp6en+6w4AAAAs0yHmy+++EKnn366JHlceyOJ01UAAMDvTIebd999tznqAAAA8AnT19z4WmZmpiIjIxUUFKSYmBht3LjRq3Hvv/++2rZtq7/97W/NWyAAAGhVTK/cXHHFFfWefrLZbAoKClK/fv00fvx4DRgw4Jj7ysnJUUpKijIzMzV8+HAtXbpUCQkJ2r59u3r37t3guPLyck2aNEkXXHCB9u7da/YnAAAACzO9chMSEqJ33nlHn3zyiTvkFBQU6J133lF1dbVycnI0ZMgQvf/++8fcV3p6uhITE5WUlKSoqChlZGTI4XAoKyvrqONuuukmjR8/XnFxcWbLBwAAFmc63Jx00kkaP368du3apdzcXK1Zs0bfffedJkyYoL59++qrr77SDTfcoDvuuOOo+6mqqlJ+fr7i4+M92uPj47V58+YGxz3zzDP67rvvdM8993hVb2VlpSoqKjw2AABgXabDzbJly5SSkqI2bf7/0DZt2mjmzJnKzs6WzWbTjBkz9MUXXxx1P6WlpaqpqVF4eLhHe3h4uIqLi+sds3PnTs2dO1fPPfec2rb17oxaWlqaQkJC3JvD4fBqHAAAaJ1Mh5vq6mp9/fXXddq//vpr1dTUSJKCgoK8vi38z/0Mw6h3bE1NjcaPH68FCxbolFNO8breefPmqby83L0VFRV5PRYAALQ+pi8onjhxohITE3XnnXfqjDPOkM1m00cffaQHHnhAkyZNkiStX79egwYNOup+wsLCFBAQUGeVpqSkpM5qjiTt379fW7duVUFBgWbMmCFJqq2tlWEYatu2rd58802df/75dcbZ7XbZ7XazPxMAALRSpsPNokWLFB4erocffth9p1J4eLhmz57tvs4mPj5eF1988VH3ExgYqJiYGDmdTl1xxRXudqfTqcsvv7xO/+DgYH3++ecebZmZmXrnnXf04osvKjIy0uxPAQAAFmQ63AQEBGj+/PmaP3++++Lc4OBgjz5Hu437f6WmpmrixImKjY1VXFycsrOzVVhYqGnTpkk6fEppz549Wrlypdq0aaPo6GiP8d26dVNQUFCddgAA8NdlOtz8r8zMTHcQaYxx48aprKxMCxculMvlUnR0tPLy8hQRESFJcrlcKiwsbEqJAADgL6ZJTyh+4IEHtG/fviYVkJycrO+//16VlZXKz8/XOeec4/5uxYoVeu+99xoce++992rbtm1NOj4AALCWJoUbwzB8VQcAAIBP+P3dUgAAAL7UpGtutm/frh49eviqFgAAgCZrUrjhab8AAKCl8TrcREZGHvOpwzabTd99912TiwIAAGgsr8NNSkpKg999//33Wrp0qSorK31REwAAQKN5HW5mzZpVp23fvn267777lJWVpbPOOksPPfSQT4sDAAAwq1HX3Pz2229KT0/XI488oj59+mjNmjUaM2aMr2sDAAAwzVS4qamp0VNPPaUFCxYoKChITzzxhCZMmOD1G8ABAACam9fh5oUXXtBdd92l8vJy3XnnnZo+fboCAwObszYAAADTvA431157rdq3b6/rrrtOP/zwg+bOnVtvv/T0dJ8VBwAAYJbX4eacc8455q3enJ4CAAD+5nW4OdoLLAEAAFoK3i0FAAAshXADAAAshXADAAAshXADAAAshXADAAAsxau7pT777DOvdzh48OBGFwMAANBUXoWbv/3tb7LZbDIM45jPsqmpqfFJYQAAAI3h1Wmp3bt3a9euXdq9e7dyc3MVGRmpzMxMFRQUqKCgQJmZmerbt69yc3Obu14AAICj8mrlJiIiwv3nq6++Wo8//rjHW8AHDx4sh8Ohu+++W2PHjvV5kQAAAN4yfUHx559/rsjIyDrtkZGR2r59u0+KAgAAaCzT4SYqKkr333+/fv/9d3dbZWWl7r//fkVFRfm0OAAAALO8frfUEUuWLNGll14qh8OhIUOGSJI+/fRT2Ww2vfbaaz4vEAAAwAzT4ebMM8/U7t279eyzz+rrr7+WYRgaN26cxo8frw4dOjRHjQAAAF4zHW4k6YQTTtDf//53X9cCAADQZI16QvH//d//acSIEerRo4d++OEHSdKiRYv08ssv+7Q4AAAAs0yHm6ysLKWmpiohIUG//PKL+6F9Xbp0UUZGhq/rAwAAMMV0uHniiSf01FNPaf78+Wrb9v+f1YqNjdXnn3/u0+IAAADMMh1udu/eraFDh9Zpt9vtOnjwoE+KAgAAaCzT4SYyMlLbtm2r0/7GG29o4MCBvqgJAACg0UzfLXX77bfr5ptv1u+//y7DMPTRRx9p1apVSktL09NPP90cNQIAAHjN9MrNlClTdM8992jOnDk6dOiQxo8fryVLlmjx4sW69tprTReQmZmpyMhIBQUFKSYmRhs3bmyw76ZNmzR8+HCFhoaqffv2OvXUU7Vo0SLTxwQAANbVqOfc3HjjjbrxxhtVWlqq2tpadevWrVEHz8nJUUpKijIzMzV8+HAtXbpUCQkJ2r59u3r37l2nf4cOHTRjxgwNHjxYHTp00KZNm3TTTTepQ4cOPHcHAABIauRzbo4ICwtrdLCRpPT0dCUmJiopKUlRUVHKyMiQw+FQVlZWvf2HDh2q6667ToMGDVKfPn00YcIEXXTRRUdd7QEAAH8tXq3cnH766Xr77bfVpUsXDR06VDabrcG+n3zyiVcHrqqqUn5+vubOnevRHh8fr82bN3u1j4KCAm3evFn3339/g30qKytVWVnp/lxRUeHVvgEAQOvkVbi5/PLLZbfbJUljx471yYFLS0tVU1Oj8PBwj/bw8HAVFxcfdWyvXr30888/q7q6Wvfee6+SkpIa7JuWlqYFCxb4pGYAANDyeRVuunTpojZtDp/BmjJlinr16uX+3FR/XgUyDOOoK0OStHHjRh04cEBbtmzR3Llz1a9fP1133XX19p03b55SU1PdnysqKuRwOJpeOAAAaJG8Cjepqam69tprFRQUpMjISLlcriZdayMdvl4nICCgzipNSUlJndWcP4uMjJQknXbaadq7d6/uvffeBsON3W53rzoBAADr82r5pUePHsrNzdUPP/wgwzD0448/qrCwsN7NW4GBgYqJiZHT6fRodzqdGjZsmNf7MQzD45oaAADw1+bVys1dd92lmTNnasaMGbLZbDrjjDPq9DlyOunIizS9kZqaqokTJyo2NlZxcXHKzs5WYWGhpk2bJunwKaU9e/Zo5cqVkqR//etf6t27t0499VRJh5978+ijj2rmzJleHxMAAFibV+Hm73//u6677jr98MMPGjx4sN566y2FhoY2+eDjxo1TWVmZFi5cKJfLpejoaOXl5SkiIkKS5HK5PFaDamtrNW/ePO3evVtt27ZV37599eCDD+qmm25qci0AAMAavH6IX6dOnRQdHa1nnnlGw4cP99l1LMnJyUpOTq73uxUrVnh8njlzJqs0AADgqEw/ofiGG25ojjoAAAB8wqtw07VrV+3YsUNhYWHq0qXLUW/V3rdvn8+KAwAAMMurcLNo0SJ16tTJ/edjPYcGAADAX7wKN/97Kmry5MnNVQsAAECTeRVuzLyPKTg4uNHFAAAANJVX4aZz585en4oy85wbAAAAX/Mq3Lz77rvuP3///feaO3euJk+erLi4OEnSBx98oH//+99KS0trnioBAAC85FW4GTVqlPvPCxcuVHp6use7nC677DKddtppys7O5lZxAADgV6Zf7f3BBx8oNja2TntsbKw++ugjnxQFAADQWKbDjcPh0JIlS+q0L126VA6HwydFAQAANJbpJxQvWrRIV111ldatW6ezzz5bkrRlyxZ99913ys3N9XmBAAAAZpheuRkzZox27typyy67TPv27VNZWZkuv/xy7dixQ2PGjGmOGgEAALxmeuVGknr16qUHHnjA17UAAAA0WaPCza+//qply5bpq6++ks1m08CBAzV16lSFhIT4uj4AAABTTJ+W2rp1q/r27atFixZp3759Ki0tVXp6uvr27atPPvmkOWoEAADwmumVm9mzZ+uyyy7TU089pbZtDw+vrq5WUlKSUlJStGHDBp8XCQAA4C3T4Wbr1q0ewUaS2rZtqzlz5tT7/BsAAIDjyfRpqeDgYBUWFtZpLyoqUqdOnXxSFAAAQGOZDjfjxo1TYmKicnJyVFRUpB9//FGrV69WUlKSxysZAAAA/MH0aalHH31UNptNkyZNUnV1tSSpXbt2mj59uh588EGfFwgAAGCG6XATGBioxYsXKy0tTd99950Mw1C/fv10wgknNEd9AAAApjTqOTeSdMIJJ+i0007zZS0AAABN5nW4mTp1qlf9li9f3uhiAAAAmsrrcLNixQpFRERo6NChMgyjOWsCAABoNK/DzbRp07R69Wrt2rVLU6dO1YQJE9S1a9fmrA0AAMA0r28Fz8zMlMvl0h133KFXX31VDodD11xzjdatW8dKDgAAaDFMPefGbrfruuuuk9Pp1Pbt2zVo0CAlJycrIiJCBw4caK4aAQAAvGb6IX5H2Gw22Ww2GYah2tpaX9YEAADQaKbCTWVlpVatWqXRo0drwIAB+vzzz/Xkk0+qsLBQHTt2bK4aAQAAvOb1BcXJyclavXq1evfurSlTpmj16tUKDQ1tztoAAABM8zrcLFmyRL1791ZkZKTWr1+v9evX19tvzZo1PisOAADALK/DzaRJk2Sz2ZqzFgAAgCYz9RA/AACAlq7Rd0v5SmZmpiIjIxUUFKSYmBht3Lixwb5r1qzR6NGjdeKJJyo4OFhxcXFat27dcawWAAC0dH4NNzk5OUpJSdH8+fNVUFCgkSNHKiEhQYWFhfX237Bhg0aPHq28vDzl5+frvPPO06WXXqqCgoLjXDkAAGip/Bpu0tPTlZiYqKSkJEVFRSkjI0MOh0NZWVn19s/IyNCcOXN0xhlnqH///nrggQfUv39/vfrqq8e5cgAA0FL5LdxUVVUpPz9f8fHxHu3x8fHavHmzV/uora3V/v37j/qOq8rKSlVUVHhsAADAuvwWbkpLS1VTU6Pw8HCP9vDwcBUXF3u1j8cee0wHDx7UNddc02CftLQ0hYSEuDeHw9GkugEAQMvm9wuK/3x7uWEYXt1yvmrVKt17773KyclRt27dGuw3b948lZeXu7eioqIm1wwAAFour28F97WwsDAFBATUWaUpKSmps5rzZzk5OUpMTNR//vMfXXjhhUfta7fbZbfbm1wvAABoHfy2chMYGKiYmBg5nU6PdqfTqWHDhjU4btWqVZo8ebKef/55XXLJJc1dJgAAaGX8tnIjSampqZo4caJiY2MVFxen7OxsFRYWatq0aZIOn1Las2ePVq5cKelwsJk0aZIWL16ss88+273q0759e4WEhPjtdwAAgJbDr+Fm3LhxKisr08KFC+VyuRQdHa28vDxFRERIklwul8czb5YuXarq6mrdfPPNuvnmm93tN9xwA09QBgAAkvwcbqTDbxtPTk6u97s/B5b33nuv+QsCAACtmt/vlgIAAPAlwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUwg0AALAUv4ebzMxMRUZGKigoSDExMdq4cWODfV0ul8aPH68BAwaoTZs2SklJOX6FAgCAVsGv4SYnJ0cpKSmaP3++CgoKNHLkSCUkJKiwsLDe/pWVlTrxxBM1f/58DRky5DhXCwAAWgO/hpv09HQlJiYqKSlJUVFRysjIkMPhUFZWVr39+/Tpo8WLF2vSpEkKCQk5ztUCAIDWwG/hpqqqSvn5+YqPj/doj4+P1+bNm312nMrKSlVUVHhsAADAuvwWbkpLS1VTU6Pw8HCP9vDwcBUXF/vsOGlpaQoJCXFvDofDZ/sGAAAtj98vKLbZbB6fDcOo09YU8+bNU3l5uXsrKiry2b4BAEDL09ZfBw4LC1NAQECdVZqSkpI6qzlNYbfbZbfbfbY/AADQsvlt5SYwMFAxMTFyOp0e7U6nU8OGDfNTVQAAoLXz28qNJKWmpmrixImKjY1VXFycsrOzVVhYqGnTpkk6fEppz549WrlypXvMtm3bJEkHDhzQzz//rG3btikwMFADBw70x08AAAAtjF/Dzbhx41RWVqaFCxfK5XIpOjpaeXl5ioiIkHT4oX1/fubN0KFD3X/Oz8/X888/r4iICH3//ffHs3QAANBC+TXcSFJycrKSk5Pr/W7FihV12gzDaOaKAABAa+b3u6UAAAB8iXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAshXADAAAsxe/hJjMzU5GRkQoKClJMTIw2btx41P7r169XTEyMgoKCdPLJJ2vJkiXHqVIAANAa+DXc5OTkKCUlRfPnz1dBQYFGjhyphIQEFRYW1tt/9+7dGjNmjEaOHKmCggLdeeeduuWWW5Sbm3ucKwcAAC2VX8NNenq6EhMTlZSUpKioKGVkZMjhcCgrK6ve/kuWLFHv3r2VkZGhqKgoJSUlaerUqXr00UePc+UAAKClauuvA1dVVSk/P19z5871aI+Pj9fmzZvrHfPBBx8oPj7eo+2iiy7SsmXL9Mcff6hdu3Z1xlRWVqqystL9uby8XJJUUVHR1J9Qr9rKQ82yXxxbc83pEcyt/zTn3DKv/sO/Wetqjrk9sk/DMI7Z12/hprS0VDU1NQoPD/doDw8PV3Fxcb1jiouL6+1fXV2t0tJSde/evc6YtLQ0LViwoE67w+FoQvVoiUIy/F0Bmgtza03Mq3U159zu379fISEhR+3jt3BzhM1m8/hsGEadtmP1r6/9iHnz5ik1NdX9uba2Vvv27VNoaOhRj/NXU1FRIYfDoaKiIgUHB/u7HPgQc2tdzK01Ma/1MwxD+/fvV48ePY7Z12/hJiwsTAEBAXVWaUpKSuqszhxx0kkn1du/bdu2Cg0NrXeM3W6X3W73aOvcuXPjC7e44OBg/jFZFHNrXcytNTGvdR1rxeYIv11QHBgYqJiYGDmdTo92p9OpYcOG1TsmLi6uTv8333xTsbGx9V5vAwAA/nr8erdUamqqnn76aS1fvlxfffWVZs+ercLCQk2bNk3S4VNKkyZNcvefNm2afvjhB6Wmpuqrr77S8uXLtWzZMt12223++gkAAKCF8es1N+PGjVNZWZkWLlwol8ul6Oho5eXlKSIiQpLkcrk8nnkTGRmpvLw8zZ49W//617/Uo0cPPf7447rqqqv89RMsw26365577qlzCg+tH3NrXcytNTGvTWczvLmnCgAAoJXw++sXAAAAfIlwAwAALIVwAwAALIVwAwAALIVw08ps2LBBl156qXr06CGbzab//ve//i4JPpCWlqYzzjhDnTp1Urdu3TR27Fh98803/i4LPpCVlaXBgwe7H8gWFxenN954w99lwcfS0tJks9mUkpLi71Igwk2rc/DgQQ0ZMkRPPvlksx7njz/+aNb9w9P69et18803a8uWLXI6naqurlZ8fLwOHjzo82Mxt8dXr1699OCDD2rr1q3aunWrzj//fF1++eX68ssvfXoc5tV/Pv74Y2VnZ2vw4MHNsn/mthEMtFqSjJdeeumY/b766itj+PDhht1uN6Kiogyn0+kxdvfu3YYkIycnxxg1apRht9uN5cuXG6Wlpca1115r9OzZ02jfvr0RHR1tPP/88x77HjVqlDFjxgxj1qxZRufOnY1u3boZS5cuNQ4cOGBMnjzZ6Nixo3HyyScbeXl5zfA3YF0lJSWGJGP9+vVH7cfctk5dunQxnn766Qa/Z15bj/379xv9+/c3nE6nMWrUKGPWrFlH7c/cHh+Em1bMm3BTU1NjDBgwwBg9erSxbds2Y+PGjcaZZ55Z7z+mPn36GLm5ucauXbuMPXv2GD/++KPxyCOPGAUFBcZ3331nPP7440ZAQICxZcsW9/5HjRpldOrUybjvvvuMHTt2GPfdd5/Rpk0bIyEhwcjOzjZ27NhhTJ8+3QgNDTUOHjzYjH8b1rJz505DkvH555832Ie5bX2qq6uNVatWGYGBgcaXX35Zbx/mtXWZNGmSkZKSYhiGccxww9weP4SbVsybcPPGG28Ybdu2NVwul7utof9TyMjIOOYxx4wZY9x6663uz6NGjTJGjBjh/lxdXW106NDBmDhxorvN5XIZkowPPvjAy1/211ZbW2tceumlHn+v9WFuW4/PPvvM6NChgxEQEGCEhIQYr7/+eoN9mdfWY9WqVUZ0dLTx22+/GYZx7HDD3B4/XHNjIQ888IA6duzo3goLC/XNN9/I4XDopJNOcvc788wz6x0fGxvr8bmmpkb//Oc/NXjwYIWGhqpjx4568803PV6JIcnjPHNAQIBCQ0N12mmnuduOvOW9pKSkyb/xr2DGjBn67LPPtGrVKncbc9u6DRgwQNu2bdOWLVs0ffp03XDDDdq+fTvz2ooVFRVp1qxZevbZZxUUFFTne+bWv/z6bin41rRp03TNNde4P/fo0UOGYchms3k1vkOHDh6fH3vsMS1atEgZGRk67bTT1KFDB6WkpKiqqsqj35/fyG6z2Tzajhy/trbW1O/5K5o5c6ZeeeUVbdiwQb169XK3M7etW2BgoPr16yfp8H+0Pv74Yy1evFhpaWnMayuVn5+vkpISxcTEuNtqamq0YcMGPfnkk9q7dy9z60eEGwvp2rWrunbt6tF26qmnqrCwUHv37nUn9o8//tir/W3cuFGXX365JkyYIOnwP4adO3cqKirKt4VDhmFo5syZeumll/Tee+8pMjLS43vm1loMw1BlZSXz2opdcMEF+vzzzz3apkyZolNPPVV33HGHQkNDFRoa6vE9c3v8EG5amQMHDujbb791f969e7e2bdumrl27qnfv3nX6jx49Wn379tUNN9yghx9+WPv379f8+fMl6Zj/B9GvXz/l5uZq8+bN6tKli9LT01VcXMw/pmZw88036/nnn9fLL7+sTp06qbi4WJIUEhKi9u3b1zuGuW0d7rzzTiUkJMjhcGj//v1avXq13nvvPa1du7be/sxr69CpUydFR0d7tHXo0EGhoaF12o9gbo8frrlpZbZu3aqhQ4dq6NChkqTU1FQNHTpU//jHP+rtHxAQoP/+9786cOCAzjjjDCUlJemuu+6SpHrPE/+vu+++W6effrouuuginXvuuTrppJM0duxYn/4eHJaVlaXy8nKde+656t69u3vLyclpcAxz2zrs3btXEydO1IABA3TBBRfoww8/1Nq1azV69Oh6+zOv1sXcHj82wzAMfxeB4+v999/XiBEj9O2336pv377+Lgc+xNxaE/NqXcxt8yDc/AW89NJL6tixo/r3769vv/1Ws2bNUpcuXbRp0yZ/l4YmYm6tiXm1Lub2+OCam7+A/fv3a86cOSoqKlJYWJguvPBCPfbYY/4uCz7A3FoT82pdzO3xwcoNAACwFC4oBgAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlkK4AQAAlvL/AEGIHigLKKERAAAAAElFTkSuQmCC",
188
+ "text/plain": [
189
+ "<Figure size 640x480 with 1 Axes>"
190
+ ]
191
+ },
192
+ "metadata": {},
193
+ "output_type": "display_data"
194
+ }
195
+ ],
196
+ "source": [
197
+ "# Mocked dataset showing the precision multiplied by the BP for different n-grams\n",
198
+ "data = {\"1-gram\": 0.8, \"2-gram\": 0.77, \"3-gram\": 0.74, \"4-gram\": 0.71}\n",
199
+ "\n",
200
+ "# Plot the datapoints defined above\n",
201
+ "fig, ax = plt.subplots(1)\n",
202
+ "bars = ax.bar(*zip(*data.items()))\n",
203
+ "ax.set(ylabel=\"Modified N-gram precision\")\n",
204
+ "plt.show()"
205
+ ]
206
+ },
207
+ {
208
+ "cell_type": "markdown",
209
+ "metadata": {},
210
+ "source": [
211
+ "# 3. Example Calculations of the BLEU score\n",
212
+ "\n",
213
+ "In this example you will have a reference sentence and 2 candidate sentences. You will tokenize all sentences using the NLTK package. Then you will compare the two candidates to the reference using BLEU score.\n",
214
+ "\n",
215
+ "First you define and tokenize the sentences."
216
+ ]
217
+ },
218
+ {
219
+ "cell_type": "code",
220
+ "execution_count": 16,
221
+ "metadata": {
222
+ "tags": []
223
+ },
224
+ "outputs": [
225
+ {
226
+ "name": "stdout",
227
+ "output_type": "stream",
228
+ "text": [
229
+ "The NASA Opportunity rover is battling a massive dust storm on planet Mars. -> ['the', 'nasa', 'opportunity', 'rover', 'is', 'battling', 'a', 'massive', 'dust', 'storm', 'on', 'planet', 'mars', '.']\n",
230
+ "\n",
231
+ "\n",
232
+ "The Opportunity rover is combating a big sandstorm on planet Mars. -> ['the', 'opportunity', 'rover', 'is', 'combating', 'a', 'big', 'sandstorm', 'on', 'planet', 'mars', '.']\n",
233
+ "\n",
234
+ "\n",
235
+ "A NASA rover is fighting a massive storm on planet Mars. -> ['a', 'nasa', 'rover', 'is', 'fighting', 'a', 'massive', 'storm', 'on', 'planet', 'mars', '.']\n"
236
+ ]
237
+ }
238
+ ],
239
+ "source": [
240
+ "reference = \"The NASA Opportunity rover is battling a massive dust storm on planet Mars.\"\n",
241
+ "candidate_1 = \"The Opportunity rover is combating a big sandstorm on planet Mars.\"\n",
242
+ "candidate_2 = \"A NASA rover is fighting a massive storm on planet Mars.\"\n",
243
+ "\n",
244
+ "tokenized_ref = nltk.word_tokenize(reference.lower())\n",
245
+ "tokenized_cand_1 = nltk.word_tokenize(candidate_1.lower())\n",
246
+ "tokenized_cand_2 = nltk.word_tokenize(candidate_2.lower())\n",
247
+ "\n",
248
+ "print(f\"{reference} -> {tokenized_ref}\")\n",
249
+ "print(\"\\n\")\n",
250
+ "print(f\"{candidate_1} -> {tokenized_cand_1}\")\n",
251
+ "print(\"\\n\")\n",
252
+ "print(f\"{candidate_2} -> {tokenized_cand_2}\")"
253
+ ]
254
+ },
255
+ {
256
+ "cell_type": "markdown",
257
+ "metadata": {},
258
+ "source": [
259
+ "## 3.1 Define the functions to calculate the BLEU score\n",
260
+ "\n",
261
+ "### Computing the Brevity Penalty\n",
262
+ "You will start by defining the function for brevity penalty according to the equation (2) in section 2.1."
263
+ ]
264
+ },
265
+ {
266
+ "cell_type": "code",
267
+ "execution_count": 6,
268
+ "metadata": {},
269
+ "outputs": [],
270
+ "source": [
271
+ "def brevity_penalty(candidate, reference):\n",
272
+ " \"\"\"\n",
273
+ " Calculates the brevity penalty given the candidate and reference sentences.\n",
274
+ " \"\"\"\n",
275
+ " reference_length = len(reference)\n",
276
+ " candidate_length = len(candidate)\n",
277
+ "\n",
278
+ " if reference_length < candidate_length:\n",
279
+ " BP = 1\n",
280
+ " else:\n",
281
+ " penalty = 1 - (reference_length / candidate_length)\n",
282
+ " BP = np.exp(penalty)\n",
283
+ "\n",
284
+ " return BP"
285
+ ]
286
+ },
287
+ {
288
+ "cell_type": "markdown",
289
+ "metadata": {},
290
+ "source": [
291
+ "### Computing the clipped Precision\n",
292
+ "Next, you need to define a function to calculate the geometrically averaged clipped precision. This function calculates how many of the n-grams in the candidate sentence actually appear in the reference sentence. The clipping takes care of overcounting. For example if a certain n-gram appears five times in the candidate sentence, but only twice in the reference, the value is clipped to two."
293
+ ]
294
+ },
295
+ {
296
+ "cell_type": "code",
297
+ "execution_count": 17,
298
+ "metadata": {},
299
+ "outputs": [],
300
+ "source": [
301
+ "def average_clipped_precision(candidate, reference):\n",
302
+ " \"\"\"\n",
303
+ " Calculates the precision given the candidate and reference sentences.\n",
304
+ " \"\"\"\n",
305
+ "\n",
306
+ " clipped_precision_score = []\n",
307
+ " \n",
308
+ " # Loop through values 1, 2, 3, 4. This is the length of n-grams\n",
309
+ " for n_gram_length in range(1, 5):\n",
310
+ " reference_n_gram_counts = Counter(ngrams(reference, n_gram_length)) \n",
311
+ " candidate_n_gram_counts = Counter(ngrams(candidate, n_gram_length)) \n",
312
+ "\n",
313
+ " total_candidate_ngrams = sum(candidate_n_gram_counts.values()) \n",
314
+ " \n",
315
+ " for ngram in candidate_n_gram_counts: \n",
316
+ " # check if it is in the reference n-gram\n",
317
+ " if ngram in reference_n_gram_counts:\n",
318
+ " # if the count of the candidate n-gram is bigger than the corresponding\n",
319
+ " # count in the reference n-gram, then set the count of the candidate n-gram \n",
320
+ " # to be equal to the reference n-gram\n",
321
+ " \n",
322
+ " if candidate_n_gram_counts[ngram] > reference_n_gram_counts[ngram]: \n",
323
+ " candidate_n_gram_counts[ngram] = reference_n_gram_counts[ngram] # t\n",
324
+ " \n",
325
+ " else:\n",
326
+ " candidate_n_gram_counts[ngram] = 0 # else set the candidate n-gram equal to zero\n",
327
+ "\n",
328
+ " clipped_candidate_ngrams = sum(candidate_n_gram_counts.values())\n",
329
+ " \n",
330
+ " clipped_precision_score.append(clipped_candidate_ngrams / total_candidate_ngrams)\n",
331
+ " \n",
332
+ " # Calculate the geometric average: take the mean of elemntwise log, then exponentiate\n",
333
+ " # This is equivalent to taking the n-th root of the product as shown in equation (1) above\n",
334
+ " s = np.exp(np.mean(np.log(clipped_precision_score)))\n",
335
+ " \n",
336
+ " return s\n"
337
+ ]
338
+ },
339
+ {
340
+ "cell_type": "markdown",
341
+ "metadata": {},
342
+ "source": [
343
+ "### Computing the BLEU score\n",
344
+ "Finally, you can compute the BLEU score using the above two functions."
345
+ ]
346
+ },
347
+ {
348
+ "cell_type": "code",
349
+ "execution_count": 18,
350
+ "metadata": {},
351
+ "outputs": [],
352
+ "source": [
353
+ "def bleu_score(candidate, reference):\n",
354
+ " BP = brevity_penalty(candidate, reference) \n",
355
+ " geometric_average_precision = average_clipped_precision(candidate, reference) \n",
356
+ " return BP * geometric_average_precision"
357
+ ]
358
+ },
359
+ {
360
+ "cell_type": "markdown",
361
+ "metadata": {},
362
+ "source": [
363
+ "## 3.2 Testing the functions\n",
364
+ "Now you can test the functions with your Example Reference and Candidates Sentences."
365
+ ]
366
+ },
367
+ {
368
+ "cell_type": "code",
369
+ "execution_count": 19,
370
+ "metadata": {
371
+ "tags": []
372
+ },
373
+ "outputs": [
374
+ {
375
+ "name": "stdout",
376
+ "output_type": "stream",
377
+ "text": [
378
+ "BLEU score of reference versus candidate 1: 27.6\n",
379
+ "BLEU score of reference versus candidate 2: 35.3\n"
380
+ ]
381
+ }
382
+ ],
383
+ "source": [
384
+ "result_candidate_1 = round(bleu_score(tokenized_cand_1, tokenized_ref) * 100, 1)\n",
385
+ "print(f\"BLEU score of reference versus candidate 1: {result_candidate_1}\")\n",
386
+ "result_candidate_2 = round(bleu_score(tokenized_cand_2, tokenized_ref) * 100, 1)\n",
387
+ "print(f\"BLEU score of reference versus candidate 2: {result_candidate_2}\")"
388
+ ]
389
+ },
390
+ {
391
+ "cell_type": "markdown",
392
+ "metadata": {},
393
+ "source": [
394
+ "## 3.3 Comparing the Results from your Code with the Sacrebleu Library\n",
395
+ "Below you will do the same calculation, but using the `sacrebleu` library. Compare them with your implementation above."
396
+ ]
397
+ },
398
+ {
399
+ "cell_type": "code",
400
+ "execution_count": 20,
401
+ "metadata": {
402
+ "scrolled": true
403
+ },
404
+ "outputs": [
405
+ {
406
+ "name": "stdout",
407
+ "output_type": "stream",
408
+ "text": [
409
+ "BLEU score of reference versus candidate 1: 27.6\n",
410
+ "BLEU score of reference versus candidate 2: 35.3\n"
411
+ ]
412
+ }
413
+ ],
414
+ "source": [
415
+ "result_candidate_1 = round(sacrebleu.sentence_bleu(candidate_1, [reference]).score, 1)\n",
416
+ "print(f\"BLEU score of reference versus candidate 1: {result_candidate_1}\")\n",
417
+ "result_candidate_2 = round(sacrebleu.sentence_bleu(candidate_2, [reference]).score, 1)\n",
418
+ "print(f\"BLEU score of reference versus candidate 2: {result_candidate_2}\")"
419
+ ]
420
+ },
421
+ {
422
+ "cell_type": "markdown",
423
+ "metadata": {},
424
+ "source": [
425
+ "# 4. BLEU computation on a corpus\n",
426
+ "\n",
427
+ "## 4.1 Loading Datasets for Evaluation Using the BLEU Score\n",
428
+ "\n",
429
+ "In this section, you will use a simple pipeline for evaluating machine translated text. You will use English to German translations generated by [Google Translate](https://translate.google.com). There are three files you will need:\n",
430
+ "\n",
431
+ "1. A source text in English. In this lab, you will use the first 1671 words of the [wmt19](http://statmt.org/wmt19/translation-task.html) evaluation dataset downloaded via SacreBLEU.\n",
432
+ "2. A reference translation to German of the corresponding first 1671 words from the original English text. This is also provided by SacreBLEU.\n",
433
+ "3. A candidate machine translation to German from the same 1671 words. This is generated by Google Translate.\n",
434
+ "\n",
435
+ "With that, you can now compare the reference and candidate translation to get the BLEU Score."
436
+ ]
437
+ },
438
+ {
439
+ "cell_type": "code",
440
+ "execution_count": 21,
441
+ "metadata": {},
442
+ "outputs": [],
443
+ "source": [
444
+ "# Loading the raw data\n",
445
+ "wmt19_src = open(\"data/wmt19_src.txt\", \"r\")\n",
446
+ "wmt19_src_1 = wmt19_src.read()\n",
447
+ "wmt19_src.close()\n",
448
+ "\n",
449
+ "wmt19_ref = open(\"data/wmt19_ref.txt\", \"r\")\n",
450
+ "wmt19_ref_1 = wmt19_ref.read()\n",
451
+ "wmt19_ref.close()\n",
452
+ "\n",
453
+ "wmt19_can = open(\"data/wmt19_can.txt\", \"r\")\n",
454
+ "wmt19_can_1 = wmt19_can.read()\n",
455
+ "wmt19_can.close()\n",
456
+ "\n",
457
+ "tokenized_corpus_src = nltk.word_tokenize(wmt19_src_1.lower())\n",
458
+ "tokenized_corpus_ref = nltk.word_tokenize(wmt19_ref_1.lower())\n",
459
+ "tokenized_corpus_cand = nltk.word_tokenize(wmt19_can_1.lower())"
460
+ ]
461
+ },
462
+ {
463
+ "cell_type": "markdown",
464
+ "metadata": {},
465
+ "source": [
466
+ "Now that you have your data loaded, you can inspect the first sentence of each dataset."
467
+ ]
468
+ },
469
+ {
470
+ "cell_type": "code",
471
+ "execution_count": 22,
472
+ "metadata": {
473
+ "tags": []
474
+ },
475
+ "outputs": [
476
+ {
477
+ "name": "stdout",
478
+ "output_type": "stream",
479
+ "text": [
480
+ "English source text:\n",
481
+ "\n",
482
+ "Welsh AMs worried about 'looking like muppets'\n",
483
+ "There is consternation among some AMs at a suggestion their title should change to MWPs (Member of the Welsh Parliament).\n",
484
+ " -> ['\\ufeffwelsh', 'ams', 'worried', 'about', \"'looking\", 'like', \"muppets'\", 'there', 'is', 'consternation', 'among', 'some', 'ams', 'at', 'a', 'suggestion', 'their', 'title', 'should', 'change', 'to', 'mwps', '(', 'member', 'of', 'the', 'welsh', 'parliament', ')', '.']\n",
485
+ "\n",
486
+ "\n",
487
+ "German reference translation:\n",
488
+ "\n",
489
+ "Walisische Ageordnete sorgen sich \"wie Dödel auszusehen\"\n",
490
+ "Es herrscht Bestürzung unter einigen Mitgliedern der Versammlung über einen Vorschlag, der ihren Titel zu MWPs (Mitglied der walisischen Parlament) ändern soll.\n",
491
+ " -> ['\\ufeffwalisische', 'ageordnete', 'sorgen', 'sich', '``', 'wie', 'dödel', 'auszusehen', \"''\", 'es', 'herrscht', 'bestürzung', 'unter', 'einigen', 'mitgliedern', 'der', 'versammlung', 'über', 'einen', 'vorschlag', ',', 'der', 'ihren', 'titel', 'zu', 'mwps', '(', 'mitglied', 'der', 'walisischen', 'parlament', ')', 'ändern', 'soll', '.']\n",
492
+ "\n",
493
+ "\n",
494
+ "German machine translation:\n",
495
+ "\n",
496
+ "Walisische AMs machten sich Sorgen, dass sie wie Muppets aussehen könnten\n",
497
+ "Einige AMs sind bestürzt über den Vorschlag, ihren Titel in MWPs (Mitglied des walisischen Parlaments) zu ändern.\n",
498
+ "Es ist aufg -> ['walisische', 'ams', 'machten', 'sich', 'sorgen', ',', 'dass', 'sie', 'wie', 'muppets', 'aussehen', 'könnten', 'einige', 'ams', 'sind', 'bestürzt', 'über', 'den', 'vorschlag', ',', 'ihren', 'titel', 'in', 'mwps', '(', 'mitglied', 'des', 'walisischen', 'parlaments']\n"
499
+ ]
500
+ }
501
+ ],
502
+ "source": [
503
+ "print(\"English source text:\\n\")\n",
504
+ "print(f\"{wmt19_src_1[0:170]} -> {tokenized_corpus_src[0:30]}\\n\\n\")\n",
505
+ "print(\"German reference translation:\\n\")\n",
506
+ "print(f\"{wmt19_ref_1[0:219]} -> {tokenized_corpus_ref[0:35]}\\n\\n\")\n",
507
+ "print(\"German machine translation:\\n\")\n",
508
+ "print(f\"{wmt19_can_1[0:199]} -> {tokenized_corpus_cand[0:29]}\")"
509
+ ]
510
+ },
511
+ {
512
+ "cell_type": "markdown",
513
+ "metadata": {},
514
+ "source": [
515
+ "And lastly, you can calculate the BLEU score of the translation."
516
+ ]
517
+ },
518
+ {
519
+ "cell_type": "code",
520
+ "execution_count": 23,
521
+ "metadata": {
522
+ "tags": []
523
+ },
524
+ "outputs": [
525
+ {
526
+ "name": "stdout",
527
+ "output_type": "stream",
528
+ "text": [
529
+ "BLEU score of the reference versus candidate translation: 43.2\n"
530
+ ]
531
+ }
532
+ ],
533
+ "source": [
534
+ "result = round(sacrebleu.sentence_bleu(wmt19_can_1, [wmt19_ref_1]).score, 1)\n",
535
+ "print(f\"BLEU score of the reference versus candidate translation: {result}\")"
536
+ ]
537
+ },
538
+ {
539
+ "cell_type": "markdown",
540
+ "metadata": {},
541
+ "source": [
542
+ "## 4.2 BLEU Score Interpretation on a Corpus\n",
543
+ "The table below (taken from [here](https://cloud.google.com/translate/automl/docs/evaluate)) shows the typical values of BLEU score. You can see that the translation above is of high quality according to this table and in comparison to the given reference sentence. (*if you see \"Hard to get the gist\", please open your workspace, delete `wmt19_can.txt` and get the latest version via the Lab Help button*)\n",
544
+ "\n",
545
+ "|Score | Interpretation |\n",
546
+ "|:---------:|:-------------------------------------------------------------:|\n",
547
+ "| < 10 | Almost useless |\n",
548
+ "| 10 - 19 | Hard to get the gist |\n",
549
+ "| 20 - 29 | The gist is clear, but has significant grammatical errors |\n",
550
+ "| 30 - 40 | Understandable to good translations |\n",
551
+ "| 40 - 50 | High quality translations |\n",
552
+ "| 50 - 60 | Very high quality, adequate, and fluent translations |\n",
553
+ "| > 60 | Quality often better than human |"
554
+ ]
555
+ },
556
+ {
557
+ "cell_type": "code",
558
+ "execution_count": null,
559
+ "metadata": {},
560
+ "outputs": [],
561
+ "source": []
562
+ }
563
+ ],
564
+ "metadata": {
565
+ "kernelspec": {
566
+ "display_name": "Python 3 (ipykernel)",
567
+ "language": "python",
568
+ "name": "python3"
569
+ },
570
+ "language_info": {
571
+ "codemirror_mode": {
572
+ "name": "ipython",
573
+ "version": 3
574
+ },
575
+ "file_extension": ".py",
576
+ "mimetype": "text/x-python",
577
+ "name": "python",
578
+ "nbconvert_exporter": "python",
579
+ "pygments_lexer": "ipython3",
580
+ "version": "3.10.11"
581
+ }
582
+ },
583
+ "nbformat": 4,
584
+ "nbformat_minor": 4
585
+ }
NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/C4W1_QKV_Attention.ipynb ADDED
@@ -0,0 +1,281 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "id": "707052ae",
6
+ "metadata": {},
7
+ "source": [
8
+ "# Scaled Dot-Product Attention: Ungraded Lab\n",
9
+ "\n",
10
+ "The 2017 paper [Attention Is All You Need](https://arxiv.org/abs/1706.03762) introduced the Transformer model and scaled dot-product attention, sometimes also called QKV (**Q**ueries, **K**eys, **V**alues) attention. Since then, Transformers have come to dominate large-scale natural language applications. Scaled dot-product attention can be used to improve seq2seq models as well. In this ungraded lab, you'll implement a simplified version of scaled dot-product attention and replicate word alignment between English and French, as shown in [Bhadanau, et al. (2014)](https://arxiv.org/abs/1409.0473).\n",
11
+ "\n",
12
+ "The Transformer model learns how to align words in different languages. You won't be training any weights here, so instead you will use [pre-trained aligned word embeddings from here](https://fasttext.cc/docs/en/aligned-vectors.html). Run the cell below to load the embeddings and set up the rest of the notebook.\n",
13
+ "\n",
14
+ "This is a practice notebook, where you can train writing your code. All of the solutions are provided at the end of the notebook."
15
+ ]
16
+ },
17
+ {
18
+ "cell_type": "code",
19
+ "execution_count": 1,
20
+ "id": "aa4d9f30",
21
+ "metadata": {},
22
+ "outputs": [],
23
+ "source": [
24
+ "# Import the libraries\n",
25
+ "import pickle\n",
26
+ "import matplotlib.pyplot as plt\n",
27
+ "import numpy as np\n",
28
+ "\n",
29
+ "# Load the word2int dictionaries\n",
30
+ "with open(\"./data/word2int_en.pkl\", \"rb\") as f:\n",
31
+ " en_words = pickle.load(f)\n",
32
+ " \n",
33
+ "with open(\"./data/word2int_fr.pkl\", \"rb\") as f:\n",
34
+ " fr_words = pickle.load(f)\n",
35
+ "\n",
36
+ "# Load the word embeddings\n",
37
+ "en_embeddings = np.load(\"./data/embeddings_en.npz\")[\"embeddings\"]\n",
38
+ "fr_embeddings = np.load(\"./data/embeddings_fr.npz\")[\"embeddings\"]"
39
+ ]
40
+ },
41
+ {
42
+ "cell_type": "code",
43
+ "execution_count": 2,
44
+ "id": "a6914081",
45
+ "metadata": {},
46
+ "outputs": [],
47
+ "source": [
48
+ "# Define some helper functions\n",
49
+ "\n",
50
+ "def tokenize(sentence, token_mapping):\n",
51
+ " tokenized = []\n",
52
+ " \n",
53
+ " for word in sentence.lower().split(\" \"):\n",
54
+ " try:\n",
55
+ " tokenized.append(token_mapping[word])\n",
56
+ " except KeyError:\n",
57
+ " # Using -1 to indicate an unknown word\n",
58
+ " tokenized.append(-1)\n",
59
+ " \n",
60
+ " return tokenized\n",
61
+ "\n",
62
+ "\n",
63
+ "def embed(tokens, embeddings):\n",
64
+ " embed_size = embeddings.shape[1]\n",
65
+ " \n",
66
+ " output = np.zeros((len(tokens), embed_size))\n",
67
+ " for i, token in enumerate(tokens):\n",
68
+ " if token == -1:\n",
69
+ " output[i] = np.zeros((1, embed_size))\n",
70
+ " else:\n",
71
+ " output[i] = embeddings[token]\n",
72
+ " \n",
73
+ " return output"
74
+ ]
75
+ },
76
+ {
77
+ "cell_type": "markdown",
78
+ "id": "6153d4b2",
79
+ "metadata": {},
80
+ "source": [
81
+ "The scaled-dot product attention consists of two matrix multiplications and a softmax scaling as shown in the diagram below from [Vaswani, et al. (2017)](https://arxiv.org/abs/1706.03762). It takes three input matrices, the queries, keys, and values.\n",
82
+ "\n",
83
+ "![scaled-dot product attention diagram](./images/attention.png)\n",
84
+ "\n",
85
+ "Mathematically, this is expressed as\n",
86
+ "\n",
87
+ "$$ \n",
88
+ "\\large \\mathrm{Attention}\\left(Q, K, V\\right) = \\mathrm{softmax}\\left(\\frac{QK^{\\top}}{\\sqrt{d_k}}\\right)V\n",
89
+ "$$\n",
90
+ "\n",
91
+ "where $Q$, $K$, and $V$ are the queries, keys, and values matrices respectively, and $d_k$ is the dimension of the keys. In practice, Q, K, and V all have the same dimensions. This form of attention is faster and more space-efficient than what you implemented before since it consists of only matrix multiplications instead of a learned feed-forward layer.\n",
92
+ "\n",
93
+ "Conceptually, the first matrix multiplication is a measure of the similarity between the queries and the keys. This is transformed into weights using the softmax function. These weights are then applied to the values with the second matrix multiplication resulting in output attention vectors. Typically, decoder states are used as the queries while encoder states are the keys and values.\n",
94
+ "\n",
95
+ "### Exercise 1\n",
96
+ "Implement the softmax function with Numpy and use it to calculate the weights from the queries and keys. Assume the queries and keys are 2D arrays (matrices). Note that since the dot-product of Q and K will be a matrix, you'll need to calculate softmax over a specific axis. See the end of the notebook for solutions."
97
+ ]
98
+ },
99
+ {
100
+ "cell_type": "code",
101
+ "execution_count": 21,
102
+ "id": "3932b927",
103
+ "metadata": {},
104
+ "outputs": [],
105
+ "source": [
106
+ "def softmax(x, axis): \n",
107
+ " \"\"\" Calculate softmax function for an array x\n",
108
+ "\n",
109
+ " axis=0 calculates softmax across rows which means each column sums to 1 \n",
110
+ " axis=1 calculates softmax across columns which means each row sums to 1\n",
111
+ " \"\"\"\n",
112
+ " # Replace pass with your code.\n",
113
+ " return np.exp(x) / np.sum(np.exp(x), axis=axis, keepdims=True)\n",
114
+ "\n",
115
+ "def calculate_weights(queries, keys):\n",
116
+ " \"\"\" Calculate the weights for scaled dot-product attention\"\"\"\n",
117
+ " # Replace None with your code.\n",
118
+ " dot = queries.dot(keys.T) / np.sqrt(keys.shape[-1])\n",
119
+ " weights = softmax(dot, axis=1)\n",
120
+ " \n",
121
+ " assert weights.sum(axis=1)[0] == 1, \"Each row in weights must sum to 1\"\n",
122
+ " \n",
123
+ " # Replace pass with your code.\n",
124
+ " return weights"
125
+ ]
126
+ },
127
+ {
128
+ "cell_type": "code",
129
+ "execution_count": 22,
130
+ "id": "51f47450",
131
+ "metadata": {},
132
+ "outputs": [
133
+ {
134
+ "data": {
135
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAApoAAAKyCAYAAAB1836kAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAACdaUlEQVR4nOzdeVxU1f8/8NcddmRVERBRFkVFwRWXwn0hc9fc11wy00+aZmb6KS1Ns0xLS80lc0vN/WOumYBL7oioKKCyiCiCwICyc35/+GO+jswgBDN30Nfz8ZhHce+5974GEN6ce865khBCgIiIiIionCnkDkBEREREryYWmkRERESkEyw0iYiIiEgnWGgSERERkU6w0CQiIiIinWChSUREREQ6wUKTiIiIiHSChSYRERER6QQLTSIiIiLSCRaaRERERKQTLDSJiIiISCdYaBIRERGRTrDQJCqD4OBghIaGlqjt1atXERwcrONEREREhkMSQgi5QxBVVAqFAm3atEFQUNBL23bo0AEnT55EXl6eHpIRERHJjz2aRGVUmr/V+HcdERG9TlhoEulJcnIyLCws5I5BRESkN8ZyByCqSJRKJVJTU9W2ZWdnIy4uTmtvZWZmJoKCgnDt2jU0atRIDymJiIgMAwtNolJYunQpvvzyS7VtFy9ehJubW4mOHzt2rA5SERERGSYWmkSlYGdnh5o1a6o+jo2NhampKZycnDS2lyQJFhYW8PDwwKBBgzB8+HB9RSUiIpIdZ50TlYFCoYC/vz+XLSIiItKAPZpEZfDrr7/C0dFR7hhEREQGiT2aRERERKQT7NEkKkcpKSnIyMgodr3M58d4EhERvcpYaBKVUUREBObOnYvDhw8jLS2t2LaSJPHJQERE9NpgoUlUBleuXEG7du1UvZjm5uZwcHCAQsFnIRAREbHQJCqDzz77DOnp6ejUqROWLl2Khg0byh2JiIjIYHAyEFEZ2NnZoaCgAAkJCahUqZLccYiIiAwK7+8RlUFBQQHq1q3LIpOIiEgDFppEZdC4cWMkJCTIHYNeA+np6QgODsatW7eKbXfr1i0EBwcjIyNDT8mIiLRjoUlUBrNmzUJCQgI2bdokdxR6xa1atQodOnTAqVOnim136tQpdOjQAWvWrNFTMiIi7ThGk6iMVq9ejY8//hjjxo3D2LFj4enpCQsLC7lj0SvmzTffxKVLl5Camgpzc3Ot7TIzM2FnZ4cWLVrg5MmTekxIRFQUC02iMjAyMipVe66jafhyc3Px66+/4tChQ7hz506xC/BLkoTbt2/rJZeTkxNsbGwQERHx0rZ169ZFeno67t+/r4dkRETacXkjojIo7d9p/LvOsCUlJaFjx464fv16ib5WkiTpIdUzqampJX6qlK2tLWJiYnSciIjo5VhoEpVBQUGB3BGoHH366ae4du0aatSogU8++QR+fn6oVq2aQSzA7+joiMjISOTn5xfbk56Xl4fIyEhUrVpVj+mIiDRjoUlE9P8dOHAAJiYm+Pvvv1G7dm2546hp06YNfv/9d6xYsQJTpkzR2m7lypVIS0vDW2+9pcd0RESayf9nOhGRgUhLS0PdunUNrsgEgKlTpwIAZsyYga+//hpPnjxR2//kyRMsXLgQ06dPh0KhwEcffSRDSiIidZwMRFQOHj58iLVr1yIoKAjx8fHIyspSmySyd+9eJCYmYuTIkcXOGCZ5+fj4IDc3Fzdv3pQ7ikaLFy/Gp59+CkmSYGpqCm9vb9jZ2SE1NRU3btxATk4OhBBYtGgRPvnkE7njEhHx1jlRWe3duxejR49Genq6agLJi5NEbty4gf/+979wcHBA37595YhJJTBu3DhMmzYNly5dQrNmzeSOU8Qnn3yCunXr4rPPPkN4eDhCQkLU9jds2BDz589Hr169ZEpIRKSOPZpEZXDlyhW0bNkSQghMmTIFvXr1wrRp03D58mXk5+er2t29exeenp4YOnQoNm/eLGNiKo4QAiNGjEBQUBBWrFiB3r17yx1Jq9u3byM8PBxKpRLW1tZo0KABPDw85I5FRKSGPZpEZfD1118jLy8Pa9euxbvvvgsAGm+Nu7u7w9HREVevXtV3RCqFTp06AQASExPRr18/2Nvbw9PTU+uz7CVJwvHjx/UZUcXT0xOenp6yXJuIqKTYo0lUBk5OTigoKEBiYqJqW5s2bXDmzBm1Hk0AaNGiBaKiovD48WN9x6QSKu0yRpIkFfk6ExHR/2GPJlEZpKSkwMfHp0RthRDIzs7WcSIqixMnTsgdAQAQHBwMALC0tETz5s3VtpVG27ZtyzUXEVFpsUeTqAxq1KiB7OxsPHr0SLVNU49mfn4+KleujGrVqiEyMlKOqFSBKBQKSJKEunXr4saNG2rbSoqPOyUiQ8AeTaIy8Pf3xx9//IF9+/YVO3Fkw4YNSE9Px+DBg/WYjiqqtm3bQpIktUdOFm4jIqpI2KNJVAYXLlxAq1atULVqVaxfvx7du3cv0qO5ceNGfPDBB8jJycGVK1fg7e0tc2oiIiL9YKFJVEZLly7Fxx9/DACoVq0asrKyoFQq8eabbyI8PFw1+WfFihWYOHGinFGphO7evYvt27cjNDQUjx8/Rm5ursZ2cs46JyKqCFhoEpWDw4cPY/bs2UUW0AaeLaL9zTffoFu3bjIko9L69ttvMXv2bOTl5aluVT//Y/L5bZx1TkRUPBaaROUoNjYWYWFhSEtLg5WVFby9vQ3yudmk2cGDB9GjRw84Ozvjq6++wrJly3D9+nUcPXoUcXFxCA0Nxbp165Cfn49FixbB19cX7dq103vO1NRU3L17FxkZGSjuRzhnnROR3FhoEhH9f926dcPRo0cRHByMN998U+MKAsnJyejfvz+uXLmCCxcuoE6dOnrL9/fff2P27Nk4f/78S9ty1jkRGQIWmkRE/1+1atVgamqKe/fuAdC++H5CQgJq1aqFgQMH6u2RogcPHkSfPn2Ql5cHc3NzuLu7w8HBodiZ6IayLigRvb64vBFROUhLS0NgYCDu3LlT7O1MSZLw3//+V8/pqKSUSiUaNWqk+rjwcaJKpRI2Njaq7c7OzmjYsKFeC7nPP/8c+fn5mDBhAhYtWgRbW1u9XZuI6N9ioUlURvPmzcM333yjeuqPpiJTkiTV5BEWmoarWrVqUCqVah8DwK1bt+Dn56fWNiMjA8nJyXrLduPGDVStWhUrV67U2zWJiMqKhSZRGXz77beYN28eAKBVq1Zo0qTJS29nkuHy9PTE5cuXVR+3bNkSv//+O1auXKlWaB4/fhxRUVFwd3fXWzZ7e3u4uLjo7XpEROWBhSZRGaxevRqSJGHLli186s8r4K233kJwcDAuXLgAPz8/DB06FF988QV+++03REREoHXr1nj48CF27NgBSZIwYsQIvWXr2rUrdu7ciSdPnqBSpUp6uy4RUVlwMhBRGVhYWKB69eq4ffu23FGoHNy9exdff/013nnnHQQEBAAAjh49isGDByM1NVWt7TvvvIOtW7fC2Fg/f6/HxsaiRYsW6Ny5M9auXasaP0pEZMhYaBKVQZ06dWBlZaVxoXZ6daSlpeHQoUOIjo6GhYUF2rRpg6ZNm+o9R0REBEaOHIl79+5hyJAh8PT0hKWlpdb2I0eO1GM6ovIXHBwMW1tbtUl62ly9ehWpqalcP9bAsNAkKoP//ve/+OabbxAZGYlatWrJHYdecZs3b8bMmTORkJBQonHAfGoRVXQKhQJt2rRBUFDQS9t26NABJ0+e5PqxBoZjNInKYPbs2Th27Bh69+6NjRs3wtfXV+5I9Iravn27qoeyRo0a8PHx4cQzei2Upj+MfWeGh4UmURmYm5sjKCgIgwYNQtOmTdGkSZNib2dKkoR169bpOSWVVlhYGH788UcEBQUhPj4e2dnZar0kq1evRkxMDD799FO19TV1aeHChZAkCQsXLsTHH38MhUKhl+sSVRTJycmwsLCQOwa9gIUmURnk5+dj0qRJOHDgAAoKCnDp0iVcunRJa3sWmobvp59+wkcffaRWWL7Ya5idnY1vvvkGDRo0wLBhw/SSKyIiAi4uLvjkk0/0cj0iOSiVyiIT77KzsxEXF6e1tzIzMxNBQUG4du1aicZykn6x0CQqg/nz52P9+vUwNTVF//790bhxY97OrMBOnDiBDz/8ENbW1liwYAF69eqFIUOG4J9//lFrN3DgQEydOhV79uzRW6FZpUoVODo66uVaRHJZunQpvvzyS7VtFy9ehJubW4mOHzt2rA5SUVmw0CQqgw0bNkChUODYsWNo06aN3HGojJYsWQIA2LJlC7p37w6gaG8mADg5OcHV1RU3btzQW7aePXvi119/RXJyMqpUqaK36xLpk52dHWrWrKn6ODY2FqampnByctLYXpIkWFhYwMPDA4MGDcLw4cP1FZVKiLPOicrA0tISbm5uei04SHeqVq0KMzMzxMfHq7a1adMGZ86cKTKDu1WrVggPD0daWppesj1+/BitWrVCrVq1sHnzZvZuVgAdO3Ys8zkkScLx48fLIU3FpFAo4O/vj+DgYLmj0L/EHk2iMnBzc+OkjFdIRkZGiZepysnJ0evyQStWrMDbb7+NlStXwtPTE926dXvpxLP//ve/estHRQUGBmrdV9hTrqmv5/l9r/swnF9//ZV/VFVw7NEkKoNvvvkGn332Ga5cuQIfHx+541AZubu7IyUlRW0ygqYezaysLFSuXBnu7u64fv26XrIpFApIkvTS5VsK20iSxHU0ZaZt7cdTp07hyy+/hL29PcaMGYP69evD0dERiYmJCA8Px/r165GSkoLPP/8cb775Jtq1a6fn5ETlhz2aRGUwY8YMXLx4ET169MCKFSvQs2dPuSNRGXTo0AG//fYb1q9fjzFjxmht98MPPyArK0v1mEp9+OKLL/R2LSofmgrES5cuYcGCBejfvz9+/fVXmJmZFWnzxRdf4N1338X8+fNx+vRpfUQ1WEqlEtHR0ahSpQpcXFzU9u3evRtr1qzB/fv30axZM3z55ZeoUaOGTElJG/ZoEpVB4Ris06dPIy8vD5UrV37p7czXebyVobt16xYaNWoEIyMjLF68GKNGjUK3bt1UPZqpqan48ccf8dVXX8Hc3Bw3btyAq6ur3LGpAunZsyeCg4ORkJBQ7ONDnz59CmdnZ7Rr1w779+/XY0LDMnfuXHz11VdYs2aN2h9/v/32G8aMGaPWw+/q6oqwsDC9rW1LJcNCk6gMSjs+k7czDd+OHTswatQo5OTkwMjICEZGRsjJyYGLiwsSEhJQUFAAU1NTbNu2Db1795Y7LlUwVatWhYeHB86fP//Sti1atMCdO3eQlJSkh2SG6c0338SFCxfw+PFjWFlZqba7u7sjNjYWM2fORKtWrfDDDz8gMDAQCxYswKeffipjYnoRC02iMijJ83dfxPFWhi8sLAxz587FoUOHkJWVpdpuYmKCgIAAfPXVV7IvDJ2ZmYnbt28jPT0d1tbW8PT05FNRKgArKyvY2dnh3r17L21bo0YNpKamIiMjQw/JDJOLiwtMTEwQHR2t2nb58mU0b94cHTt2xF9//QXg2VOBXFxc4OPjgwsXLsiUljThGE2iMmDR+Gry8fHBrl27kJubi4iICKSlpcHKygp16tSRvZg7cuQIFi5cWGSCkpGREfz9/fHpp5+ia9euMiak4vj6+uLcuXNYtWoV3n//fa3tVq9ejfv376NVq1Z6TGd4kpOT0bhxY7VtQUFBkCQJffr0UW2rUqUKvLy8EBMTo9+A9FJcl4WISAsTExM0aNAAb7zxBnx9fWUvMufOnYu3334bwcHByMvLg4mJCapXrw4TExPk5eUhMDAQ3bp1w9y5c2XNSdrNmDEDQghMnjwZQ4YMQVBQEBITEyGEQGJiIoKDgzF06FBMmjQJkiRhxowZckeWlampKR4/fqy2rXBNzbZt26ptt7CwwJMnT/SWjUqGt86Jysnp06cRFBSE+Ph4ZGVlqT3TPDo6Gjk5OfDy8pIxIZWWId2ePnz4MN5++20YGRlhwoQJmDJlCurUqaPaHxkZiR9++AG//PIL8vPzcfDgQb3OiqeSW7x4MWbPno2CggKN+4UQUCgUmD9//ms/3rBFixa4dOkSwsPD4eXlhZSUFLi6usLS0hKJiYlqbV1dXWFsbIy7d+/KlJY0EkRUJpGRkaJFixZCoVAIhUIhJEkSCoVCrc3EiROFQqEQwcHBMqWk0jh48KBo3769MDExUX1dFQqFMDY2Fu3btxd//vmn3jN169ZNKBQKsXHjxmLbbdq0SUiSJLp166anZPRvhISEiOHDhwtHR0chSZLq5ejoKIYPHy4uXbokd0SDsGzZMiFJkqhVq5aYPn26aNy4sVAoFOKjjz5SaxcdHS0kSRIBAQEyJSVtWGgSlcGDBw9E9erVhSRJokWLFuLLL78UderUKVJonjt3TkiSJKZMmSJPUCqxKVOmqP5gkCRJmJubC1dXV2Fubq7aplAoxH/+8x+95qpataqoWbNmidrWrFlTVKlSRceJqLykpqaKe/fuidTUVLmjGJy8vDzRv39/tWK8VatWRT5XX331lZAkSXz77bcyJSVtOEaTqAy+/vprJCQkYNKkSTh79iz++9//anxcWosWLWBtbY0zZ87IkJJK6tdff8WPP/4IY2NjTJs2DVFRUcjMzERsbKzqNvq0adNgYmKCn376CevXr9dbtvT09BI/is/R0ZFj1SoQW1tbuLi4wNbWVu4oBsfIyAg7d+7ExYsX8fvvv+PUqVM4c+ZMkc+Vh4cHli5diiFDhsiUlLThGE2iMvD09ERiYiKSkpJUT/jQ9MhCAGjSpAkePHiAhIQEOaJSCTRt2hShoaH4448/0K9fP63t9uzZg/79+6NJkya4dOmSXrJ5eHggKSkJCQkJqFSpktZ2T548gZOTExwcHHDnzh29ZKN/Jy4uDidPnkR8fDwyMzPx+eefq/bl5uZCCAFTU1MZExKVHXs0icogPj4ederU0fgYuReZmZkhJSVFD6no37p58yZq1apVbJEJAH379oWbmxvCw8P1lAwICAhARkYGxo8fj5ycHI1tcnJyMG7cODx9+hRvvfWW3rJR6SQlJWHQoEFwd3fHiBEj8Omnn2LevHlqbd59911YWFjo7Q8ZIl3hOppEZWBlZYVHjx6VqG1sbCyqVKmi40RUFtbW1iX+GlWpUgVPnz7VcaL/89lnn2H79u3Yvn07AgMDMX78eHh7e6NatWpITEzEjRs3sGbNGjx8+BC2traYNWuW3rJRyaWnp6Ndu3YIDw+Hq6srOnfujGPHjiE+Pl6t3bhx47B161bs3r0bzZo1kymt/DZu3FjqY0aOHKmDJPRvsdAkKoMmTZrg77//RlhYGHx8fLS2CwoKwoMHD9C3b189pqPSat++Pf73v//h8ePHqFy5stZ2ycnJuH79Onr16qW3bK6urjh06BAGDhyIuLg4zJ8/v0gbIQRq1qyJHTt28BnsBmrx4sUIDw9H//79sXHjRlhYWKBNmzZFCs22bdvCwsICJ06ckCmpYRg9ejQkSSpRWyEEJElioWlgWGgSlcHYsWNx/PhxjBkzBvv374ezs3ORNrdv38aYMWMgSRLGjx8vQ0oqqfnz5+Po0aMYNGgQtm7dCgcHhyJtHj16hKFDh8Lc3FxjsadLLVu2xM2bN7F161YcPXoUERERyMjIgJWVFby8vBAQEIAhQ4bIvrA8abdz506YmZlh7dq1xX6dFAoFateujdjYWD2mMzwjR47UWmg+efIEUVFRCA0NhYmJCd555x2YmJjoOSG9DCcDEZXRwIEDsXPnTtja2iIgIAD//PMP7t27h9mzZ+PatWs4ePAgcnJyMGLECPz2229yx6VibNy4EREREVi8eDGMjY3Rr18/1K9fH9WqVcOjR48QHh6OXbt2IT8/HzNmzNC6AD97VEgbCwsLeHl5ITQ0VLVN2wTC1q1bIyQkBFlZWfqOWaFcvHgRo0ePhoODA44eParXYjMlJQVRUVGwsLCAt7c3FIrip76EhoYiLS2tyFONXmUsNInKKC8vD//973+xbNkyZGdnq7ZLkqSaNTp16lQsWLAARkZGMiall1EoFKqvW6Hne1O0bX/RiwUDUSF7e3vY29urrQigrdB0c3NDZmYmHj58qO+YFU5kZCTq16+P2bNnF5lYpQspKSmYMGEC9uzZo3rCk729PaZNm4ZPPvkExsaabxi3adMG//zzD/Ly8nSe0VCw0CQqJ0lJSTh48CDCwsKQlpYGKysreHt7o3v37hpvqZPhKc14sOL8+uuv5ZCGXkX+/v44d+4coqKiUKtWLQCaC80rV66gadOmeOutt3Dw4EG54lYovr6+ePr0KaKionR6nZycHLRq1QqhoaF4sYSSJAlNmjTBnj17NI6T1vZHxauMYzSJyknVqlV5y7SC27Bhg9wRivX48WN89913OHToEO7cuYOMjAytbSVJ0luvSUFBASIjI/H48WPk5uZqbfc63S7UZvjw4Thz5gzee+897NmzB5aWlkXapKSkYOzYsZzYUkq5ublFJlXpws8//4wrV66gWrVq+PHHH9G1a1dkZWVh+/btmD9/Pi5fvow33ngDR48eRf369XWex9CxR5OIqAK4e/cu2rRpg4SEhCK9KNoU3tLTlUePHuHTTz/Fjh07XrrUkz4LX0OWn5+Pjh074uTJk3B3d8eAAQOwe/du3L59G2vWrMG1a9ewefNmJCUloWvXrjh8+LDckSuECxcuoHXr1nBxcUFMTIxOr/XGG2/g3Llz+Ouvv9ChQwe1fQkJCRg4cCBOnz6NqlWr4tChQ2rLU72OPZosNInKQVhYGH788UcEBQUhPj4e2dnZar9UV69ejZiYGHz66aewsbGRMSmVVkZGBtLT02FtbQ0rKyvZchROOvP19cX8+fPh5+eHatWqlcut/n8jOTkZfn5+iImJQY0aNZCWlob09HS88cYbiIuLQ3x8PPLz82FhYYEWLVoAwGu/VE+h9PR0vPfee9i+fbvamODn/3/gwIFYt25dsU+Beh0EBwdr3SeEwKNHj3DhwgWsWbMGaWlpmDZtGr799ludZrKzs4ONjY3WFQFyc3MxatQobNu2DTY2Nvjf//6HNm3aAHg9C03o8bnqRK+kFStWCBMTEyFJkuqlUCjU2vzwww9CoVCIzZs3y5SSSiMsLEyMGjVKODs7C4VCoXo5OzuLd999V4SFhek9U+XKlYWFhYV4+PCh3q+tySeffCIkSRIffvihEEIIf39/te/75ORkMWvWLGFqaipGjRolU0rDdvXqVTFv3jzxzjvviC5duoi+ffuK2bNni4sXL8odzWAU/jwt7lX4c7dLly7iyZMnOs9kamoqWrRo8dJ2kyZNEpIkCUtLS3Hw4EEhRNF/J68D9mgSlcGJEyfQuXNnWFtbY8GCBejVqxeGDBmCf/75R+0v1gcPHqB69ero168fdu7cKWNiepl169Zh0qRJqmdNa2JqaoqffvoJY8eO1VuuSpUqoW7durh8+bLerlkcb29vxMTEICEhATY2Nlp7ajZu3Ih3330Xy5cvxwcffCBTWqqo2rdvr7XXXpIkVKpUCR4eHujWrZveHrvq6uqK/Px83L9//6VtZ8+ejYULF8LMzAy//fYbli9fzh5NIiq57t27C4VCIQ4cOKDapu0v1po1a4r69evrMx6V0tmzZ4WRkZGQJEl0795dHD16VMTHx4u8vDwRHx8vjh49Krp37y4kSRLGxsbi3LlzesvWrFkz4e7urrfrvYylpaXw9vZWfdy2bVuhUChETk5OkbYuLi6iadOm+oxHpDM9e/YUCoVCXLt2rUTtFy9erPqZYWtr+9r1aBa/sigRFevs2bNwcnJC9+7dX9rW2dlZLzMi6d/79ttvIYTA119/jQMHDqBLly6oXr06jIyMUL16dXTp0gUHDhzAokWLkJ+fr/OxYM+bOnUqoqOjcfToUb1dszgmJiZqM6atra0BPOu9f5GzszMiIyP1lq0iSUlJQVxcHGJjY7W+yLB07NgRQgisW7euRO1nzJiBVatWQQiB9PR0HaczPCw0icogIyMDTk5OJWqbk5Pzet0uqYBOnToFBwcHfPrpp8W2mzFjBqpVq4aTJ0/qKdmzZXE+/fRTDBo0CD/88IPsv7Bq1KiBhIQE1ceFT0l68XPy5MkTREZGyjZpSZuHDx8iJCTkpbPldSEiIgJDhw5F5cqVUbVqVbi5ucHd3V3jy8PDQ+/5qHgDBgyAi4sL/vzzT6SmppbomPfeew+///671oXcX2Uco0kVyv379xEfH4/MzEyDWJPP3d0dKSkpaj9sNI1Vy8rKQuXKleHu7o7r16/LkJRKwszMDI0bN8a5c+de2rZly5YIDQ3V2+MBCwuOe/fuqb63qlatqnVWsiRJuH37ts7yjBkzBps2bcKjR49gZ2eHEydOoFOnTnB2dsZvv/2G1q1b4+HDh5g+fTr279+PLl266HWpnnPnzmH79u3o1KmT2h0HpVKJESNG4MCBAwCejX394Ycf8O677+ol15UrV9CuXTtkZGRACAFzc3M4ODgU++jCu3fv6iWbIdq4cWOJ2xoZGcHa2hpubm5o0KABn8RmKOS8b09UUj///LOoXbu2apahkZGR2v5p06aJ1q1bi5iYGL3mevfdd4VCoRDr1q1TbdM0RnPRokVCkiTx0Ucf6TUflU6NGjVE5cqVRW5ubrHtcnJyROXKlYWLi4uekgm1VQ1K8tL1OLC9e/cKSZLExo0bVdv69OlTZJawJEnC3NxcXLhwQad5XjRu3DihUChEUFCQ2vb33ntPSJIkjIyMROXKlVX/f/XqVb3k6tatm5AkSXTu3FmW1QsqmpLMOtf0qly5spgxY4ZeZqFT8VhokkErKCgQAwcOVP3w8PDwEDY2NkV+iW7fvl1IkiSWLl2q13w3b94UZmZmwtLSUqxYsUKkp6erFZopKSli3rx5wtjYWFhZWYnY2Fi95qPSGT58uFAoFGLatGnFtvvoo4+EQqEQI0aM0FMyIaKjo0v90qX8/Hxx7949kZaWptqWk5Mj5s2bJ+rWrSvMzMyEnZ2d6NGjh7h06ZJOs2hSv359YW1trbYtPT1dWFhYCBsbGxEeHi6EeLb0mCRJeluCydbWVlhbW4uMjAy9XK+iGzVqlBg6dKgwNTUVkiQJDw8P0bt3bzF8+HDRu3dv4eHhISRJEmZmZmLIkCFiwIABwsfHR1WgtmzZUmRmZsr9Nl5rLDTJoK1Zs0ZIkiQaNmyo6nHQ1GOYkZEhjI2NRefOnfWecfv27cLc3FwoFAphYmKi+n9XV1dhbGwsFAqFMDc3F3v37tV7Niqd69evq75+zZo1E+vXrxdnz54Vd+7cEWfPnhXr168XTZs2VX1Nr1+/Lndk0qJy5cpqs+KFEOLAgQNCkiQxfvx41bb8/Hzh4OAg6tWrp5dc1tbWonnz5nq51qvgyZMnokWLFsLNzU0EBgZqbBMUFCTc3d1FixYtVD2Y58+fF25ubkKhUIjvvvtOZ/mePn0qdu/eLT755BPRu3dv0bZtW+Hn5yc6dOgghgwZIr777jtx8+ZNnV2/ImChSQatVatWwsjISNX7IIT25YPq1q0rPDw89BlP5erVq6Jfv37CwsJC7falqamp6Nmzp7hy5Yosuaj09u3bJ2xtbbXespMkSdja2or9+/fLHZWKYWJiUqSg+/TTT4VCoRA7duxQ2+7n5ycqVaqkl1xt2rTR65CLim7mzJnCyMhI3Lhxo9h2169fFwqFQsyYMUO17dy5c0KSJOHn51fuufLz88X8+fNVyxVpW0S+8OOuXbuKyMjIcs9REXAyEBk0GxsbODk5ISIiQrVN28LQrVu3RmhoqCyzSAvl5uYiIiICaWlpsLKyQp06dWBhYSFbHvp3EhIS8NNPP+HYsWOIiIhARkYGrKys4OXlhYCAAEycOBHOzs6y5bt7964qW+HjMb28vNClSxe4u7vrPU9cXBxOnjypmqj3+eefq/YVLnxvamqq10zOzs7Iz8/Hw4cPVTPe/fz8cPnyZSQkJKBatWqqtk2aNEFMTAweP36s81yHDh1Cjx49sGHDBowYMULn16voPDw8YGVlhatXr760baNGjZCeno47d+6otrm7u+Px48dIS0srt0xCCPTp0wcHDhyAEAIuLi5wcXFBfHw84uPjIUkSBg4ciNq1a+PixYsIDAxEdnY2rKyscODAAYOYyKpXspa5RC9RqVIl0bBhQ7Vt2no0vb29hZ2dnb6iCSGEcHNzE76+viI7O1uv131VxMfHi/PnzxeZsCGXmJgYERMTI/Lz8+WOotHjx4/F4MGDhZGRkVrPyfOT5IYOHSoeP36slzyPHj0SAwcOVMvz4r/NYcOGCYVCoffHKvbp00coFAqxevVqIYQQx44dE5IkiSZNmqi1KygoEFZWVkVus+vSqlWrhJWVlZg6daoICwsTT58+1du1Kxpzc3PRqFGjErVt1KiRMDc3V9vWokWLItvK6ueff1YN6Tp79qzavnPnzon69esLc3NzERoaKoR49u928uTJQpIkUaVKFYN5jKy+sNAkg9awYUNhbm4u0tPTVds0FZoJCQnCyMhItG7dWq/5LC0tdXJb5lVnqKsISJIknJycDLLQfPr0qWjSpImquHzjjTfE+PHjxZw5c8T48ePFG2+8oSo6mzZtqvMJEEqlUnh7ewtJkkTNmjXFmDFjhKura5F/mydOnBCSJInPPvtMp3ledPLkSVUBXKVKFdX/b968Wa1dYGCgXicDlXb29Iv/Nl43bm5uwtjYWNy6davYdrdu3RJGRkbCzc1Nbbuzs7OoXr16uWby8/MTJiYmWn8+FWbp27ev2vaZM2cKSZLEJ598Uq55DB0LTTJon332WZFlgTQVmkOHDhUKhUIsXrxYr/l8fHxEnTp19HrNiszQVxGws7MTLVu21Os1S2rBggVCkiRRv359rUsFXbhwQXh7ewuFQiEWLlyo0zxz5swRkiSJd955R9Ujp+nfZn5+vrC0tNT7H4FCPFuCqfCP1Tp16oiff/65SJvBgwcLSZLEli1b9JKptMtUSZKkl1yGasaMGUKSJFGnTh3xzz//aGxz9uxZUadOHaFQKNSKuPj4eCFJkujQoUO5ZrK2ti7SM/6iunXrisqVK6tty8jIEBYWFq/do4hZaJJBe/z4sXBxcREKhUK888474tChQ6J58+ZCoVCIO3fuiH379olOnToJSZKEp6en3pcM+eabb4RCodDrM68rMkNfRaBNmzbC2dlZr9csqUaNGgljY2Nx+/btYttFRUUJY2PjEt9u/Lfq1asnzM3NRWpqqmqbtmEtvr6+BjsBRqlUitTUVIPsxaZnS1I1adJE1Vtft25dMWDAADFmzBgxcOBAUa9ePVUvf9OmTdV+BxR2VHz77bflmsnKyqrIkK4Xubu7C0tLyyLbmzRporeJZ4aChSYZvGvXrglPT89iZwF7enrKsoREXl6e6NGjh3BychJ79+4VBQUFes9QkRj6KgI7d+4UkiSpLcBvKCpVqiQaN25coraNGzfW+S8zc3Nz4evrq7ZN29eyVatWwszMTKd56NWlVCrFpEmTiqzqUfiysLAQkydPFkqlUi95mjVrJoyMjLQuuF84271BgwZF9jVo0EDY2trqOKFhef0eukkVToMGDXD16lWsW7cOe/bsQVhYmGpWt7e3N/r164cJEyZofRSfLnXp0gVCCCQlJaFfv36wtbVFnTp1in0s4PHjx/Wc0nBcv34dHh4eqFev3kvb2tvbIzQ0VA+p/k///v2xaNEiTJo0CWFhYRgxYgTq169vECsHGBkZITc3t0Rtc3Nzi32kYXkwNzcv8fPWExISYGtrq9M89OqytrbGihUr8PXXX+PkyZOIjIzEkydPUKlSJXh5ecHf3x82NjZ6yzNkyBBcvnwZ3bp1w9KlS9GnTx8YGxsjLy8P+/btw9SpUyFJEvr37692XH5+Pu7evfvaPb+eyxsRlUFpf5lLklRkWabXiZWVFdzd3REWFqbapm25qgYNGuD+/ftISUnRW77SPhtZkiTk5eXpKI26Fi1a4NKlS7h8+TIaNWqktd2VK1fQtGlT+Pn5leiZ7f+Wv78/zp07h6ioKNSqVQuA5q9lYZ633noLBw8e1Fkebe7evYvt27cjNDQUjx8/1lqsv+5/BFLJ5eTkoG3btjh//jwkSYJCoYCDgwOSkpKQn58PIQTq1auHc+fOwdraWnXcvn370LdvX7z//vv4+eefZXwH+sUeTaIyOHHihNwRKhR3d3dERUWp1qXU5sGDB7h16xZatGihx3TP1sfTZfuyGDFiBC5evIgePXrg559/Rs+ePYu02b9/PyZPngxJknS+RuPw4cNx5swZvPfee9izZw8sLS2LtElJScHYsWMhSRJGjhyp0zyafPvtt5g9ezby8vJUa2k+/zV7flvh/+taaf6YUSgUsLa2hpubG/z9/TFu3Dj4+vrqMB2VhKmpKf766y98+OGH2LRpE/Lz8/HgwQMAz76n+vXrh5UrV6oVmQDg6emJPXv2FPuH4quIPZpEpDezZ8/GwoULMXXqVHz//fcANPeCDRs2DNu2bcOiRYswY8YMueIalLy8PAQEBODEiROQJAk1a9ZEvXr1UK1aNSQmJiI8PBxxcXEQQqBjx444cuRIqXtoSyM/Px8dO3bEyZMn4e7ujgEDBmD37t24ffs21qxZg2vXrmHz5s1ISkpC165dcfjwYZ1l0eTgwYPo0aMHnJ2d8dVXX2HZsmW4fv06jh49iri4OISGhmLdunXIz8/HokWL4Ovri3bt2uk8V1mGNBgZGeHrr79+rf5NBAcHl/oYfS6I/vjxY5w7dw4pKSmwtbVF8+bN4ejoqLfrVwgyjQ0lKrGcnByxevVq0adPH+Hr6ys8PDyEu7u7xpdcj6CkkjH0VQQMXWZmppg+fbqoVKmSxkkRlSpVEh9//LHO19AspFQqVcsDvfjYvcL/HzRokCxfx7feeksoFApx6tQpIYTmiUpJSUmiXbt2wtbWVkREROgt29KlS4W5ubkYPXq0CAoKEikpKSIvL0+kpKSI4OBg8e677wpzc3OxdOlS8eTJE3Hp0iUxadIk1Vqgf/31l96yyk3bJFCuO1pxsEeTDFpSUhI6duyI69evl+g25es+BrIiuH79Onr37o07d+5ovF0phICHhwf+/PNP1K1bV4aEhi89PR2nTp0q8nhMf3//Irfr9CEsLEzjRL2+ffuiWbNmes8DANWqVYOpqSnu3bsHQPtY4ISEBNSqVQsDBw7E5s2bdZ5r165dGDhwIFasWIGJEydqbbdy5UpMnjwZ27Ztw4ABAwAA33//PT7++GP07t0be/bs0XlWQ9C+fXutwxqePHmC27dvIyUlBaampmjdujUADmkyNCw0yaCNGzcO69evR40aNfDJJ5/Az88P1apVK/b2U+HEBH3o2LFjidsaGRmpxlu9+eab6Nmzp86f/3z06FEcOnQId+7cQUZGhtZiXd8TIZ4+fWqQqwhQyW3cuBEAMGjQIJiZmcmcpihzc3M0atRINSGqS5cu+Pvvv5GSklJkhnLTpk3x8OFDxMfH6zxX69atERcXpyqAi1OjRg3UqFEDZ8+eBfBs+ETVqlVhYWGBhIQEXUetMHbt2oUpU6agXbt22LJli96vHxISgrNnz+LWrVtISUlBZmYmrKys4OTkhGbNmqFz586v96oLMvamEr2Uo6OjMDU1FZGRkXJH0ej5W4XP3y58/qVpn0KhEK6uriIwMFAnuZ48eSICAgKKzfViHuLjAUtDoVCI2rVryx1DK1dXV1GvXj3Vx4VPDzt//nyRtnXq1NHbOp9WVlaiRYsWJWrbokULYW1trbbNz89PmJqa6iJahXb+/Hm9P01s8+bNao/SfXF958L/t7S0FO+9955ITk7WWzZDotuF1ojKKC0tDXXr1kXt2rXljqLRiRMnsGDBAhgbG8PT0xNz587F7t27cezYMezevRvz5s1DnTp1YGJigq+//hoHDhzAt99+i4YNG+LevXvo0aMHIiMjyz3Xf//7Xxw9ehTW1tb46KOP8Pvvv+P48eM4ceKExtfff/9d7hkqIvHsIRYlfhUUFOgt2/79++Hh4YElS5YU227JkiXw8PDQ+VJCDg4OsLe31+k1ysLT0xP3799XfdyyZUsIIbBy5Uq1dsePH0dUVBRcXFz0ksvExAQRERHIzs4utl12djYiIiJgbKy+OIxSqZRleISh8/Pzg5eXF9asWaOX602aNAkjR47E7du3IUkSHBwcIEmS6q5Rr169MHToUHh5eSEzMxNr166Fj48PwsPD9ZLPoMhR3RKVVMOGDUXdunXljqFVSEiIsLS0FKNGjRK5ubka2+Tl5YnRo0cLCwsLcenSJSHEs+c/Dx8+XEiSJMaPH1/uuVxdXYWxsTEfjVmOnjx5IkJDQ8XkyZNFpUqVxOrVq/V6/f79+wuFQvHSR1BGRkaqJuHoOo+NjY3eJh6V1qJFi9R6MB89eiTs7OyEQqEQb775pvj444/FiBEjhJmZmVAoFOKLL77QS64ePXoIhUIhxo0bp/WxlwUFBWL8+PFCkiTRs2dP1fbs7GxhamoqfHx89JK1ovHx8REWFhY6v87vv/8uJEkSLi4uYvv27SInJ0cI8Wzi6vbt24WLi4uoXLmyiI6OFkI8e7pdz549hSRJombNmiI9PV3nGQ0JC00yaMuWLRMKhUJcvHhR7iga9erVS9ja2r70l21mZqawtbVV+6Xx+PFjYWpqKtzd3cs9l7m5uahfv365n7c8vAqrCGzYsEEoFApx8OBBvV3Tw8NDODk5laitk5OT8PT01GmeK1euCHNzczFx4kSdXuffunPnjhg3bpw4fPiwatuRI0eEvb19kaEjAwYM0PqHYnm7fPmyMDc3FwqFQnh7e4uFCxeKP//8UwQHB4uDBw+KRYsWiYYNGwqFQiHMzc1FSEiI6tht27YJSZLEhx9+qJesFUlkZKQwMTEp8b+Rsmjbtq0wMjISV69e1bj/7NmzQpIkMXLkSLXto0aNEgqFQnz11Vc6z2hIOBmIDJoQAiNGjEBQUBBWrFiB3r17yx1JjYODAzw8PEr0BJaWLVvi9u3bSEpKUm1r3Lgxbt26hczMzHLNVadOHVhYWODq1avlet6yepVWEXBxcYGnp+e/Wufv37CwsICvr2+JvtdatGiB69ev48mTJzrLExwcjKCgIHz55Zfw9fXFsGHDUL9+/WIncelzfUNt0tLScOjQIURHR8PCwgJt2rRB06ZN9Zrhr7/+wogRI/Dw4UOtKy84OTlh06ZN6NSpk2p7YGAgYmJi0KZNm9fmMYaxsbFa9wkh8OjRI1y4cAGLFy9GbGwsxo4di19++UWnmezt7VGjRg21J5y9yMPDA0+fPlUt5A48W3PTyckJDRo0QEhIiE4zGhIWmmQwipvBffr0aeTl5cHe3h6enp4G8yxxKysr2NvbIy4u7qVtXV1dkZKSgoyMDNW2Jk2a4M6dO0hLSyvXXHPnzsX8+fNx69YteHp6luu5y8LQVxEojebNmyMiIgJKpVIv16tatSpsbGxw586dl7b18PBAamoqHj9+rLM8CoVCbUzay56so8/HdQLPCpSaNWvq7XqllZGRga1bt+LYsWNFnt3dpUsXDBkypNinZ70uCr/PXkYIgQYNGuDEiROoWrWqTjNVqlQJdevWxeXLl7W2qVevHmJiYop0IjRq1AjR0dHl/jPfkLHQJINRlidmFNJ3D1irVq1w4cIFrFq1CuPHj9fabu3atXjvvffQsmVL/PPPP6rtNjY2cHR0LPcJQTk5OejatSuSk5OxceNGNGnSpFzP/285OTkhJSUF169fN9gJXiXx5MkTODk5wcjICKmpqXq5ZocOHRAcHIxz586hefPmWttdvHgRLVq0gL+/v057W4tb31Abfa5vaGRkhFq1aqFt27Zo164d2rZta1B/dFHJuLm5af0+kyQJlSpVgoeHB7p164Z3331XL0tt+fj44NatW7h9+zZcXV2L7I+MjET9+vXh7u5e5Gd748aNcefOHb39gWoI+KxzMhgVcZHdadOmYfDgwZg4cSIuXbqE0aNHw9fXF5aWlsjMzMTVq1exYcMGrF27FpIkYfr06apjT506hYyMDJ0MB3j//ffh6uqKf/75B35+fmjcuPFLe4LXrVtX7jleZOirCJREeHg4pk2bhqdPn+Ktt97S23WHDh2KoKAgDBs2DIcOHdJ46/Tu3bsYNmwYJEnC0KFDdZonMDBQp+cvK1dXV0RHRyM6OhqbNm0CADg7O6Ndu3aqwrNevXoypzRcQghcvXr1pWvwAtDpc+yjo6N1du5/q2/fvpg/fz569OiBtWvXws/PT7Xv4sWLGD16NIQQ6NmzZ5Fj7969i+rVq+szruzYo0lURl999RXmzZun9oP4+VuKQghIkoR58+Zhzpw5qjbr16/HyZMnMXr06HJ/xvKLtzVfRl89wT4+PsjNzcXNmzd1fq1/o7hxb4XjwTIzMyGEgJWVFU6ePIlGjRrpJVt+fj7atWuHM2fOwNzcHP369UPLli1hZ2eH1NRUnD17Fnv37kVmZibeeOMNBAUF6fRZ5yUVHx+PLVu2YPPmzXofMxwXF4egoCAEBQUhODhY1btU2EPm4OCg1uPp4+Oj13yGauvWrZg5c6ba8lDFMYRx1Pr8PlMqlWjevDmioqIgSRJq1KgBFxcXxMfH4969exBCwNnZGSEhIahWrZrquL///hudO3fGqFGj8Ouvv+o0oyFhoUkGpWPHjvD19cWyZcvkjlIqFy5cwPfff4/jx4+rTfapWrUqunTpgqlTp6r91atrv/32W6mPGTVqlA6SqPvhhx8wbdo0nD9/XrZHExanJMM3bG1tERAQgHnz5un9EZmpqal49913sW/fPgDq4yILf5T37dsX69atg52dnV6zPS8jIwO7du3Cpk2bEBgYqMomd0Hy4MEDBAYGIjg4GMHBwWprGupiDGnhuPNatWqpCovSPE2sMJc+x53/8ccfGDRoEIBnQ10aNWr00nHUchVNcn6f3b9/H8OHD9fYs9+sWTNs2bIFXl5eattPnz6Nixcvon379nr7A9UQsNAkg6JQKHQ+tkzXUlNTVQP75fxlb4gMfRWBmJgYrfsKx4NVqVJFj4k0u3jxIvbt24fw8HDVAt4NGjRAnz599D6DulBBQQGOHj2KTZs2Yd++faqeX+DZpLfhw4fjo48+kiXbi+7fv48TJ05g586d2L9/v+quQ3kXKIXFWb169XDjxg21bSWl73HnzZs3R0hICGbOnIkvv/yyyILxcjO077OrV6/i9OnTSElJga2tLfz8/NCiRQu9Xb8iYKFJBuVVKDQNUUREBCIiIpCeng5ra2t4eXkV+Wu7vFXEVQSo9EJCQrBp0yb8/vvvSExMVP3SNzMzw7Rp0zB8+HDUr19f1oyxsbGqW+hBQUGqmftCCDg6Oqpun3/wwQflet2goCAAgKWlpeqORuG20ijvoTXFsbS0hI2NjdqyPIagInyfkWYsNMmgVNRC88mTJ/jf//6H0NBQPH78GLm5uRrb6WvSTaHVq1fjm2++0dhTV6tWLcyaNavY2fJlURFWEdi4cSMcHR0REBBQZJ9SqYSxsTEsLS01HrtixQrcuXMH33///WuX7d69e9iyZQs2bdqkuv0shIC9vT0GDBiAX375BU5OTiUe46cLv/76q6qwjI2NVRUmNWrUUBuXqe/hD4bOwcEB7u7uOH/+vNxRKsT3WWnFx8cjPz/foJfeKm8sNMmgVMRCc9u2bZg4caLachWa1hbU1e05bd59911s3LgRQgiYmZnB1dUVjo6OePjwIeLi4pCdnQ1JkjBy5EidjLH6Nz03muiyN0ehUKBNmzYasxa3DwDatGmDM2fO6OzraWjZMjIysHPnTmzatAlBQUGq571bWFigR48eGDZsGLp16wYTExMoFArZC4DCCXFOTk7o2rWrara5u7u7bJleRqlU4tChQ7h//z6aNm2q157MQgMGDMCxY8eQmJgIU1NTvV+/on2flZaDgwNSUlL0uqas3Axr8AVRBfPPP/9gxIgRsLCwwOzZs7F9+3ZERUVhzZo1iIuLQ2hoKP73v//BzMwMc+bM0duyFlu3bsVvv/2GSpUq4YsvvsD777+vtvhzRkYGVq1ahS+//BIbN25E165dMWTIkHLNoOmXZHBwMGxtbUs0EP7q1at6WaOyuL+15f473JCyOTo6IisrC0IIGBkZoVOnThg2bBj69etnsAuLCyGQmJiIsLAwVK5cGVWqVIGdnR3s7e1ly7R9+3Z88803+OCDDzBu3DjV9ps3b6Jr166Ij49XbRsxYgQ2bNig13zz58/HkSNH8Mknn8gyKbMifp+Vltw/V/SuPJ5jSVReJEkSCoXiX7+MjIz0mrdfv35CoVCI/fv3CyGE8Pf3FwqFQq1NeHi4aNiwoXBxcREPHjzQS6727dsLhUIhjhw5Umy7I0eOCEmSRIcOHfSSS5Ik0bZt2xK1bd++vc6/npIkiTZt2pR6nxCav9blydCyFf7brFy5sti6dasoKCgotq2zs3O5Xr+0zp49KxYvXiy6d+8u7OzsVPmNjIxEo0aNxIcffih27dolHj16pNdcffv2FQqFQty6dUtte0BAgJAkSdSuXVv07dtX2NjYCIVCIf7880+95gsKChJLliwRpqamomnTpmL58uXi4MGDIigoSOurPFW077PSqlq1qk5/bhgi3jong1LWcX36nqFZvXp15Ofn4+HDhwC037KMiIhA/fr1MX78eKxatUrnuQp7b0ryxCEvLy88evQIKSkpOs9VmqERhU/C0eXXs7g8L8uqj1vnhpTNx8cH169fB/Ds35mzszMGDRqEoUOHFlmqytBuaQohcOXKFQQGBiIoKAinTp3C48ePVUNb6tevj/bt22PFihU6z+Lp6QmlUolHjx6ptiUkJKBGjRpwdXXFzZs3YW5ujuDgYLRv3x7dunXDn3/+qfNcheR+tGhF+D77+uuvy3RsZmam7Et96RNvnZPB8fHxwY8//ih3jBJJTk6Gr6+v6uPCMU2FyxsV8vLyQoMGDXDo0CG95MrKyirx0ko2Nja4d++ebgP9C8nJybCwsJA7Bv1/YWFhCA0NxcaNG7Ft2zbcv38fy5Ytw7Jly1C7dm0MGzYMQ4cONcinPkmShCZNmqBJkyaqpW/Onz+PhQsXYv/+/bhx4wbCw8P1Umg+evQIderUUdt24sQJCCEwdOhQmJubAwDatm2LWrVqqa31qQ9t27Yt9aNFy1NF+D6bM2fOv/4cif8/Vv91wkKTDI6tra0sg+D/jSpVqiAzM1P1cdWqVQEAt2/fVitAAaj1fOpazZo1ce3aNSQlJakyafLo0SNcv34dtWrV0kkOpVJZZJxldnY24uLitI5TyszMRFBQEK5du/ZaLWpcETRq1AhLlizBt99+i7/++gu//fYb9u3bh8jISMybNw/z5s2TbR3Pl3ny5AlOnz6tmol+8eJF5Obmqr4P9TXxJScnp0hv1smTJyFJEjp06KC23dHREaGhoXrJVcgQHi1q6N9nRkZGKCgo+FfjRrdt24acnBwdJTNQ8t21JyrqZWPPDE3r1q1F5cqVVR8vWrRISJIkPvnkE7V2V65cEcbGxqJGjRp6yTVjxgwhSZLo2LGjSExM1Njm4cOHokOHDkKhUBTJW17mzp2rNoa2NGNwJUkSy5cv10muQoY2DrKk15c72/MyMjLEhg0bRKdOnYSRkZGQJElIkiSMjIxEx44dxa+//iqUSqVesjwvNTVVHDhwQMyYMUO0aNFCmJiYqL6vJEkSlpaWokOHDmLu3Lni77//FpmZmXrJ5eHhIaysrMSTJ09U29zc3ISpqanaNiGE8Pb2Fo6OjnrJZegM6fusUaNGJRoDr8nrOEaTPZpEZdClSxecO3cO169fR4MGDTB06FDMmzcP3333HeLj49G6dWs8fPgQP//8MwoKCtC/f3+95Pr000+xbds2BAYGolatWhgwYAC8vb1RrVo1JCYm4saNG/jjjz+QlZUFV1dXzJw5Uyc57Ozs1NaLi42NhampKZycnDS2lyQJFhYW8PDwwKBBgzB8+HCd5HpeYmIiNm7c+K/26ZohZytUqVIljBo1CqNGjcL9+/exefNmbN68GdeuXcOJEycQGBiISZMmoVevXvj999/1lqtq1aooKChQ9VhaWVnhjTfeUK2f2aJFC5iYmOgtT6HOnTtj7dq1+M9//oOPPvoIO3fuRExMDAICAtTWRc3MzERkZKROe/VjY2MBACYmJnB2dlbbVhr6WBPSkL7PWrRogbCwMFy8eBFdu3bV6bX05ZNPPkFSUpJu1nqWu9Ilel5F69G8du2a6Ny5s9i1a5dq24YNG4Spqala750kSaJ169YiPT1db9kiIyNF8+bNVX/5v9hbKEmSaNGihYiKitJbJkP7+pZllYPCY1/HbCUREhIiPvroI+Hs7CxLHnt7e9GzZ0/x3XffifPnz4u8vDy9Xl+bmJgYVa9W4dfK1NRUnDt3Tq3d9u3bhSRJ4qOPPtJZlsKvi7e3d5FthrrSx4vk+D5bs2aNkCRJ9OrVq9THVqlSRfZ/m5q4ubnp7PPHHk2iMmjQoAGOHTumtm3UqFFo06YNduzYgejoaFhYWMDf3x99+vSBkZGR3rLVrl0bFy5cwPHjx3H06FFEREQgIyMDVlZW8PLyQkBAQLGPidSFX3/9FY6Ojnq9ZnFq1qxpsAPzDTlbSTRu3BiNGzfGd999h6NHj2Lz5s16vX5ycrJBfv5q1qyJixcv4rvvvkNUVBRcXV0xadKkIj2XgYGBaNSoEXr37q3TLIUzu1/cVlHI8X3WuXNnTJkypdjx79rs379f65Pj5PTOO+8gKSlJJ+fm8kZEREREpBNlfxgxEREREZEGLDSJiIiISCdYaFKFlZ2djblz5yI7O1vuKGoMNRdguNkMNRdguNkMNRdguNkMNRdguNkMNRdguNkMNRcgTzaO0aQKS6lUwtbWFmlpabCxsZE7joqh5gIMN5uh5gIMN5uh5gIMN5uh5gIMN5uh5gIMN5uh5gLkycYeTSIiIiLSCRaaRERERKQTXEeT9KagoAD379+HtbV1uazTplQq1f5rKAw1F2C42Qw1F2C42Qw1F2C42Qw1F2C42Qw1F2C42Qw1F1C+2YQQSE9PR/Xq1aFQaO+35BhN0pt79+7B1dVV7hhERERUTuLi4lCjRg2t+9mjSXpjbW0NAPjmm29gbm4ucxp1a9askTuCVvXq1ZM7glZt2rSRO4JGf/zxh9wRtMrKypI7gkZ5eXlyR9CqcuXKckfQKCYmRu4IWjVp0kTuCBqFhobKHUErQ+13M9QOmry8PJw8eVL1u10bFpqkN4W3y83NzWFhYSFzGnX6fDRkaZmYmMgdQStD+zoWMjY23B9thprNUH/JAob7OePPjdIz5M+Zof4bMNTv/0IvGwrHyUBEREREpBMsNImIiIhIJ1hoEhEREZFOsNAkIiIiIp1goUlEREREOsFCk4iIiIh0goUmEREREekEC00iIiIi0gkWmkRERESkEyw06aXc3NwgSRI2bNggdxQiIiKqQFhoEhEREZFOsNAkIiIiIp1goUlEREREOsFCk4iIiIh0goUmEREREemEsdwB6NWVnZ2N7Oxs1cdKpVLGNERERKRv7NEknVm4cCFsbW1VL1dXV7kjERERkR6x0CSdmTVrFtLS0lSvuLg4uSMRERGRHvHWOemMmZkZzMzM5I5BREREMmGPJhERERHpBAtNIiIiItIJFppEREREpBMsNImIiIhIJ1hoEhEREZFOsNCkEvvPf/6DqlWran1du3ZN7ohERERkQLi8EZVYRkYGMjIytO7Py8vTYxoiIiIydCw06aWio6PljkBEREQVEG+dExEREZFOsNAkIiIiIp1goUlEREREOsFCk4iIiIh0goUmEREREekEC00iIiIi0gkWmkRERESkEyw0iYiIiEgnJCGEkDsEvR6USiVsbW3RpEkTGBkZyR1HzZEjR+SOoNW4cePkjqBVXFyc3BE0sre3lzuCVvfv35c7gkb37t2TO4JW9evXlzuCRkOHDpU7gla//PKL3BE0cnd3lzuCVnfv3pU7QoWSn5+P8PBwpKWlwcbGRms79mgSERERkU6w0CQiIiIinWChSUREREQ6wUKTiIiIiHSChSYRERER6QQLTSIiIiLSCRaaRERERKQTLDSJiIiISCdYaBIRERGRTrDQJCIiIiKdYKFJpRYYGAhJktC+fXu5oxAREZEBY6FJRERERDrBQpOIiIiIdIKFJhERERHpBAtNIiIiItIJFpoldO3aNXzxxRdo3bo1nJ2dYWpqCmdnZ/Tr1w9nzpzRelx8fDymTZsGb29vVKpUCba2tvDx8cHHH3+MyMjIIu2fPn2K7777Dq1atYKdnR0sLS1Rp04djBgxAkFBQUXaP3nyBPPnz4evry8qVaoEGxsbtGzZEj/99BPy8vKKtH9+Ik9eXh4WL14MHx8fWFpaws3NTa3tnj178MYbb6BSpUqoUqUKevTogYsXL5b+k0dERESvJWO5A1QUU6dOxfHjx2FnZwdnZ2dUr14dsbGx2LNnD/bv34+NGzdi6NChasccP34c/fr1g1KphImJCerXr4+CggLcuXMHS5YsgZWVFebOnatqHxsbi7feegvh4eEAgDp16sDa2hrR0dHYvHkz4uLiEBgYqGr/6NEjdOrUCWFhYVAoFGjYsCFyc3Nx/vx5nD9/Hvv27cP+/fthbm5e5P0IIdCnTx/8+eef8PT0hLe3N7KyslT7Fy9ejJkzZwKA6v0GBQXB398fc+bMKcfPLBEREb2q2KNZQu+//z6uXr2KlJQU3LhxA5cuXUJiYiL27t0LCwsLTJw4Eenp6ar2sbGx6N+/P5RKJUaOHIkHDx4gNDQUYWFhSE9Px4EDB9CsWTNV+/z8fPTr1w/h4eFo3rw5bty4gYiICFy6dAnJyckICQnBoEGD1DJNnDgRYWFhaNCgASIiIhAaGoobN27gwoULcHR0xLFjx/DFF19ofD+nT5/GhQsXcObMGURFReHixYuq3sqQkBB89tlnkCQJK1asQHx8PC5evIiEhAT06dMHX375pQ4+w0RERPSqYaFZQu+88w58fHzUtkmShN69e2Pq1KlQKpX43//+p9r3zTffIC0tDZ06dcKGDRtQuXJl1T6FQoHu3bujZ8+eqm27d+/GpUuXUK1aNRw+fBj169dXu1bjxo0xceJE1ceRkZHYvXs3AGDTpk3w9PRU7WvevDmWL18OAPjpp5/UCuBC+fn5WLlyJVq3bq3aVtjz+f333yM/Px/vvPMOJk2aBEmSAABWVlbYsGED7O3tS/Q5y87OhlKpVHsRERHR64OFZinExsZi0aJFGDhwIDp27Ah/f3/4+/tj+/btAIDQ0FBV23379gEAZsyYoSrUilPYfsyYMahSpcpL2x87dgxCCPj7+6NJkyZF9vfv3x81atTAkydPcPr06SL7bW1t0bt3b43nPnr0KACoFbaFzM3NMWbMmJfmA4CFCxfC1tZW9XJ1dS3RcURERPRq4BjNEvrtt9/w/vvvq41jfNHjx48BAOnp6YiPjwcAtGrVqkTnLxyXWdL2ERERAABvb2+N+xUKBerVq4d79+4hIiICb731ltr+OnXqwMjIqMhxqampSExMBIAivaqFtG1/0axZszBt2jTVx0qlksUmERHRa4Q9miVw+/ZtjB8/HllZWZg+fTpCQkKgVCpRUFAAIQTWrFkDAMjNzQUAtVvEtra2JbpG4TF2dnYlap+RkQEAqFatmtY2jo6OAKDx1nmlSpWKPS8AODg4FHvelzEzM4ONjY3ai4iIiF4f7NEsgR07diA3NxeDBw/Gd999V2R/XFyc2sfW1taq/09LSytRsVl4TGpqaokyWVlZAYCq91GThw8fFslT0vMCz2a1Ozk5FWlT3DWJiIiICrFHswSio6MBAG+88YbG/c+PzQQAGxsb1KhRAwBw9uzZEl2jQYMGpWrv5eUFALhx44bG/QUFBbh586Za25Kws7NT9ZIWHv+iwtv8RERERMVhoVkCFhYWAP6vh/B5N2/eVJttXqhPnz4AgCVLlpToGoXt169frxrrWZyuXbtCkiScOnUKISEhRfbv3r0b9+7dQ6VKlfDmm2+WKEOhLl26AABWrVpVZF92djbWr19fqvMRERHR64mFZgn4+/sDAH7++WdcuXJFtT0iIgIDBgyAqalpkWNmzJgBW1tbHDt2DGPHjkVKSopqX0FBAQ4ePIgDBw6otvXp0wfNmzdHYmIi3n77bdy6dUvtfKGhoVi5cqXq49q1a6Nfv34AgJEjR+LOnTuqfZcvX8aHH34IAJg8eXKpbp0DwEcffQSFQoEdO3Zg1apVEEIAePYUojFjxpSoECYiIiJioVkCffr0QatWrZCSkoLmzZvD29sbPj4+qFevHpKTkzU+KadmzZrYuXMnrK2tsX79ejg6OqJx48bw9fWFjY0NunfvrvY4RyMjI+zatQt169bFuXPnUK9ePdStWxfNmzdH1apV0bhxY9UySoVWrlwJHx8fXLt2DV5eXmjcuDEaNGiAZs2aISEhAZ07d1Z78lBJNWvWDPPnz4cQAhMnTkSNGjXg5+cHZ2dn7Nq1C59//nmpz0lERESvHxaaJWBsbIwjR47gP//5DxwdHREVFYXU1FSMHTsWly5dgouLi8bjOnfujGvXrmHy5MmoVasWbt68ibi4OHh6emLGjBkYMWKEWvuaNWvi0qVLWLhwIZo2bYr79+8jPDwclStXxqhRo/DVV1+ptXdwcMA///yDL7/8EvXr10dERARiYmLg5+eH5cuX4+DBgxofP1kSs2bNws6dO9GyZUukpKTg9u3baNOmDU6dOqXq4SUiIiIqjiQK74sS6ZhSqYStrS2aNGmicQ1POR05ckTuCFqNGzdO7ghavbjigqEo6dOr5HD//n25I2h07949uSNoVdK1e/Vt6NChckfQ6pdffpE7gkbu7u5yR9Dq7t27ckeoUPLz8xEeHo60tLRily9kjyYRERER6QQLTSIiIiLSCRaaRERERKQTLDSJiIiISCdYaBIRERGRTrDQJCIiIiKdYKFJRERERDrBQpOIiIiIdIKFJhERERHphLHcAej14+PjA1NTU7ljqGncuLHcEbQ6ceKE3BG0WrRokdwRNLpx44bcEbTKy8uTO4JGhvxvICEhQe4IGq1du1buCFq1bt1a7gganT9/Xu4IWhUUFMgdQaM6derIHUGj3NxchIeHv7QdezSJiIiISCdYaBIRERGRTrDQJCIiIiKdYKFJRERERDrBQpOIiIiIdIKFJhERERHpBAtNIiIiItIJFppEREREpBMsNImIiIhIJ1hoEhEREZFOsNAkIiIiIp1goUlEREREOsFCk4iIiIh0goUmEREREekEC00iIiIi0gkWmq+ImJgYTJgwAR4eHjAzM4O1tTU8PDzQt29fbNu2TdVu7ty5kCQJc+fO1XieDRs2QJIkjB49Wuv2J0+e4LPPPoOXlxfMzc3Rvn173b0xIiIiqrCM5Q5AZRcdHQ0/Pz8kJSXB0tISdevWhZGREWJjY7F3717cvXsXgwcPLpdrZWZmom3btggJCUG9evXg7e0NMzOzcjk3ERERvVpYaL4ClixZgqSkJIwaNQorVqyAlZWVat/NmzcRHBxcbtfatWsXPD09cf36ddSvXx8AkJWVpbFtdnY2srOzVR8rlcpyy0FERESGj4XmKyAyMhIAMG3aNLUiEwDq1auHevXqldu18vPz8fvvv6uKTAAwNzfX2HbhwoWYN29euV2biIiIKhaO0XwFuLq6AgB27twJIYROr9WgQQM0bdq0RG1nzZqFtLQ01SsuLk6n2YiIiMiwsNB8BUyaNAkmJib46quv4O7ujvfffx9btmzB/fv3y/1az/dkvoyZmRlsbGzUXkRERPT6YKH5CmjcuDGCg4PRtWtXxMfHY/Xq1Rg+fDhq1KiBgIAAhIeHl9u1KlWqVG7nIiIiolcbC81XRKtWrXDkyBGkpKTg8OHDmDlzJmrUqIGjR4+iS5cuSE1NBQBIkgQAWm+xP3nyRF+RiYiI6BXHQvMVY2VlhYCAACxatAg3b96Ep6cn4uPjcejQIQD/1yP56NEjjcdHRUXpLSsRERG92lhovsIsLS3h4+MDAKrxmh4eHgCACxcuFGn/5MkTtcXdiYiIiMqCheYrYOLEidi+fTuePn2qtj04OBjHjx8HANVM8Q4dOsDc3BwXL17EL7/8omqbmpqK0aNHIzk5WX/BiYiI6JXGQvMV8M8//2Dw4MGwtbWFt7c3WrZsCTc3N7Rr1w7p6ekYPnw4OnToAACwt7fH7NmzAQATJkxAjRo10Lx5c1SvXh0nT55U7SMiIiIqKxaar4ClS5diypQp8PX1RVJSEq5cuQIACAgIwP79+7Fx40a19nPmzMFPP/0Eb29vPHr0CHFxcXjnnXdw8eJF1KpVS4Z3QERERK8iPhnoFdChQwdVj2VJffDBB/jggw+KbB89ejRGjx5d4u1ERERE2rBHk4iIiIh0goUmEREREekEC00iIiIi0gkWmkRERESkEyw0iYiIiEgnWGgSERERkU6w0CQiIiIinWChSUREREQ6wQXbSe8CAgJgaWkpdww1x44dkzuCVkuWLJE7glbdu3eXO4JG165dkzuCVo8fP5Y7gkbW1tZyR9CqWrVqckfQ6MaNG3JH0MpQH7ARFBQkdwStkpOT5Y6gkaurq9wRNMrLyytRO/ZoEhEREZFOsNAkIiIiIp1goUlEREREOsFCk4iIiIh0goUmEREREekEC00iIiIi0gkWmkRERESkEyw0iYiIiEgnWGgSERERkU6w0CQiIiIinWChSXBzc4MkSYiOjpY7ChEREb1CWGgSERERkU6w0CQiIiIinWChSUREREQ6wUKTiIiIiHSChSZp9ODBAyxfvhwBAQFwc3ODubk57O3t0a5dO2zatEnueERERFQBsNAkjdauXYsPP/wQJ0+ehLGxMXx8fGBjY4Pg4GCMHDkSEydOlDsiERERGTgWmqRR+/bt8ffffyM9PR1RUVG4cOECYmJiEBoaivr162PVqlUICgqSOyYREREZMBaapJG/vz86dOgAIyMjte2+vr5Yvnw5AGDLli3FniM7OxtKpVLtRURERK8PY7kDkOFKT0/Htm3bcOrUKSQkJCAzMxNCCGRnZwMAQkNDiz1+4cKFmDdvnj6iEhERkQFioUkahYSEoEePHrh//77WNo8fPy72HLNmzcK0adNUHyuVSri6upZbRiIiIjJsvHVOReTn52PgwIG4f/8+3n77bQQFBSEpKQl5eXkQQiAyMhIAkJubW+x5zMzMYGNjo/YiIiKi1wd7NKmI8+fPIyoqCrVq1cLu3bthZmamtj8uLk6mZERERFSRsEeTioiOjgYANGvWrEiRCbx8bCYRERERwEKTNLCwsAAAPHz4sMi+3NxcLFu2TM+JiIiIqCJioUlFtGrVCsbGxjh9+jQ2btyo2p6WloZhw4ZpLECJiIiIXsRCk4pwcnLC1KlTAQCjRo1CrVq10Lx5czg7O2Pv3r1YunSpvAGJiIioQuBkINJo8eLFqFGjBlatWoU7d+7g6dOn6Ny5M2bPng1HR0e54xEREVEFwEKTVJN/nidJEqZMmYIpU6ZoPEYIoeNUREREVNHx1jkRERER6QQLTSIiIiLSCRaaRERERKQTLDSJiIiISCdYaBIRERGRTrDQJCIiIiKdYKFJRERERDrBQpOIiIiIdIKFJhERERHphCT4iBfSE6VSCVtbWwwcOBAmJiZyx1Hz8OFDuSNopVAY7t+DDg4OckfQKD4+Xu4IWtnb28sdQaNJkybJHUGrmTNnyh1Bo8zMTLkjaFW7dm25I2gUExMjdwSt8vPz5Y5QoeTn5yM8PBxpaWmwsbHR2s5wf4MRERERUYXGQpOIiIiIdIKFJhERERHpBAtNIiIiItIJFppEREREpBMsNImIiIhIJ1hoEhEREZFOsNAkIiIiIp1goUlEREREOsFCk4iIiIh0goUmEREREekEC00iIiIi0gkWmkRERESkEyw0iYiIiEgnWGgSERERkU6w0JRR+/btIUlSsa/27dsXOe769esYMWIEatSoAVNTUzg6OqJ///44e/asxuuMHj0akiRhw4YNuH//PsaMGQNnZ2eYm5ujQYMG+Omnn4rNef78eQwePBguLi6q6w0YMAAhISHl8WkgIiKiV5Sx3AFeZz4+PsjLy9O47/bt23jw4EGR7fv378fAgQORnZ0NOzs7NGrUCDExMdi9ezf27t2LVatWYfz48RrPGRMTg2bNmiE1NRXe3t5QKBS4ceMGJk+ejNTUVMyePbvIMUuXLsX06dMhhEDlypXRsGFDxMbGYufOndi3bx+2bduGfv36le0TQURERK8k9mjKaPny5Th16lSR17p165CZmQkAeP/991Xt79+/jxEjRiA7OxtTpkzBw4cPceHCBTx48AALFixAQUEBJk2ahKtXr2q83oIFC+Dv74+EhARcunQJ8fHx+PnnnwEA8+fPR2pqqlr7w4cPY/r06ahSpQp27dqF5ORkXL58GUlJSVi7di2EEBg9ejQSEhI0Xi87OxtKpVLtRURERK8PFpoGJi0tDb169UJaWho+/fRTDB48WLXv559/hlKpROPGjbFs2TKYmpoCABQKBT777DO8/fbbyM3NxXfffafx3FWqVMGGDRtgZ2en2jZx4kQ0bdoUWVlZOHHihFr72bNnQwiBdevWFem1HDt2LKZMmYL09HSsXbtW4/UWLlwIW1tb1cvV1fXffEqIiIiogmKhaUAKCgowdOhQREREoHv37liwYIHa/qNHjwIAJk+erPH4KVOmqLV70ZAhQ1CpUqUi2/38/AAAd+7cUW2LiYnB5cuXUa1aNfTq1Uvj+Qq3BwUFadw/a9YspKWlqV5xcXEa2xEREdGriWM0DcisWbNw8OBB1KtXD1u3boVCof53QEREBADA29tb4/ENGjQAADx8+BBKpRI2NjZq+z09PTUeV61aNQBARkaGaltYWBgAICsrC/7+/hqPy8rKAgDEx8dr3G9mZgYzMzON+4iIiOjVx0LTQGzbtg2LFy+GnZ0d9u3bV6RIBP6vECwsDF/k6Oio+v/09PQi59DUmwlAVdAKIVTb0tLSAABKpRKnT58uNnvheFIiIiKi5/HWuQG4fPkyxowZA4VCga1bt8LLy0tjOysrKwBAYmKixv0PHz5U/b+1tXWZMhVe680334QQothXdHR0ma5FREREryYWmjJLTExEnz59kJmZiUWLFqFbt25a2xYWoDdu3NC4//r16wCe9Wxq6hEtjcLb8+Hh4SgoKCjTuYiIiOj1xEJTRrm5uXjnnXcQFxeHYcOGYcaMGcW2DwgIAACsWLFC4/4ff/xRrV1Z1KlTBw0bNsTjx4+xcePGMp+PiIiIXj8sNGX0n//8BydPnkTz5s2xZs2al7afOHEibGxscOXKFXz00UfIyckB8Gy2+uLFi/Hnn3/CxMQE06dPL5d833zzDSRJwqRJk7B27doii8vfuXMHCxYswO7du8vlekRERPRq4WQgGa1evRrAs0k+Xbp00dimSZMmWL58OQCgevXq2LRpEwYMGIBly5bht99+Q+3atRETE4PExEQoFAqsWLECvr6+5ZLv7bffxvLlyzFlyhSMHz8e06ZNg5eXFyRJQlxcnGpM6MqVK8vlekRERPRqYaFpAG7evKl1n7Gx+peoV69euHTpEhYtWoS///4bV65cgZ2dHfr27YsZM2agdevW5Zpt0qRJaNeuHX744Qf8/fffuH79OszMzFCjRg107NgR/fr1w9tvv12u1yQiIqJXgySeX9OGSIeUSiVsbW0xcOBAmJiYyB1HzfMz9g3Ni+upGhIHBwe5I2ikbW1XQ2Bvby93BI0mTZokdwStZs6cKXcEjQx5abfatWvLHUGjmJgYuSNolZ+fL3eECiU/Px/h4eFIS0srdgKy4f4GIyIiIqIKjYUmEREREekEC00iIiIi0gkWmkRERESkEyw0iYiIiEgnWGgSERERkU6w0CQiIiIinWChSUREREQ6wScDkd49fvy4yBOP5Jabmyt3BK1MTU3ljqBVcnKy3BE08vT0lDuCVsuWLZM7gkbdunWTO4JWkiTJHUGj1NRUuSNoZag/09LS0uSOoJWjo6PcETRKT0+XO4JGBQUFJWrHHk0iIiIi0gkWmkRERESkEyw0iYiIiEgnWGgSERERkU6w0CQiIiIinWChSUREREQ6wUKTiIiIiHSChSYRERER6QQLTSIiIiLSCRaaRERERKQT5VJoCiHQrl07SJKELVu2lMcpiYiIiKiCK5dC84cffkBwcDA+/PBDDBs2rDxO+VqKjo6GJElwc3OTOwoRERFRmZW50Lx16xY+++wztGnTBkuWLCmPTERERET0CjAuy8H5+fkYPXo07O3tsWPHDhgbl+l0rz0TExPUrVsXLi4uckchIiIiKrMyVYa3b99GQEAAevbsCScnp/LK9NpycXHBzZs35Y5BREREVC7KVGh6eXlh7ty55RSFiIiIiF4l/2qMZl5eHlatWgV/f3/Y2dnB3Nwc9erVw5w5c6BUKjUeEx8fj2nTpsHb2xuVKlWCra0tfHx88PHHHyMyMrJI+9jYWEycOBHu7u4wMzND1apV0a1bNxw6dEjj+efOnQtJkjB37lykpaVh6tSpqFmzJszMzFC7dm189dVXyMvL0/qe/vzzT7z11luoWrUqzMzM4O7ujg8++ABxcXEa27u5uUGSJERHRyMoKAidO3eGnZ0dKleujL59+6q9p/3796NNmzawsbGBvb09hgwZgvv37xc558smA8XExGD48OGoVq0aLC0t4evri59++glCCLU8z5MkCZIkaX3f2o4Dnq0msG3bNnTp0gVVqlSBmZkZPDw88OGHH+LBgwdaz0lEREQE/ItCU6lUolOnTpg4cSL++ecf2NnZoU6dOrh79y4WLFiAVq1aITExUe2Y48ePw9vbG0uXLkVUVBRq166NmjVr4s6dO1iyZEmRJZHOnTuHRo0aYdWqVXj06BF8fHxgYWGBw4cP4+2338bnn3+uNV9aWhpat26Nn376CVWqVEH16tVx+/ZtfP7555g4caLGY2bNmoUePXrgyJEjsLCwgI+PDxITE7Fy5Uo0atQIFy9e1Hq9PXv2oFOnTggLC4OnpydycnKwd+9etGvXDg8ePMDSpUvRu3dvREdHw8PDA5mZmdi2bRs6duyIrKysEn/ew8PD0axZM2zZsgXp6enw9vZGWloaJk+ejMmTJ5f4PCWVm5uLQYMGYciQIfjrr79gbm6O+vXr4+HDh1i+fDmaNm2KiIiIcr8uERERvTpKXWhOmDABwcHB6NSpEyIjIxEdHY2wsDA8ePAA/fr1Q3h4OCZNmqRqHxsbi/79+0OpVGLkyJF48OABQkNDERYWhvT0dBw4cADNmjVTtX/69CkGDhyI1NRUDBw4EAkJCbh48SLi4uKwYcMGGBkZ4auvvtLas/nTTz/BwcEBMTExCAkJwd27d7F//34YGRlh7dq1RcZAHjhwAIsWLYKxsTE2b96MuLg4XLx4EQkJCejbty9SUlIwYMAAZGZmarzezJkzsXjxYiQkJODSpUu4d+8eWrVqhYSEBIwbNw5z5szBli1bEBcXhytXriAyMhIeHh64desWfv311xJ9zoUQGD58OJKTkxEQEID4+HhcvHgRMTEx+P3337FmzRrEx8eX6Fwl9fnnn+OPP/5AkyZNEBISgvj4eFy5cgVJSUn44IMPkJCQ8NKlrLKzs6FUKtVeRERE9PooVaF59epVbNu2DbVq1cKePXvg4eGh2mdvb49NmzbB1dUVu3btQkxMDADgm2++QVpaGjp16oQNGzagcuXK/3dxhQLdu3dHz549Vdu2bt2K2NhYODo64rfffoO1tbVq36hRozBhwgQAwMKFCzVmNDY2xpYtW1C9enXVtp49e6J3794AUKRAXbRoEQBg0qRJaoWTjY0NNm/ejKpVqyI6Ohq///67xuu9/fbbmDZtGhSKZ59KOzs7zJs3D8Cz2/Hjx4/H0KFDVe1dXV3xySefAAAOHz6s8Zwv+vvvv3H58mVYWFhg8+bNap/DwYMHY+LEicUOCyitR48eYenSpbCxscH+/fvRuHFj1T4LCwssX74cfn5+uHjxIk6ePKn1PAsXLoStra3q5erqWm4ZiYiIyPCVqtDcs2cPAGDgwIFqBWAhS0tLdO7cGUIIVQGyb98+AMCMGTOKHStY6OjRowCA8ePHw9zcvMj+KVOmAADOnDmDJ0+eFNn/1ltvoUaNGkW2+/n5AQDu3Lmj2paRkYF//vkHAPCf//xH4/sZP368Wq4XjR07tsi25wszTfubNGlSJEtxjhw5AgAYMGAAqlatWmT/Bx98UKLzlNTBgweRnZ2NgIAAjZ9LhUKBHj16AACCgoK0nmfWrFlIS0tTvbSNdyUiIqJXU6lmnYeFhQF4VnCeOXNGY5vCnsz4+Hikp6erbum2atWqRNcoHPfn7e2tcX+dOnVgamqKnJwc3L59G76+vmr7PT09NR5XrVo1AM+Ky0JRUVEoKChQTXLRpEGDBmq5XqTpeg4ODiXa/3yW4hReu379+hr316lTB8bGxuXWq1n4dT579iz8/f01tnn48CEAFHvL3szMDGZmZuWSiYiIiCqeUhWaaWlpAJ4VaFFRUcW2zczMVBuTZ2trW6JrFBZfhYXhiyRJgoODg6qQfVGlSpU0Hld4a1sIUeRaDg4OWntbHR0dAUDjtYBnvZ6aMpZk//NZivN8Tk0UCgWqVq1abjPBC7/OcXFxL+2F1DZ2lYiIiKhUt86trKwAAGvWrIEQotjX3Llz1W6vFxYvJb3GizPXCwkh8OjRIwDQePu+NAqv9ejRI61FX2HPXVmvVRbP59SkoKAAycnJxZ5D2/vTNPyg8HqzZ89+6dd5w4YNpXgnRERE9DopVaFZeDv72rVrJWpvY2OjGuN39uzZEh3j5eUFALhx44bG/ZGRkcjJyYGRkZHW2+QlVbt2bSgUCmRnZ2sdL3n9+nW1XHIovLa2pwZFRUUhNzdX477CHl5NRWpaWhqSkpKKbC/t15mIiIhIk1IVmn379gUAbN68+aU9aIX69OkDAFiyZEmJ2gcEBAB41muqaZ3JH3/8EQDw5ptvar1NXlJWVlZ44403AADLly8vsj8zMxNr165VyyWHrl27AgD++OMPjZ/3n3/+WeuxhWNPL1y4UGRf4Xt7Uffu3WFqaoqDBw9qXEyfiIiIqCRKVWg2b94cAwcORHJyMrp06YKQkBC1/fn5+QgMDMSwYcOQnZ0N4Nlsc1tbWxw7dgxjx45FSkqKqn1BQQEOHjyIAwcOqLYNGTIENWvWxMOHDzF69Gi1CTObN2/G6tWrAQCffvpp6d+tBjNnzgTwrFjbunWrant6ejpGjhyJR48ewc3NDYMHDy6X6/0bnTp1QpMmTfD06VOMGDFC7XO4Y8cOrFy5EsbGmofbduvWDQAwZ84c1TAA4NnSSl9++aXG46pXr46pU6ciNzcXAQEBCAwMVNsvhMD58+cxceLEEs+cJyIiotdPqRdsX7dunarIbNq0KWrVqoVWrVrB19cX1tbW6NChA7Zu3aoaE1izZk3s3LkT1tbWWL9+PRwdHdG4cWP4+vrCxsYG3bt3V3vyjqWlJXbs2AFbW1ts374dTk5O8PPzQ82aNTFixAjk5eVhzpw5qgKqrHr06IFPP/0Uubm5GDZsGGrWrAk/Pz84Oztj586dsLe3x44dO2BhYVEu1/s3JEnCpk2bULlyZRw6dAguLi7w8/ODm5sbBg0ahHHjxsHFxUXjsR9//DGcnJxw5coV1KpVC02aNIG7uzu6deuGDz74QOtxCxYswPDhw3H37l106NABzs7OaNmyJRo3bgxbW1u0bNkSq1atQk5Oji7fOhEREVVgpS40rayscPjwYWzZsgUBAQF4+vQpLl++jKSkJPj6+mLmzJk4f/682hqYnTt3xrVr1zB58mTUqlULN2/eRFxcHDw9PTFjxgyMGDFC7RotW7ZEaGgoJkyYgKpVq+Lq1avIyMhA165d8eeff+Krr74q+zt/zsKFC/G///0PXbp0QUZGBq5evYqqVavi/fffR2hoqGoNTjk1aNAAFy9exNChQ2FpaYlr167BxsYGy5cvx4oVK7Qe5+DggNOnT2PAgAGwtLTErVu3YG9vj19//VXrovfAs4XvN23ahD///FM1/CEkJAQJCQnw8vLC5MmTERgYKOvYVSIiIjJskijpGjtk0Nzc3BATE4O7d+/Czc1N7jgaKZVK2NraonPnzlpv9cvFkJdpMjU1lTuCVkZGRnJH0EhbT70hWLZsmdwRNCqvu0S6oGm8viG4f/++3BG0atSokdwRNAoPD5c7glaFyxkaGm3LK8otPz8ft27dQlpaGmxsbLS2K3WPJhERERFRSbDQJCIiIiKdYKFJRERERDrBQpOIiIiIdMKwZmTQvxYdHS13BCIiIiI17NEkIiIiIp1goUlEREREOsFCk4iIiIh0gmM0Se8yMjIMbsH2WrVqyR1BK0NeFNpQF7o31EXRAWDatGlyR9AoMTFR7ghaGeqjbg11UXQAyMvLkzuCRvn5+XJH0Co3N1fuCBoZ6kM7Svq1ZI8mEREREekEC00iIiIi0gkWmkRERESkEyw0iYiIiEgnWGgSERERkU6w0CQiIiIinWChSUREREQ6wUKTiIiIiHSChSYRERER6QQLTR0YMWIEJEnC119/LXcUIiIiItmw0Cxne/fuxebNm9G7d2/MmjVL7jhEREREsmGhWY6SkpIwYcIE1K1bFxs3boQkSXJHIiIiIpKNsdwBXiUffPABnj59ihMnTsDGxkbuOERERESyYo9mOUlMTIS3tzf27t0Lb29vueMQERERyY49muWkWrVqmDt3rtwxiIiIiAxGufRo5uXlYdWqVfD394ednR3Mzc1Rr149zJkzB0qlUq3t3LlzIUmS1qJsw4YNkCQJo0eP1rr9yZMn+Oyzz+Dl5QVzc3O0b99e1U4Igc2bN6Ndu3aws7ODhYUF6tWrh5kzZ+Lx48carylJkmo85datW9GiRQtYWVmhcuXK6NOnD65du6b1vQshsG3bNnTp0gVVqlSBmZkZPDw88OGHH+LBgwdaj3v8+DFmz56Nhg0bolKlSrC2tkarVq2wZs0aFBQUFGk/evRoSJKEDRs24P79+xgzZgycnZ1hbm6OBg0a4KefftJ4nX97XKHz589j8ODBcHFxgampKRwdHTFgwACEhIQUexwRERFRmQtNpVKJTp06YeLEifjnn39gZ2eHOnXq4O7du1iwYAFatWqFxMTE8sgKAMjMzETbtm2xaNEiGBsbw9vbG2ZmZgCeFX3Dhw/HiBEjEBwcjCpVqsDb2xt3797F4sWL0bRpU9y5c0fruRcvXoxhw4YhLi4O9evXR15eHvbt24cWLVrg1KlTRdrn5uZi0KBBGDJkCP766y+Ym5ujfv36ePjwIZYvX46mTZsiIiKiyHHXr1+Hr68vvv76a0RGRsLNzQ2Ojo44f/483nvvPQwaNAhCCI0ZY2Ji0KxZM/z++++oXr06qlSpghs3bmDy5MlYsGCB1vf2b45bunQpWrVqhe3btyMrKwsNGzZEfn4+du7ciZYtW2L37t1ar0dERERU5kJzwoQJCA4ORqdOnRAZGYno6GiEhYXhwYMH6NevH8LDwzFp0qTyyAoA2LVrFzIyMnD9+nXcuHEDly9fxr59+wAAP/30E7Zu3Qpra2scPXoUt2/fxqVLlxATE4M333wTMTExGDp0qNZzz5kzB0uWLEF8fDwuXLiABw8eYNiwYcjMzMTw4cORmZmp1v7zzz/HH3/8gSZNmiAkJATx8fG4cuUKkpKS8MEHHyAhIQHDhg1TO+bJkyfo3bs34uPj8eGHH+LRo0e4fv06oqKicO3aNTRo0AA7d+7Ezz//rDHjggUL4O/vj4SEBFy6dAnx8fGqtvPnz0dqamq5HHf48GFMnz4dVapUwa5du5CcnIzLly8jKSkJa9euhRACo0ePRkJCgtbPZ3Z2NpRKpdqLiIiIXh9lKjSvXr2Kbdu2oVatWtizZw88PDxU++zt7bFp0ya4urpi165diImJKXNYAMjPz8fvv/+O+vXrq7aZm5tDCIHFixcDAL788kt06dJFtd/JyQnbt2+Hqakpzp07h7///lvjubt164Zp06ZBoXj2abG0tMT69evh5OSEmJgYbNu2TdX20aNHWLp0KWxsbLB//340btxYtc/CwgLLly+Hn58fLl68iJMnT6r2rV+/Hrdv30bfvn3xww8/qM1O9/b2xtatWyFJEr7//nuNGatUqYINGzbAzs5OtW3ixIlo2rQpsrKycOLEiXI5bvbs2RBCYN26dejXr5/avrFjx2LKlClIT0/H2rVrNV4PABYuXAhbW1vVy9XVVWtbIiIievWUqdDcs2cPAGDgwIGwtrYust/S0hKdO3eGEEKt2CqLBg0aoGnTpkW2h4eHIy4uDubm5hg/fnyR/S4uLujfvz8A4OjRoxrPrann1dTUFOPGjQMAHDlyRLX94MGDyM7ORkBAAGrUqFHkOIVCgR49egAAgoKCVNsLbzcXnvNFvr6+cHNzw507d3Dv3r0i+4cMGYJKlSoV2e7n5wcAWocGlOa4mJgYXL58GdWqVUOvXr00nq9w+/Pv7UWzZs1CWlqa6hUXF6e1LREREb16yjTrPCwsDMCzgvPMmTMa2xT2ZMbHx5flUirP92Q+r3AsZM2aNTUWVMCzIvX5tiU9d+H2548rfO9nz56Fv7+/xuMePnwIQP29Fx73+eefa31EZVJSkuq4F4tYT09PjcdUq1YNAJCRkaFxf2mOK8yYlZWl9b1lZWWpMmpjZmamGj9LREREr58yFZppaWkAgKioKERFRRXb9sXxjf+WtiKysFAqLJw0cXR0BACkp6dr3K/tWE3HFb73uLi4l/bUPf/eC4+7dOlSsce8eFwhbe+/8Ha/tklEpTmuMKNSqcTp06dLnZGIiIgIKOOtcysrKwDAmjVrIIQo9lW4nFHhMkLaCqInT56UKUtxM9wLexg13eYHno271KTwnM8fV3i9wrGMxb02bNhQ5LjIyMiXHvf8sk36VJjxzTfffGnG6OhoWTISERGR4StToVn4BJzi1pl8UWHPmrai7mU9o9p4eXkBAGJjY7XePr5+/bpa2xeFh4cXu/354/7Ney/LcfpUmDE8PFzjmp5EREREJVGmQrNv374AgM2bNyM5OblExxTOTL9w4UKRfU+ePFGb2V0a9evXR82aNZGVlaVxJvT9+/exa9cuAEBAQIDGc2haUignJwfr1q0DAHTt2lW1vXv37jA1NcXBgwcRGRlZ4pyFM7h//PFHrb26cqtTpw4aNmyIx48fY+PGjXLHISIiogqqTIVm8+bNMXDgQCQnJ6NLly5FnhaTn5+PwMBADBs2DNnZ2QCADh06wNzcHBcvXsQvv/yiapuamorRo0eXuGB9kSRJmDFjBgDgiy++wPHjx1X7Hj58iMGDByMnJwetWrVChw4dNJ7jzz//xA8//KAqADMzMzF+/Hjcv38frq6uGDx4sKpt9erVMXXqVOTm5iIgIACBgYFq5xJC4Pz585g4caLajO4JEybAw8MDJ06cwLBhw4qsQ5mRkYEdO3Zg2rRp/+rzUF6++eYbSJKESZMmYe3atcjLy1Pbf+fOHSxYsICLthMREZFWZV6wfd26daois2nTpqhVqxZatWoFX19fWFtbo0OHDti6dauqeLO3t8fs2bMBPCu6atSogebNm6N69eo4efKkat+/MWnSJAwdOhRKpRKdO3dGnTp10KxZM9SsWRMnT55EzZo1sWXLFq3Hz58/H1OnTkX16tXRokULODk5YePGjTA3N8fmzZthaWmp1n7BggUYPnw47t69iw4dOsDZ2RktW7ZE48aNYWtri5YtW2LVqlXIyclRHWNlZYU///wT7u7u+P3331GjRg14e3ujVatWqFu3Luzs7DBo0CCts/j15e2338by5cuRnZ2N8ePHo3LlymjevDn8/Pzg5OQET09PzJkzp1yf+kRERESvljIXmlZWVjh8+DC2bNmCgIAAPH36VPUEGV9fX8ycORPnz5+Hubm56pg5c+bgp59+gre3Nx49eoS4uDi88847uHjxImrVqvWvs0iShM2bN2Pjxo1o06YNEhMTcf36ddSqVQszZszA5cuX1RaVf9Enn3yCLVu2wNXVFdevX4ckSejVqxfOnTuHtm3bFmlvbGyMTZs24c8//0SfPn0AACEhIUhISICXlxcmT56MwMDAImNC69Wrh9DQUCxatAh+fn6qJwrl5OSgXbt2+O677/71EILyNGnSJFy5cgXjxo2Dg4MDrl+/jsjISFStWhVDhgzBH3/8gZEjR8odk4iIiAyUJAx1oKAevWwmPJUPpVIJW1tbtGrVCsbGZVpZq9yV5Q8cXbt//77cEbTKzc2VO4JGhw4dkjuCVnIPi9GmuIcvyO35u0KGRNvay4bgxeFOhuLmzZtyR9DKwcFB7ggaGeqk3Pz8fISFhSEtLU3tKYcvKnOPJhERERGRJiw0iYiIiEgnWGgSERERkU6w0CQiIiIinTCsGRky4SQgIiIiovLHHk0iIiIi0gkWmkRERESkEyw0iYiIiEgnOEaT9M7R0REmJiZyx1Bz9uxZuSNo1aZNG7kjaJWRkSF3BI3eeustuSNo9eDBA7kjaGRvby93BK1GjRoldwSNli1bJncErbp37y53BI0KH5BiiAz159mLTxc0FDk5OQgLC3tpO/ZoEhEREZFOsNAkIiIiIp1goUlEREREOsFCk4iIiIh0goUmEREREekEC00iIiIi0gkWmkRERESkEyw0iYiIiEgnWGgSERERkU6w0CQiIiIinWChSUREREQ6wUKTiIiIiHSChSYRERER6QQLTSIiIiLSCRaaRERERKQTLDRJq2vXruGLL75A69at4ezsDFNTUzg7O6Nfv344c+aM3PGIiIjIwLHQJK2mTp2KL7/8Ejdv3oS9vT18fHyQl5eHPXv2oG3btti6davcEYmIiMiAsdAkrd5//31cvXoVKSkpuHHjBi5duoTExETs3bsXFhYWmDhxItLT0+WOSURERAaKhSZp9c4778DHx0dtmyRJ6N27N6ZOnQqlUon//e9/Wo/Pzs6GUqlUexEREdHrw1juAGTYYmNjsXXrVly+fBlJSUnIyckBACQmJgIAQkNDMXToUI3HLly4EPPmzdNbViIiIjIsLDRJq99++w3vv/8+srKytLZ5/Pix1n2zZs3CtGnTVB8rlUq4urqWa0YiIiIyXLx1Thrdvn0b48ePR1ZWFqZPn46QkBAolUoUFBRACIE1a9YAAHJzc7Wew8zMDDY2NmovIiIien2wR5M02rFjB3JzczF48GB89913RfbHxcXJkIqIiIgqEvZokkbR0dEAgDfeeEPj/tDQUD2mISIiooqIhSZpZGFhAQB4+PBhkX03b94sdrY5EREREcBCk7Tw9/cHAPz888+4cuWKantERAQGDBgAU1NTmZIRERFRRcFCkzTq06cPWrVqhZSUFDRv3hze3t7w8fFBvXr1kJycjDlz5sgdkYiIiAwcC03SyNjYGEeOHMF//vMfODo6IioqCqmpqRg7diwuXboEFxcXuSMSERGRgeOsc9LKxsYGP/74I3788cci+0aPHo3Ro0frPxQRERFVGOzRJCIiIiKdYKFJRERERDrBQpOIiIiIdIKFJhERERHpBAtNIiIiItIJFppEREREpBMsNImIiIhIJ1hoEhEREZFOsNAkIiIiIp3gk4FI7xQKBRQKw/obx9zcXO4IWmVmZsodQStD+zoWMjExkTuCVnfu3JE7gkZNmzaVO4JWjRs3ljuCRk5OTnJH0Co3N1fuCBrZ2dnJHUGrxMREuSNodPXqVbkjaJSfn1+idob5W4KIiIiIKjwWmkRERESkEyw0iYiIiEgnWGgSERERkU6w0CQiIiIinWChSUREREQ6wUKTiIiIiHSChSYRERER6QQLTSIiIiLSCRaaryghBNq1awdJkrBly5ZyO++6desgSRLeeustFBQUlNt5iYiI6NXDQvMV9cMPPyA4OBgffvghhg0bVi7njI2NxbRp0+Du7o6tW7ca7OMHiYiIyDCwUngF3bp1C5999hnatGmDJUuWFNkfHR2NuXPnYsOGDSU+pxACY8aMQW5uLnbv3o3KlSuXY2IiIiJ6FbHQfMXk5+dj9OjRsLe3x44dO2BsbFykTXR0NObNm1eqQvPnn3/G8ePHsXr1ajRu3Lj8AhMREdErq2gVQhXa7du3ERAQgJ49e8LJyalczimEQGZmJtauXYsRI0aUyzmJiIjo1cdC8xXj5eWFuXPnlus5JUnCxx9/XK7nJCIiolcfb52/IvLy8rBq1Sr4+/vDzs4O5ubmqFevHubMmQOlUqlq1759e3To0AEAEBQUBEmSVC83N7ci5z1//jwGDx4MFxcXmJqawtHREQMGDEBISIi+3hoRERFVUOzRfAUolUr07NkTwcHBUCgUcHV1hbW1NSIiIrBgwQLs3r0bgYGBqFatGnx8fJCcnIxr167BxsYGPj4+qvM4OzurnXfp0qWYPn06hBCoXLkyGjZsiNjYWOzcuRP79u3Dtm3b0K9fP32/XSIiIqog2KP5CpgwYQKCg4PRqVMnREZGIjo6GmFhYXjw4AH69euH8PBwTJo0CQCwfPlyLF++HADQpEkTnDp1SvX6448/VOc8fPgwpk+fjipVqmDXrl1ITk7G5cuXkZSUhLVr10IIgdGjRyMhIUFrruzsbCiVSrUXERERvT5YaFZwV69exbZt21CrVi3s2bMHHh4eqn329vbYtGkTXF1dsWvXLsTExJT4vLNnz4YQAuvWrSvSazl27FhMmTIF6enpWLt2rdZzLFy4ELa2tqqXq6tr6d8gERERVVgsNCu4PXv2AAAGDhwIa2vrIvstLS3RuXNnCCFw8uTJEp0zJiYGly9fRrVq1dCrVy+NbQq3BwUFaT3PrFmzkJaWpnrFxcWV6PpERET0auAYzQouLCwMwLOC88yZMxrbFPZkxsfHl+qcWVlZ8Pf319gmKyvrpec0MzODmZlZia5JRERErx4WmhVcWloaACAqKgpRUVHFts3MzCzVOZVKJU6fPl0u5yQiIqLXDwvNCs7KygoAsGbNGowbN65cz/nmm2/i1KlT5XJOIiIiev1wjGYF5+3tDQC4du1aiY+RJKlE5wwPD0dBQcG/D0dERESvNRaaFVzfvn0BAJs3b0ZycnKJjrGwsACg/bZ3nTp10LBhQzx+/BgbN24sn6BERET02mGhWcE1b94cAwcORHJyMrp06VLkiT35+fkIDAzEsGHDkJ2dDQBwd3cHANy4cQOPHj3SeN5vvvkGkiRh0qRJWLt2LfLy8tT237lzR7UYPBEREZEmHKP5Cli3bh1SUlJw7NgxNG3aFDVr1oSzszOePn2KqKgoVc/lunXrAAAODg7o2LEj/v77b3h6esLb2xvm5uZwcnLCtm3bAABvv/02li9fjilTpmD8+PGYNm0avLy8IEkS4uLi8PDhQwDAypUr5XnTREREZPDYo/kKsLKywuHDh7FlyxYEBATg6dOnqqf4+Pr6YubMmTh//jzMzc1Vx2zduhWjR4+GjY0NLl26hKCgIJw9e1btvJMmTcKVK1cwbtw4ODg44Pr164iMjETVqlUxZMgQ/PHHHxg5cqS+3y4RERFVEOzRfEUoFAoMHToUQ4cOLVF7R0dH/Prrry9t17BhQ6xZs6as8YiIiOg1xB5NIiIiItIJFppEREREpBMsNImIiIhIJ1hoEhEREZFOsNAkIiIiIp1goUlEREREOsFCk4iIiIh0goUmEREREekEF2wnvYuNjYWRkZHcMdR8++23ckfQaunSpXJH0CotLU3uCBrl5eXJHUErMzMzuSNodPXqVbkjaPX999/LHUGj3bt3yx1Bq759+8odQaOqVavKHUGrrKwsuSNoZGlpKXeEMmGPJhERERHpBAtNIiIiItIJFppEREREpBMsNImIiIhIJ1hoEhEREZFOsNAkIiIiIp1goUlEREREOsFCk4iIiIh0goUmEREREekEC00iIiIi0gkWmlRiQgi0a9cOkiRhy5YtcschIiIiA8dC04AEBgZCkiS0b99e7iga/fDDDwgODsaHH36IYcOGyR2HiIiIDBwLTSqRW7du4bPPPkObNm2wZMkSueMQERFRBcBC04BYWlqibt26qFmzptxR1OTn52P06NGwt7fHjh07YGxsLHckIiIiqgBYMRiQFi1a4ObNm3LHKOL27dsICAhAz5494eTkJHccIiIiqiBYaNJLeXl5Ye7cuXLHICIiogqGt871ICYmBhMmTICHhwfMzMxgbW0NDw8P9O3bF9u2bVO1e9lkoJCQEPTs2RP29vawsrJCq1atsHPnTgCAJEmQJKnIMc9vP3ToENq2bQtra2vY2tqiW7duCAkJ0Zo7Ly8Pq1atgr+/P+zs7GBubo569ephzpw5UCqVZfiMEBER0euAPZo6Fh0dDT8/PyQlJanGYBoZGSE2NhZ79+7F3bt3MXjw4Jee56+//kKPHj2QnZ0NGxsb1K9fH7GxsRgwYAC+//77lx6/atUqfPDBB3BycoKXlxdu3bqFw4cP49SpU7hw4QLq1aun1l6pVKJnz54IDg6GQqGAq6srrK2tERERgQULFmD37t0IDAxEtWrV/vXnhoiIiF5t7NHUsSVLliApKQmjRo3Cw4cPcfXqVYSEhCA5ORnh4eH44IMPXnqO9PR0jBgxAtnZ2Xj33Xfx4MEDXLhwAfHx8VixYgVmzZr10nNMnz4d69evx/3793Hp0iUkJCSgU6dOyMjI0HhbfMKECQgODkanTp0QGRmJ6OhohIWF4cGDB+jXrx/Cw8MxadKkYq+ZnZ0NpVKp9iIiIqLXBwtNHYuMjAQATJs2DVZWVmr76tWrh/fee++l59i6dSsePHiAevXq4ZdffoGFhQWAZ7fFJ02aVKIe0bFjx2L06NGqj62trbF06VIAwOHDh9XaXr16Fdu2bUOtWrWwZ88eeHh4qPbZ29tj06ZNcHV1xa5duxATE6P1mgsXLoStra3q5erq+tKcRERE9OpgoaljhcXVzp07IYT4V+c4duwYAGDEiBEalxZ69913X3qOcePGFdnm4+MDc3NzpKWlITk5WbV9z549AICBAwfC2tq6yHGWlpbo3LkzhBA4efKk1mvOmjULaWlpqldcXNxLcxIREdGrg2M0dWzSpEn47bff8NVXX2Hjxo1466230KZNG3To0AHVq1cv0TkKe0V9fX017te2/Xmenp4atzs4OCAuLg4ZGRmoUqUKACAsLAzAs4LzzJkzGo8r7MmMj4/Xek0zMzOYmZm9NBsRERG9mlho6ljjxo0RHByML774An///TdWr16N1atXQ5IkdOnSBcuWLUP9+vWLPceTJ08AQGPvYnHbn1epUiWN2xWKZ53az/e2pqWlAQCioqIQFRVV7HkzMzNfem0iIiJ6PbHQ1INWrVrhyJEjyMjIwOnTp3HixAls3boVR48eRZcuXXDt2jXY2dlpPb6wSMzIyNC4Pz09vVzzFo4lXbNmjcZb7kREREQlwTGaemRlZYWAgAAsWrQIN2/ehKenJ+Lj43Ho0KFij/Py8gLwbJKOJoW3usuLt7c3AODatWvlel4iIiJ6vbDQlImlpSV8fHwAAPfv3y+2bZcuXQAAmzdvRn5+fpH9GzZsKNdsffv2VV3v+UlCRERERKXBQlPHJk6ciO3bt+Pp06dq24ODg3H8+HEAQNOmTYs9x5AhQ+Dk5IQbN27g/fffR1ZWFoBn4ypXrlyJrVu3lmvm5s2bY+DAgUhOTkaXLl2KPD0oPz8fgYGBGDZsGLKzs8v12kRERPTq4BhNHfvnn3+watUqGBsbo06dOrC2tsbDhw9Vs7aHDx+ODh06FHsOa2trbNq0Cd27d8fatWvxxx9/wMvLC/Hx8bh//z6WLFmC6dOnqyb2lId169YhJSUFx44dQ9OmTVGzZk04Ozvj6dOniIqKUk0CWrduXbldk4iIiP5fe/ceFNV5uHH82eW2KrBqEsELEbRqFG8xarEai9XEyURrtK1NtU2xiTW29ZbYqKkTrbYlk5pqahIydTLxEqqJYyHVVpLQ1ipe0Ch4v0GiYtQQEFkUWRbY3x8O+3NlV6nJ4Szw/cycGfac87LPAWWeefdcmhZmNA22fPlyzZo1S3379lVRUZFyc3MlSaNHj9bf//53rV27tl7fZ9SoUdq9e7cef/xxSdKxY8fUsWNHrV+/XtOmTZNUv6vP6ys8PFwZGRlKTU3V6NGjVV5ergMHDqioqEh9+/bVvHnztHfvXtlstq/tPQEAQNPCjKbBRowYcccZy1qJiYm3van7gAEDtGXLljrr9+/fL0mKjY2ts+1ON4k/c+aM321Wq1WTJk3SpEmTbvs9AAAAfGFGswl45513JElDhw41OQkAAMD/o2g2Ev/5z3+0YcMGr4tvXC6X/vSnPyklJUVWq1VTp041MSEAAIA3PjpvJM6ePaspU6YoJCREcXFxioyM1KlTp+RwOCRJycnJ6t+/v7khAQAAbsKMZiPx8MMP61e/+pW6d++uL7/8Urm5ubLZbBo7dqw+/PBDzZ8/3+yIAAAAXpjRbCS6du2qlStXmh0DAACg3pjRBAAAgCEomgAAADAERRMAAACG4BxNNLj4+HiFhoaaHcPLiy++aHYEv4YMGWJ2BL9cLpfZEXzKz883O4Jfgfo0rcjISLMj+BUeHm52BJ+++93vmh3Br2eeecbsCD5t2rTJ7Ah+VVVVmR3Bp86dO5sdwSeXy6WDBw/ecT9mNAEAAGAIiiYAAAAMQdEEAACAISiaAAAAMARFEwAAAIagaAIAAMAQFE0AAAAYgqIJAAAAQ1A0AQAAYAiKJgAAAAxB0QQAAIAhKJoAAAAwBEUTAAAAhqBoAgAAwBAUTQAAABiCotlMVFVV6a233tKwYcPUunVr2Ww2PfDAA1q4cKEcDofXvqtXr5bFYlFSUpKcTqcWL16sb3zjG7LZbIqJidFzzz2na9eumXQkAACgsaBoNgMOh0MjR47U9OnTtXv3brVu3VrdunXTZ599pt///vdKSEhQYWFhnXEul0uPPvqolixZIpvNptjYWF24cEHLly/X+PHjTTgSAADQmFA0m4Fp06Zp+/btGjlypE6fPq0zZ87o8OHDunTpkiZMmKDjx4/rl7/8ZZ1xGzduVFFRkU6cOKEjR47oxIkT2rlzpyIjI/Xxxx8rIyPjtu/rdDrlcDi8FgAA0HxQNJu4Q4cOacOGDercubPS0tLUpUsXz7Y2bdpo3bp1iomJ0aZNm3T27FmvsVVVVVqzZo26d+/uWZeQkKBnnnlGkrR169bbvndycrLsdrtniYmJ+RqPDAAABDqKZhOXlpYmSZo4caIiIiLqbG/ZsqVGjRolt9utHTt2eG3r37+/Bg4cWGfMoEGDJEmffvrpbd97wYIFKi0t9SwFBQV3exgAAKARCjY7AIx1+PBhSTcK565du3zuUzuT+fnnn3ut79q1q8/927VrJ0m6evXqbd87LCxMYWFh/1NeAADQdFA0m7jS0lJJUl5envLy8m677/Xr171et2rVyud+VuuNiXC32/01JAQAAE0VRbOJCw8PlyStWrXKc24lAABAQ+AczSauV69ekqQjR46YnAQAADQ3FM0mrvZ+l++++66Ki4tNTgMAAJoTimYTN3DgQE2cOFHFxcV65JFHlJOT47W9urpa27Zt0+TJk+V0Ok1KCQAAmiLO0WwG3n77bZWUlOjjjz/WgAEDdP/996t9+/YqLy9XXl6e5yKgt99+2+SkAACgKWFGsxkIDw9XRkaGUlNTNXr0aJWXl+vAgQMqKipS3759NW/ePO3du1c2m83sqAAAoAlhRrOZsFqtmjRpkiZNmnTHfZOSkpSUlOR3e2JiIrc2AgAAd8SMJgAAAAxB0QQAAIAhKJoAAAAwBEUTAAAAhqBoAgAAwBAUTQAAABiCogkAAABDUDQBAABgCG7YjgZ3+PBhBQUFmR3Dyz333GN2BL+CgwP3v+m+ffvMjuBTYWGh2RH8unLlitkRfLJaA3feITIy0uwIPnXo0MHsCH5t2LDB7Ag+ZWZmmh3Brz59+pgdwaeqqiqzI/hU31yB+5cFAAAAjRpFEwAAAIagaAIAAMAQFE0AAAAYgqIJAAAAQ1A0AQAAYAiKJgAAAAxB0QQAAIAhKJoAAAAwBEWzmbty5YratWunoKAg5eTkmB0HAAA0IRTNZu6ll15SaWmpoqKiNGPGDLPjAACAJoSi2YwdPXpUKSkpWrhwod555x3t3LlTqampfvdPT0/X4sWLlZub23AhAQBAo0XRbMZmzpyp3r17a/78+Ro9erR++tOf6oUXXtDVq1d97p+enq7f/va3FE0AAFAvFM1m6osvvtDDDz+sd999VyEhIZKk5cuX6+c//7lOnz5tcjoAANAUBJsdAOaIiorS4sWLvda1adNGixYtMicQAABocpjRDCBHjhzRokWLNGTIELVv316hoaFq3769JkyYoF27dvkdt2vXLk2YMEFRUVEKDQ1Vp06d9NRTT+n48eM+94+NjZXFYtGZM2d8bk9MTJTFYtG2bdskSWfOnJHFYtGaNWskSVOmTJHFYvEstxZWAAAAiaIZUGbPnq0lS5boxIkTatOmjfr06aOqqiqlpaVp+PDh+utf/1pnTEpKioYNG6a0tDRJUr9+/XTt2jWtW7dOAwYM0D/+8Y+vnMtms2no0KFq166dJKlbt24aOnSoZ7n//vu/8nsAAICmh6IZQJ599lkdOnRIJSUlOnbsmPbv36/CwkKlp6erRYsWmj59usrKyjz75+bmaubMmXK73XrllVd08eJF7du3T5cuXdIvfvELVVRUaPLkybp48eJXyhUdHa2srCw99thjkqQXX3xRWVlZnuVnP/vZV/r+AACgaaJoBpDvf//76tOnj9c6i8WicePGafbs2XI4HNq8ebNn27Jly1RVVaVx48bp17/+tazWG7/OsLAwvf7664qPj1dpaalSUlIa9DhqOZ1OORwOrwUAADQfFM0Ac+7cOb388suaOHGivvOd72jYsGEaNmyY3nvvPUnSwYMHPft+9NFHkuTzRusWi0UzZ8702q+hJScny263e5aYmBhTcgAAAHNw1XkAWbNmjZ599llVVFT43efy5cuSbjw68ssvv5Qk9erVy+e+8fHxkqRTp059zUnrZ8GCBXruuec8rx0OB2UTAIBmhBnNAJGfn6+pU6eqoqJCzz//vHJycuRwOFRTUyO3261Vq1ZJklwulyR53VS99iKdW0VFRUmS13mdDSksLEyRkZFeCwAAaD6Y0QwQ77//vlwul5588kktW7aszvaCggKv1+Hh4Z6vCwsL1b59+zpjvvjiC0lSRESE13qLxSJJcrvdPrNcu3btfwsPAADgAzOaAaL2npbf+ta3fG6/+dxMSWrdurXuu+8+SdKxY8d8jjl69KgkqXv37l7rW7VqJUmej95vlZ+f73N9bUEFAACoD4pmgGjRooWk/5+FvNmJEye8rjavNXr0aEnSypUr62xzu92e9bX71erSpYskad++fXXGbdq0SSUlJbfNeP36db/HAQAAUIuiGSCGDRsmSXrzzTeVm5vrWX/q1Cn94Ac/UGhoaJ0xzz//vIKDg/XBBx/o1VdfVU1NjSSpsrJSs2bN0pEjR2S32zV9+nSvcbX3w3zllVe8nmu+b98+zZw50/Ps81vVFtTt27f7/dgdAACgFkUzQDzxxBNKSEhQSUmJBg4cqF69eqlPnz564IEHVFxcrIULF9YZ079/f/35z3+WxWLR3Llz1aFDBw0ePFhRUVFauXKlwsLClJqaqujoaK9xU6ZMUXx8vM6dO+d5nx49emjw4MEaPny434/vx48fr9DQUG3YsEFxcXEaPny4EhMTtXr1aiN+JAAAoJGjaAaI4OBgffjhh5oxY4aioqKUl5enK1eu6Omnn9b+/fvVsWNHn+OmT5+uHTt26IknnlBNTY1yc3PVsmVL/fjHP9aBAwf0+OOP1xljs9n073//W08//bTatm2r06dPy2q1atmyZUpNTfWbsWvXrtq8ebO+/e1vq6SkRFlZWfrvf//r95npAACgebO4+QwUDcThcMhut+vBBx9UUFCQ2XG82O12syP45e8+qYFg9+7dZkfwqbCw0OwIfp07d87sCD61bdvW7Ah+/ehHPzI7gk8XLlwwO4JfgfoktszMTLMj+HXrk/kCRVxcnNkRfHK5XMrIyFBpaeltb1/IjCYAAAAMQdEEAACAISiaAAAAMARFEwAAAIagaAIAAMAQFE0AAAAYgqIJAAAAQ1A0AQAAYAiKJgAAAAzBk4HQYGqfDDRq1CiFhISYHQcAANwll8ulzMxMngwEAAAAc1A0AQAAYAiKJgAAAAxB0QQAAIAhKJoAAAAwBEUTAAAAhqBoAgAAwBAUTQAAABiCogkAAABDUDQBAABgCIomAAAADEHRNMFnn32mVatWaerUqerXr5+Cg4NlsVj0u9/97rbjSktL9dJLL6l3795q2bKlWrdureHDh2v9+vW3Hed0OvXqq6/qoYceUnh4uCIiIjRo0CC9+eabqqmp8Tnm/PnzWrFihcaOHatOnTopNDRUdrtdQ4YM0fLly+V0Ou/6+AEAQPMQbHaA5ui1117Ta6+99j+N+fzzzzVixAidPn1aQUFB6t27t1wul7KysrRjxw5t375dKSkpdcaVlZXpkUceUXZ2tiwWi3r27KmQkBDl5OTok08+0datW5WWlqbgYO9/CkOGDNH58+clSVFRUerXr58uXryoPXv2aM+ePVq7dq0yMzN1zz333P0PAgAANGnMaJrg3nvv1ZgxY7RkyRJt3bpV3/ve9+445ic/+YlOnz6t+Ph45eXlKTc3V0ePHlVOTo46dOigt956S+vWraszbtasWcrOzlaHDh2Uk5Ojo0ePKjc3V3l5eYqPj9eWLVuUnJxcZ5zNZtPMmTN16NAhXbp0Sfv27dP58+eVmZmpdu3aKTc3V9OmTftafh4AAKBpsrjdbrfZIZq7pKQkrVmzRkuXLtXChQvrbD948KD69+8vSdq9e7cSEhK8tr/33nt68skn1aVLF+Xn53vWFxcXKyoqStXV1dqwYYN++MMfeo3bs2ePhgwZooiICF28eFGtWrXybLt8+bLatm3rM2/t+1mtVhUWFtZ7VtPhcMhut2vUqFEKCQmp1xgAABB4XC6XMjMzVVpaqsjISL/7MaPZCOzcuVOS1KlTpzolU5LGjx8vq9WqTz/9VPv37/esz87OVnV1taxWq8aPH19nXEJCgjp27KiysjJlZGR4bfNXMiXp0UcflSTV1NQoLy/vro4JAAA0fRTNRqCkpESS1LFjR5/bQ0NDde+990q6MUt567j77rtPoaGhPsfWfs+bx91JRUWF5+sWLVrUexwAAGheuBioEbDb7ZJuXBDkS2VlpYqKiiRJJ0+erDOuqKhIlZWVPstm7fe8edydvP/++5KkNm3aqFevXn73czqdXlenOxyOer8HAABo/JjRbAQGDRok6cYth/bu3Vtne3p6uuc2RbWzmJI0cOBAWSwWVVdX64MPPqgzbu/evZ6iefO427l48aKWLl0qSZozZ06dq9VvlpycLLvd7lliYmLq9R4AAKBpoGg2At/85jf10EMPSbpx4dCpU6c827KzszVnzhzP6+vXr3u+jo6O9pybOXv2bGVnZ3u2nTp1SklJST7H+VNZWamJEyequLhY/fv317x58267/4IFC1RaWupZCgoK7vgeAACg6aBoNhKpqamKjo7W8ePH1bNnT/Xo0UNxcXFKSEhQeXm5xo4dK0kKDw/3GpeSkqIePXrowoULSkhIUFxcnHr06KGePXsqPz9fEydO9DnuVm63W0lJScrKylL79u2Vlpbm97zPWmFhYYqMjPRaAABA80HRbCR69OihnJwczZo1S7GxsTpz5oyuXbumyZMn68CBA54SFx0d7TWuXbt2ys7O1sKFC9WzZ09dunRJhYWFGjNmjLKzs9WtWzef4241Y8YMrV+/Xm3bttVHH32k2NhYQ44TAAA0HVwM1IhER0drxYoVWrFiRZ1tn3zyiSR5PmK/md1u19KlSz3nVt5s/vz5fsfV+s1vfqM33nhD4eHh2rp1q3r37n2XRwAAAJoTZjSbgKNHj+rkyZOy2WwaNWpUvcddvnxZ27ZtkySNGTPG5z5//OMf9Yc//EE2m02bN2/W4MGDv47IAACgGaBoNnJut1sLFiyQJE2ePFlt2rSp99hFixbJ6XRq5MiR6tmzZ53tf/nLX/TCCy8oJCREGzduVGJi4tcVGwAANAMUzUYiKytL//rXv3TzE0OLi4s1ZcoUbd68WVFRUXr55ZfrjDt8+LDS09NVVVXlWXf16lXNnz9fr7/+ulq2bKk33nijzriNGzdq+vTpslqtWrt2rd8ZTwAAAH941rkJdu7cqXHjxnleX716VU6nUy1btvR60k5OTo7n3pMrVqzQnDlzFBERobi4OLndbh0/flxVVVXq2LGjMjIyfJ47mZ6ervHjx6tFixaKi4tTaGioTpw4oYqKCrVu3Vp/+9vfNGLEiDrjwsLCVFlZqcjISPXp08fvsaxcuVIPPvhgvY6bZ50DANA01PdZ51wMZAKXy6Xi4uI668vLy1VeXu55XV1d7fk6MTFRTz31lHbv3q38/HxZLBb16tVLEyZM0Jw5c/z+kvv166dp06Zpx44dKigoUFVVlTp37qwxY8Zo7ty5fq82r6yslHSjHNY+a92X0tLSeh0zAABofpjRRINhRhMAgKahvjOanKMJAAAAQ1A0AQAAYAiKJgAAAAxB0QQAAIAhKJoAAAAwBEUTAAAAhqBoAgAAwBAUTQAAABiCJwOhwXXu3FmhoaFmx/CyZcsWsyP49dhjj5kdwa+goCCzI/iUnp5udgS/bn76VyCJjY01O4Jfdrvd7Ag+nTx50uwIfs2aNcvsCD7985//NDuCX+fPnzc7gk+ZmZlmR/CprKysXo+gZkYTAAAAhqBoAgAAwBAUTQAAABiCogkAAABDUDQBAABgCIomAAAADEHRBAAAgCEomgAAADAERRMAAACGoGgCAADAEBRNAAAAGIKiCQAAAENQNAEAAGAIiiYAAAAMQdEEAACAISiaAAAAMARFEwAAAIagaAIAAMAQwWYHQNPldDrldDo9rx0Oh4lpAABAQ2NGE4ZJTk6W3W73LDExMWZHAgAADYiiCcMsWLBApaWlnqWgoMDsSAAAoAHx0TkMExYWprCwMLNjAAAAkzCjCQAAAENQNAEAAGAIiiYAAAAMQdHEXZk7d65iY2M1d+5cs6MAAIAARdHEXSkqKtLZs2dVVFRkdhQAABCgKJoAAAAwBLc3wl1ZvXq1Vq9ebXYMAAAQwJjRBAAAgCEomgAAADAERRMAAACGoGgCAADAEBRNAAAAGIKiCQAAAENQNAEAAGAIiiYAAAAMwQ3b0WDcbrckqbKy0uQkddXU1Jgdwa9A/HnVCgoKMjuCT4H8+6z9fxBoqqurzY7gV1VVldkRfArkf2cVFRVmR/ApUH+XUuD+PsvKysyO4NPVq1cl3flvmsUdqH/10OScP39eMTExZscAAABfk4KCAnXq1MnvdoomGkxNTY0uXLigiIgIWSwWs+MAAIC75Ha7VVZWpg4dOshq9X8mJkUTAAAAhuBiIAAAABiCogkAAABDUDQBAABgCIomAAAADEHRBAAAgCEomgAAADAERRMAAACG+D+eSyhaTYOgTgAAAABJRU5ErkJggg==",
136
+ "text/plain": [
137
+ "<Figure size 700x700 with 1 Axes>"
138
+ ]
139
+ },
140
+ "metadata": {},
141
+ "output_type": "display_data"
142
+ }
143
+ ],
144
+ "source": [
145
+ "# Tokenize example sentences in English and French, then get their embeddings\n",
146
+ "sentence_en = \"The agreement on the European Economic Area was signed in August 1992 .\"\n",
147
+ "tokenized_en = tokenize(sentence_en, en_words)\n",
148
+ "embedded_en = embed(tokenized_en, en_embeddings)\n",
149
+ "\n",
150
+ "sentence_fr = \"L accord sur la zone économique européenne a été signé en août 1992 .\"\n",
151
+ "tokenized_fr = tokenize(sentence_fr, fr_words)\n",
152
+ "embedded_fr = embed(tokenized_fr, fr_embeddings)\n",
153
+ "\n",
154
+ "# These weights indicate alignment between words in English and French\n",
155
+ "alignment = calculate_weights(embedded_fr, embedded_en)\n",
156
+ "\n",
157
+ "# Visualize weights to check for alignment\n",
158
+ "fig, ax = plt.subplots(figsize=(7,7))\n",
159
+ "ax.imshow(alignment, cmap='gray')\n",
160
+ "ax.xaxis.tick_top()\n",
161
+ "ax.set_xticks(np.arange(alignment.shape[1]))\n",
162
+ "ax.set_xticklabels(sentence_en.split(\" \"), rotation=90, size=16);\n",
163
+ "ax.set_yticks(np.arange(alignment.shape[0]));\n",
164
+ "ax.set_yticklabels(sentence_fr.split(\" \"), size=16);"
165
+ ]
166
+ },
167
+ {
168
+ "cell_type": "markdown",
169
+ "id": "d634f0ec",
170
+ "metadata": {},
171
+ "source": [
172
+ "If you implemented the weights calculations correctly, the alignment matrix should look like this:\n",
173
+ "\n",
174
+ "![alignment visualization](./images/alignment.png)\n",
175
+ "\n",
176
+ "This is a demonstration of alignment where the model has learned which words in English correspond to words in French. For example, the words *signed* and *signé* have a large weight because they have the same meaning. Typically, these alignments are learned using linear layers in the model, but you've used pre-trained embeddings here.\n",
177
+ "\n",
178
+ "### Exercise 2\n",
179
+ "Complete the implementation of scaled dot-product attention using your `calculate_weights` function (ignore the mask)."
180
+ ]
181
+ },
182
+ {
183
+ "cell_type": "code",
184
+ "execution_count": null,
185
+ "id": "fbfc157e",
186
+ "metadata": {},
187
+ "outputs": [],
188
+ "source": [
189
+ "def attention_qkv(queries, keys, values):\n",
190
+ " \"\"\" Calculate scaled dot-product attention from queries, keys, and values matrices \"\"\"\n",
191
+ " \n",
192
+ " # Replace pass with your code.\n",
193
+ " return calculate_weights(queries, keys).dot(values)\n",
194
+ "\n",
195
+ "\n",
196
+ "attention_qkv_result = attention_qkv(embedded_fr, embedded_en, embedded_en)\n",
197
+ "\n",
198
+ "print(f\"The shape of the attention_qkv function is {attention_qkv_result.shape}\")\n",
199
+ "print(f\"Some elements of the attention_qkv function are \\n{attention_qkv_result[0:2,:10]}\")"
200
+ ]
201
+ },
202
+ {
203
+ "cell_type": "markdown",
204
+ "id": "f98335f0",
205
+ "metadata": {},
206
+ "source": [
207
+ "**Expected output**\n",
208
+ "\n",
209
+ "The shape of the attention_qkv function is `(14, 300)`\n",
210
+ "\n",
211
+ "Some elements of the attention_qkv function are \n",
212
+ "```python\n",
213
+ "[[-0.04039161 -0.00275749 0.00389873 0.04842744 -0.02472726 0.01435613\n",
214
+ " -0.00370253 -0.0619686 -0.00206159 0.01615228]\n",
215
+ " [-0.04083253 -0.00245985 0.00409068 0.04830341 -0.02479128 0.01447497\n",
216
+ " -0.00355203 -0.06196036 -0.00241327 0.01582606]]\n",
217
+ "```"
218
+ ]
219
+ },
220
+ {
221
+ "cell_type": "markdown",
222
+ "id": "f87131fb",
223
+ "metadata": {},
224
+ "source": [
225
+ "## Solutions"
226
+ ]
227
+ },
228
+ {
229
+ "cell_type": "markdown",
230
+ "id": "8470a024",
231
+ "metadata": {},
232
+ "source": [
233
+ "```python\n",
234
+ "def softmax(x, axis=0):\n",
235
+ " \"\"\" Calculate softmax function for an array x\n",
236
+ " \n",
237
+ " axis=0 calculates softmax across rows which means each column sums to 1 \n",
238
+ " axis=1 calculates softmax across columns which means each row sums to 1\n",
239
+ " \"\"\"\n",
240
+ " y = np.exp(x) \n",
241
+ " return y / np.expand_dims(np.sum(y, axis=axis), axis)\n",
242
+ "\n",
243
+ "def calculate_weights(queries, keys):\n",
244
+ " \"\"\" Calculate the weights for scaled dot-product attention\"\"\"\n",
245
+ " dot = np.matmul(queries, keys.T)/np.sqrt(keys.shape[1])\n",
246
+ " weights = softmax(dot, axis=1)\n",
247
+ " \n",
248
+ " assert weights.sum(axis=1)[0] == 1, \"Each row in weights must sum to 1\"\n",
249
+ " \n",
250
+ " return weights\n",
251
+ "\n",
252
+ "def attention_qkv(queries, keys, values):\n",
253
+ " \"\"\" Calculate scaled dot-product attention from queries, keys, and values matrices \"\"\"\n",
254
+ " weights = calculate_weights(queries, keys)\n",
255
+ " return np.matmul(weights, values)\n",
256
+ "```"
257
+ ]
258
+ }
259
+ ],
260
+ "metadata": {
261
+ "kernelspec": {
262
+ "display_name": "Python 3 (ipykernel)",
263
+ "language": "python",
264
+ "name": "python3"
265
+ },
266
+ "language_info": {
267
+ "codemirror_mode": {
268
+ "name": "ipython",
269
+ "version": 3
270
+ },
271
+ "file_extension": ".py",
272
+ "mimetype": "text/x-python",
273
+ "name": "python",
274
+ "nbconvert_exporter": "python",
275
+ "pygments_lexer": "ipython3",
276
+ "version": "3.10.11"
277
+ }
278
+ },
279
+ "nbformat": 4,
280
+ "nbformat_minor": 5
281
+ }
NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/data/embeddings_en.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0dde813af6f3db6e520041567fe87619549e9a966515c534f050fe08413e0bc5
3
+ size 6681346
NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/data/embeddings_fr.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:02aeb7d162339f0440134f501e867b15203f0dfac2f4cde6c7333840642b03ad
3
+ size 6685746
NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/data/wmt19_can.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ Walisische AMs machten sich Sorgen, dass sie wie Muppets aussehen könnten
NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/data/wmt19_ref.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ Walisische Ageordnete sorgen sich "wie Dödel auszusehen"
NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/data/wmt19_src.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ Welsh AMs worried about 'looking like muppets'
NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/data/word2int_en.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:42b2af079f58c498b5f437e1b6922d486a8baa721f254c15ccaba44e5a7c8165
3
+ size 127796
NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/data/word2int_fr.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:16e1f4953589bfcad3daae8002476a5727f9902dc8f48f513a20bcb1c94f81dd
3
+ size 133064
NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/images/alignment.png ADDED
NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/images/alignment_model_3.jpg ADDED
NLP with Attention Models/NMT_with_Attention/Basic Attention-BLEU-QKV Attention/Files/home/jovyan/work/images/attention.png ADDED
NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/.ipynb_checkpoints/C4W1_Assignment-checkpoint.ipynb ADDED
@@ -0,0 +1,1994 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "id": "9cb49525",
6
+ "metadata": {},
7
+ "source": [
8
+ "# Assignment 1: Neural Machine Translation\n",
9
+ "\n",
10
+ "Welcome to the first assignment of Course 4. Here, you will build an English-to-Portuguese neural machine translation (NMT) model using Long Short-Term Memory (LSTM) networks with attention. Machine translation is an important task in natural language processing and could be useful not only for translating one language to another but also for word sense disambiguation (e.g. determining whether the word \"bank\" refers to the financial bank, or the land alongside a river). Implementing this using just a Recurrent Neural Network (RNN) with LSTMs can work for short to medium length sentences but can result in vanishing gradients for very long sequences. To help with this, you will be adding an attention mechanism to allow the decoder to access all relevant parts of the input sentence regardless of its length. By completing this assignment, you will:\n",
11
+ "\n",
12
+ "- Implement an encoder-decoder system with attention\n",
13
+ "- Build the NMT model from scratch using Tensorflow\n",
14
+ "- Generate translations using greedy and Minimum Bayes Risk (MBR) decoding\n",
15
+ "\n",
16
+ "## Table of Contents\n",
17
+ "- [1 - Data Preparation](#1)\n",
18
+ "- [2 - NMT model with attention](#2)\n",
19
+ " - [Exercise 1 - Encoder](#ex1)\n",
20
+ " - [Exercise 2 - CrossAttention](#ex2)\n",
21
+ " - [Exercise 3 - Decoder](#ex3) \n",
22
+ " - [Exercise 4 - Translator](#ex4)\n",
23
+ "- [3 - Training](#3)\n",
24
+ "- [4 - Using the model for inference ](#4)\n",
25
+ " - [Exercise 5 - translate](#ex5)\n",
26
+ "- [5 - Minimum Bayes-Risk Decoding](#5)\n",
27
+ " - [Exercise 6 - rouge1_similarity](#ex6)\n",
28
+ " - [Exercise 7 - average_overlap](#ex7)\n"
29
+ ]
30
+ },
31
+ {
32
+ "cell_type": "code",
33
+ "execution_count": null,
34
+ "id": "f9ef370d",
35
+ "metadata": {
36
+ "deletable": false,
37
+ "editable": false,
38
+ "tags": [
39
+ "graded"
40
+ ]
41
+ },
42
+ "outputs": [],
43
+ "source": [
44
+ "import os\n",
45
+ "os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # Setting this env variable prevents TF warnings from showing up\n",
46
+ "\n",
47
+ "import numpy as np\n",
48
+ "import tensorflow as tf\n",
49
+ "from collections import Counter\n",
50
+ "from utils import (sentences, train_data, val_data, english_vectorizer, portuguese_vectorizer, \n",
51
+ " masked_loss, masked_acc, tokens_to_text)"
52
+ ]
53
+ },
54
+ {
55
+ "cell_type": "code",
56
+ "execution_count": null,
57
+ "id": "8adb8fd6",
58
+ "metadata": {
59
+ "deletable": false,
60
+ "editable": false,
61
+ "tags": []
62
+ },
63
+ "outputs": [],
64
+ "source": [
65
+ "import w1_unittest"
66
+ ]
67
+ },
68
+ {
69
+ "cell_type": "markdown",
70
+ "id": "e76be1dc",
71
+ "metadata": {},
72
+ "source": [
73
+ "<a name=\"1\"></a>\n",
74
+ "## 1. Data Preparation\n",
75
+ "\n",
76
+ "The text pre-processing bits have already been taken care of (if you are interested in this be sure to check the `utils.py` file). The steps performed can be summarized as:\n",
77
+ "\n",
78
+ "- Reading the raw data from the text files\n",
79
+ "- Cleaning the data (using lowercase, adding space around punctuation, trimming whitespaces, etc)\n",
80
+ "- Splitting it into training and validation sets\n",
81
+ "- Adding the start-of-sentence and end-of-sentence tokens to every sentence\n",
82
+ "- Tokenizing the sentences\n",
83
+ "- Creating a Tensorflow dataset out of the tokenized sentences\n",
84
+ "\n",
85
+ "Take a moment to inspect the raw sentences:"
86
+ ]
87
+ },
88
+ {
89
+ "cell_type": "code",
90
+ "execution_count": null,
91
+ "id": "226033a1",
92
+ "metadata": {
93
+ "deletable": false,
94
+ "editable": false,
95
+ "tags": [
96
+ "graded"
97
+ ]
98
+ },
99
+ "outputs": [],
100
+ "source": [
101
+ "portuguese_sentences, english_sentences = sentences\n",
102
+ "\n",
103
+ "print(f\"English (to translate) sentence:\\n\\n{english_sentences[-5]}\\n\")\n",
104
+ "print(f\"Portuguese (translation) sentence:\\n\\n{portuguese_sentences[-5]}\")"
105
+ ]
106
+ },
107
+ {
108
+ "cell_type": "markdown",
109
+ "id": "5ba90eb9",
110
+ "metadata": {},
111
+ "source": [
112
+ "You don't have much use for the raw sentences so delete them to save memory:"
113
+ ]
114
+ },
115
+ {
116
+ "cell_type": "code",
117
+ "execution_count": null,
118
+ "id": "d9f081b0",
119
+ "metadata": {
120
+ "deletable": false,
121
+ "editable": false,
122
+ "tags": [
123
+ "graded"
124
+ ]
125
+ },
126
+ "outputs": [],
127
+ "source": [
128
+ "del portuguese_sentences\n",
129
+ "del english_sentences\n",
130
+ "del sentences"
131
+ ]
132
+ },
133
+ {
134
+ "cell_type": "markdown",
135
+ "id": "a2ff83d2",
136
+ "metadata": {},
137
+ "source": [
138
+ "Notice that you imported an `english_vectorizer` and a `portuguese_vectorizer` from `utils.py`. These were created using [tf.keras.layers.TextVectorization](https://www.tensorflow.org/api_docs/python/tf/keras/layers/TextVectorization) and they provide interesting features such as ways to visualize the vocabulary and convert text into tokenized ids and vice versa. In fact, you can inspect the first ten words of the vocabularies for both languages:"
139
+ ]
140
+ },
141
+ {
142
+ "cell_type": "code",
143
+ "execution_count": null,
144
+ "id": "2c1cfc17",
145
+ "metadata": {
146
+ "deletable": false,
147
+ "editable": false,
148
+ "tags": [
149
+ "graded"
150
+ ]
151
+ },
152
+ "outputs": [],
153
+ "source": [
154
+ "print(f\"First 10 words of the english vocabulary:\\n\\n{english_vectorizer.get_vocabulary()[:10]}\\n\")\n",
155
+ "print(f\"First 10 words of the portuguese vocabulary:\\n\\n{portuguese_vectorizer.get_vocabulary()[:10]}\")"
156
+ ]
157
+ },
158
+ {
159
+ "cell_type": "markdown",
160
+ "id": "3152b075",
161
+ "metadata": {},
162
+ "source": [
163
+ "Notice that the first 4 words are reserved for special words. In order, these are:\n",
164
+ "\n",
165
+ "- the empty string\n",
166
+ "- a special token to represent an unknown word\n",
167
+ "- a special token to represent the start of a sentence\n",
168
+ "- a special token to represent the end of a sentence\n",
169
+ "\n",
170
+ "You can see how many words are in a vocabulary by using the `vocabulary_size` method:"
171
+ ]
172
+ },
173
+ {
174
+ "cell_type": "code",
175
+ "execution_count": null,
176
+ "id": "5facaa0c",
177
+ "metadata": {
178
+ "deletable": false,
179
+ "editable": false,
180
+ "slideshow": {
181
+ "slide_type": ""
182
+ },
183
+ "tags": [
184
+ "graded"
185
+ ]
186
+ },
187
+ "outputs": [],
188
+ "source": [
189
+ "# Size of the vocabulary\n",
190
+ "vocab_size_por = portuguese_vectorizer.vocabulary_size()\n",
191
+ "vocab_size_eng = english_vectorizer.vocabulary_size()\n",
192
+ "\n",
193
+ "print(f\"Portuguese vocabulary is made up of {vocab_size_por} words\")\n",
194
+ "print(f\"English vocabulary is made up of {vocab_size_eng} words\")"
195
+ ]
196
+ },
197
+ {
198
+ "cell_type": "markdown",
199
+ "id": "53e4b615",
200
+ "metadata": {
201
+ "editable": true,
202
+ "slideshow": {
203
+ "slide_type": ""
204
+ },
205
+ "tags": []
206
+ },
207
+ "source": [
208
+ "You can define [tf.keras.layers.StringLookup](https://www.tensorflow.org/api_docs/python/tf/keras/layers/StringLookup) objects that will help you map from words to ids and vice versa. Do this for the portuguese vocabulary since this will be useful later on when you decode the predictions from your model:"
209
+ ]
210
+ },
211
+ {
212
+ "cell_type": "code",
213
+ "execution_count": null,
214
+ "id": "218f7a36",
215
+ "metadata": {
216
+ "deletable": false,
217
+ "editable": false,
218
+ "tags": [
219
+ "graded"
220
+ ]
221
+ },
222
+ "outputs": [],
223
+ "source": [
224
+ "# This helps you convert from words to ids\n",
225
+ "word_to_id = tf.keras.layers.StringLookup(\n",
226
+ " vocabulary=portuguese_vectorizer.get_vocabulary(), \n",
227
+ " mask_token=\"\", \n",
228
+ " oov_token=\"[UNK]\"\n",
229
+ ")\n",
230
+ "\n",
231
+ "# This helps you convert from ids to words\n",
232
+ "id_to_word = tf.keras.layers.StringLookup(\n",
233
+ " vocabulary=portuguese_vectorizer.get_vocabulary(),\n",
234
+ " mask_token=\"\",\n",
235
+ " oov_token=\"[UNK]\",\n",
236
+ " invert=True,\n",
237
+ ")"
238
+ ]
239
+ },
240
+ {
241
+ "cell_type": "markdown",
242
+ "id": "4af8b623",
243
+ "metadata": {},
244
+ "source": [
245
+ "Try it out for the special tokens and a random word:"
246
+ ]
247
+ },
248
+ {
249
+ "cell_type": "code",
250
+ "execution_count": null,
251
+ "id": "20076b9a",
252
+ "metadata": {
253
+ "deletable": false,
254
+ "editable": false,
255
+ "tags": [
256
+ "graded"
257
+ ]
258
+ },
259
+ "outputs": [],
260
+ "source": [
261
+ "unk_id = word_to_id(\"[UNK]\")\n",
262
+ "sos_id = word_to_id(\"[SOS]\")\n",
263
+ "eos_id = word_to_id(\"[EOS]\")\n",
264
+ "baunilha_id = word_to_id(\"baunilha\")\n",
265
+ "\n",
266
+ "print(f\"The id for the [UNK] token is {unk_id}\")\n",
267
+ "print(f\"The id for the [SOS] token is {sos_id}\")\n",
268
+ "print(f\"The id for the [EOS] token is {eos_id}\")\n",
269
+ "print(f\"The id for baunilha (vanilla) is {baunilha_id}\")"
270
+ ]
271
+ },
272
+ {
273
+ "cell_type": "markdown",
274
+ "id": "2f1d744c",
275
+ "metadata": {},
276
+ "source": [
277
+ "Finally take a look at how the data that is going to be fed to the neural network looks like. Both `train_data` and `val_data` are of type `tf.data.Dataset` and are already arranged in batches of 64 examples. To get the first batch out of a tf dataset you can use the `take` method. To get the first example out of the batch you can slice the tensor and use the `numpy` method for nicer printing:"
278
+ ]
279
+ },
280
+ {
281
+ "cell_type": "code",
282
+ "execution_count": null,
283
+ "id": "739777eb",
284
+ "metadata": {
285
+ "deletable": false,
286
+ "editable": false,
287
+ "tags": [
288
+ "graded"
289
+ ]
290
+ },
291
+ "outputs": [],
292
+ "source": [
293
+ "for (to_translate, sr_translation), translation in train_data.take(1):\n",
294
+ " print(f\"Tokenized english sentence:\\n{to_translate[0, :].numpy()}\\n\\n\")\n",
295
+ " print(f\"Tokenized portuguese sentence (shifted to the right):\\n{sr_translation[0, :].numpy()}\\n\\n\")\n",
296
+ " print(f\"Tokenized portuguese sentence:\\n{translation[0, :].numpy()}\\n\\n\")"
297
+ ]
298
+ },
299
+ {
300
+ "cell_type": "markdown",
301
+ "id": "bdd9ee3c",
302
+ "metadata": {
303
+ "editable": true,
304
+ "slideshow": {
305
+ "slide_type": ""
306
+ },
307
+ "tags": []
308
+ },
309
+ "source": [
310
+ "There are a couple of important details to notice.\n",
311
+ "\n",
312
+ "- Padding has already been applied to the tensors and the value used for this is 0\n",
313
+ "- Each example consists of 3 different tensors:\n",
314
+ " - The sentence to translate\n",
315
+ " - The shifted-to-the-right translation\n",
316
+ " - The translation\n",
317
+ " \n",
318
+ "The first two can be considered as the features, while the third one as the target. By doing this your model can perform Teacher Forcing as you saw in the lectures.\n",
319
+ "\n",
320
+ "Now it is time to begin coding!"
321
+ ]
322
+ },
323
+ {
324
+ "cell_type": "markdown",
325
+ "id": "dd41cb52",
326
+ "metadata": {
327
+ "editable": true,
328
+ "slideshow": {
329
+ "slide_type": ""
330
+ },
331
+ "tags": []
332
+ },
333
+ "source": [
334
+ "<a name=\"2\"></a>\n",
335
+ "## 2. NMT model with attention\n",
336
+ "\n",
337
+ "The model you will build uses an encoder-decoder architecture. This Recurrent Neural Network (RNN) takes in a tokenized version of a sentence in its encoder, then passes it on to the decoder for translation. As mentioned in the lectures, just using a a regular sequence-to-sequence model with LSTMs will work effectively for short to medium sentences but will start to degrade for longer ones. You can picture it like the figure below where all of the context of the input sentence is compressed into one vector that is passed into the decoder block. You can see how this will be an issue for very long sentences (e.g. 100 tokens or more) because the context of the first parts of the input will have very little effect on the final vector passed to the decoder.\n",
338
+ "\n",
339
+ "<img src='images/plain_rnn.png'>\n",
340
+ "\n",
341
+ "Adding an attention layer to this model avoids this problem by giving the decoder access to all parts of the input sentence. To illustrate, let's just use a 4-word input sentence as shown below. Remember that a hidden state is produced at each timestep of the encoder (represented by the orange rectangles). These are all passed to the attention layer and each are given a score given the current activation (i.e. hidden state) of the decoder. For instance, let's consider the figure below where the first prediction \"como\" is already made. To produce the next prediction, the attention layer will first receive all the encoder hidden states (i.e. orange rectangles) as well as the decoder hidden state when producing the word \"como\" (i.e. first green rectangle). Given this information, it will score each of the encoder hidden states to know which one the decoder should focus on to produce the next word. As a result of training, the model might have learned that it should align to the second encoder hidden state and subsequently assigns a high probability to the word \"você\". If we are using greedy decoding, we will output the said word as the next symbol, then restart the process to produce the next word until we reach an end-of-sentence prediction.\n",
342
+ "\n",
343
+ "<img src='images/attention_overview.png'>\n",
344
+ "\n",
345
+ "\n",
346
+ "There are different ways to implement attention and the one we'll use for this assignment is the Scaled Dot Product Attention which has the form:\n",
347
+ "\n",
348
+ "$$Attention(Q, K, V) = softmax(\\frac{QK^T}{\\sqrt{d_k}})V$$\n",
349
+ "\n",
350
+ "You will dive deeper into this equation in the next week but for now, you can think of it as computing scores using queries (Q) and keys (K), followed by a multiplication of values (V) to get a context vector at a particular timestep of the decoder. This context vector is fed to the decoder RNN to get a set of probabilities for the next predicted word. The division by square root of the keys dimensionality ($\\sqrt{d_k}$) is for improving model performance and you'll also learn more about it next week. For our machine translation application, the encoder activations (i.e. encoder hidden states) will be the keys and values, while the decoder activations (i.e. decoder hidden states) will be the queries.\n",
351
+ "\n",
352
+ "You will see in the upcoming sections that this complex architecture and mechanism can be implemented with just a few lines of code. \n",
353
+ "\n",
354
+ "First you will define two important global variables:\n",
355
+ "\n",
356
+ "- The size of the vocabulary\n",
357
+ "- The number of units in the LSTM layers (the same number will be used for all LSTM layers)\n",
358
+ "\n",
359
+ "In this assignment, the vocabulary sizes for English and Portuguese are the same. Therefore, we use a single constant VOCAB_SIZE throughout the notebook. While in other settings, vocabulary sizes could differ, that is not the case in our assignment."
360
+ ]
361
+ },
362
+ {
363
+ "cell_type": "code",
364
+ "execution_count": null,
365
+ "id": "2e484abf",
366
+ "metadata": {
367
+ "deletable": false,
368
+ "editable": false,
369
+ "slideshow": {
370
+ "slide_type": ""
371
+ },
372
+ "tags": [
373
+ "graded"
374
+ ]
375
+ },
376
+ "outputs": [],
377
+ "source": [
378
+ "VOCAB_SIZE = 12000\n",
379
+ "UNITS = 256"
380
+ ]
381
+ },
382
+ {
383
+ "cell_type": "markdown",
384
+ "id": "cc251965",
385
+ "metadata": {},
386
+ "source": [
387
+ "<a name=\"ex1\"></a>\n",
388
+ "## Exercise 1 - Encoder\n",
389
+ "\n",
390
+ "Your first exercise is to code the encoder part of the neural network. For this, complete the `Encoder` class below. Notice that in the constructor (the `__init__` method) you need to define all of the sublayers of the encoder and then use these sublayers during the forward pass (the `call` method).\n",
391
+ "\n",
392
+ "The encoder consists of the following layers:\n",
393
+ "\n",
394
+ "- [Embedding](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding). For this layer you need to define the appropriate `input_dim` and `output_dim` and let it know that you are using '0' as padding, which can be done by using the appropriate value for the `mask_zero` parameter.\n",
395
+ " \n",
396
+ "+ [Bidirectional](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Bidirectional) [LSTM](https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM). In TF you can implement bidirectional behaviour for RNN-like layers. This part is already taken care of but you will need to specify the appropriate type of layer as well as its parameters. In particular you need to set the appropriate number of units and make sure that the LSTM returns the full sequence and not only the last output, which can be done by using the appropriate value for the `return_sequences` parameter.\n",
397
+ "\n",
398
+ "\n",
399
+ "You need to define the forward pass using the syntax of TF's [functional API](https://www.tensorflow.org/guide/keras/functional_api). What this means is that you chain function calls together to define your network like this:\n",
400
+ "\n",
401
+ "```python\n",
402
+ "encoder_input = keras.Input(shape=(28, 28, 1), name=\"original_img\")\n",
403
+ "x = layers.Conv2D(16, 3, activation=\"relu\")(encoder_input)\n",
404
+ "x = layers.MaxPooling2D(3)(x)\n",
405
+ "x = layers.Conv2D(16, 3, activation=\"relu\")(x)\n",
406
+ "encoder_output = layers.GlobalMaxPooling2D()(x)\n",
407
+ "```"
408
+ ]
409
+ },
410
+ {
411
+ "cell_type": "code",
412
+ "execution_count": null,
413
+ "id": "b1db0a1d",
414
+ "metadata": {
415
+ "deletable": false,
416
+ "tags": [
417
+ "graded"
418
+ ]
419
+ },
420
+ "outputs": [],
421
+ "source": [
422
+ "# GRADED CLASS: Encoder\n",
423
+ "class Encoder(tf.keras.layers.Layer):\n",
424
+ " def __init__(self, vocab_size, units):\n",
425
+ " \"\"\"Initializes an instance of this class\n",
426
+ "\n",
427
+ " Args:\n",
428
+ " vocab_size (int): Size of the vocabulary\n",
429
+ " units (int): Number of units in the LSTM layer\n",
430
+ " \"\"\"\n",
431
+ " super(Encoder, self).__init__()\n",
432
+ "\n",
433
+ " ### START CODE HERE ###\n",
434
+ "\n",
435
+ " self.embedding = tf.keras.layers.Embedding( \n",
436
+ " input_dim=None,\n",
437
+ " output_dim=None,\n",
438
+ " mask_zero=None\n",
439
+ " ) \n",
440
+ "\n",
441
+ " self.rnn = tf.keras.layers.Bidirectional( \n",
442
+ " merge_mode=\"sum\", \n",
443
+ " layer=tf.keras.layers.None(\n",
444
+ " units=None,\n",
445
+ " return_sequences=None\n",
446
+ " ), \n",
447
+ " ) \n",
448
+ "\n",
449
+ " ### END CODE HERE ###\n",
450
+ "\n",
451
+ " def call(self, context):\n",
452
+ " \"\"\"Forward pass of this layer\n",
453
+ "\n",
454
+ " Args:\n",
455
+ " context (tf.Tensor): The sentence to translate\n",
456
+ "\n",
457
+ " Returns:\n",
458
+ " tf.Tensor: Encoded sentence to translate\n",
459
+ " \"\"\"\n",
460
+ "\n",
461
+ " ### START CODE HERE ###\n",
462
+ "\n",
463
+ " # Pass the context through the embedding layer\n",
464
+ " x = None\n",
465
+ "\n",
466
+ " # Pass the output of the embedding through the RNN\n",
467
+ " x = None\n",
468
+ "\n",
469
+ " ### END CODE HERE ###\n",
470
+ "\n",
471
+ " return x"
472
+ ]
473
+ },
474
+ {
475
+ "cell_type": "code",
476
+ "execution_count": null,
477
+ "id": "65034ffd",
478
+ "metadata": {
479
+ "deletable": false,
480
+ "editable": false,
481
+ "tags": [
482
+ "graded"
483
+ ]
484
+ },
485
+ "outputs": [],
486
+ "source": [
487
+ "# Do a quick check of your implementation\n",
488
+ "\n",
489
+ "# Create an instance of your class\n",
490
+ "encoder = Encoder(VOCAB_SIZE, UNITS)\n",
491
+ "\n",
492
+ "# Pass a batch of sentences to translate from english to portuguese\n",
493
+ "encoder_output = encoder(to_translate)\n",
494
+ "\n",
495
+ "print(f'Tensor of sentences in english has shape: {to_translate.shape}\\n')\n",
496
+ "print(f'Encoder output has shape: {encoder_output.shape}')"
497
+ ]
498
+ },
499
+ {
500
+ "cell_type": "markdown",
501
+ "id": "a909aea1",
502
+ "metadata": {},
503
+ "source": [
504
+ "##### __Expected Output__\n",
505
+ "\n",
506
+ "```\n",
507
+ "Tensor of sentences in english has shape: (64, 14)\n",
508
+ "\n",
509
+ "Encoder output has shape: (64, 14, 256)\n",
510
+ "```"
511
+ ]
512
+ },
513
+ {
514
+ "cell_type": "code",
515
+ "execution_count": null,
516
+ "id": "3031bb14",
517
+ "metadata": {
518
+ "deletable": false,
519
+ "editable": false,
520
+ "tags": []
521
+ },
522
+ "outputs": [],
523
+ "source": [
524
+ "# Test your code!\n",
525
+ "\n",
526
+ "w1_unittest.test_encoder(Encoder)"
527
+ ]
528
+ },
529
+ {
530
+ "cell_type": "markdown",
531
+ "id": "1afe83f4",
532
+ "metadata": {},
533
+ "source": [
534
+ "<a name=\"ex2\"></a>\n",
535
+ "## Exercise 2 - CrossAttention\n",
536
+ "\n",
537
+ "Your next exercise is to code the layer that will perform cross attention between the original sentences and the translations. For this, complete the `CrossAttention` class below. Notice that in the constructor (the `__init__` method) you need to define all of the sublayers and then use these sublayers during the forward pass (the `call` method). For this particular case some of these bits are already taken care of.\n",
538
+ "\n",
539
+ "The cross attention consists of the following layers:\n",
540
+ "\n",
541
+ "- [MultiHeadAttention](https://www.tensorflow.org/api_docs/python/tf/keras/layers/MultiHeadAttention). For this layer you need to define the appropriate `key_dim`, which is the size of the key and query tensors. You will also need to set the number of heads to 1 since you aren't implementing multi head attention but attention between two tensors. The reason why this layer is preferred over [Attention](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Attention) is that it allows simpler code during the forward pass.\n",
542
+ " \n",
543
+ "A couple of things to notice:\n",
544
+ "- You need a way to pass both the output of the attention alongside the shifted-to-the-right translation (since this cross attention happens in the decoder side). For this you will use an [Add](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Add) layer so that the original dimension is preserved, which would not happen if you use something like a [Concatenate](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Concatenate) layer.\n",
545
+ "\n",
546
+ "+ Layer normalization is also performed for better stability of the network by using a [LayerNormalization](https://www.tensorflow.org/api_docs/python/tf/keras/layers/LayerNormalization) layer.\n",
547
+ "\n",
548
+ "- You don't need to worry about these last steps as these are already solved.\n",
549
+ "\n"
550
+ ]
551
+ },
552
+ {
553
+ "cell_type": "code",
554
+ "execution_count": null,
555
+ "id": "74e71f3d",
556
+ "metadata": {
557
+ "deletable": false,
558
+ "tags": [
559
+ "graded"
560
+ ]
561
+ },
562
+ "outputs": [],
563
+ "source": [
564
+ "# GRADED CLASS: CrossAttention\n",
565
+ "class CrossAttention(tf.keras.layers.Layer):\n",
566
+ " def __init__(self, units):\n",
567
+ " \"\"\"Initializes an instance of this class\n",
568
+ "\n",
569
+ " Args:\n",
570
+ " units (int): Number of units in the LSTM layer\n",
571
+ " \"\"\"\n",
572
+ " super().__init__()\n",
573
+ "\n",
574
+ " ### START CODE HERE ###\n",
575
+ "\n",
576
+ " self.mha = ( \n",
577
+ " tf.keras.layers.None(\n",
578
+ " key_dim=None,\n",
579
+ " num_heads=None\n",
580
+ " ) \n",
581
+ " ) \n",
582
+ "\n",
583
+ " ### END CODE HERE ###\n",
584
+ "\n",
585
+ " self.layernorm = tf.keras.layers.LayerNormalization()\n",
586
+ " self.add = tf.keras.layers.Add()\n",
587
+ "\n",
588
+ " def call(self, context, target):\n",
589
+ " \"\"\"Forward pass of this layer\n",
590
+ "\n",
591
+ " Args:\n",
592
+ " context (tf.Tensor): Encoded sentence to translate\n",
593
+ " target (tf.Tensor): The embedded shifted-to-the-right translation\n",
594
+ "\n",
595
+ " Returns:\n",
596
+ " tf.Tensor: Cross attention between context and target\n",
597
+ " \"\"\"\n",
598
+ " ### START CODE HERE ###\n",
599
+ "\n",
600
+ " # Call the MH attention by passing in the query and value\n",
601
+ " # For this case the query should be the translation and the value the encoded sentence to translate\n",
602
+ " # Hint: Check the call arguments of MultiHeadAttention in the docs\n",
603
+ " attn_output = None(\n",
604
+ " query=None,\n",
605
+ " value=None\n",
606
+ " ) \n",
607
+ "\n",
608
+ " ### END CODE HERE ###\n",
609
+ "\n",
610
+ " x = self.add([target, attn_output])\n",
611
+ "\n",
612
+ " x = self.layernorm(x)\n",
613
+ "\n",
614
+ " return x"
615
+ ]
616
+ },
617
+ {
618
+ "cell_type": "code",
619
+ "execution_count": null,
620
+ "id": "4c62796f",
621
+ "metadata": {
622
+ "deletable": false,
623
+ "editable": false,
624
+ "tags": [
625
+ "graded"
626
+ ]
627
+ },
628
+ "outputs": [],
629
+ "source": [
630
+ "# Do a quick check of your implementation\n",
631
+ "\n",
632
+ "# Create an instance of your class\n",
633
+ "attention_layer = CrossAttention(UNITS)\n",
634
+ "\n",
635
+ "# The attention layer expects the embedded sr-translation and the context\n",
636
+ "# The context (encoder_output) is already embedded so you need to do this for sr_translation:\n",
637
+ "sr_translation_embed = tf.keras.layers.Embedding(VOCAB_SIZE, output_dim=UNITS, mask_zero=True)(sr_translation)\n",
638
+ "\n",
639
+ "# Compute the cross attention\n",
640
+ "attention_result = attention_layer(encoder_output, sr_translation_embed)\n",
641
+ "\n",
642
+ "print(f'Tensor of contexts has shape: {encoder_output.shape}')\n",
643
+ "print(f'Tensor of translations has shape: {sr_translation_embed.shape}')\n",
644
+ "print(f'Tensor of attention scores has shape: {attention_result.shape}')"
645
+ ]
646
+ },
647
+ {
648
+ "cell_type": "markdown",
649
+ "id": "41d4f99a",
650
+ "metadata": {},
651
+ "source": [
652
+ "##### __Expected Output__\n",
653
+ "\n",
654
+ "```\n",
655
+ "Tensor of contexts has shape: (64, 14, 256)\n",
656
+ "Tensor of translations has shape: (64, 15, 256)\n",
657
+ "Tensor of attention scores has shape: (64, 15, 256)\n",
658
+ "```"
659
+ ]
660
+ },
661
+ {
662
+ "cell_type": "code",
663
+ "execution_count": null,
664
+ "id": "4f658975",
665
+ "metadata": {
666
+ "deletable": false,
667
+ "editable": false,
668
+ "tags": []
669
+ },
670
+ "outputs": [],
671
+ "source": [
672
+ "# Test your code!\n",
673
+ "\n",
674
+ "w1_unittest.test_cross_attention(CrossAttention)"
675
+ ]
676
+ },
677
+ {
678
+ "cell_type": "markdown",
679
+ "id": "aa296ee2",
680
+ "metadata": {},
681
+ "source": [
682
+ "<a name=\"ex3\"></a>\n",
683
+ "## Exercise 3 - Decoder\n",
684
+ "\n",
685
+ "\n",
686
+ "Now you will implement the decoder part of the neural network by completing the `Decoder` class below. Notice that in the constructor (the `__init__` method) you need to define all of the sublayers of the decoder and then use these sublayers during the forward pass (the `call` method).\n",
687
+ "\n",
688
+ "The decoder consists of the following layers:\n",
689
+ "\n",
690
+ "- [Embedding](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding). For this layer you need to define the appropriate `input_dim` and `output_dim` and let it know that you are using '0' as padding, which can be done by using the appropriate value for the `mask_zero` parameter.\n",
691
+ " \n",
692
+ " \n",
693
+ "+ Pre-attention [LSTM](https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM). Unlike in the encoder in which you used a Bidirectional LSTM, here you will use a vanilla LSTM. Don't forget to set the appropriate number of units and make sure that the LSTM returns the full sequence and not only the last output, which can be done by using the appropriate value for the `return_sequences` parameter. It is very important that this layer returns the state since this will be needed for inference so make sure to set the `return_state` parameter accordingly. Notice that LSTM layers return state as a tuple of two tensors called `memory_state` and `carry_state`, **however these names have been changed to better reflect what you have seen in the lectures to `hidden_state` and `cell_state` respectively**.\n",
694
+ "\n",
695
+ "- The attention layer that performs cross attention between the sentence to translate and the right-shifted translation. Here you need to use the `CrossAttention` layer you defined in the previous exercise.\n",
696
+ "\n",
697
+ "+ Post-attention [LSTM](https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM). Another LSTM layer. For this one you don't need it to return the state.\n",
698
+ "\n",
699
+ "- Finally a [Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense) layer. This one should have the same number of units as the size of the vocabulary since you expect it to compute the logits for every possible word in the vocabulary. Make sure to use a `logsoftmax` activation function for this one, which you can get as [tf.nn.log_softmax](https://www.tensorflow.org/api_docs/python/tf/nn/log_softmax).\n",
700
+ "\n"
701
+ ]
702
+ },
703
+ {
704
+ "cell_type": "code",
705
+ "execution_count": null,
706
+ "id": "e9639bdb",
707
+ "metadata": {
708
+ "deletable": false,
709
+ "tags": [
710
+ "graded"
711
+ ]
712
+ },
713
+ "outputs": [],
714
+ "source": [
715
+ "# GRADED CLASS: Decoder\n",
716
+ "class Decoder(tf.keras.layers.Layer):\n",
717
+ " def __init__(self, vocab_size, units):\n",
718
+ " \"\"\"Initializes an instance of this class\n",
719
+ "\n",
720
+ " Args:\n",
721
+ " vocab_size (int): Size of the vocabulary\n",
722
+ " units (int): Number of units in the LSTM layer\n",
723
+ " \"\"\"\n",
724
+ " super(Decoder, self).__init__()\n",
725
+ "\n",
726
+ " ### START CODE HERE ###\n",
727
+ "\n",
728
+ " # The embedding layer\n",
729
+ " self.embedding = tf.keras.layers.None(\n",
730
+ " input_dim=None,\n",
731
+ " output_dim=None,\n",
732
+ " mask_zero=None\n",
733
+ " ) \n",
734
+ "\n",
735
+ " # The RNN before attention\n",
736
+ " self.pre_attention_rnn = tf.keras.layers.None(\n",
737
+ " units=None,\n",
738
+ " return_sequences=None,\n",
739
+ " return_state=None\n",
740
+ " ) \n",
741
+ "\n",
742
+ " # The attention layer\n",
743
+ " self.attention = None(None)\n",
744
+ "\n",
745
+ " # The RNN after attention\n",
746
+ " self.post_attention_rnn = tf.keras.layers.None(\n",
747
+ " units=None,\n",
748
+ " return_sequences=None\n",
749
+ " ) \n",
750
+ "\n",
751
+ " # The dense layer with logsoftmax activation\n",
752
+ " self.output_layer = tf.keras.layers.None(\n",
753
+ " units=None,\n",
754
+ " activation=None\n",
755
+ " ) \n",
756
+ "\n",
757
+ " ### END CODE HERE ###\n",
758
+ "\n",
759
+ " def call(self, context, target, state=None, return_state=False):\n",
760
+ " \"\"\"Forward pass of this layer\n",
761
+ "\n",
762
+ " Args:\n",
763
+ " context (tf.Tensor): Encoded sentence to translate\n",
764
+ " target (tf.Tensor): The shifted-to-the-right translation\n",
765
+ " state (list[tf.Tensor, tf.Tensor], optional): Hidden state of the pre-attention LSTM. Defaults to None.\n",
766
+ " return_state (bool, optional): If set to true return the hidden states of the LSTM. Defaults to False.\n",
767
+ "\n",
768
+ " Returns:\n",
769
+ " tf.Tensor: The log_softmax probabilities of predicting a particular token\n",
770
+ " \"\"\"\n",
771
+ " ### START CODE HERE ###\n",
772
+ "\n",
773
+ " # Get the embedding of the input\n",
774
+ " x = self.None(None)\n",
775
+ "\n",
776
+ " # Pass the embedded input into the pre attention LSTM\n",
777
+ " # Hints:\n",
778
+ " # - The LSTM you defined earlier should return the output alongside the state (made up of two tensors)\n",
779
+ " # - Pass in the state to the LSTM (needed for inference)\n",
780
+ " x, hidden_state, cell_state = self.None(None, initial_state=None)\n",
781
+ "\n",
782
+ " # Perform cross attention between the context and the output of the LSTM (in that order)\n",
783
+ " x = self.None(None, None)\n",
784
+ "\n",
785
+ " # Do a pass through the post attention LSTM\n",
786
+ " x = self.None(None)\n",
787
+ "\n",
788
+ " # Compute the logits\n",
789
+ " logits = self.None(None)\n",
790
+ "\n",
791
+ " ### END CODE HERE ###\n",
792
+ "\n",
793
+ " if return_state:\n",
794
+ " return logits, [hidden_state, cell_state]\n",
795
+ "\n",
796
+ " return logits"
797
+ ]
798
+ },
799
+ {
800
+ "cell_type": "code",
801
+ "execution_count": null,
802
+ "id": "f6165cf2",
803
+ "metadata": {
804
+ "deletable": false,
805
+ "editable": false,
806
+ "tags": [
807
+ "graded"
808
+ ]
809
+ },
810
+ "outputs": [],
811
+ "source": [
812
+ "# Do a quick check of your implementation\n",
813
+ "\n",
814
+ "# Create an instance of your class\n",
815
+ "decoder = Decoder(VOCAB_SIZE, UNITS)\n",
816
+ "\n",
817
+ "# Notice that you don't need the embedded version of sr_translation since this is done inside the class\n",
818
+ "logits = decoder(encoder_output, sr_translation)\n",
819
+ "\n",
820
+ "print(f'Tensor of contexts has shape: {encoder_output.shape}')\n",
821
+ "print(f'Tensor of right-shifted translations has shape: {sr_translation.shape}')\n",
822
+ "print(f'Tensor of logits has shape: {logits.shape}')"
823
+ ]
824
+ },
825
+ {
826
+ "cell_type": "markdown",
827
+ "id": "6f2b5d7d",
828
+ "metadata": {},
829
+ "source": [
830
+ "##### __Expected Output__\n",
831
+ "\n",
832
+ "```\n",
833
+ "Tensor of contexts has shape: (64, 14, 256)\n",
834
+ "Tensor of right-shifted translations has shape: (64, 15)\n",
835
+ "Tensor of logits has shape: (64, 15, 12000)\n",
836
+ "```"
837
+ ]
838
+ },
839
+ {
840
+ "cell_type": "code",
841
+ "execution_count": null,
842
+ "id": "1b61093a",
843
+ "metadata": {
844
+ "deletable": false,
845
+ "editable": false,
846
+ "tags": []
847
+ },
848
+ "outputs": [],
849
+ "source": [
850
+ "# Test your code!\n",
851
+ "\n",
852
+ "w1_unittest.test_decoder(Decoder, CrossAttention)"
853
+ ]
854
+ },
855
+ {
856
+ "cell_type": "markdown",
857
+ "id": "9dcce3a7",
858
+ "metadata": {},
859
+ "source": [
860
+ "<a name=\"ex4\"></a>\n",
861
+ "## Exercise 4 - Translator\n",
862
+ "\n",
863
+ "Now you have to put together all of the layers you previously coded into an actual model. For this, complete the `Translator` class below. Notice how unlike the Encoder and Decoder classes inherited from `tf.keras.layers.Layer`, the Translator class inherits from `tf.keras.Model`.\n",
864
+ "\n",
865
+ "Remember that `train_data` will yield a tuple with the sentence to translate and the shifted-to-the-right translation, which are the \"features\" of the model. This means that the inputs of your network will be tuples containing context and targets."
866
+ ]
867
+ },
868
+ {
869
+ "cell_type": "code",
870
+ "execution_count": null,
871
+ "id": "205fcf31",
872
+ "metadata": {
873
+ "deletable": false,
874
+ "tags": [
875
+ "graded"
876
+ ]
877
+ },
878
+ "outputs": [],
879
+ "source": [
880
+ "# GRADED CLASS: Translator\n",
881
+ "class Translator(tf.keras.Model):\n",
882
+ " def __init__(self, vocab_size, units):\n",
883
+ " \"\"\"Initializes an instance of this class\n",
884
+ "\n",
885
+ " Args:\n",
886
+ " vocab_size (int): Size of the vocabulary\n",
887
+ " units (int): Number of units in the LSTM layer\n",
888
+ " \"\"\"\n",
889
+ " super().__init__()\n",
890
+ "\n",
891
+ " ### START CODE HERE ###\n",
892
+ "\n",
893
+ " # Define the encoder with the appropriate vocab_size and number of units\n",
894
+ " self.encoder = None\n",
895
+ "\n",
896
+ " # Define the decoder with the appropriate vocab_size and number of units\n",
897
+ " self.decoder = None\n",
898
+ "\n",
899
+ " ### END CODE HERE ###\n",
900
+ "\n",
901
+ " def call(self, inputs):\n",
902
+ " \"\"\"Forward pass of this layer\n",
903
+ "\n",
904
+ " Args:\n",
905
+ " inputs (tuple(tf.Tensor, tf.Tensor)): Tuple containing the context (sentence to translate) and the target (shifted-to-the-right translation)\n",
906
+ "\n",
907
+ " Returns:\n",
908
+ " tf.Tensor: The log_softmax probabilities of predicting a particular token\n",
909
+ " \"\"\"\n",
910
+ "\n",
911
+ " ### START CODE HERE ###\n",
912
+ "\n",
913
+ " # In this case inputs is a tuple consisting of the context and the target, unpack it into single variables\n",
914
+ " context, target = None\n",
915
+ "\n",
916
+ " # Pass the context through the encoder\n",
917
+ " encoded_context = None\n",
918
+ "\n",
919
+ " # Compute the logits by passing the encoded context and the target to the decoder\n",
920
+ " logits = None\n",
921
+ "\n",
922
+ " ### END CODE HERE ###\n",
923
+ "\n",
924
+ " return logits"
925
+ ]
926
+ },
927
+ {
928
+ "cell_type": "code",
929
+ "execution_count": null,
930
+ "id": "4d4a231c",
931
+ "metadata": {
932
+ "deletable": false,
933
+ "editable": false,
934
+ "tags": [
935
+ "graded"
936
+ ]
937
+ },
938
+ "outputs": [],
939
+ "source": [
940
+ "# Do a quick check of your implementation\n",
941
+ "\n",
942
+ "# Create an instance of your class\n",
943
+ "translator = Translator(VOCAB_SIZE, UNITS)\n",
944
+ "\n",
945
+ "# Compute the logits for every word in the vocabulary\n",
946
+ "logits = translator((to_translate, sr_translation))\n",
947
+ "\n",
948
+ "print(f'Tensor of sentences to translate has shape: {to_translate.shape}')\n",
949
+ "print(f'Tensor of right-shifted translations has shape: {sr_translation.shape}')\n",
950
+ "print(f'Tensor of logits has shape: {logits.shape}')"
951
+ ]
952
+ },
953
+ {
954
+ "cell_type": "markdown",
955
+ "id": "e3a162dd",
956
+ "metadata": {},
957
+ "source": [
958
+ "##### __Expected Output__\n",
959
+ "\n",
960
+ "```\n",
961
+ "Tensor of sentences to translate has shape: (64, 14)\n",
962
+ "Tensor of right-shifted translations has shape: (64, 15)\n",
963
+ "Tensor of logits has shape: (64, 15, 12000)\n",
964
+ "```"
965
+ ]
966
+ },
967
+ {
968
+ "cell_type": "code",
969
+ "execution_count": null,
970
+ "id": "37009022",
971
+ "metadata": {
972
+ "deletable": false,
973
+ "editable": false,
974
+ "tags": []
975
+ },
976
+ "outputs": [],
977
+ "source": [
978
+ "w1_unittest.test_translator(Translator, Encoder, Decoder)"
979
+ ]
980
+ },
981
+ {
982
+ "cell_type": "markdown",
983
+ "id": "f81bc228",
984
+ "metadata": {},
985
+ "source": [
986
+ "<a name=\"3\"></a>\n",
987
+ "## 3. Training\n",
988
+ "\n",
989
+ "Now that you have an untrained instance of the NMT model, it is time to train it. You can use the `compile_and_train` function below to achieve this:"
990
+ ]
991
+ },
992
+ {
993
+ "cell_type": "code",
994
+ "execution_count": null,
995
+ "id": "8a61ef65",
996
+ "metadata": {
997
+ "deletable": false,
998
+ "editable": false,
999
+ "tags": [
1000
+ "graded"
1001
+ ]
1002
+ },
1003
+ "outputs": [],
1004
+ "source": [
1005
+ "def compile_and_train(model, epochs=20, steps_per_epoch=500):\n",
1006
+ " model.compile(optimizer=\"adam\", loss=masked_loss, metrics=[masked_acc, masked_loss])\n",
1007
+ "\n",
1008
+ " history = model.fit(\n",
1009
+ " train_data.repeat(),\n",
1010
+ " epochs=epochs,\n",
1011
+ " steps_per_epoch=steps_per_epoch,\n",
1012
+ " validation_data=val_data,\n",
1013
+ " validation_steps=50,\n",
1014
+ " callbacks=[tf.keras.callbacks.EarlyStopping(patience=3)],\n",
1015
+ " )\n",
1016
+ "\n",
1017
+ " return model, history"
1018
+ ]
1019
+ },
1020
+ {
1021
+ "cell_type": "code",
1022
+ "execution_count": null,
1023
+ "id": "87d9bf9f",
1024
+ "metadata": {
1025
+ "deletable": false,
1026
+ "editable": false,
1027
+ "tags": []
1028
+ },
1029
+ "outputs": [],
1030
+ "source": [
1031
+ "# Train the translator (this takes some minutes so feel free to take a break)\n",
1032
+ "\n",
1033
+ "trained_translator, history = compile_and_train(translator)"
1034
+ ]
1035
+ },
1036
+ {
1037
+ "cell_type": "markdown",
1038
+ "id": "d23b9301",
1039
+ "metadata": {},
1040
+ "source": [
1041
+ "<a name=\"4\"></a>\n",
1042
+ "## 4. Using the model for inference \n",
1043
+ "\n",
1044
+ "\n",
1045
+ "Now that your model is trained you can use it for inference. To help you with this the `generate_next_token` function is provided. Notice that this function is meant to be used inside a for-loop, so you feed to it the information of the previous step to generate the information of the next step. In particular you need to keep track of the state of the pre-attention LSTM in the decoder and if you are done with the translation. Also notice that a `temperature` variable is introduced which determines how to select the next token given the predicted logits: "
1046
+ ]
1047
+ },
1048
+ {
1049
+ "cell_type": "code",
1050
+ "execution_count": null,
1051
+ "id": "522f6b6f",
1052
+ "metadata": {
1053
+ "deletable": false,
1054
+ "editable": false,
1055
+ "tags": [
1056
+ "graded"
1057
+ ]
1058
+ },
1059
+ "outputs": [],
1060
+ "source": [
1061
+ "def generate_next_token(decoder, context, next_token, done, state, temperature=0.0):\n",
1062
+ " \"\"\"Generates the next token in the sequence\n",
1063
+ "\n",
1064
+ " Args:\n",
1065
+ " decoder (Decoder): The decoder\n",
1066
+ " context (tf.Tensor): Encoded sentence to translate\n",
1067
+ " next_token (tf.Tensor): The predicted next token\n",
1068
+ " done (bool): True if the translation is complete\n",
1069
+ " state (list[tf.Tensor, tf.Tensor]): Hidden states of the pre-attention LSTM layer\n",
1070
+ " temperature (float, optional): The temperature that controls the randomness of the predicted tokens. Defaults to 0.0.\n",
1071
+ "\n",
1072
+ " Returns:\n",
1073
+ " tuple(tf.Tensor, np.float, list[tf.Tensor, tf.Tensor], bool): The next token, log prob of said token, hidden state of LSTM and if translation is done\n",
1074
+ " \"\"\"\n",
1075
+ " # Get the logits and state from the decoder\n",
1076
+ " logits, state = decoder(context, next_token, state=state, return_state=True)\n",
1077
+ " \n",
1078
+ " # Trim the intermediate dimension \n",
1079
+ " logits = logits[:, -1, :]\n",
1080
+ " \n",
1081
+ " # If temp is 0 then next_token is the argmax of logits\n",
1082
+ " if temperature == 0.0:\n",
1083
+ " next_token = tf.argmax(logits, axis=-1)\n",
1084
+ " \n",
1085
+ " # If temp is not 0 then next_token is sampled out of logits\n",
1086
+ " else:\n",
1087
+ " logits = logits / temperature\n",
1088
+ " next_token = tf.random.categorical(logits, num_samples=1)\n",
1089
+ " \n",
1090
+ " # Trim dimensions of size 1\n",
1091
+ " logits = tf.squeeze(logits)\n",
1092
+ " next_token = tf.squeeze(next_token)\n",
1093
+ " \n",
1094
+ " # Get the logit of the selected next_token\n",
1095
+ " logit = logits[next_token].numpy()\n",
1096
+ " \n",
1097
+ " # Reshape to (1,1) since this is the expected shape for text encoded as TF tensors\n",
1098
+ " next_token = tf.reshape(next_token, shape=(1,1))\n",
1099
+ " \n",
1100
+ " # If next_token is End-of-Sentence token you are done\n",
1101
+ " if next_token == eos_id:\n",
1102
+ " done = True\n",
1103
+ " \n",
1104
+ " return next_token, logit, state, done"
1105
+ ]
1106
+ },
1107
+ {
1108
+ "cell_type": "markdown",
1109
+ "id": "190d2d76",
1110
+ "metadata": {},
1111
+ "source": [
1112
+ "See how it works by running the following cell:"
1113
+ ]
1114
+ },
1115
+ {
1116
+ "cell_type": "code",
1117
+ "execution_count": null,
1118
+ "id": "9937547a",
1119
+ "metadata": {
1120
+ "deletable": false,
1121
+ "editable": false,
1122
+ "tags": [
1123
+ "graded"
1124
+ ]
1125
+ },
1126
+ "outputs": [],
1127
+ "source": [
1128
+ "# PROCESS SENTENCE TO TRANSLATE AND ENCODE\n",
1129
+ "\n",
1130
+ "# A sentence you wish to translate\n",
1131
+ "eng_sentence = \"I love languages\"\n",
1132
+ "\n",
1133
+ "# Convert it to a tensor\n",
1134
+ "texts = tf.convert_to_tensor(eng_sentence)[tf.newaxis]\n",
1135
+ "\n",
1136
+ "# Vectorize it and pass it through the encoder\n",
1137
+ "context = english_vectorizer(texts).to_tensor()\n",
1138
+ "context = encoder(context)\n",
1139
+ "\n",
1140
+ "# SET STATE OF THE DECODER\n",
1141
+ "\n",
1142
+ "# Next token is Start-of-Sentence since you are starting fresh\n",
1143
+ "next_token = tf.fill((1,1), sos_id)\n",
1144
+ "\n",
1145
+ "# Hidden and Cell states of the LSTM can be mocked using uniform samples\n",
1146
+ "state = [tf.random.uniform((1, UNITS)), tf.random.uniform((1, UNITS))]\n",
1147
+ "\n",
1148
+ "# You are not done until next token is EOS token\n",
1149
+ "done = False\n",
1150
+ "\n",
1151
+ "# Generate next token\n",
1152
+ "next_token, logit, state, done = generate_next_token(decoder, context, next_token, done, state, temperature=0.5)\n",
1153
+ "print(f\"Next token: {next_token}\\nLogit: {logit:.4f}\\nDone? {done}\")"
1154
+ ]
1155
+ },
1156
+ {
1157
+ "cell_type": "markdown",
1158
+ "id": "170323dd",
1159
+ "metadata": {},
1160
+ "source": [
1161
+ "<a name=\"ex5\"></a>\n",
1162
+ "## Exercise 5 - translate\n",
1163
+ "\n",
1164
+ "Now you can put everything together to translate a given sentence. For this, complete the `translate` function below. This function will take care of the following steps: \n",
1165
+ "- Process the sentence to translate and encode it\n",
1166
+ "\n",
1167
+ "+ Set the initial state of the decoder\n",
1168
+ "\n",
1169
+ "- Get predictions of the next token (starting with the \\<SOS> token) for a maximum of iterations (in case the \\<EOS> token is never returned)\n",
1170
+ " \n",
1171
+ "+ Return the translated text (as a string), the logit of the last iteration (this helps measure how certain was that the sequence was translated in its totality) and the translation in token format.\n",
1172
+ "\n",
1173
+ "\n",
1174
+ "Hints: \n",
1175
+ "\n",
1176
+ "- The previous cell provides a lot of insights on how this function should work, so if you get stuck refer to it.\n",
1177
+ "\n",
1178
+ "+ Some useful docs:\n",
1179
+ " + [tf.newaxis](https://www.tensorflow.org/api_docs/python/tf#newaxis)\n",
1180
+ "\n",
1181
+ " - [tf.fill](https://www.tensorflow.org/api_docs/python/tf/fill)\n",
1182
+ "\n",
1183
+ " + [tf.zeros](https://www.tensorflow.org/api_docs/python/tf/zeros)\n",
1184
+ "\n",
1185
+ "\n",
1186
+ "**IMPORTANT NOTE**: Due to randomness processes involving tensorflow training and weight initializing, the results below may vary a lot, even if you retrain your model in the same session. \n"
1187
+ ]
1188
+ },
1189
+ {
1190
+ "cell_type": "code",
1191
+ "execution_count": null,
1192
+ "id": "42c74f1f",
1193
+ "metadata": {
1194
+ "deletable": false,
1195
+ "tags": [
1196
+ "graded"
1197
+ ]
1198
+ },
1199
+ "outputs": [],
1200
+ "source": [
1201
+ "# GRADED FUNCTION: translate\n",
1202
+ "def translate(model, text, max_length=50, temperature=0.0):\n",
1203
+ " \"\"\"Translate a given sentence from English to Portuguese\n",
1204
+ "\n",
1205
+ " Args:\n",
1206
+ " model (tf.keras.Model): The trained translator\n",
1207
+ " text (string): The sentence to translate\n",
1208
+ " max_length (int, optional): The maximum length of the translation. Defaults to 50.\n",
1209
+ " temperature (float, optional): The temperature that controls the randomness of the predicted tokens. Defaults to 0.0.\n",
1210
+ "\n",
1211
+ " Returns:\n",
1212
+ " tuple(str, np.float, tf.Tensor): The translation, logit that predicted <EOS> token and the tokenized translation\n",
1213
+ " \"\"\"\n",
1214
+ " # Lists to save tokens and logits\n",
1215
+ " tokens, logits = [], []\n",
1216
+ "\n",
1217
+ " ### START CODE HERE ###\n",
1218
+ " \n",
1219
+ " # PROCESS THE SENTENCE TO TRANSLATE\n",
1220
+ " \n",
1221
+ " # Convert the original string into a tensor\n",
1222
+ " text = tf.None(None)[tf.None]\n",
1223
+ " \n",
1224
+ " # Vectorize the text using the correct vectorizer\n",
1225
+ " context = None(None).to_tensor()\n",
1226
+ " \n",
1227
+ " # Get the encoded context (pass the context through the encoder)\n",
1228
+ " # Hint: Remember you can get the encoder by using model.encoder\n",
1229
+ " context = None.None(None)\n",
1230
+ " \n",
1231
+ " # INITIAL STATE OF THE DECODER\n",
1232
+ " \n",
1233
+ " # First token should be SOS token with shape (1,1)\n",
1234
+ " next_token = tf.None((None, None), None)\n",
1235
+ " \n",
1236
+ " # Initial hidden and cell states should be tensors of zeros with shape (1, UNITS)\n",
1237
+ " state = [tf.None((None, None)), tf.None((None, None))]\n",
1238
+ " \n",
1239
+ " # You are done when you draw a EOS token as next token (initial state is False)\n",
1240
+ " done = None\n",
1241
+ "\n",
1242
+ " # Iterate for max_length iterations\n",
1243
+ " for None in None(None):\n",
1244
+ " # Generate the next token\n",
1245
+ " try:\n",
1246
+ " next_token, logit, state, done = None(\n",
1247
+ " decoder=None,\n",
1248
+ " context=None,\n",
1249
+ " next_token=None,\n",
1250
+ " done=None,\n",
1251
+ " state=None,\n",
1252
+ " temperature=None\n",
1253
+ " )\n",
1254
+ " except:\n",
1255
+ " raise Exception(\"Problem generating the next token\")\n",
1256
+ " \n",
1257
+ " # If done then break out of the loop\n",
1258
+ " if None:\n",
1259
+ " None\n",
1260
+ " \n",
1261
+ " # Add next_token to the list of tokens\n",
1262
+ " None.None(None)\n",
1263
+ " \n",
1264
+ " # Add logit to the list of logits\n",
1265
+ " None.None(None)\n",
1266
+ " \n",
1267
+ " ### END CODE HERE ###\n",
1268
+ " \n",
1269
+ " # Concatenate all tokens into a tensor\n",
1270
+ " tokens = tf.concat(tokens, axis=-1)\n",
1271
+ " \n",
1272
+ " # Convert the translated tokens into text\n",
1273
+ " translation = tf.squeeze(tokens_to_text(tokens, id_to_word))\n",
1274
+ " translation = translation.numpy().decode()\n",
1275
+ " \n",
1276
+ " return translation, logits[-1], tokens"
1277
+ ]
1278
+ },
1279
+ {
1280
+ "cell_type": "markdown",
1281
+ "id": "3525e8ba",
1282
+ "metadata": {},
1283
+ "source": [
1284
+ "Try your function with temperature of 0, which will yield a deterministic output and is equivalent to a greedy decoding:"
1285
+ ]
1286
+ },
1287
+ {
1288
+ "cell_type": "code",
1289
+ "execution_count": null,
1290
+ "id": "daaea8c5",
1291
+ "metadata": {
1292
+ "deletable": false,
1293
+ "editable": false,
1294
+ "tags": []
1295
+ },
1296
+ "outputs": [],
1297
+ "source": [
1298
+ "# Running this cell multiple times should return the same output since temp is 0\n",
1299
+ "\n",
1300
+ "temp = 0.0 \n",
1301
+ "original_sentence = \"I love languages\"\n",
1302
+ "\n",
1303
+ "translation, logit, tokens = translate(trained_translator, original_sentence, temperature=temp)\n",
1304
+ "\n",
1305
+ "print(f\"Temperature: {temp}\\n\\nOriginal sentence: {original_sentence}\\nTranslation: {translation}\\nTranslation tokens:{tokens}\\nLogit: {logit:.3f}\")"
1306
+ ]
1307
+ },
1308
+ {
1309
+ "cell_type": "markdown",
1310
+ "id": "7d05129b",
1311
+ "metadata": {},
1312
+ "source": [
1313
+ "Try your function with temperature of 0.7 (stochastic output):"
1314
+ ]
1315
+ },
1316
+ {
1317
+ "cell_type": "code",
1318
+ "execution_count": null,
1319
+ "id": "0e0697db",
1320
+ "metadata": {
1321
+ "deletable": false,
1322
+ "editable": false,
1323
+ "tags": []
1324
+ },
1325
+ "outputs": [],
1326
+ "source": [
1327
+ "# Running this cell multiple times should return different outputs since temp is not 0\n",
1328
+ "# You can try different temperatures\n",
1329
+ "\n",
1330
+ "temp = 0.7\n",
1331
+ "original_sentence = \"I love languages\"\n",
1332
+ "\n",
1333
+ "translation, logit, tokens = translate(trained_translator, original_sentence, temperature=temp)\n",
1334
+ "\n",
1335
+ "print(f\"Temperature: {temp}\\n\\nOriginal sentence: {original_sentence}\\nTranslation: {translation}\\nTranslation tokens:{tokens}\\nLogit: {logit:.3f}\")"
1336
+ ]
1337
+ },
1338
+ {
1339
+ "cell_type": "code",
1340
+ "execution_count": null,
1341
+ "id": "a3a9ea35",
1342
+ "metadata": {
1343
+ "deletable": false,
1344
+ "editable": false,
1345
+ "tags": []
1346
+ },
1347
+ "outputs": [],
1348
+ "source": [
1349
+ "w1_unittest.test_translate(translate, trained_translator)"
1350
+ ]
1351
+ },
1352
+ {
1353
+ "cell_type": "markdown",
1354
+ "id": "ba027524",
1355
+ "metadata": {},
1356
+ "source": [
1357
+ "<a name=\"5\"></a>\n",
1358
+ "## 5. Minimum Bayes-Risk Decoding\n",
1359
+ "\n",
1360
+ "As mentioned in the lectures, getting the most probable token at each step may not necessarily produce the best results. Another approach is to do Minimum Bayes Risk Decoding or MBR. The general steps to implement this are:\n",
1361
+ "\n",
1362
+ "- Take several random samples\n",
1363
+ "+ Score each sample against all other samples\n",
1364
+ "- Select the one with the highest score\n",
1365
+ "\n",
1366
+ "You will be building helper functions for these steps in the following sections.\n",
1367
+ "\n",
1368
+ "With the ability to generate different translations by setting different temperature values you can do what you saw in the lectures and generate a bunch of translations and then determine which one is the best candidate. You will now do this by using the provided `generate_samples` function. This function will return any desired number of candidate translations alongside the log-probability for each one:"
1369
+ ]
1370
+ },
1371
+ {
1372
+ "cell_type": "code",
1373
+ "execution_count": null,
1374
+ "id": "62301cd5",
1375
+ "metadata": {
1376
+ "deletable": false,
1377
+ "editable": false,
1378
+ "tags": [
1379
+ "graded"
1380
+ ]
1381
+ },
1382
+ "outputs": [],
1383
+ "source": [
1384
+ "def generate_samples(model, text, n_samples=4, temperature=0.6):\n",
1385
+ " \n",
1386
+ " samples, log_probs = [], []\n",
1387
+ "\n",
1388
+ " # Iterate for n_samples iterations\n",
1389
+ " for _ in range(n_samples):\n",
1390
+ " \n",
1391
+ " # Save the logit and the translated tensor\n",
1392
+ " _, logp, sample = translate(model, text, temperature=temperature)\n",
1393
+ " \n",
1394
+ " # Save the translated tensors\n",
1395
+ " samples.append(np.squeeze(sample.numpy()).tolist())\n",
1396
+ " \n",
1397
+ " # Save the logits\n",
1398
+ " log_probs.append(logp)\n",
1399
+ " \n",
1400
+ " return samples, log_probs"
1401
+ ]
1402
+ },
1403
+ {
1404
+ "cell_type": "code",
1405
+ "execution_count": null,
1406
+ "id": "06bd792c",
1407
+ "metadata": {
1408
+ "deletable": false,
1409
+ "editable": false,
1410
+ "tags": []
1411
+ },
1412
+ "outputs": [],
1413
+ "source": [
1414
+ "samples, log_probs = generate_samples(trained_translator, 'I love languages')\n",
1415
+ "\n",
1416
+ "for s, l in zip(samples, log_probs):\n",
1417
+ " print(f\"Translated tensor: {s} has logit: {l:.3f}\")"
1418
+ ]
1419
+ },
1420
+ {
1421
+ "cell_type": "markdown",
1422
+ "id": "29b10677",
1423
+ "metadata": {},
1424
+ "source": [
1425
+ "## Comparing overlaps\n",
1426
+ "\n",
1427
+ "Now that you can generate multiple translations it is time to come up with a method to measure the goodness of each one. As you saw in the lectures, one way to achieve this is by comparing each sample against the others. \n",
1428
+ "\n",
1429
+ "There are several metrics you can use for this purpose, as shown in the lectures and you can try experimenting with any one of these. For this assignment, you will be calculating scores for **unigram overlaps**. \n",
1430
+ "\n",
1431
+ "One of these metrics is the widely used yet simple [Jaccard similarity](https://en.wikipedia.org/wiki/Jaccard_index) which gets the intersection over union of two sets. The `jaccard_similarity` function returns this metric for any pair of candidate and reference translations:\n"
1432
+ ]
1433
+ },
1434
+ {
1435
+ "cell_type": "code",
1436
+ "execution_count": null,
1437
+ "id": "edb54a71",
1438
+ "metadata": {
1439
+ "deletable": false,
1440
+ "editable": false,
1441
+ "tags": [
1442
+ "graded"
1443
+ ]
1444
+ },
1445
+ "outputs": [],
1446
+ "source": [
1447
+ "def jaccard_similarity(candidate, reference):\n",
1448
+ " \n",
1449
+ " # Convert the lists to sets to get the unique tokens\n",
1450
+ " candidate_set = set(candidate)\n",
1451
+ " reference_set = set(reference)\n",
1452
+ " \n",
1453
+ " # Get the set of tokens common to both candidate and reference\n",
1454
+ " common_tokens = candidate_set.intersection(reference_set)\n",
1455
+ " \n",
1456
+ " # Get the set of all tokens found in either candidate or reference\n",
1457
+ " all_tokens = candidate_set.union(reference_set)\n",
1458
+ " \n",
1459
+ " # Compute the percentage of overlap (divide the number of common tokens by the number of all tokens)\n",
1460
+ " overlap = len(common_tokens) / len(all_tokens)\n",
1461
+ " \n",
1462
+ " return overlap"
1463
+ ]
1464
+ },
1465
+ {
1466
+ "cell_type": "code",
1467
+ "execution_count": null,
1468
+ "id": "fc3384bf",
1469
+ "metadata": {
1470
+ "deletable": false,
1471
+ "editable": false,
1472
+ "tags": [
1473
+ "graded"
1474
+ ]
1475
+ },
1476
+ "outputs": [],
1477
+ "source": [
1478
+ "l1 = [1, 2, 3]\n",
1479
+ "l2 = [1, 2, 3, 4]\n",
1480
+ "\n",
1481
+ "js = jaccard_similarity(l1, l2)\n",
1482
+ "\n",
1483
+ "print(f\"jaccard similarity between lists: {l1} and {l2} is {js:.3f}\")"
1484
+ ]
1485
+ },
1486
+ {
1487
+ "cell_type": "markdown",
1488
+ "id": "a6997662",
1489
+ "metadata": {},
1490
+ "source": [
1491
+ "##### __Expected Output__\n",
1492
+ "\n",
1493
+ "```\n",
1494
+ "jaccard similarity between tensors: [1, 2, 3] and [1, 2, 3, 4] is 0.750\n",
1495
+ "\n",
1496
+ "```"
1497
+ ]
1498
+ },
1499
+ {
1500
+ "cell_type": "markdown",
1501
+ "id": "b2510e3d",
1502
+ "metadata": {},
1503
+ "source": [
1504
+ "<a name=\"ex6\"></a>\n",
1505
+ "## Exercise 6 - rouge1_similarity\n",
1506
+ "\n",
1507
+ "Jaccard similarity is good but a more commonly used metric in machine translation is the ROUGE score. For unigrams, this is called ROUGE-1 and as shown in the lectures, you can output the scores for both precision and recall when comparing two samples. To get the final score, you will want to compute the F1-score as given by:\n",
1508
+ "\n",
1509
+ "$$score = 2* \\frac{(precision * recall)}{(precision + recall)}$$\n",
1510
+ "\n",
1511
+ "For the implementation of the `rouge1_similarity` function you want to use the [Counter](https://docs.python.org/3/library/collections.html#collections.Counter) class from the Python standard library:"
1512
+ ]
1513
+ },
1514
+ {
1515
+ "cell_type": "code",
1516
+ "execution_count": null,
1517
+ "id": "fb2e0a00",
1518
+ "metadata": {
1519
+ "deletable": false,
1520
+ "tags": [
1521
+ "graded"
1522
+ ]
1523
+ },
1524
+ "outputs": [],
1525
+ "source": [
1526
+ "# GRADED FUNCTION: rouge1_similarity\n",
1527
+ "def rouge1_similarity(candidate, reference):\n",
1528
+ " \"\"\"Computes the ROUGE 1 score between two token lists\n",
1529
+ "\n",
1530
+ " Args:\n",
1531
+ " candidate (list[int]): Tokenized candidate translation\n",
1532
+ " reference (list[int]): Tokenized reference translation\n",
1533
+ "\n",
1534
+ " Returns:\n",
1535
+ " float: Overlap between the two token lists\n",
1536
+ " \"\"\"\n",
1537
+ " ### START CODE HERE ###\n",
1538
+ " \n",
1539
+ " # Make a frequency table of the candidate and reference tokens\n",
1540
+ " # Hint: use the Counter class (already imported)\n",
1541
+ " candidate_word_counts = None\n",
1542
+ " reference_word_counts = None\n",
1543
+ " \n",
1544
+ " # Initialize overlap at 0\n",
1545
+ " overlap = None\n",
1546
+ " \n",
1547
+ " # Iterate over the tokens in the candidate frequency table\n",
1548
+ " # Hint: Counter is a subclass of dict and you can get the keys \n",
1549
+ " # out of a dict using the keys method like this: dict.keys()\n",
1550
+ " for token in None:\n",
1551
+ " \n",
1552
+ " # Get the count of the current token in the candidate frequency table\n",
1553
+ " # Hint: You can access the counts of a token as you would access values of a dictionary\n",
1554
+ " token_count_candidate = None\n",
1555
+ " \n",
1556
+ " # Get the count of the current token in the reference frequency table\n",
1557
+ " # Hint: You can access the counts of a token as you would access values of a dictionary\n",
1558
+ " token_count_reference = None\n",
1559
+ " \n",
1560
+ " # Update the overlap by getting the minimum between the two token counts above\n",
1561
+ " overlap += None\n",
1562
+ " \n",
1563
+ " # Compute the precision\n",
1564
+ " # Hint: precision = overlap / (number of tokens in candidate list) \n",
1565
+ " precision = None\n",
1566
+ " \n",
1567
+ " # Compute the recall\n",
1568
+ " # Hint: recall = overlap / (number of tokens in reference list) \n",
1569
+ " recall = None\n",
1570
+ " \n",
1571
+ " if precision + recall != 0:\n",
1572
+ " # Compute the Rouge1 Score\n",
1573
+ " # Hint: This is equivalent to the F1 score\n",
1574
+ " f1_score = None\n",
1575
+ " \n",
1576
+ " return f1_score\n",
1577
+ " \n",
1578
+ " ### END CODE HERE ###\n",
1579
+ " \n",
1580
+ " return 0 # If precision + recall = 0 then return 0"
1581
+ ]
1582
+ },
1583
+ {
1584
+ "cell_type": "code",
1585
+ "execution_count": null,
1586
+ "id": "14bb5295",
1587
+ "metadata": {
1588
+ "deletable": false,
1589
+ "editable": false,
1590
+ "tags": [
1591
+ "graded"
1592
+ ]
1593
+ },
1594
+ "outputs": [],
1595
+ "source": [
1596
+ "l1 = [1, 2, 3]\n",
1597
+ "l2 = [1, 2, 3, 4]\n",
1598
+ "\n",
1599
+ "r1s = rouge1_similarity(l1, l2)\n",
1600
+ "\n",
1601
+ "print(f\"rouge 1 similarity between lists: {l1} and {l2} is {r1s:.3f}\")"
1602
+ ]
1603
+ },
1604
+ {
1605
+ "cell_type": "markdown",
1606
+ "id": "afb8c61a",
1607
+ "metadata": {},
1608
+ "source": [
1609
+ "##### __Expected Output__\n",
1610
+ "\n",
1611
+ "```\n",
1612
+ "rouge 1 similarity between lists: [1, 2, 3] and [1, 2, 3, 4] is 0.857\n",
1613
+ "\n",
1614
+ "```"
1615
+ ]
1616
+ },
1617
+ {
1618
+ "cell_type": "code",
1619
+ "execution_count": null,
1620
+ "id": "a680132e",
1621
+ "metadata": {
1622
+ "deletable": false,
1623
+ "editable": false,
1624
+ "tags": []
1625
+ },
1626
+ "outputs": [],
1627
+ "source": [
1628
+ "w1_unittest.test_rouge1_similarity(rouge1_similarity)"
1629
+ ]
1630
+ },
1631
+ {
1632
+ "cell_type": "markdown",
1633
+ "id": "aaf8a058",
1634
+ "metadata": {},
1635
+ "source": [
1636
+ "## Computing the Overall Score\n",
1637
+ "\n",
1638
+ "\n",
1639
+ "You will now build a function to generate the overall score for a particular sample. As mentioned in the lectures, you need to compare each sample with all other samples. For instance, if we generated 30 sentences, we will need to compare sentence 1 to sentences 2 through 30. Then, we compare sentence 2 to sentences 1 and 3 through 30, and so forth. At each step, we get the average score of all comparisons to get the overall score for a particular sample. To illustrate, these will be the steps to generate the scores of a 4-sample list.\n",
1640
+ "\n",
1641
+ "- Get similarity score between sample 1 and sample 2\n",
1642
+ "+ Get similarity score between sample 1 and sample 3\n",
1643
+ "- Get similarity score between sample 1 and sample 4\n",
1644
+ "+ Get average score of the first 3 steps. This will be the overall score of sample 1\n",
1645
+ "- Iterate and repeat until samples 1 to 4 have overall scores.\n",
1646
+ "\n",
1647
+ "\n",
1648
+ "The results will be stored in a dictionary for easy lookups.\n",
1649
+ "\n",
1650
+ "<a name=\"ex7\"></a>\n",
1651
+ "## Exercise 7 - average_overlap\n",
1652
+ "\n",
1653
+ "Complete the `average_overlap` function below which should implement the process described above:"
1654
+ ]
1655
+ },
1656
+ {
1657
+ "cell_type": "code",
1658
+ "execution_count": null,
1659
+ "id": "142264ff",
1660
+ "metadata": {
1661
+ "deletable": false,
1662
+ "tags": [
1663
+ "graded"
1664
+ ]
1665
+ },
1666
+ "outputs": [],
1667
+ "source": [
1668
+ "# GRADED FUNCTION: average_overlap\n",
1669
+ "def average_overlap(samples, similarity_fn):\n",
1670
+ " \"\"\"Computes the arithmetic mean of each candidate sentence in the samples\n",
1671
+ "\n",
1672
+ " Args:\n",
1673
+ " samples (list[list[int]]): Tokenized version of translated sentences\n",
1674
+ " similarity_fn (Function): Similarity function used to compute the overlap\n",
1675
+ "\n",
1676
+ " Returns:\n",
1677
+ " dict[int, float]: A dictionary mapping the index of each translation to its score\n",
1678
+ " \"\"\"\n",
1679
+ " # Initialize dictionary\n",
1680
+ " scores = {}\n",
1681
+ " \n",
1682
+ " # Iterate through all samples (enumerate helps keep track of indexes)\n",
1683
+ " for index_candidate, candidate in enumerate(samples): \n",
1684
+ " \n",
1685
+ " ### START CODE HERE ###\n",
1686
+ " \n",
1687
+ " # Initially overlap is zero\n",
1688
+ " overlap = None\n",
1689
+ " \n",
1690
+ " # Iterate through all samples (enumerate helps keep track of indexes)\n",
1691
+ " for index_sample, sample in enumerate(samples):\n",
1692
+ "\n",
1693
+ " # Skip if the candidate index is the same as the sample index\n",
1694
+ " if None == None:\n",
1695
+ " None\n",
1696
+ " \n",
1697
+ " # Get the overlap between candidate and sample using the similarity function\n",
1698
+ " sample_overlap = None(None, None)\n",
1699
+ " \n",
1700
+ " # Add the sample overlap to the total overlap\n",
1701
+ " overlap += None\n",
1702
+ "\n",
1703
+ " ### END CODE HERE ###\n",
1704
+ " \n",
1705
+ " # Get the score for the candidate by computing the average\n",
1706
+ " score = overlap / (len(samples) - 1)\n",
1707
+ "\n",
1708
+ " # Only use 3 decimal points\n",
1709
+ " score = round(score, 3)\n",
1710
+ " \n",
1711
+ " # Save the score in the dictionary. use index as the key.\n",
1712
+ " scores[index_candidate] = score\n",
1713
+ " \n",
1714
+ " return scores"
1715
+ ]
1716
+ },
1717
+ {
1718
+ "cell_type": "code",
1719
+ "execution_count": null,
1720
+ "id": "f36cf403",
1721
+ "metadata": {
1722
+ "deletable": false,
1723
+ "editable": false,
1724
+ "tags": [
1725
+ "graded"
1726
+ ]
1727
+ },
1728
+ "outputs": [],
1729
+ "source": [
1730
+ "# Test with Jaccard similarity\n",
1731
+ "\n",
1732
+ "l1 = [1, 2, 3]\n",
1733
+ "l2 = [1, 2, 4]\n",
1734
+ "l3 = [1, 2, 4, 5]\n",
1735
+ "\n",
1736
+ "avg_ovlp = average_overlap([l1, l2, l3], jaccard_similarity)\n",
1737
+ "\n",
1738
+ "print(f\"average overlap between lists: {l1}, {l2} and {l3} using Jaccard similarity is:\\n\\n{avg_ovlp}\")"
1739
+ ]
1740
+ },
1741
+ {
1742
+ "cell_type": "markdown",
1743
+ "id": "e277aed2-a5c9-4ed0-9ee2-614939f2df7b",
1744
+ "metadata": {},
1745
+ "source": [
1746
+ "##### __Expected Output__\n",
1747
+ "\n",
1748
+ "```\n",
1749
+ "average overlap between lists: [1, 2, 3], [1, 2, 4] and [1, 2, 4, 5] using Jaccard similarity is:\n",
1750
+ "\n",
1751
+ "{0: 0.45, 1: 0.625, 2: 0.575}\n",
1752
+ "```"
1753
+ ]
1754
+ },
1755
+ {
1756
+ "cell_type": "code",
1757
+ "execution_count": null,
1758
+ "id": "d961a304-7c03-4ecb-ba5f-c8747ed3ec39",
1759
+ "metadata": {
1760
+ "deletable": false,
1761
+ "editable": false,
1762
+ "tags": [
1763
+ "graded"
1764
+ ]
1765
+ },
1766
+ "outputs": [],
1767
+ "source": [
1768
+ "# Test with Rouge1 similarity\n",
1769
+ "\n",
1770
+ "l1 = [1, 2, 3]\n",
1771
+ "l2 = [1, 4]\n",
1772
+ "l3 = [1, 2, 4, 5]\n",
1773
+ "l4 = [5,6]\n",
1774
+ "\n",
1775
+ "avg_ovlp = average_overlap([l1, l2, l3, l4], rouge1_similarity)\n",
1776
+ "\n",
1777
+ "print(f\"average overlap between lists: {l1}, {l2}, {l3} and {l4} using Rouge1 similarity is:\\n\\n{avg_ovlp}\")"
1778
+ ]
1779
+ },
1780
+ {
1781
+ "cell_type": "markdown",
1782
+ "id": "30adc749-ffcb-4e82-a8f0-c04a7e39da0a",
1783
+ "metadata": {},
1784
+ "source": [
1785
+ "##### __Expected Output__\n",
1786
+ "\n",
1787
+ "```\n",
1788
+ "average overlap between lists: [1, 2, 3], [1, 4], [1, 2, 4, 5] and [5, 6] using Rouge1 similarity is:\n",
1789
+ "\n",
1790
+ "{0: 0.324, 1: 0.356, 2: 0.524, 3: 0.111}\n",
1791
+ "```"
1792
+ ]
1793
+ },
1794
+ {
1795
+ "cell_type": "code",
1796
+ "execution_count": null,
1797
+ "id": "c41b1fba-fd0f-41e6-9b07-746f64030fe3",
1798
+ "metadata": {
1799
+ "deletable": false,
1800
+ "editable": false,
1801
+ "tags": []
1802
+ },
1803
+ "outputs": [],
1804
+ "source": [
1805
+ "w1_unittest.test_average_overlap(average_overlap)"
1806
+ ]
1807
+ },
1808
+ {
1809
+ "cell_type": "markdown",
1810
+ "id": "e4482249",
1811
+ "metadata": {},
1812
+ "source": [
1813
+ "In practice, it is also common to see the weighted mean being used to calculate the overall score instead of just the arithmetic mean. This is implemented in the `weighted_avg_overlap` function below and you can use it in your experiments to see which one will give better results:"
1814
+ ]
1815
+ },
1816
+ {
1817
+ "cell_type": "code",
1818
+ "execution_count": null,
1819
+ "id": "398714be",
1820
+ "metadata": {
1821
+ "deletable": false,
1822
+ "editable": false,
1823
+ "tags": [
1824
+ "graded"
1825
+ ]
1826
+ },
1827
+ "outputs": [],
1828
+ "source": [
1829
+ "def weighted_avg_overlap(samples, log_probs, similarity_fn):\n",
1830
+ " \n",
1831
+ " # Scores dictionary\n",
1832
+ " scores = {}\n",
1833
+ " \n",
1834
+ " # Iterate over the samples\n",
1835
+ " for index_candidate, candidate in enumerate(samples): \n",
1836
+ " \n",
1837
+ " # Initialize overlap and weighted sum\n",
1838
+ " overlap, weight_sum = 0.0, 0.0\n",
1839
+ " \n",
1840
+ " # Iterate over all samples and log probabilities\n",
1841
+ " for index_sample, (sample, logp) in enumerate(zip(samples, log_probs)):\n",
1842
+ "\n",
1843
+ " # Skip if the candidate index is the same as the sample index \n",
1844
+ " if index_candidate == index_sample:\n",
1845
+ " continue\n",
1846
+ " \n",
1847
+ " # Convert log probability to linear scale\n",
1848
+ " sample_p = float(np.exp(logp))\n",
1849
+ "\n",
1850
+ " # Update the weighted sum\n",
1851
+ " weight_sum += sample_p\n",
1852
+ "\n",
1853
+ " # Get the unigram overlap between candidate and sample\n",
1854
+ " sample_overlap = similarity_fn(candidate, sample)\n",
1855
+ " \n",
1856
+ " # Update the overlap\n",
1857
+ " overlap += sample_p * sample_overlap\n",
1858
+ " \n",
1859
+ " # Compute the score for the candidate\n",
1860
+ " score = overlap / weight_sum\n",
1861
+ "\n",
1862
+ " # Only use 3 decimal points\n",
1863
+ " score = round(score, 3)\n",
1864
+ " \n",
1865
+ " # Save the score in the dictionary. use index as the key.\n",
1866
+ " scores[index_candidate] = score\n",
1867
+ " \n",
1868
+ " return scores"
1869
+ ]
1870
+ },
1871
+ {
1872
+ "cell_type": "code",
1873
+ "execution_count": null,
1874
+ "id": "e3dfd6d3",
1875
+ "metadata": {
1876
+ "deletable": false,
1877
+ "editable": false,
1878
+ "tags": [
1879
+ "graded"
1880
+ ]
1881
+ },
1882
+ "outputs": [],
1883
+ "source": [
1884
+ "l1 = [1, 2, 3]\n",
1885
+ "l2 = [1, 2, 4]\n",
1886
+ "l3 = [1, 2, 4, 5]\n",
1887
+ "log_probs = [0.4, 0.2, 0.5]\n",
1888
+ "\n",
1889
+ "w_avg_ovlp = weighted_avg_overlap([l1, l2, l3], log_probs, jaccard_similarity)\n",
1890
+ "\n",
1891
+ "print(f\"weighted average overlap using Jaccard similarity is:\\n\\n{w_avg_ovlp}\")"
1892
+ ]
1893
+ },
1894
+ {
1895
+ "cell_type": "markdown",
1896
+ "id": "cdb0b4db",
1897
+ "metadata": {},
1898
+ "source": [
1899
+ "## mbr_decode\n",
1900
+ "\n",
1901
+ "You will now put everything together in the the `mbr_decode` function below. This final step is not graded as this function is just a wrapper around all the cool stuff you have coded so far! \n",
1902
+ "\n",
1903
+ "You can use it to play around, trying different numbers of samples, temperatures and similarity functions!"
1904
+ ]
1905
+ },
1906
+ {
1907
+ "cell_type": "code",
1908
+ "execution_count": null,
1909
+ "id": "6fcfa640",
1910
+ "metadata": {
1911
+ "deletable": false,
1912
+ "editable": false,
1913
+ "tags": [
1914
+ "graded"
1915
+ ]
1916
+ },
1917
+ "outputs": [],
1918
+ "source": [
1919
+ "def mbr_decode(model, text, n_samples=5, temperature=0.6, similarity_fn=jaccard_similarity):\n",
1920
+ " \n",
1921
+ " # Generate samples\n",
1922
+ " samples, log_probs = generate_samples(model, text, n_samples=n_samples, temperature=temperature)\n",
1923
+ " \n",
1924
+ " # Compute the overlap scores\n",
1925
+ " scores = weighted_avg_overlap(samples, log_probs, similarity_fn)\n",
1926
+ "\n",
1927
+ " # Decode samples\n",
1928
+ " decoded_translations = [tokens_to_text(s, id_to_word).numpy().decode('utf-8') for s in samples]\n",
1929
+ " \n",
1930
+ " # Find the key with the highest score\n",
1931
+ " max_score_key = max(scores, key=lambda k: scores[k])\n",
1932
+ " \n",
1933
+ " # Get the translation \n",
1934
+ " translation = decoded_translations[max_score_key]\n",
1935
+ " \n",
1936
+ " return translation, decoded_translations"
1937
+ ]
1938
+ },
1939
+ {
1940
+ "cell_type": "code",
1941
+ "execution_count": null,
1942
+ "id": "99507fcc-7727-45e7-933b-d3a08034f731",
1943
+ "metadata": {
1944
+ "deletable": false,
1945
+ "editable": false,
1946
+ "tags": []
1947
+ },
1948
+ "outputs": [],
1949
+ "source": [
1950
+ "english_sentence = \"I love languages\"\n",
1951
+ "\n",
1952
+ "translation, candidates = mbr_decode(trained_translator, english_sentence, n_samples=10, temperature=0.6)\n",
1953
+ "\n",
1954
+ "print(\"Translation candidates:\")\n",
1955
+ "for c in candidates:\n",
1956
+ " print(c)\n",
1957
+ "\n",
1958
+ "print(f\"\\nSelected translation: {translation}\")"
1959
+ ]
1960
+ },
1961
+ {
1962
+ "cell_type": "markdown",
1963
+ "id": "801b193f-4ea6-4ca1-ae29-a506cce656d9",
1964
+ "metadata": {},
1965
+ "source": [
1966
+ "**Congratulations!** Next week, you'll dive deeper into attention models and study the Transformer architecture. You will build another network but without the recurrent part. It will show that attention is all you need! It should be fun!\n",
1967
+ "\n",
1968
+ "**Keep up the good work!**"
1969
+ ]
1970
+ }
1971
+ ],
1972
+ "metadata": {
1973
+ "grader_version": "1",
1974
+ "kernelspec": {
1975
+ "display_name": "Python 3 (ipykernel)",
1976
+ "language": "python",
1977
+ "name": "python3"
1978
+ },
1979
+ "language_info": {
1980
+ "codemirror_mode": {
1981
+ "name": "ipython",
1982
+ "version": 3
1983
+ },
1984
+ "file_extension": ".py",
1985
+ "mimetype": "text/x-python",
1986
+ "name": "python",
1987
+ "nbconvert_exporter": "python",
1988
+ "pygments_lexer": "ipython3",
1989
+ "version": "3.10.11"
1990
+ }
1991
+ },
1992
+ "nbformat": 4,
1993
+ "nbformat_minor": 5
1994
+ }
NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/.ipynb_checkpoints/w1_unittest-checkpoint.py ADDED
@@ -0,0 +1,654 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import math
2
+ from itertools import combinations
3
+ import tensorflow as tf
4
+ import numpy as np
5
+ from dlai_grader.grading import test_case, print_feedback
6
+ from utils import train_data
7
+
8
+ VOCAB_SIZE = 12000
9
+ UNITS = 256
10
+
11
+
12
+ def test_encoder(encoder_to_test):
13
+ def g():
14
+ vocab_sizes = [5, 20, 1000, 15000]
15
+ units = [32, 64, 256, 512]
16
+
17
+ cases = []
18
+
19
+ vocab_size = 15000
20
+ n_units = 512
21
+ encoder = encoder_to_test(vocab_size, n_units)
22
+
23
+ t = test_case()
24
+ if encoder.embedding.mask_zero != True:
25
+ t.failed = True
26
+ t.msg = "Embedding layer has incorrect value for 'mask_zero' attribute"
27
+ t.want = True
28
+ t.got = encoder.embedding.mask_zero
29
+ cases.append(t)
30
+
31
+ for vs, u in zip(vocab_sizes, units):
32
+ encoder = encoder_to_test(vs, u)
33
+
34
+ t = test_case()
35
+ if encoder.embedding.input_dim != vs:
36
+ t.failed = True
37
+ t.msg = "Incorrect input dim of embedding layer"
38
+ t.want = vs
39
+ t.got = encoder.embedding.input_dim
40
+ cases.append(t)
41
+
42
+ t = test_case()
43
+ if encoder.embedding.output_dim != u:
44
+ t.failed = True
45
+ t.msg = "Incorrect output dim of embedding layer"
46
+ t.want = u
47
+ t.got = encoder.embedding.output_dim
48
+ cases.append(t)
49
+
50
+ t = test_case()
51
+ if not isinstance(encoder.rnn.layer, tf.keras.layers.LSTM):
52
+ t.failed = True
53
+ t.msg = "Incorrect type of layer inside Bidirectional"
54
+ t.want = tf.keras.layers.LSTM
55
+ t.got = type(encoder.rnn.layer)
56
+ return [t]
57
+
58
+ for u in units:
59
+ encoder = encoder_to_test(vocab_size, u)
60
+ t = test_case()
61
+ if encoder.rnn.layer.units != u:
62
+ t.failed = True
63
+ t.msg = "Incorrect number of units in LSTM layer"
64
+ t.want = u
65
+ t.got = encoder.rnn.layer.units
66
+ cases.append(t)
67
+
68
+ t = test_case()
69
+ if encoder.rnn.layer.return_sequences != True:
70
+ t.failed = True
71
+ t.msg = "LSTM layer has incorrect value for 'return_sequences' attribute"
72
+ t.want = True
73
+ t.got = encoder.rnn.layer.return_sequences
74
+ cases.append(t)
75
+
76
+ encoder = encoder_to_test(vocab_size, n_units)
77
+
78
+ for (to_translate, _), _ in train_data.take(3):
79
+ first_dim_in, second_dim_in = to_translate.shape
80
+ encoder_output = encoder(to_translate)
81
+ t = test_case()
82
+ if len(encoder_output.shape) != 3:
83
+ t.failed = True
84
+ t.msg = "Incorrect shape of encoder output"
85
+ t.want = "a shape with 3 dimensions"
86
+ t.got = encoder_output.shape
87
+ return [t]
88
+
89
+ first_dim_out, second_dim_out, third_dim_out = encoder_output.shape
90
+
91
+ t = test_case()
92
+ if first_dim_in != first_dim_out:
93
+ t.failed = True
94
+ t.msg = "Incorrect first dimension of encoder output"
95
+ t.want = first_dim_in
96
+ t.got = first_dim_out
97
+ cases.append(t)
98
+
99
+ t = test_case()
100
+ if second_dim_in != second_dim_out:
101
+ t.failed = True
102
+ t.msg = "Incorrect second dimension of encoder output"
103
+ t.want = second_dim_in
104
+ t.got = second_dim_out
105
+ cases.append(t)
106
+
107
+ t = test_case()
108
+ if third_dim_out != n_units:
109
+ t.failed = True
110
+ t.msg = "Incorrect third dimension of encoder output"
111
+ t.want = units
112
+ t.got = third_dim_out
113
+ cases.append(t)
114
+
115
+ return cases
116
+
117
+ cases = g()
118
+ print_feedback(cases)
119
+
120
+
121
+ def test_cross_attention(cross_attention_to_test):
122
+ def g():
123
+ units = [32, 64, 256, 512]
124
+
125
+ cases = []
126
+
127
+ n_units = 512
128
+ cross_attention = cross_attention_to_test(n_units)
129
+
130
+ t = test_case()
131
+ if not isinstance(cross_attention.mha, tf.keras.layers.MultiHeadAttention):
132
+ t.failed = True
133
+ t.msg = "Incorrect type of layer for Multi Head Attention"
134
+ t.want = tf.keras.layers.MultiHeadAttention
135
+ t.got = type(cross_attention.mha)
136
+ return [t]
137
+
138
+ # for u in units:
139
+ # cross_attention = cross_attention_to_test(u)
140
+
141
+ # t = test_case()
142
+ # if cross_attention.mha.key_dim != u:
143
+ # t.failed = True
144
+ # t.msg = "Incorrect key dim of Multi Head Attention layer"
145
+ # t.want = u
146
+ # t.got = cross_attention.mha.key_dim
147
+ # cases.append(t)
148
+
149
+ cross_attention = cross_attention_to_test(n_units)
150
+ embed = tf.keras.layers.Embedding(VOCAB_SIZE, output_dim=UNITS, mask_zero=True)
151
+
152
+ for (to_translate, sr_translation), _ in train_data.take(3):
153
+ sr_translation_embed = embed(sr_translation)
154
+ first_dim_in, second_dim_in, third_dim_in = sr_translation_embed.shape
155
+ dummy_encoder_output = np.random.rand(64, 14, 512)
156
+ cross_attention_output = cross_attention(
157
+ dummy_encoder_output, sr_translation_embed
158
+ )
159
+ # print(cross_attention_output.shape)
160
+
161
+ t = test_case()
162
+ if len(cross_attention_output.shape) != 3:
163
+ t.failed = True
164
+ t.msg = "Incorrect shape of cross_attention output"
165
+ t.want = "a shape with 3 dimensions"
166
+ t.got = cross_attention_output.shape
167
+ return [t]
168
+
169
+ first_dim_out, second_dim_out, third_dim_out = cross_attention_output.shape
170
+
171
+ t = test_case()
172
+ if first_dim_in != first_dim_out:
173
+ t.failed = True
174
+ t.msg = "Incorrect first dimension of cross_attention output"
175
+ t.want = first_dim_in
176
+ t.got = first_dim_out
177
+ cases.append(t)
178
+
179
+ t = test_case()
180
+ if second_dim_in != second_dim_out:
181
+ t.failed = True
182
+ t.msg = "Incorrect second dimension of cross_attention output"
183
+ t.want = second_dim_in
184
+ t.got = second_dim_out
185
+ cases.append(t)
186
+
187
+ t = test_case()
188
+ if third_dim_in != third_dim_out:
189
+ t.failed = True
190
+ t.msg = "Incorrect third dimension of cross_attention output"
191
+ t.want = third_dim_in
192
+ t.got = third_dim_out
193
+ cases.append(t)
194
+
195
+ _, n_heads, key_dim = cross_attention.mha.get_weights()[0].shape
196
+
197
+ t = test_case()
198
+ if n_heads != 1:
199
+ t.failed = True
200
+ t.msg = "Incorrect number of attention heads"
201
+ t.want = 1
202
+ t.got = n_heads
203
+ cases.append(t)
204
+
205
+ t = test_case()
206
+ if key_dim != n_units:
207
+ t.failed = True
208
+ t.msg = f"Incorrect size of query and key for every attention head when passing {n_units} units to the constructor"
209
+ t.want = n_units
210
+ t.got = key_dim
211
+ cases.append(t)
212
+
213
+ return cases
214
+
215
+ cases = g()
216
+ print_feedback(cases)
217
+
218
+
219
+ def test_decoder(decoder_to_test, CrossAttention):
220
+ def g():
221
+ vocab_sizes = [5, 20, 1000, 15000]
222
+ units = [32, 64, 256, 512]
223
+
224
+ cases = []
225
+
226
+ vocab_size = 10000
227
+ n_units = 512
228
+ decoder = decoder_to_test(vocab_size, n_units)
229
+
230
+ t = test_case()
231
+ if not isinstance(decoder.embedding, tf.keras.layers.Embedding):
232
+ t.failed = True
233
+ t.msg = "Incorrect type of embedding layer"
234
+ t.want = tf.keras.layers.Embedding
235
+ t.got = type(decoder.embedding)
236
+ return [t]
237
+
238
+ t = test_case()
239
+ if decoder.embedding.mask_zero != True:
240
+ t.failed = True
241
+ t.msg = "Embedding layer has incorrect value for 'mask_zero' attribute"
242
+ t.want = True
243
+ t.got = decoder.embedding.mask_zero
244
+ cases.append(t)
245
+
246
+ for vs, u in zip(vocab_sizes, units):
247
+ decoder = decoder_to_test(vs, u)
248
+
249
+ t = test_case()
250
+ if decoder.embedding.input_dim != vs:
251
+ t.failed = True
252
+ t.msg = "Incorrect input dim of embedding layer"
253
+ t.want = vs
254
+ t.got = decoder.embedding.input_dim
255
+ cases.append(t)
256
+
257
+ t = test_case()
258
+ if decoder.embedding.output_dim != u:
259
+ t.failed = True
260
+ t.msg = "Incorrect output dim of embedding layer"
261
+ t.want = u
262
+ t.got = decoder.embedding.output_dim
263
+ cases.append(t)
264
+
265
+ t = test_case()
266
+ if not isinstance(decoder.pre_attention_rnn, tf.keras.layers.LSTM):
267
+ t.failed = True
268
+ t.msg = "Incorrect type of pre_attention_rnn layer"
269
+ t.want = tf.keras.layers.LSTM
270
+ t.got = type(decoder.pre_attention_rnn)
271
+ return [t]
272
+
273
+ for u in units:
274
+ decoder = decoder_to_test(vocab_size, u)
275
+ t = test_case()
276
+ if decoder.pre_attention_rnn.units != u:
277
+ t.failed = True
278
+ t.msg = "Incorrect number of units in pre_attention_rnn layer"
279
+ t.want = u
280
+ t.got = decoder.pre_attention_rnn.units
281
+ cases.append(t)
282
+
283
+ # t = test_case()
284
+ # if decoder.attention.units != u:
285
+ # t.failed = True
286
+ # t.msg = "Incorrect number of units in attention layer"
287
+ # t.want = u
288
+ # t.got = decoder.attention.units
289
+ # cases.append(t)
290
+
291
+ t = test_case()
292
+ if decoder.post_attention_rnn.units != u:
293
+ t.failed = True
294
+ t.msg = "Incorrect number of units in post_attention_rnn layer"
295
+ t.want = u
296
+ t.got = decoder.post_attention_rnn.units
297
+ cases.append(t)
298
+
299
+ t = test_case()
300
+ if decoder.pre_attention_rnn.return_sequences != True:
301
+ t.failed = True
302
+ t.msg = "pre_attention_rnn layer has incorrect value for 'return_sequences' attribute"
303
+ t.want = True
304
+ t.got = decoder.pre_attention_rnn.return_sequences
305
+ cases.append(t)
306
+
307
+ t = test_case()
308
+ if decoder.pre_attention_rnn.return_state != True:
309
+ t.failed = True
310
+ t.msg = "pre_attention_rnn layer has incorrect value for 'return_state' attribute"
311
+ t.want = True
312
+ t.got = decoder.pre_attention_rnn.return_state
313
+ cases.append(t)
314
+
315
+ t = test_case()
316
+ if not isinstance(decoder.attention, CrossAttention):
317
+ t.failed = True
318
+ t.msg = "Incorrect type of attention layer"
319
+ t.want = CrossAttention
320
+ t.got = type(decoder.attention)
321
+ return [t]
322
+
323
+ t = test_case()
324
+ if decoder.post_attention_rnn.return_sequences != True:
325
+ t.failed = True
326
+ t.msg = "post_attention_rnn layer has incorrect value for 'return_sequences' attribute"
327
+ t.want = True
328
+ t.got = decoder.post_attention_rnn.return_sequences
329
+ cases.append(t)
330
+
331
+ t = test_case()
332
+ if not isinstance(decoder.post_attention_rnn, tf.keras.layers.LSTM):
333
+ t.failed = True
334
+ t.msg = "Incorrect type of pre_attention_rnn layer"
335
+ t.want = tf.keras.layers.LSTM
336
+ t.got = type(decoder.post_attention_rnn)
337
+ return [t]
338
+
339
+ t = test_case()
340
+ if not isinstance(decoder.output_layer, tf.keras.layers.Dense):
341
+ t.failed = True
342
+ t.msg = "Incorrect type of output_layer layer"
343
+ t.want = tf.keras.layers.Dense
344
+ t.got = type(decoder.output_layer)
345
+ return [t]
346
+
347
+ t = test_case()
348
+ if (
349
+ "log" not in decoder.output_layer.activation.__name__
350
+ or "softmax" not in decoder.output_layer.activation.__name__
351
+ ):
352
+ t.failed = True
353
+ t.msg = "output_layer layer has incorrect activation function"
354
+ t.want = "a log softmax activation function such as 'log_softmax_v2'"
355
+ t.got = decoder.output_layer.activation.__name__
356
+ cases.append(t)
357
+
358
+ vocab_size = 10000
359
+ n_units = 512
360
+ decoder = decoder_to_test(vocab_size, n_units)
361
+
362
+ for (_, sr_translation), _ in train_data.take(3):
363
+ encoder_output = np.random.rand(64, 15, 256)
364
+ decoder_output = decoder(encoder_output, sr_translation)
365
+
366
+ first_dim_in, second_dim_in = sr_translation.shape
367
+
368
+ t = test_case()
369
+ if len(decoder_output.shape) != 3:
370
+ t.failed = True
371
+ t.msg = "Incorrect shape of decoder output"
372
+ t.want = "a shape with 3 dimensions"
373
+ t.got = decoder_output.shape
374
+ return [t]
375
+
376
+ first_dim_out, second_dim_out, third_dim_out = decoder_output.shape
377
+
378
+ t = test_case()
379
+ if first_dim_in != first_dim_out:
380
+ t.failed = True
381
+ t.msg = "Incorrect first dimension of decoder output"
382
+ t.want = first_dim_in
383
+ t.got = first_dim_out
384
+ cases.append(t)
385
+
386
+ t = test_case()
387
+ if second_dim_in != second_dim_out:
388
+ t.failed = True
389
+ t.msg = "Incorrect second dimension of decoder output"
390
+ t.want = second_dim_in
391
+ t.got = second_dim_out
392
+ cases.append(t)
393
+
394
+ t = test_case()
395
+ if third_dim_out != vocab_size:
396
+ t.failed = True
397
+ t.msg = "Incorrect third dimension of decoder output"
398
+ t.want = vocab_size
399
+ t.got = third_dim_out
400
+ cases.append(t)
401
+ return cases
402
+
403
+ cases = g()
404
+ print_feedback(cases)
405
+
406
+
407
+ def test_translator(translator_to_test, Encoder, Decoder):
408
+ def g():
409
+ vocab_sizes = [5, 20, 1000, 15000]
410
+ units = [32, 64, 256, 512]
411
+
412
+ cases = []
413
+
414
+ vocab_size = 10000
415
+ n_units = 512
416
+ translator = translator_to_test(vocab_size, n_units)
417
+
418
+ t = test_case()
419
+ if not isinstance(translator.encoder, Encoder):
420
+ t.failed = True
421
+ t.msg = "Incorrect type of encoder layer"
422
+ t.want = Encoder
423
+ t.got = type(translator.encoder)
424
+ return [t]
425
+
426
+ t = test_case()
427
+ if not isinstance(translator.decoder, Decoder):
428
+ t.failed = True
429
+ t.msg = "Incorrect type of encoder layer"
430
+ t.want = Decoder
431
+ t.got = type(translator.decoder)
432
+ return [t]
433
+
434
+ translator = translator_to_test(vocab_size, n_units)
435
+
436
+ for (to_translate, sr_translation), _ in train_data.take(3):
437
+ first_dim_in, second_dim_in = sr_translation.shape
438
+ translator_output = translator((to_translate, sr_translation))
439
+ t = test_case()
440
+ if len(translator_output.shape) != 3:
441
+ t.failed = True
442
+ t.msg = "Incorrect shape of translator output"
443
+ t.want = "a shape with 3 dimensions"
444
+ t.got = translator_output.shape
445
+ return [t]
446
+
447
+ first_dim_out, second_dim_out, third_dim_out = translator_output.shape
448
+
449
+ t = test_case()
450
+ if first_dim_in != first_dim_out:
451
+ t.failed = True
452
+ t.msg = "Incorrect first dimension of translator output"
453
+ t.want = first_dim_in
454
+ t.got = first_dim_out
455
+ cases.append(t)
456
+
457
+ t = test_case()
458
+ if second_dim_in != second_dim_out:
459
+ t.failed = True
460
+ t.msg = "Incorrect second dimension of translator output"
461
+ t.want = second_dim_in
462
+ t.got = second_dim_out
463
+ cases.append(t)
464
+
465
+ t = test_case()
466
+ if third_dim_out != vocab_size:
467
+ t.failed = True
468
+ t.msg = "Incorrect third dimension of translator output"
469
+ t.want = vocab_size
470
+ t.got = third_dim_out
471
+ cases.append(t)
472
+
473
+ return cases
474
+
475
+ cases = g()
476
+ print_feedback(cases)
477
+
478
+
479
+
480
+ def test_translate(learner_func, model):
481
+ def g():
482
+
483
+ cases = []
484
+
485
+ txt = "Hi, my name is Younes"
486
+ try:
487
+ translation, logit, tokens = learner_func(model, txt, temperature=0.9)
488
+ except Exception as e:
489
+ t = test_case()
490
+ t.failed = True
491
+ t.msg = "There was an exception when running your function"
492
+ t.want = "No exceptions"
493
+ t.got = f"{str(e)}"
494
+ return [t]
495
+
496
+ txt = "Hi, my name is Alejandra"
497
+ translation, logit, tokens = learner_func(model, txt, temperature=0.0)
498
+
499
+ t = test_case()
500
+
501
+ if not isinstance(translation, str):
502
+ t.failed = True
503
+ t.msg = "'translation' has incorrect type"
504
+ t.want = str
505
+ t.got = type(translation)
506
+ cases.append(t)
507
+
508
+ if not isinstance(logit, np.number):
509
+ t.failed = True
510
+ t.msg = "'logit' has incorrect type"
511
+ t.want = np.number
512
+ t.got = type(logit)
513
+ cases.append(t)
514
+
515
+ if not isinstance(tokens, tf.Tensor):
516
+ t.failed = True
517
+ t.msg = "'tokens' has incorrect type"
518
+ t.want = tf.Tensor
519
+ t.got = type(tokens)
520
+ cases.append(t)
521
+
522
+ translation2, logit2, tokens2 = learner_func(model, txt, temperature=0.0)
523
+
524
+ t = test_case()
525
+ if translation != translation2:
526
+ t.failed = True
527
+ t.msg = "translate didn't return the same translation when using temperature of 0.0"
528
+ t.want = translation
529
+ t.got = translation2
530
+ cases.append(t)
531
+
532
+ t = test_case()
533
+ if logit != logit2:
534
+ t.failed = True
535
+ t.msg = "translate didn't return the same logit when using temperature of 0.0"
536
+ t.want = logit
537
+ t.got = logit2
538
+ cases.append(t)
539
+
540
+ t = test_case()
541
+ if not np.allclose(tokens, tokens2):
542
+ t.failed = True
543
+ t.msg = "translate didn't return the same tokens when using temperature of 0.0"
544
+ t.want = tokens
545
+ t.got = tokens2
546
+ cases.append(t)
547
+
548
+
549
+ return cases
550
+
551
+ cases = g()
552
+ print_feedback(cases)
553
+
554
+
555
+
556
+
557
+ def test_rouge1_similarity(learner_func):
558
+
559
+ def g():
560
+
561
+ tensors = [
562
+ [0],
563
+ [0, 1],
564
+ [0, 1, 2],
565
+ [1, 2, 4, 5],
566
+ [5, 5, 7, 0, 232]
567
+ ]
568
+
569
+ expected = [0.6666666666666666, 0.5, 0, 0.33333333333333337, 0.8, 0.3333333333333333, 0.28571428571428575, 0.5714285714285715, 0.25]
570
+
571
+ cases = []
572
+ pairs = list(combinations(tensors, 2))
573
+
574
+ for (candidate, reference), solution in zip(pairs, expected):
575
+ answer = learner_func(candidate, reference)
576
+ t = test_case()
577
+ if not math.isclose(answer, solution):
578
+ t.failed = True
579
+ t.msg = f"Incorrect similarity for candidate={candidate} and reference={reference}"
580
+ t.want = solution
581
+ t.got = answer
582
+ cases.append(t)
583
+
584
+ return cases
585
+
586
+ cases = g()
587
+ print_feedback(cases)
588
+
589
+
590
+ def test_average_overlap(learner_func):
591
+
592
+ def jaccard_similarity(candidate, reference):
593
+
594
+ # Convert the lists to sets to get the unique tokens
595
+ candidate_set = set(candidate)
596
+ reference_set = set(reference)
597
+
598
+ # Get the set of tokens common to both candidate and reference
599
+ common_tokens = candidate_set.intersection(reference_set)
600
+
601
+ # Get the set of all tokens found in either candidate or reference
602
+ all_tokens = candidate_set.union(reference_set)
603
+
604
+ # Compute the percentage of overlap (divide the number of common tokens by the number of all tokens)
605
+ overlap = len(common_tokens) / len(all_tokens)
606
+
607
+ return overlap
608
+
609
+ def g():
610
+
611
+ l1 = [1, 2, 3]
612
+ l2 = [1, 2, 4]
613
+ l3 = [1, 2, 4, 5]
614
+ l4 = [5,6]
615
+
616
+ elements = [l1, l2, l3, l4]
617
+
618
+ all_combinations = []
619
+
620
+ for r in range(2, len(elements) + 1):
621
+ # Generate combinations of length r
622
+ combinations_r = combinations(elements, r)
623
+
624
+ # Append the combinations to the result list
625
+ all_combinations.extend(combinations_r)
626
+
627
+ expected = [{0: 0.5, 1: 0.5},
628
+ {0: 0.4, 1: 0.4},
629
+ {0: 0.0, 1: 0.0},
630
+ {0: 0.75, 1: 0.75},
631
+ {0: 0.0, 1: 0.0},
632
+ {0: 0.2, 1: 0.2},
633
+ {0: 0.45, 1: 0.625, 2: 0.575},
634
+ {0: 0.25, 1: 0.25, 2: 0.0},
635
+ {0: 0.2, 1: 0.3, 2: 0.1},
636
+ {0: 0.375, 1: 0.475, 2: 0.1},
637
+ {0: 0.3, 1: 0.417, 2: 0.45, 3: 0.067}]
638
+
639
+ cases = []
640
+
641
+ for combination, solution in zip(all_combinations, expected):
642
+ answer = learner_func(combination, jaccard_similarity)
643
+ t = test_case()
644
+ if answer != solution:
645
+ t.failed = True
646
+ t.msg = f"Incorrect overlap for lists={combination}"
647
+ t.want = solution
648
+ t.got = answer
649
+ cases.append(t)
650
+
651
+ return cases
652
+
653
+ cases = g()
654
+ print_feedback(cases)
NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/C4W1_Assignment.ipynb ADDED
@@ -0,0 +1,2312 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "id": "9cb49525",
6
+ "metadata": {},
7
+ "source": [
8
+ "# Assignment 1: Neural Machine Translation\n",
9
+ "\n",
10
+ "Welcome to the first assignment of Course 4. Here, you will build an English-to-Portuguese neural machine translation (NMT) model using Long Short-Term Memory (LSTM) networks with attention. Machine translation is an important task in natural language processing and could be useful not only for translating one language to another but also for word sense disambiguation (e.g. determining whether the word \"bank\" refers to the financial bank, or the land alongside a river). Implementing this using just a Recurrent Neural Network (RNN) with LSTMs can work for short to medium length sentences but can result in vanishing gradients for very long sequences. To help with this, you will be adding an attention mechanism to allow the decoder to access all relevant parts of the input sentence regardless of its length. By completing this assignment, you will:\n",
11
+ "\n",
12
+ "- Implement an encoder-decoder system with attention\n",
13
+ "- Build the NMT model from scratch using Tensorflow\n",
14
+ "- Generate translations using greedy and Minimum Bayes Risk (MBR) decoding\n",
15
+ "\n",
16
+ "## Table of Contents\n",
17
+ "- [1 - Data Preparation](#1)\n",
18
+ "- [2 - NMT model with attention](#2)\n",
19
+ " - [Exercise 1 - Encoder](#ex1)\n",
20
+ " - [Exercise 2 - CrossAttention](#ex2)\n",
21
+ " - [Exercise 3 - Decoder](#ex3) \n",
22
+ " - [Exercise 4 - Translator](#ex4)\n",
23
+ "- [3 - Training](#3)\n",
24
+ "- [4 - Using the model for inference ](#4)\n",
25
+ " - [Exercise 5 - translate](#ex5)\n",
26
+ "- [5 - Minimum Bayes-Risk Decoding](#5)\n",
27
+ " - [Exercise 6 - rouge1_similarity](#ex6)\n",
28
+ " - [Exercise 7 - average_overlap](#ex7)\n"
29
+ ]
30
+ },
31
+ {
32
+ "cell_type": "code",
33
+ "execution_count": 1,
34
+ "id": "f9ef370d",
35
+ "metadata": {
36
+ "deletable": false,
37
+ "editable": false,
38
+ "tags": [
39
+ "graded"
40
+ ]
41
+ },
42
+ "outputs": [],
43
+ "source": [
44
+ "import os\n",
45
+ "os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # Setting this env variable prevents TF warnings from showing up\n",
46
+ "\n",
47
+ "import numpy as np\n",
48
+ "import tensorflow as tf\n",
49
+ "from collections import Counter\n",
50
+ "from utils import (sentences, train_data, val_data, english_vectorizer, portuguese_vectorizer, \n",
51
+ " masked_loss, masked_acc, tokens_to_text)"
52
+ ]
53
+ },
54
+ {
55
+ "cell_type": "code",
56
+ "execution_count": 2,
57
+ "id": "8adb8fd6",
58
+ "metadata": {
59
+ "deletable": false,
60
+ "editable": false,
61
+ "tags": []
62
+ },
63
+ "outputs": [],
64
+ "source": [
65
+ "import w1_unittest"
66
+ ]
67
+ },
68
+ {
69
+ "cell_type": "markdown",
70
+ "id": "e76be1dc",
71
+ "metadata": {},
72
+ "source": [
73
+ "<a name=\"1\"></a>\n",
74
+ "## 1. Data Preparation\n",
75
+ "\n",
76
+ "The text pre-processing bits have already been taken care of (if you are interested in this be sure to check the `utils.py` file). The steps performed can be summarized as:\n",
77
+ "\n",
78
+ "- Reading the raw data from the text files\n",
79
+ "- Cleaning the data (using lowercase, adding space around punctuation, trimming whitespaces, etc)\n",
80
+ "- Splitting it into training and validation sets\n",
81
+ "- Adding the start-of-sentence and end-of-sentence tokens to every sentence\n",
82
+ "- Tokenizing the sentences\n",
83
+ "- Creating a Tensorflow dataset out of the tokenized sentences\n",
84
+ "\n",
85
+ "Take a moment to inspect the raw sentences:"
86
+ ]
87
+ },
88
+ {
89
+ "cell_type": "code",
90
+ "execution_count": 3,
91
+ "id": "226033a1",
92
+ "metadata": {
93
+ "deletable": false,
94
+ "editable": false,
95
+ "tags": [
96
+ "graded"
97
+ ]
98
+ },
99
+ "outputs": [
100
+ {
101
+ "name": "stdout",
102
+ "output_type": "stream",
103
+ "text": [
104
+ "English (to translate) sentence:\n",
105
+ "\n",
106
+ "No matter how much you try to convince people that chocolate is vanilla, it'll still be chocolate, even though you may manage to convince yourself and a few others that it's vanilla.\n",
107
+ "\n",
108
+ "Portuguese (translation) sentence:\n",
109
+ "\n",
110
+ "Não importa o quanto você tenta convencer os outros de que chocolate é baunilha, ele ainda será chocolate, mesmo que você possa convencer a si mesmo e poucos outros de que é baunilha.\n"
111
+ ]
112
+ }
113
+ ],
114
+ "source": [
115
+ "portuguese_sentences, english_sentences = sentences\n",
116
+ "\n",
117
+ "print(f\"English (to translate) sentence:\\n\\n{english_sentences[-5]}\\n\")\n",
118
+ "print(f\"Portuguese (translation) sentence:\\n\\n{portuguese_sentences[-5]}\")"
119
+ ]
120
+ },
121
+ {
122
+ "cell_type": "markdown",
123
+ "id": "5ba90eb9",
124
+ "metadata": {},
125
+ "source": [
126
+ "You don't have much use for the raw sentences so delete them to save memory:"
127
+ ]
128
+ },
129
+ {
130
+ "cell_type": "code",
131
+ "execution_count": 4,
132
+ "id": "d9f081b0",
133
+ "metadata": {
134
+ "deletable": false,
135
+ "editable": false,
136
+ "tags": [
137
+ "graded"
138
+ ]
139
+ },
140
+ "outputs": [],
141
+ "source": [
142
+ "del portuguese_sentences\n",
143
+ "del english_sentences\n",
144
+ "del sentences"
145
+ ]
146
+ },
147
+ {
148
+ "cell_type": "markdown",
149
+ "id": "a2ff83d2",
150
+ "metadata": {},
151
+ "source": [
152
+ "Notice that you imported an `english_vectorizer` and a `portuguese_vectorizer` from `utils.py`. These were created using [tf.keras.layers.TextVectorization](https://www.tensorflow.org/api_docs/python/tf/keras/layers/TextVectorization) and they provide interesting features such as ways to visualize the vocabulary and convert text into tokenized ids and vice versa. In fact, you can inspect the first ten words of the vocabularies for both languages:"
153
+ ]
154
+ },
155
+ {
156
+ "cell_type": "code",
157
+ "execution_count": 5,
158
+ "id": "2c1cfc17",
159
+ "metadata": {
160
+ "deletable": false,
161
+ "editable": false,
162
+ "tags": [
163
+ "graded"
164
+ ]
165
+ },
166
+ "outputs": [
167
+ {
168
+ "name": "stdout",
169
+ "output_type": "stream",
170
+ "text": [
171
+ "First 10 words of the english vocabulary:\n",
172
+ "\n",
173
+ "['', '[UNK]', '[SOS]', '[EOS]', '.', 'tom', 'i', 'to', 'you', 'the']\n",
174
+ "\n",
175
+ "First 10 words of the portuguese vocabulary:\n",
176
+ "\n",
177
+ "['', '[UNK]', '[SOS]', '[EOS]', '.', 'tom', 'que', 'o', 'nao', 'eu']\n"
178
+ ]
179
+ }
180
+ ],
181
+ "source": [
182
+ "print(f\"First 10 words of the english vocabulary:\\n\\n{english_vectorizer.get_vocabulary()[:10]}\\n\")\n",
183
+ "print(f\"First 10 words of the portuguese vocabulary:\\n\\n{portuguese_vectorizer.get_vocabulary()[:10]}\")"
184
+ ]
185
+ },
186
+ {
187
+ "cell_type": "markdown",
188
+ "id": "3152b075",
189
+ "metadata": {},
190
+ "source": [
191
+ "Notice that the first 4 words are reserved for special words. In order, these are:\n",
192
+ "\n",
193
+ "- the empty string\n",
194
+ "- a special token to represent an unknown word\n",
195
+ "- a special token to represent the start of a sentence\n",
196
+ "- a special token to represent the end of a sentence\n",
197
+ "\n",
198
+ "You can see how many words are in a vocabulary by using the `vocabulary_size` method:"
199
+ ]
200
+ },
201
+ {
202
+ "cell_type": "code",
203
+ "execution_count": 6,
204
+ "id": "5facaa0c",
205
+ "metadata": {
206
+ "deletable": false,
207
+ "editable": false,
208
+ "slideshow": {
209
+ "slide_type": ""
210
+ },
211
+ "tags": [
212
+ "graded"
213
+ ]
214
+ },
215
+ "outputs": [
216
+ {
217
+ "name": "stdout",
218
+ "output_type": "stream",
219
+ "text": [
220
+ "Portuguese vocabulary is made up of 12000 words\n",
221
+ "English vocabulary is made up of 12000 words\n"
222
+ ]
223
+ }
224
+ ],
225
+ "source": [
226
+ "# Size of the vocabulary\n",
227
+ "vocab_size_por = portuguese_vectorizer.vocabulary_size()\n",
228
+ "vocab_size_eng = english_vectorizer.vocabulary_size()\n",
229
+ "\n",
230
+ "print(f\"Portuguese vocabulary is made up of {vocab_size_por} words\")\n",
231
+ "print(f\"English vocabulary is made up of {vocab_size_eng} words\")"
232
+ ]
233
+ },
234
+ {
235
+ "cell_type": "markdown",
236
+ "id": "53e4b615",
237
+ "metadata": {
238
+ "slideshow": {
239
+ "slide_type": ""
240
+ },
241
+ "tags": []
242
+ },
243
+ "source": [
244
+ "You can define [tf.keras.layers.StringLookup](https://www.tensorflow.org/api_docs/python/tf/keras/layers/StringLookup) objects that will help you map from words to ids and vice versa. Do this for the portuguese vocabulary since this will be useful later on when you decode the predictions from your model:"
245
+ ]
246
+ },
247
+ {
248
+ "cell_type": "code",
249
+ "execution_count": 7,
250
+ "id": "218f7a36",
251
+ "metadata": {
252
+ "deletable": false,
253
+ "editable": false,
254
+ "tags": [
255
+ "graded"
256
+ ]
257
+ },
258
+ "outputs": [],
259
+ "source": [
260
+ "# This helps you convert from words to ids\n",
261
+ "word_to_id = tf.keras.layers.StringLookup(\n",
262
+ " vocabulary=portuguese_vectorizer.get_vocabulary(), \n",
263
+ " mask_token=\"\", \n",
264
+ " oov_token=\"[UNK]\"\n",
265
+ ")\n",
266
+ "\n",
267
+ "# This helps you convert from ids to words\n",
268
+ "id_to_word = tf.keras.layers.StringLookup(\n",
269
+ " vocabulary=portuguese_vectorizer.get_vocabulary(),\n",
270
+ " mask_token=\"\",\n",
271
+ " oov_token=\"[UNK]\",\n",
272
+ " invert=True,\n",
273
+ ")"
274
+ ]
275
+ },
276
+ {
277
+ "cell_type": "markdown",
278
+ "id": "4af8b623",
279
+ "metadata": {},
280
+ "source": [
281
+ "Try it out for the special tokens and a random word:"
282
+ ]
283
+ },
284
+ {
285
+ "cell_type": "code",
286
+ "execution_count": 8,
287
+ "id": "20076b9a",
288
+ "metadata": {
289
+ "deletable": false,
290
+ "editable": false,
291
+ "tags": [
292
+ "graded"
293
+ ]
294
+ },
295
+ "outputs": [
296
+ {
297
+ "name": "stdout",
298
+ "output_type": "stream",
299
+ "text": [
300
+ "The id for the [UNK] token is 1\n",
301
+ "The id for the [SOS] token is 2\n",
302
+ "The id for the [EOS] token is 3\n",
303
+ "The id for baunilha (vanilla) is 7079\n"
304
+ ]
305
+ }
306
+ ],
307
+ "source": [
308
+ "unk_id = word_to_id(\"[UNK]\")\n",
309
+ "sos_id = word_to_id(\"[SOS]\")\n",
310
+ "eos_id = word_to_id(\"[EOS]\")\n",
311
+ "baunilha_id = word_to_id(\"baunilha\")\n",
312
+ "\n",
313
+ "print(f\"The id for the [UNK] token is {unk_id}\")\n",
314
+ "print(f\"The id for the [SOS] token is {sos_id}\")\n",
315
+ "print(f\"The id for the [EOS] token is {eos_id}\")\n",
316
+ "print(f\"The id for baunilha (vanilla) is {baunilha_id}\")"
317
+ ]
318
+ },
319
+ {
320
+ "cell_type": "markdown",
321
+ "id": "2f1d744c",
322
+ "metadata": {},
323
+ "source": [
324
+ "Finally take a look at how the data that is going to be fed to the neural network looks like. Both `train_data` and `val_data` are of type `tf.data.Dataset` and are already arranged in batches of 64 examples. To get the first batch out of a tf dataset you can use the `take` method. To get the first example out of the batch you can slice the tensor and use the `numpy` method for nicer printing:"
325
+ ]
326
+ },
327
+ {
328
+ "cell_type": "code",
329
+ "execution_count": 9,
330
+ "id": "739777eb",
331
+ "metadata": {
332
+ "deletable": false,
333
+ "editable": false,
334
+ "tags": [
335
+ "graded"
336
+ ]
337
+ },
338
+ "outputs": [
339
+ {
340
+ "name": "stdout",
341
+ "output_type": "stream",
342
+ "text": [
343
+ "Tokenized english sentence:\n",
344
+ "[ 2 210 9 146 123 38 9 1672 4 3 0 0 0 0]\n",
345
+ "\n",
346
+ "\n",
347
+ "Tokenized portuguese sentence (shifted to the right):\n",
348
+ "[ 2 1085 7 128 11 389 37 2038 4 0 0 0 0 0\n",
349
+ " 0]\n",
350
+ "\n",
351
+ "\n",
352
+ "Tokenized portuguese sentence:\n",
353
+ "[1085 7 128 11 389 37 2038 4 3 0 0 0 0 0\n",
354
+ " 0]\n",
355
+ "\n",
356
+ "\n"
357
+ ]
358
+ }
359
+ ],
360
+ "source": [
361
+ "for (to_translate, sr_translation), translation in train_data.take(1):\n",
362
+ " print(f\"Tokenized english sentence:\\n{to_translate[0, :].numpy()}\\n\\n\")\n",
363
+ " print(f\"Tokenized portuguese sentence (shifted to the right):\\n{sr_translation[0, :].numpy()}\\n\\n\")\n",
364
+ " print(f\"Tokenized portuguese sentence:\\n{translation[0, :].numpy()}\\n\\n\")"
365
+ ]
366
+ },
367
+ {
368
+ "cell_type": "markdown",
369
+ "id": "bdd9ee3c",
370
+ "metadata": {
371
+ "slideshow": {
372
+ "slide_type": ""
373
+ },
374
+ "tags": []
375
+ },
376
+ "source": [
377
+ "There are a couple of important details to notice.\n",
378
+ "\n",
379
+ "- Padding has already been applied to the tensors and the value used for this is 0\n",
380
+ "- Each example consists of 3 different tensors:\n",
381
+ " - The sentence to translate\n",
382
+ " - The shifted-to-the-right translation\n",
383
+ " - The translation\n",
384
+ " \n",
385
+ "The first two can be considered as the features, while the third one as the target. By doing this your model can perform Teacher Forcing as you saw in the lectures.\n",
386
+ "\n",
387
+ "Now it is time to begin coding!"
388
+ ]
389
+ },
390
+ {
391
+ "cell_type": "markdown",
392
+ "id": "dd41cb52",
393
+ "metadata": {
394
+ "slideshow": {
395
+ "slide_type": ""
396
+ },
397
+ "tags": []
398
+ },
399
+ "source": [
400
+ "<a name=\"2\"></a>\n",
401
+ "## 2. NMT model with attention\n",
402
+ "\n",
403
+ "The model you will build uses an encoder-decoder architecture. This Recurrent Neural Network (RNN) takes in a tokenized version of a sentence in its encoder, then passes it on to the decoder for translation. As mentioned in the lectures, just using a a regular sequence-to-sequence model with LSTMs will work effectively for short to medium sentences but will start to degrade for longer ones. You can picture it like the figure below where all of the context of the input sentence is compressed into one vector that is passed into the decoder block. You can see how this will be an issue for very long sentences (e.g. 100 tokens or more) because the context of the first parts of the input will have very little effect on the final vector passed to the decoder.\n",
404
+ "\n",
405
+ "<img src='images/plain_rnn.png'>\n",
406
+ "\n",
407
+ "Adding an attention layer to this model avoids this problem by giving the decoder access to all parts of the input sentence. To illustrate, let's just use a 4-word input sentence as shown below. Remember that a hidden state is produced at each timestep of the encoder (represented by the orange rectangles). These are all passed to the attention layer and each are given a score given the current activation (i.e. hidden state) of the decoder. For instance, let's consider the figure below where the first prediction \"como\" is already made. To produce the next prediction, the attention layer will first receive all the encoder hidden states (i.e. orange rectangles) as well as the decoder hidden state when producing the word \"como\" (i.e. first green rectangle). Given this information, it will score each of the encoder hidden states to know which one the decoder should focus on to produce the next word. As a result of training, the model might have learned that it should align to the second encoder hidden state and subsequently assigns a high probability to the word \"você\". If we are using greedy decoding, we will output the said word as the next symbol, then restart the process to produce the next word until we reach an end-of-sentence prediction.\n",
408
+ "\n",
409
+ "<img src='images/attention_overview.png'>\n",
410
+ "\n",
411
+ "\n",
412
+ "There are different ways to implement attention and the one we'll use for this assignment is the Scaled Dot Product Attention which has the form:\n",
413
+ "\n",
414
+ "$$Attention(Q, K, V) = softmax(\\frac{QK^T}{\\sqrt{d_k}})V$$\n",
415
+ "\n",
416
+ "You will dive deeper into this equation in the next week but for now, you can think of it as computing scores using queries (Q) and keys (K), followed by a multiplication of values (V) to get a context vector at a particular timestep of the decoder. This context vector is fed to the decoder RNN to get a set of probabilities for the next predicted word. The division by square root of the keys dimensionality ($\\sqrt{d_k}$) is for improving model performance and you'll also learn more about it next week. For our machine translation application, the encoder activations (i.e. encoder hidden states) will be the keys and values, while the decoder activations (i.e. decoder hidden states) will be the queries.\n",
417
+ "\n",
418
+ "You will see in the upcoming sections that this complex architecture and mechanism can be implemented with just a few lines of code. \n",
419
+ "\n",
420
+ "First you will define two important global variables:\n",
421
+ "\n",
422
+ "- The size of the vocabulary\n",
423
+ "- The number of units in the LSTM layers (the same number will be used for all LSTM layers)\n",
424
+ "\n",
425
+ "In this assignment, the vocabulary sizes for English and Portuguese are the same. Therefore, we use a single constant VOCAB_SIZE throughout the notebook. While in other settings, vocabulary sizes could differ, that is not the case in our assignment."
426
+ ]
427
+ },
428
+ {
429
+ "cell_type": "code",
430
+ "execution_count": 10,
431
+ "id": "2e484abf",
432
+ "metadata": {
433
+ "deletable": false,
434
+ "editable": false,
435
+ "slideshow": {
436
+ "slide_type": ""
437
+ },
438
+ "tags": [
439
+ "graded"
440
+ ]
441
+ },
442
+ "outputs": [],
443
+ "source": [
444
+ "VOCAB_SIZE = 12000\n",
445
+ "UNITS = 256"
446
+ ]
447
+ },
448
+ {
449
+ "cell_type": "markdown",
450
+ "id": "cc251965",
451
+ "metadata": {},
452
+ "source": [
453
+ "<a name=\"ex1\"></a>\n",
454
+ "## Exercise 1 - Encoder\n",
455
+ "\n",
456
+ "Your first exercise is to code the encoder part of the neural network. For this, complete the `Encoder` class below. Notice that in the constructor (the `__init__` method) you need to define all of the sublayers of the encoder and then use these sublayers during the forward pass (the `call` method).\n",
457
+ "\n",
458
+ "The encoder consists of the following layers:\n",
459
+ "\n",
460
+ "- [Embedding](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding). For this layer you need to define the appropriate `input_dim` and `output_dim` and let it know that you are using '0' as padding, which can be done by using the appropriate value for the `mask_zero` parameter.\n",
461
+ " \n",
462
+ "+ [Bidirectional](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Bidirectional) [LSTM](https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM). In TF you can implement bidirectional behaviour for RNN-like layers. This part is already taken care of but you will need to specify the appropriate type of layer as well as its parameters. In particular you need to set the appropriate number of units and make sure that the LSTM returns the full sequence and not only the last output, which can be done by using the appropriate value for the `return_sequences` parameter.\n",
463
+ "\n",
464
+ "\n",
465
+ "You need to define the forward pass using the syntax of TF's [functional API](https://www.tensorflow.org/guide/keras/functional_api). What this means is that you chain function calls together to define your network like this:\n",
466
+ "\n",
467
+ "```python\n",
468
+ "encoder_input = keras.Input(shape=(28, 28, 1), name=\"original_img\")\n",
469
+ "x = layers.Conv2D(16, 3, activation=\"relu\")(encoder_input)\n",
470
+ "x = layers.MaxPooling2D(3)(x)\n",
471
+ "x = layers.Conv2D(16, 3, activation=\"relu\")(x)\n",
472
+ "encoder_output = layers.GlobalMaxPooling2D()(x)\n",
473
+ "```"
474
+ ]
475
+ },
476
+ {
477
+ "cell_type": "code",
478
+ "execution_count": 20,
479
+ "id": "b1db0a1d",
480
+ "metadata": {
481
+ "deletable": false,
482
+ "tags": [
483
+ "graded"
484
+ ]
485
+ },
486
+ "outputs": [],
487
+ "source": [
488
+ "# GRADED CLASS: Encoder\n",
489
+ "class Encoder(tf.keras.layers.Layer):\n",
490
+ " def __init__(self, vocab_size, units):\n",
491
+ " \"\"\"Initializes an instance of this class\n",
492
+ "\n",
493
+ " Args:\n",
494
+ " vocab_size (int): Size of the vocabulary\n",
495
+ " units (int): Number of units in the LSTM layer\n",
496
+ " \"\"\"\n",
497
+ " super(Encoder, self).__init__()\n",
498
+ "\n",
499
+ " ### START CODE HERE ###\n",
500
+ "\n",
501
+ " self.embedding = tf.keras.layers.Embedding( \n",
502
+ " input_dim=vocab_size,\n",
503
+ " output_dim=units,\n",
504
+ " mask_zero=True\n",
505
+ " ) \n",
506
+ "\n",
507
+ " self.rnn = tf.keras.layers.Bidirectional( \n",
508
+ " merge_mode=\"sum\", \n",
509
+ " layer=tf.keras.layers.LSTM(\n",
510
+ " units=units,\n",
511
+ " return_sequences=True\n",
512
+ " ), \n",
513
+ " ) \n",
514
+ "\n",
515
+ " ### END CODE HERE ###\n",
516
+ "\n",
517
+ " def call(self, context):\n",
518
+ " \"\"\"Forward pass of this layer\n",
519
+ "\n",
520
+ " Args:\n",
521
+ " context (tf.Tensor): The sentence to translate\n",
522
+ "\n",
523
+ " Returns:\n",
524
+ " tf.Tensor: Encoded sentence to translate\n",
525
+ " \"\"\"\n",
526
+ "\n",
527
+ " ### START CODE HERE ###\n",
528
+ "\n",
529
+ " # Pass the context through the embedding layer\n",
530
+ " x = self.embedding(context)\n",
531
+ "\n",
532
+ " # Pass the output of the embedding through the RNN\n",
533
+ " x = self.rnn(x)\n",
534
+ "\n",
535
+ " ### END CODE HERE ###\n",
536
+ "\n",
537
+ " return x"
538
+ ]
539
+ },
540
+ {
541
+ "cell_type": "code",
542
+ "execution_count": 21,
543
+ "id": "65034ffd",
544
+ "metadata": {
545
+ "deletable": false,
546
+ "editable": false,
547
+ "tags": [
548
+ "graded"
549
+ ]
550
+ },
551
+ "outputs": [
552
+ {
553
+ "name": "stdout",
554
+ "output_type": "stream",
555
+ "text": [
556
+ "Tensor of sentences in english has shape: (64, 14)\n",
557
+ "\n",
558
+ "Encoder output has shape: (64, 14, 256)\n"
559
+ ]
560
+ }
561
+ ],
562
+ "source": [
563
+ "# Do a quick check of your implementation\n",
564
+ "\n",
565
+ "# Create an instance of your class\n",
566
+ "encoder = Encoder(VOCAB_SIZE, UNITS)\n",
567
+ "\n",
568
+ "# Pass a batch of sentences to translate from english to portuguese\n",
569
+ "encoder_output = encoder(to_translate)\n",
570
+ "\n",
571
+ "print(f'Tensor of sentences in english has shape: {to_translate.shape}\\n')\n",
572
+ "print(f'Encoder output has shape: {encoder_output.shape}')"
573
+ ]
574
+ },
575
+ {
576
+ "cell_type": "markdown",
577
+ "id": "a909aea1",
578
+ "metadata": {},
579
+ "source": [
580
+ "##### __Expected Output__\n",
581
+ "\n",
582
+ "```\n",
583
+ "Tensor of sentences in english has shape: (64, 14)\n",
584
+ "\n",
585
+ "Encoder output has shape: (64, 14, 256)\n",
586
+ "```"
587
+ ]
588
+ },
589
+ {
590
+ "cell_type": "code",
591
+ "execution_count": 22,
592
+ "id": "3031bb14",
593
+ "metadata": {
594
+ "deletable": false,
595
+ "editable": false,
596
+ "tags": []
597
+ },
598
+ "outputs": [
599
+ {
600
+ "name": "stdout",
601
+ "output_type": "stream",
602
+ "text": [
603
+ "\u001b[92m All tests passed!\n"
604
+ ]
605
+ }
606
+ ],
607
+ "source": [
608
+ "# Test your code!\n",
609
+ "\n",
610
+ "w1_unittest.test_encoder(Encoder)"
611
+ ]
612
+ },
613
+ {
614
+ "cell_type": "markdown",
615
+ "id": "1afe83f4",
616
+ "metadata": {},
617
+ "source": [
618
+ "<a name=\"ex2\"></a>\n",
619
+ "## Exercise 2 - CrossAttention\n",
620
+ "\n",
621
+ "Your next exercise is to code the layer that will perform cross attention between the original sentences and the translations. For this, complete the `CrossAttention` class below. Notice that in the constructor (the `__init__` method) you need to define all of the sublayers and then use these sublayers during the forward pass (the `call` method). For this particular case some of these bits are already taken care of.\n",
622
+ "\n",
623
+ "The cross attention consists of the following layers:\n",
624
+ "\n",
625
+ "- [MultiHeadAttention](https://www.tensorflow.org/api_docs/python/tf/keras/layers/MultiHeadAttention). For this layer you need to define the appropriate `key_dim`, which is the size of the key and query tensors. You will also need to set the number of heads to 1 since you aren't implementing multi head attention but attention between two tensors. The reason why this layer is preferred over [Attention](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Attention) is that it allows simpler code during the forward pass.\n",
626
+ " \n",
627
+ "A couple of things to notice:\n",
628
+ "- You need a way to pass both the output of the attention alongside the shifted-to-the-right translation (since this cross attention happens in the decoder side). For this you will use an [Add](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Add) layer so that the original dimension is preserved, which would not happen if you use something like a [Concatenate](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Concatenate) layer.\n",
629
+ "\n",
630
+ "+ Layer normalization is also performed for better stability of the network by using a [LayerNormalization](https://www.tensorflow.org/api_docs/python/tf/keras/layers/LayerNormalization) layer.\n",
631
+ "\n",
632
+ "- You don't need to worry about these last steps as these are already solved.\n",
633
+ "\n"
634
+ ]
635
+ },
636
+ {
637
+ "cell_type": "code",
638
+ "execution_count": 23,
639
+ "id": "74e71f3d",
640
+ "metadata": {
641
+ "deletable": false,
642
+ "tags": [
643
+ "graded"
644
+ ]
645
+ },
646
+ "outputs": [],
647
+ "source": [
648
+ "# GRADED CLASS: CrossAttention\n",
649
+ "class CrossAttention(tf.keras.layers.Layer):\n",
650
+ " def __init__(self, units):\n",
651
+ " \"\"\"Initializes an instance of this class\n",
652
+ "\n",
653
+ " Args:\n",
654
+ " units (int): Number of units in the LSTM layer\n",
655
+ " \"\"\"\n",
656
+ " super().__init__()\n",
657
+ "\n",
658
+ " ### START CODE HERE ###\n",
659
+ "\n",
660
+ " self.mha = ( \n",
661
+ " tf.keras.layers.MultiHeadAttention(\n",
662
+ " key_dim=units,\n",
663
+ " num_heads=1\n",
664
+ " ) \n",
665
+ " ) \n",
666
+ "\n",
667
+ " ### END CODE HERE ###\n",
668
+ "\n",
669
+ " self.layernorm = tf.keras.layers.LayerNormalization()\n",
670
+ " self.add = tf.keras.layers.Add()\n",
671
+ "\n",
672
+ " def call(self, context, target):\n",
673
+ " \"\"\"Forward pass of this layer\n",
674
+ "\n",
675
+ " Args:\n",
676
+ " context (tf.Tensor): Encoded sentence to translate\n",
677
+ " target (tf.Tensor): The embedded shifted-to-the-right translation\n",
678
+ "\n",
679
+ " Returns:\n",
680
+ " tf.Tensor: Cross attention between context and target\n",
681
+ " \"\"\"\n",
682
+ " ### START CODE HERE ###\n",
683
+ "\n",
684
+ " # Call the MH attention by passing in the query and value\n",
685
+ " # For this case the query should be the translation and the value the encoded sentence to translate\n",
686
+ " # Hint: Check the call arguments of MultiHeadAttention in the docs\n",
687
+ " attn_output =self.mha(\n",
688
+ " query=target,\n",
689
+ " value=context\n",
690
+ " ) \n",
691
+ "\n",
692
+ " ### END CODE HERE ###\n",
693
+ "\n",
694
+ " x = self.add([target, attn_output])\n",
695
+ "\n",
696
+ " x = self.layernorm(x)\n",
697
+ "\n",
698
+ " return x"
699
+ ]
700
+ },
701
+ {
702
+ "cell_type": "code",
703
+ "execution_count": 24,
704
+ "id": "4c62796f",
705
+ "metadata": {
706
+ "deletable": false,
707
+ "editable": false,
708
+ "tags": [
709
+ "graded"
710
+ ]
711
+ },
712
+ "outputs": [
713
+ {
714
+ "name": "stdout",
715
+ "output_type": "stream",
716
+ "text": [
717
+ "Tensor of contexts has shape: (64, 14, 256)\n",
718
+ "Tensor of translations has shape: (64, 15, 256)\n",
719
+ "Tensor of attention scores has shape: (64, 15, 256)\n"
720
+ ]
721
+ }
722
+ ],
723
+ "source": [
724
+ "# Do a quick check of your implementation\n",
725
+ "\n",
726
+ "# Create an instance of your class\n",
727
+ "attention_layer = CrossAttention(UNITS)\n",
728
+ "\n",
729
+ "# The attention layer expects the embedded sr-translation and the context\n",
730
+ "# The context (encoder_output) is already embedded so you need to do this for sr_translation:\n",
731
+ "sr_translation_embed = tf.keras.layers.Embedding(VOCAB_SIZE, output_dim=UNITS, mask_zero=True)(sr_translation)\n",
732
+ "\n",
733
+ "# Compute the cross attention\n",
734
+ "attention_result = attention_layer(encoder_output, sr_translation_embed)\n",
735
+ "\n",
736
+ "print(f'Tensor of contexts has shape: {encoder_output.shape}')\n",
737
+ "print(f'Tensor of translations has shape: {sr_translation_embed.shape}')\n",
738
+ "print(f'Tensor of attention scores has shape: {attention_result.shape}')"
739
+ ]
740
+ },
741
+ {
742
+ "cell_type": "markdown",
743
+ "id": "41d4f99a",
744
+ "metadata": {},
745
+ "source": [
746
+ "##### __Expected Output__\n",
747
+ "\n",
748
+ "```\n",
749
+ "Tensor of contexts has shape: (64, 14, 256)\n",
750
+ "Tensor of translations has shape: (64, 15, 256)\n",
751
+ "Tensor of attention scores has shape: (64, 15, 256)\n",
752
+ "```"
753
+ ]
754
+ },
755
+ {
756
+ "cell_type": "code",
757
+ "execution_count": 25,
758
+ "id": "4f658975",
759
+ "metadata": {
760
+ "deletable": false,
761
+ "editable": false,
762
+ "tags": []
763
+ },
764
+ "outputs": [
765
+ {
766
+ "name": "stdout",
767
+ "output_type": "stream",
768
+ "text": [
769
+ "\u001b[92m All tests passed!\n"
770
+ ]
771
+ }
772
+ ],
773
+ "source": [
774
+ "# Test your code!\n",
775
+ "\n",
776
+ "w1_unittest.test_cross_attention(CrossAttention)"
777
+ ]
778
+ },
779
+ {
780
+ "cell_type": "markdown",
781
+ "id": "aa296ee2",
782
+ "metadata": {},
783
+ "source": [
784
+ "<a name=\"ex3\"></a>\n",
785
+ "## Exercise 3 - Decoder\n",
786
+ "\n",
787
+ "\n",
788
+ "Now you will implement the decoder part of the neural network by completing the `Decoder` class below. Notice that in the constructor (the `__init__` method) you need to define all of the sublayers of the decoder and then use these sublayers during the forward pass (the `call` method).\n",
789
+ "\n",
790
+ "The decoder consists of the following layers:\n",
791
+ "\n",
792
+ "- [Embedding](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding). For this layer you need to define the appropriate `input_dim` and `output_dim` and let it know that you are using '0' as padding, which can be done by using the appropriate value for the `mask_zero` parameter.\n",
793
+ " \n",
794
+ " \n",
795
+ "+ Pre-attention [LSTM](https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM). Unlike in the encoder in which you used a Bidirectional LSTM, here you will use a vanilla LSTM. Don't forget to set the appropriate number of units and make sure that the LSTM returns the full sequence and not only the last output, which can be done by using the appropriate value for the `return_sequences` parameter. It is very important that this layer returns the state since this will be needed for inference so make sure to set the `return_state` parameter accordingly. Notice that LSTM layers return state as a tuple of two tensors called `memory_state` and `carry_state`, **however these names have been changed to better reflect what you have seen in the lectures to `hidden_state` and `cell_state` respectively**.\n",
796
+ "\n",
797
+ "- The attention layer that performs cross attention between the sentence to translate and the right-shifted translation. Here you need to use the `CrossAttention` layer you defined in the previous exercise.\n",
798
+ "\n",
799
+ "+ Post-attention [LSTM](https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM). Another LSTM layer. For this one you don't need it to return the state.\n",
800
+ "\n",
801
+ "- Finally a [Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense) layer. This one should have the same number of units as the size of the vocabulary since you expect it to compute the logits for every possible word in the vocabulary. Make sure to use a `logsoftmax` activation function for this one, which you can get as [tf.nn.log_softmax](https://www.tensorflow.org/api_docs/python/tf/nn/log_softmax).\n",
802
+ "\n"
803
+ ]
804
+ },
805
+ {
806
+ "cell_type": "code",
807
+ "execution_count": 43,
808
+ "id": "e9639bdb",
809
+ "metadata": {
810
+ "deletable": false,
811
+ "tags": [
812
+ "graded"
813
+ ]
814
+ },
815
+ "outputs": [],
816
+ "source": [
817
+ "# GRADED CLASS: Decoder\n",
818
+ "class Decoder(tf.keras.layers.Layer):\n",
819
+ " def __init__(self, vocab_size, units):\n",
820
+ " \"\"\"Initializes an instance of this class\n",
821
+ "\n",
822
+ " Args:\n",
823
+ " vocab_size (int): Size of the vocabulary\n",
824
+ " units (int): Number of units in the LSTM layer\n",
825
+ " \"\"\"\n",
826
+ " super(Decoder, self).__init__()\n",
827
+ "\n",
828
+ " ### START CODE HERE ###\n",
829
+ "\n",
830
+ " # The embedding layer\n",
831
+ " self.embedding = tf.keras.layers.Embedding(\n",
832
+ " input_dim=vocab_size,\n",
833
+ " output_dim=units,\n",
834
+ " mask_zero=True\n",
835
+ " ) \n",
836
+ "\n",
837
+ " # The RNN before attention\n",
838
+ " self.pre_attention_rnn = tf.keras.layers.LSTM(\n",
839
+ " units=units,\n",
840
+ " return_sequences=True,\n",
841
+ " return_state=True\n",
842
+ " ) \n",
843
+ "\n",
844
+ " # The attention layer\n",
845
+ " self.attention = CrossAttention(units)\n",
846
+ "\n",
847
+ " # The RNN after attention\n",
848
+ " self.post_attention_rnn = tf.keras.layers.LSTM(\n",
849
+ " units=units,\n",
850
+ " return_sequences=True\n",
851
+ " ) \n",
852
+ "\n",
853
+ " # The dense layer with logsoftmax activation\n",
854
+ " self.output_layer = tf.keras.layers.Dense(\n",
855
+ " units=vocab_size,\n",
856
+ " activation=tf.nn.log_softmax\n",
857
+ " ) \n",
858
+ "\n",
859
+ " ### END CODE HERE ###\n",
860
+ "\n",
861
+ " def call(self, context, target, state=None, return_state=False):\n",
862
+ " \"\"\"Forward pass of this layer\n",
863
+ "\n",
864
+ " Args:\n",
865
+ " context (tf.Tensor): Encoded sentence to translate\n",
866
+ " target (tf.Tensor): The shifted-to-the-right translation\n",
867
+ " state (list[tf.Tensor, tf.Tensor], optional): Hidden state of the pre-attention LSTM. Defaults to None.\n",
868
+ " return_state (bool, optional): If set to true return the hidden states of the LSTM. Defaults to False.\n",
869
+ "\n",
870
+ " Returns:\n",
871
+ " tf.Tensor: The log_softmax probabilities of predicting a particular token\n",
872
+ " \"\"\"\n",
873
+ " ### START CODE HERE ###\n",
874
+ "\n",
875
+ " # Get the embedding of the input\n",
876
+ " x = self.embedding(target)\n",
877
+ "\n",
878
+ " # Pass the embedded input into the pre attention LSTM\n",
879
+ " # Hints:\n",
880
+ " # - The LSTM you defined earlier should return the output alongside the state (made up of two tensors)\n",
881
+ " # - Pass in the state to the LSTM (needed for inference)\n",
882
+ " x, hidden_state, cell_state = self.pre_attention_rnn(x, initial_state=state)\n",
883
+ "\n",
884
+ " # Perform cross attention between the context and the output of the LSTM (in that order)\n",
885
+ " x = self.attention(context, x)\n",
886
+ "\n",
887
+ " # Do a pass through the post attention LSTM\n",
888
+ " x = self.post_attention_rnn(x)\n",
889
+ "\n",
890
+ " # Compute the logits\n",
891
+ " logits = self.output_layer(x)\n",
892
+ "\n",
893
+ " ### END CODE HERE ###\n",
894
+ "\n",
895
+ " if return_state:\n",
896
+ " return logits, [hidden_state, cell_state]\n",
897
+ "\n",
898
+ " return logits"
899
+ ]
900
+ },
901
+ {
902
+ "cell_type": "code",
903
+ "execution_count": 44,
904
+ "id": "f6165cf2",
905
+ "metadata": {
906
+ "deletable": false,
907
+ "editable": false,
908
+ "tags": [
909
+ "graded"
910
+ ]
911
+ },
912
+ "outputs": [
913
+ {
914
+ "name": "stdout",
915
+ "output_type": "stream",
916
+ "text": [
917
+ "Tensor of contexts has shape: (64, 14, 256)\n",
918
+ "Tensor of right-shifted translations has shape: (64, 15)\n",
919
+ "Tensor of logits has shape: (64, 15, 12000)\n"
920
+ ]
921
+ }
922
+ ],
923
+ "source": [
924
+ "# Do a quick check of your implementation\n",
925
+ "\n",
926
+ "# Create an instance of your class\n",
927
+ "decoder = Decoder(VOCAB_SIZE, UNITS)\n",
928
+ "\n",
929
+ "# Notice that you don't need the embedded version of sr_translation since this is done inside the class\n",
930
+ "logits = decoder(encoder_output, sr_translation)\n",
931
+ "\n",
932
+ "print(f'Tensor of contexts has shape: {encoder_output.shape}')\n",
933
+ "print(f'Tensor of right-shifted translations has shape: {sr_translation.shape}')\n",
934
+ "print(f'Tensor of logits has shape: {logits.shape}')"
935
+ ]
936
+ },
937
+ {
938
+ "cell_type": "markdown",
939
+ "id": "6f2b5d7d",
940
+ "metadata": {},
941
+ "source": [
942
+ "##### __Expected Output__\n",
943
+ "\n",
944
+ "```\n",
945
+ "Tensor of contexts has shape: (64, 14, 256)\n",
946
+ "Tensor of right-shifted translations has shape: (64, 15)\n",
947
+ "Tensor of logits has shape: (64, 15, 12000)\n",
948
+ "```"
949
+ ]
950
+ },
951
+ {
952
+ "cell_type": "code",
953
+ "execution_count": 45,
954
+ "id": "1b61093a",
955
+ "metadata": {
956
+ "deletable": false,
957
+ "editable": false,
958
+ "tags": []
959
+ },
960
+ "outputs": [
961
+ {
962
+ "name": "stdout",
963
+ "output_type": "stream",
964
+ "text": [
965
+ "\u001b[92m All tests passed!\n"
966
+ ]
967
+ }
968
+ ],
969
+ "source": [
970
+ "# Test your code!\n",
971
+ "\n",
972
+ "w1_unittest.test_decoder(Decoder, CrossAttention)"
973
+ ]
974
+ },
975
+ {
976
+ "cell_type": "markdown",
977
+ "id": "9dcce3a7",
978
+ "metadata": {},
979
+ "source": [
980
+ "<a name=\"ex4\"></a>\n",
981
+ "## Exercise 4 - Translator\n",
982
+ "\n",
983
+ "Now you have to put together all of the layers you previously coded into an actual model. For this, complete the `Translator` class below. Notice how unlike the Encoder and Decoder classes inherited from `tf.keras.layers.Layer`, the Translator class inherits from `tf.keras.Model`.\n",
984
+ "\n",
985
+ "Remember that `train_data` will yield a tuple with the sentence to translate and the shifted-to-the-right translation, which are the \"features\" of the model. This means that the inputs of your network will be tuples containing context and targets."
986
+ ]
987
+ },
988
+ {
989
+ "cell_type": "code",
990
+ "execution_count": 46,
991
+ "id": "205fcf31",
992
+ "metadata": {
993
+ "deletable": false,
994
+ "tags": [
995
+ "graded"
996
+ ]
997
+ },
998
+ "outputs": [],
999
+ "source": [
1000
+ "# GRADED CLASS: Translator\n",
1001
+ "class Translator(tf.keras.Model):\n",
1002
+ " def __init__(self, vocab_size, units):\n",
1003
+ " \"\"\"Initializes an instance of this class\n",
1004
+ "\n",
1005
+ " Args:\n",
1006
+ " vocab_size (int): Size of the vocabulary\n",
1007
+ " units (int): Number of units in the LSTM layer\n",
1008
+ " \"\"\"\n",
1009
+ " super().__init__()\n",
1010
+ "\n",
1011
+ " ### START CODE HERE ###\n",
1012
+ "\n",
1013
+ " # Define the encoder with the appropriate vocab_size and number of units\n",
1014
+ " self.encoder = Encoder(vocab_size,units)\n",
1015
+ "\n",
1016
+ " # Define the decoder with the appropriate vocab_size and number of units\n",
1017
+ " self.decoder = Decoder(vocab_size,units)\n",
1018
+ "\n",
1019
+ " ### END CODE HERE ###\n",
1020
+ "\n",
1021
+ " def call(self, inputs):\n",
1022
+ " \"\"\"Forward pass of this layer\n",
1023
+ "\n",
1024
+ " Args:\n",
1025
+ " inputs (tuple(tf.Tensor, tf.Tensor)): Tuple containing the context (sentence to translate) and the target (shifted-to-the-right translation)\n",
1026
+ "\n",
1027
+ " Returns:\n",
1028
+ " tf.Tensor: The log_softmax probabilities of predicting a particular token\n",
1029
+ " \"\"\"\n",
1030
+ "\n",
1031
+ " ### START CODE HERE ###\n",
1032
+ "\n",
1033
+ " # In this case inputs is a tuple consisting of the context and the target, unpack it into single variables\n",
1034
+ " context, target = inputs\n",
1035
+ "\n",
1036
+ " # Pass the context through the encoder\n",
1037
+ " encoded_context = self.encoder(context)\n",
1038
+ "\n",
1039
+ " # Compute the logits by passing the encoded context and the target to the decoder\n",
1040
+ " logits = self.decoder(target=target,context=encoded_context)\n",
1041
+ "\n",
1042
+ " ### END CODE HERE ###\n",
1043
+ "\n",
1044
+ " return logits"
1045
+ ]
1046
+ },
1047
+ {
1048
+ "cell_type": "code",
1049
+ "execution_count": 47,
1050
+ "id": "4d4a231c",
1051
+ "metadata": {
1052
+ "deletable": false,
1053
+ "editable": false,
1054
+ "tags": [
1055
+ "graded"
1056
+ ]
1057
+ },
1058
+ "outputs": [
1059
+ {
1060
+ "name": "stdout",
1061
+ "output_type": "stream",
1062
+ "text": [
1063
+ "Tensor of sentences to translate has shape: (64, 14)\n",
1064
+ "Tensor of right-shifted translations has shape: (64, 15)\n",
1065
+ "Tensor of logits has shape: (64, 15, 12000)\n"
1066
+ ]
1067
+ }
1068
+ ],
1069
+ "source": [
1070
+ "# Do a quick check of your implementation\n",
1071
+ "\n",
1072
+ "# Create an instance of your class\n",
1073
+ "translator = Translator(VOCAB_SIZE, UNITS)\n",
1074
+ "\n",
1075
+ "# Compute the logits for every word in the vocabulary\n",
1076
+ "logits = translator((to_translate, sr_translation))\n",
1077
+ "\n",
1078
+ "print(f'Tensor of sentences to translate has shape: {to_translate.shape}')\n",
1079
+ "print(f'Tensor of right-shifted translations has shape: {sr_translation.shape}')\n",
1080
+ "print(f'Tensor of logits has shape: {logits.shape}')"
1081
+ ]
1082
+ },
1083
+ {
1084
+ "cell_type": "markdown",
1085
+ "id": "e3a162dd",
1086
+ "metadata": {},
1087
+ "source": [
1088
+ "##### __Expected Output__\n",
1089
+ "\n",
1090
+ "```\n",
1091
+ "Tensor of sentences to translate has shape: (64, 14)\n",
1092
+ "Tensor of right-shifted translations has shape: (64, 15)\n",
1093
+ "Tensor of logits has shape: (64, 15, 12000)\n",
1094
+ "```"
1095
+ ]
1096
+ },
1097
+ {
1098
+ "cell_type": "code",
1099
+ "execution_count": 50,
1100
+ "id": "37009022",
1101
+ "metadata": {
1102
+ "deletable": false,
1103
+ "editable": false,
1104
+ "tags": []
1105
+ },
1106
+ "outputs": [
1107
+ {
1108
+ "name": "stdout",
1109
+ "output_type": "stream",
1110
+ "text": [
1111
+ "\u001b[92m All tests passed!\n"
1112
+ ]
1113
+ }
1114
+ ],
1115
+ "source": [
1116
+ "w1_unittest.test_translator(Translator, Encoder, Decoder)"
1117
+ ]
1118
+ },
1119
+ {
1120
+ "cell_type": "markdown",
1121
+ "id": "f81bc228",
1122
+ "metadata": {},
1123
+ "source": [
1124
+ "<a name=\"3\"></a>\n",
1125
+ "## 3. Training\n",
1126
+ "\n",
1127
+ "Now that you have an untrained instance of the NMT model, it is time to train it. You can use the `compile_and_train` function below to achieve this:"
1128
+ ]
1129
+ },
1130
+ {
1131
+ "cell_type": "code",
1132
+ "execution_count": 79,
1133
+ "id": "8a61ef65",
1134
+ "metadata": {
1135
+ "deletable": false,
1136
+ "editable": false,
1137
+ "tags": [
1138
+ "graded"
1139
+ ]
1140
+ },
1141
+ "outputs": [],
1142
+ "source": [
1143
+ "def compile_and_train(model, epochs=20, steps_per_epoch=500):\n",
1144
+ " model.compile(optimizer=\"adam\", loss=masked_loss, metrics=[masked_acc, masked_loss])\n",
1145
+ "\n",
1146
+ " history = model.fit(\n",
1147
+ " train_data.repeat(),\n",
1148
+ " epochs=epochs,\n",
1149
+ " steps_per_epoch=steps_per_epoch,\n",
1150
+ " validation_data=val_data,\n",
1151
+ " validation_steps=50,\n",
1152
+ " callbacks=[tf.keras.callbacks.EarlyStopping(patience=3)],\n",
1153
+ " )\n",
1154
+ "\n",
1155
+ " return model, history"
1156
+ ]
1157
+ },
1158
+ {
1159
+ "cell_type": "code",
1160
+ "execution_count": 80,
1161
+ "id": "87d9bf9f",
1162
+ "metadata": {
1163
+ "deletable": false,
1164
+ "editable": false,
1165
+ "tags": []
1166
+ },
1167
+ "outputs": [
1168
+ {
1169
+ "name": "stdout",
1170
+ "output_type": "stream",
1171
+ "text": [
1172
+ "Epoch 1/20\n",
1173
+ "500/500 [==============================] - 44s 62ms/step - loss: 0.7642 - masked_acc: 0.8158 - masked_loss: 0.7650 - val_loss: 1.0111 - val_masked_acc: 0.7826 - val_masked_loss: 1.0127\n",
1174
+ "Epoch 2/20\n",
1175
+ "500/500 [==============================] - 17s 33ms/step - loss: 0.7811 - masked_acc: 0.8130 - masked_loss: 0.7816 - val_loss: 0.9974 - val_masked_acc: 0.7832 - val_masked_loss: 0.9979\n",
1176
+ "Epoch 3/20\n",
1177
+ "500/500 [==============================] - 16s 32ms/step - loss: 0.7887 - masked_acc: 0.8119 - masked_loss: 0.7894 - val_loss: 0.9753 - val_masked_acc: 0.7860 - val_masked_loss: 0.9780\n",
1178
+ "Epoch 4/20\n",
1179
+ "500/500 [==============================] - 15s 31ms/step - loss: 0.7835 - masked_acc: 0.8125 - masked_loss: 0.7841 - val_loss: 0.9936 - val_masked_acc: 0.7821 - val_masked_loss: 0.9955\n",
1180
+ "Epoch 5/20\n",
1181
+ "500/500 [==============================] - 16s 32ms/step - loss: 0.7452 - masked_acc: 0.8175 - masked_loss: 0.7461 - val_loss: 0.9643 - val_masked_acc: 0.7896 - val_masked_loss: 0.9638\n",
1182
+ "Epoch 6/20\n",
1183
+ "500/500 [==============================] - 15s 30ms/step - loss: 0.6538 - masked_acc: 0.8322 - masked_loss: 0.6543 - val_loss: 0.9582 - val_masked_acc: 0.7932 - val_masked_loss: 0.9586\n",
1184
+ "Epoch 7/20\n",
1185
+ "500/500 [==============================] - 15s 30ms/step - loss: 0.6596 - masked_acc: 0.8308 - masked_loss: 0.6605 - val_loss: 0.9581 - val_masked_acc: 0.7916 - val_masked_loss: 0.9594\n",
1186
+ "Epoch 8/20\n",
1187
+ "500/500 [==============================] - 15s 30ms/step - loss: 0.6746 - masked_acc: 0.8267 - masked_loss: 0.6754 - val_loss: 0.9448 - val_masked_acc: 0.7925 - val_masked_loss: 0.9467\n",
1188
+ "Epoch 9/20\n",
1189
+ "500/500 [==============================] - 16s 32ms/step - loss: 0.6785 - masked_acc: 0.8266 - masked_loss: 0.6788 - val_loss: 0.9292 - val_masked_acc: 0.7928 - val_masked_loss: 0.9295\n",
1190
+ "Epoch 10/20\n",
1191
+ "500/500 [==============================] - 15s 31ms/step - loss: 0.6287 - masked_acc: 0.8356 - masked_loss: 0.6292 - val_loss: 0.9324 - val_masked_acc: 0.7955 - val_masked_loss: 0.9324\n",
1192
+ "Epoch 11/20\n",
1193
+ "500/500 [==============================] - 15s 30ms/step - loss: 0.5875 - masked_acc: 0.8432 - masked_loss: 0.5880 - val_loss: 0.9407 - val_masked_acc: 0.7978 - val_masked_loss: 0.9401\n",
1194
+ "Epoch 12/20\n",
1195
+ "500/500 [==============================] - 15s 31ms/step - loss: 0.5988 - masked_acc: 0.8398 - masked_loss: 0.5992 - val_loss: 0.9546 - val_masked_acc: 0.7926 - val_masked_loss: 0.9546\n"
1196
+ ]
1197
+ }
1198
+ ],
1199
+ "source": [
1200
+ "# Train the translator (this takes some minutes so feel free to take a break)\n",
1201
+ "\n",
1202
+ "trained_translator, history = compile_and_train(translator)"
1203
+ ]
1204
+ },
1205
+ {
1206
+ "cell_type": "markdown",
1207
+ "id": "d23b9301",
1208
+ "metadata": {},
1209
+ "source": [
1210
+ "<a name=\"4\"></a>\n",
1211
+ "## 4. Using the model for inference \n",
1212
+ "\n",
1213
+ "\n",
1214
+ "Now that your model is trained you can use it for inference. To help you with this the `generate_next_token` function is provided. Notice that this function is meant to be used inside a for-loop, so you feed to it the information of the previous step to generate the information of the next step. In particular you need to keep track of the state of the pre-attention LSTM in the decoder and if you are done with the translation. Also notice that a `temperature` variable is introduced which determines how to select the next token given the predicted logits: "
1215
+ ]
1216
+ },
1217
+ {
1218
+ "cell_type": "code",
1219
+ "execution_count": 82,
1220
+ "id": "522f6b6f",
1221
+ "metadata": {
1222
+ "deletable": false,
1223
+ "editable": false,
1224
+ "tags": [
1225
+ "graded"
1226
+ ]
1227
+ },
1228
+ "outputs": [],
1229
+ "source": [
1230
+ "def generate_next_token(decoder, context, next_token, done, state, temperature=0.0):\n",
1231
+ " \"\"\"Generates the next token in the sequence\n",
1232
+ "\n",
1233
+ " Args:\n",
1234
+ " decoder (Decoder): The decoder\n",
1235
+ " context (tf.Tensor): Encoded sentence to translate\n",
1236
+ " next_token (tf.Tensor): The predicted next token\n",
1237
+ " done (bool): True if the translation is complete\n",
1238
+ " state (list[tf.Tensor, tf.Tensor]): Hidden states of the pre-attention LSTM layer\n",
1239
+ " temperature (float, optional): The temperature that controls the randomness of the predicted tokens. Defaults to 0.0.\n",
1240
+ "\n",
1241
+ " Returns:\n",
1242
+ " tuple(tf.Tensor, np.float, list[tf.Tensor, tf.Tensor], bool): The next token, log prob of said token, hidden state of LSTM and if translation is done\n",
1243
+ " \"\"\"\n",
1244
+ " # Get the logits and state from the decoder\n",
1245
+ " logits, state = decoder(context, next_token, state=state, return_state=True)\n",
1246
+ " \n",
1247
+ " # Trim the intermediate dimension \n",
1248
+ " logits = logits[:, -1, :]\n",
1249
+ " \n",
1250
+ " # If temp is 0 then next_token is the argmax of logits\n",
1251
+ " if temperature == 0.0:\n",
1252
+ " next_token = tf.argmax(logits, axis=-1)\n",
1253
+ " \n",
1254
+ " # If temp is not 0 then next_token is sampled out of logits\n",
1255
+ " else:\n",
1256
+ " logits = logits / temperature\n",
1257
+ " next_token = tf.random.categorical(logits, num_samples=1)\n",
1258
+ " \n",
1259
+ " # Trim dimensions of size 1\n",
1260
+ " logits = tf.squeeze(logits)\n",
1261
+ " next_token = tf.squeeze(next_token)\n",
1262
+ " \n",
1263
+ " # Get the logit of the selected next_token\n",
1264
+ " logit = logits[next_token].numpy()\n",
1265
+ " \n",
1266
+ " # Reshape to (1,1) since this is the expected shape for text encoded as TF tensors\n",
1267
+ " next_token = tf.reshape(next_token, shape=(1,1))\n",
1268
+ " \n",
1269
+ " # If next_token is End-of-Sentence token you are done\n",
1270
+ " if next_token == eos_id:\n",
1271
+ " done = True\n",
1272
+ " \n",
1273
+ " return next_token, logit, state, done"
1274
+ ]
1275
+ },
1276
+ {
1277
+ "cell_type": "markdown",
1278
+ "id": "190d2d76",
1279
+ "metadata": {},
1280
+ "source": [
1281
+ "See how it works by running the following cell:"
1282
+ ]
1283
+ },
1284
+ {
1285
+ "cell_type": "code",
1286
+ "execution_count": 83,
1287
+ "id": "9937547a",
1288
+ "metadata": {
1289
+ "deletable": false,
1290
+ "editable": false,
1291
+ "tags": [
1292
+ "graded"
1293
+ ]
1294
+ },
1295
+ "outputs": [
1296
+ {
1297
+ "name": "stdout",
1298
+ "output_type": "stream",
1299
+ "text": [
1300
+ "Next token: [[11484]]\n",
1301
+ "Logit: -18.7833\n",
1302
+ "Done? False\n"
1303
+ ]
1304
+ }
1305
+ ],
1306
+ "source": [
1307
+ "# PROCESS SENTENCE TO TRANSLATE AND ENCODE\n",
1308
+ "\n",
1309
+ "# A sentence you wish to translate\n",
1310
+ "eng_sentence = \"I love languages\"\n",
1311
+ "\n",
1312
+ "# Convert it to a tensor\n",
1313
+ "texts = tf.convert_to_tensor(eng_sentence)[tf.newaxis]\n",
1314
+ "\n",
1315
+ "# Vectorize it and pass it through the encoder\n",
1316
+ "context = english_vectorizer(texts).to_tensor()\n",
1317
+ "context = encoder(context)\n",
1318
+ "\n",
1319
+ "# SET STATE OF THE DECODER\n",
1320
+ "\n",
1321
+ "# Next token is Start-of-Sentence since you are starting fresh\n",
1322
+ "next_token = tf.fill((1,1), sos_id)\n",
1323
+ "\n",
1324
+ "# Hidden and Cell states of the LSTM can be mocked using uniform samples\n",
1325
+ "state = [tf.random.uniform((1, UNITS)), tf.random.uniform((1, UNITS))]\n",
1326
+ "\n",
1327
+ "# You are not done until next token is EOS token\n",
1328
+ "done = False\n",
1329
+ "\n",
1330
+ "# Generate next token\n",
1331
+ "next_token, logit, state, done = generate_next_token(decoder, context, next_token, done, state, temperature=0.5)\n",
1332
+ "print(f\"Next token: {next_token}\\nLogit: {logit:.4f}\\nDone? {done}\")"
1333
+ ]
1334
+ },
1335
+ {
1336
+ "cell_type": "markdown",
1337
+ "id": "170323dd",
1338
+ "metadata": {},
1339
+ "source": [
1340
+ "<a name=\"ex5\"></a>\n",
1341
+ "## Exercise 5 - translate\n",
1342
+ "\n",
1343
+ "Now you can put everything together to translate a given sentence. For this, complete the `translate` function below. This function will take care of the following steps: \n",
1344
+ "- Process the sentence to translate and encode it\n",
1345
+ "\n",
1346
+ "+ Set the initial state of the decoder\n",
1347
+ "\n",
1348
+ "- Get predictions of the next token (starting with the \\<SOS> token) for a maximum of iterations (in case the \\<EOS> token is never returned)\n",
1349
+ " \n",
1350
+ "+ Return the translated text (as a string), the logit of the last iteration (this helps measure how certain was that the sequence was translated in its totality) and the translation in token format.\n",
1351
+ "\n",
1352
+ "\n",
1353
+ "Hints: \n",
1354
+ "\n",
1355
+ "- The previous cell provides a lot of insights on how this function should work, so if you get stuck refer to it.\n",
1356
+ "\n",
1357
+ "+ Some useful docs:\n",
1358
+ " + [tf.newaxis](https://www.tensorflow.org/api_docs/python/tf#newaxis)\n",
1359
+ "\n",
1360
+ " - [tf.fill](https://www.tensorflow.org/api_docs/python/tf/fill)\n",
1361
+ "\n",
1362
+ " + [tf.zeros](https://www.tensorflow.org/api_docs/python/tf/zeros)\n",
1363
+ "\n",
1364
+ "\n",
1365
+ "**IMPORTANT NOTE**: Due to randomness processes involving tensorflow training and weight initializing, the results below may vary a lot, even if you retrain your model in the same session. \n"
1366
+ ]
1367
+ },
1368
+ {
1369
+ "cell_type": "code",
1370
+ "execution_count": 84,
1371
+ "id": "42c74f1f",
1372
+ "metadata": {
1373
+ "deletable": false,
1374
+ "tags": [
1375
+ "graded"
1376
+ ]
1377
+ },
1378
+ "outputs": [],
1379
+ "source": [
1380
+ "# GRADED FUNCTION: translate\n",
1381
+ "def translate(model, text, max_length=50, temperature=0.0):\n",
1382
+ " \"\"\"Translate a given sentence from English to Portuguese\n",
1383
+ "\n",
1384
+ " Args:\n",
1385
+ " model (tf.keras.Model): The trained translator\n",
1386
+ " text (string): The sentence to translate\n",
1387
+ " max_length (int, optional): The maximum length of the translation. Defaults to 50.\n",
1388
+ " temperature (float, optional): The temperature that controls the randomness of the predicted tokens. Defaults to 0.0.\n",
1389
+ "\n",
1390
+ " Returns:\n",
1391
+ " tuple(str, np.float, tf.Tensor): The translation, logit that predicted <EOS> token and the tokenized translation\n",
1392
+ " \"\"\"\n",
1393
+ " # Lists to save tokens and logits\n",
1394
+ " tokens, logits = [], []\n",
1395
+ "\n",
1396
+ " ### START CODE HERE ###\n",
1397
+ " \n",
1398
+ " # PROCESS THE SENTENCE TO TRANSLATE\n",
1399
+ " \n",
1400
+ " # Convert the original string into a tensor\n",
1401
+ " text = tf.convert_to_tensor(text)[tf.newaxis]\n",
1402
+ " \n",
1403
+ " # Vectorize the text using the correct vectorizer\n",
1404
+ " context = english_vectorizer(text).to_tensor()\n",
1405
+ " \n",
1406
+ " # Get the encoded context (pass the context through the encoder)\n",
1407
+ " # Hint: Remember you can get the encoder by using model.encoder\n",
1408
+ " context = model.encoder(context)\n",
1409
+ " \n",
1410
+ " # INITIAL STATE OF THE DECODER\n",
1411
+ " \n",
1412
+ " # First token should be SOS token with shape (1,1)\n",
1413
+ " next_token = tf.fill((1,1), sos_id)\n",
1414
+ " \n",
1415
+ " # Initial hidden and cell states should be tensors of zeros with shape (1, UNITS)\n",
1416
+ " state = [tf.random.uniform((1, UNITS)), tf.random.uniform((1, UNITS))]\n",
1417
+ " \n",
1418
+ " # You are done when you draw a EOS token as next token (initial state is False)\n",
1419
+ " done = False\n",
1420
+ "\n",
1421
+ " # Iterate for max_length iterations\n",
1422
+ " for _ in range(max_length):\n",
1423
+ " # Generate the next token\n",
1424
+ " try:\n",
1425
+ " next_token, logit, state, done = generate_next_token(\n",
1426
+ " decoder=model.decoder,\n",
1427
+ " context=context,\n",
1428
+ " next_token=next_token,\n",
1429
+ " done=done,\n",
1430
+ " state=state,\n",
1431
+ " temperature=temperature\n",
1432
+ " )\n",
1433
+ " except:\n",
1434
+ " raise Exception(\"Problem generating the next token\")\n",
1435
+ " \n",
1436
+ " # If done then break out of the loop\n",
1437
+ " if done:\n",
1438
+ " break\n",
1439
+ " \n",
1440
+ " # Add next_token to the list of tokens\n",
1441
+ " tokens.append(next_token)\n",
1442
+ " \n",
1443
+ " # Add logit to the list of logits\n",
1444
+ " logits.append(logit)\n",
1445
+ " \n",
1446
+ " ### END CODE HERE ###\n",
1447
+ " \n",
1448
+ " # Concatenate all tokens into a tensor\n",
1449
+ " tokens = tf.concat(tokens, axis=-1)\n",
1450
+ " \n",
1451
+ " # Convert the translated tokens into text\n",
1452
+ " translation = tf.squeeze(tokens_to_text(tokens, id_to_word))\n",
1453
+ " translation = translation.numpy().decode()\n",
1454
+ " \n",
1455
+ " return translation, logits[-1], tokens"
1456
+ ]
1457
+ },
1458
+ {
1459
+ "cell_type": "markdown",
1460
+ "id": "3525e8ba",
1461
+ "metadata": {},
1462
+ "source": [
1463
+ "Try your function with temperature of 0, which will yield a deterministic output and is equivalent to a greedy decoding:"
1464
+ ]
1465
+ },
1466
+ {
1467
+ "cell_type": "code",
1468
+ "execution_count": 85,
1469
+ "id": "daaea8c5",
1470
+ "metadata": {
1471
+ "deletable": false,
1472
+ "editable": false,
1473
+ "tags": []
1474
+ },
1475
+ "outputs": [
1476
+ {
1477
+ "name": "stdout",
1478
+ "output_type": "stream",
1479
+ "text": [
1480
+ "Temperature: 0.0\n",
1481
+ "\n",
1482
+ "Original sentence: I love languages\n",
1483
+ "Translation: eu adoro idiomas .\n",
1484
+ "Translation tokens:[[ 9 564 850 4]]\n",
1485
+ "Logit: -0.074\n"
1486
+ ]
1487
+ }
1488
+ ],
1489
+ "source": [
1490
+ "# Running this cell multiple times should return the same output since temp is 0\n",
1491
+ "\n",
1492
+ "temp = 0.0 \n",
1493
+ "original_sentence = \"I love languages\"\n",
1494
+ "\n",
1495
+ "translation, logit, tokens = translate(trained_translator, original_sentence, temperature=temp)\n",
1496
+ "\n",
1497
+ "print(f\"Temperature: {temp}\\n\\nOriginal sentence: {original_sentence}\\nTranslation: {translation}\\nTranslation tokens:{tokens}\\nLogit: {logit:.3f}\")"
1498
+ ]
1499
+ },
1500
+ {
1501
+ "cell_type": "markdown",
1502
+ "id": "7d05129b",
1503
+ "metadata": {},
1504
+ "source": [
1505
+ "Try your function with temperature of 0.7 (stochastic output):"
1506
+ ]
1507
+ },
1508
+ {
1509
+ "cell_type": "code",
1510
+ "execution_count": 86,
1511
+ "id": "0e0697db",
1512
+ "metadata": {
1513
+ "deletable": false,
1514
+ "editable": false,
1515
+ "tags": []
1516
+ },
1517
+ "outputs": [
1518
+ {
1519
+ "name": "stdout",
1520
+ "output_type": "stream",
1521
+ "text": [
1522
+ "Temperature: 0.7\n",
1523
+ "\n",
1524
+ "Original sentence: I love languages\n",
1525
+ "Translation: eu adoro idiomas .\n",
1526
+ "Translation tokens:[[ 9 564 850 4]]\n",
1527
+ "Logit: -0.093\n"
1528
+ ]
1529
+ }
1530
+ ],
1531
+ "source": [
1532
+ "# Running this cell multiple times should return different outputs since temp is not 0\n",
1533
+ "# You can try different temperatures\n",
1534
+ "\n",
1535
+ "temp = 0.7\n",
1536
+ "original_sentence = \"I love languages\"\n",
1537
+ "\n",
1538
+ "translation, logit, tokens = translate(trained_translator, original_sentence, temperature=temp)\n",
1539
+ "\n",
1540
+ "print(f\"Temperature: {temp}\\n\\nOriginal sentence: {original_sentence}\\nTranslation: {translation}\\nTranslation tokens:{tokens}\\nLogit: {logit:.3f}\")"
1541
+ ]
1542
+ },
1543
+ {
1544
+ "cell_type": "code",
1545
+ "execution_count": 87,
1546
+ "id": "a3a9ea35",
1547
+ "metadata": {
1548
+ "deletable": false,
1549
+ "editable": false,
1550
+ "tags": []
1551
+ },
1552
+ "outputs": [
1553
+ {
1554
+ "name": "stdout",
1555
+ "output_type": "stream",
1556
+ "text": [
1557
+ "\u001b[91mFailed test case: translate didn't return the same translation when using temperature of 0.0.\n",
1558
+ "Expected: o meu nome e [UNK] a [UNK] .\n",
1559
+ "Got: , meu nome e [UNK] a [UNK] .\n",
1560
+ "\n",
1561
+ "\u001b[91mFailed test case: translate didn't return the same logit when using temperature of 0.0.\n",
1562
+ "Expected: -0.5501561164855957\n",
1563
+ "Got: -0.6304512619972229\n",
1564
+ "\n",
1565
+ "\u001b[91mFailed test case: translate didn't return the same tokens when using temperature of 0.0.\n",
1566
+ "Expected: [[ 7 43 175 13 1 12 1 4]]\n",
1567
+ "Got: [[ 19 43 175 13 1 12 1 4]]\n",
1568
+ "\n",
1569
+ "\n"
1570
+ ]
1571
+ }
1572
+ ],
1573
+ "source": [
1574
+ "w1_unittest.test_translate(translate, trained_translator)"
1575
+ ]
1576
+ },
1577
+ {
1578
+ "cell_type": "markdown",
1579
+ "id": "ba027524",
1580
+ "metadata": {},
1581
+ "source": [
1582
+ "<a name=\"5\"></a>\n",
1583
+ "## 5. Minimum Bayes-Risk Decoding\n",
1584
+ "\n",
1585
+ "As mentioned in the lectures, getting the most probable token at each step may not necessarily produce the best results. Another approach is to do Minimum Bayes Risk Decoding or MBR. The general steps to implement this are:\n",
1586
+ "\n",
1587
+ "- Take several random samples\n",
1588
+ "+ Score each sample against all other samples\n",
1589
+ "- Select the one with the highest score\n",
1590
+ "\n",
1591
+ "You will be building helper functions for these steps in the following sections.\n",
1592
+ "\n",
1593
+ "With the ability to generate different translations by setting different temperature values you can do what you saw in the lectures and generate a bunch of translations and then determine which one is the best candidate. You will now do this by using the provided `generate_samples` function. This function will return any desired number of candidate translations alongside the log-probability for each one:"
1594
+ ]
1595
+ },
1596
+ {
1597
+ "cell_type": "code",
1598
+ "execution_count": 88,
1599
+ "id": "62301cd5",
1600
+ "metadata": {
1601
+ "deletable": false,
1602
+ "editable": false,
1603
+ "tags": [
1604
+ "graded"
1605
+ ]
1606
+ },
1607
+ "outputs": [],
1608
+ "source": [
1609
+ "def generate_samples(model, text, n_samples=4, temperature=0.6):\n",
1610
+ " \n",
1611
+ " samples, log_probs = [], []\n",
1612
+ "\n",
1613
+ " # Iterate for n_samples iterations\n",
1614
+ " for _ in range(n_samples):\n",
1615
+ " \n",
1616
+ " # Save the logit and the translated tensor\n",
1617
+ " _, logp, sample = translate(model, text, temperature=temperature)\n",
1618
+ " \n",
1619
+ " # Save the translated tensors\n",
1620
+ " samples.append(np.squeeze(sample.numpy()).tolist())\n",
1621
+ " \n",
1622
+ " # Save the logits\n",
1623
+ " log_probs.append(logp)\n",
1624
+ " \n",
1625
+ " return samples, log_probs"
1626
+ ]
1627
+ },
1628
+ {
1629
+ "cell_type": "code",
1630
+ "execution_count": 89,
1631
+ "id": "06bd792c",
1632
+ "metadata": {
1633
+ "deletable": false,
1634
+ "editable": false,
1635
+ "tags": []
1636
+ },
1637
+ "outputs": [
1638
+ {
1639
+ "name": "stdout",
1640
+ "output_type": "stream",
1641
+ "text": [
1642
+ "Translated tensor: [9, 81, 850, 4] has logit: -0.080\n",
1643
+ "Translated tensor: 4 has logit: -0.677\n",
1644
+ "Translated tensor: [9, 98, 11, 850, 4] has logit: -0.063\n",
1645
+ "Translated tensor: [9, 564, 850, 4] has logit: -0.110\n"
1646
+ ]
1647
+ }
1648
+ ],
1649
+ "source": [
1650
+ "samples, log_probs = generate_samples(trained_translator, 'I love languages')\n",
1651
+ "\n",
1652
+ "for s, l in zip(samples, log_probs):\n",
1653
+ " print(f\"Translated tensor: {s} has logit: {l:.3f}\")"
1654
+ ]
1655
+ },
1656
+ {
1657
+ "cell_type": "markdown",
1658
+ "id": "29b10677",
1659
+ "metadata": {},
1660
+ "source": [
1661
+ "## Comparing overlaps\n",
1662
+ "\n",
1663
+ "Now that you can generate multiple translations it is time to come up with a method to measure the goodness of each one. As you saw in the lectures, one way to achieve this is by comparing each sample against the others. \n",
1664
+ "\n",
1665
+ "There are several metrics you can use for this purpose, as shown in the lectures and you can try experimenting with any one of these. For this assignment, you will be calculating scores for **unigram overlaps**. \n",
1666
+ "\n",
1667
+ "One of these metrics is the widely used yet simple [Jaccard similarity](https://en.wikipedia.org/wiki/Jaccard_index) which gets the intersection over union of two sets. The `jaccard_similarity` function returns this metric for any pair of candidate and reference translations:\n"
1668
+ ]
1669
+ },
1670
+ {
1671
+ "cell_type": "code",
1672
+ "execution_count": 90,
1673
+ "id": "edb54a71",
1674
+ "metadata": {
1675
+ "deletable": false,
1676
+ "editable": false,
1677
+ "tags": [
1678
+ "graded"
1679
+ ]
1680
+ },
1681
+ "outputs": [],
1682
+ "source": [
1683
+ "def jaccard_similarity(candidate, reference):\n",
1684
+ " \n",
1685
+ " # Convert the lists to sets to get the unique tokens\n",
1686
+ " candidate_set = set(candidate)\n",
1687
+ " reference_set = set(reference)\n",
1688
+ " \n",
1689
+ " # Get the set of tokens common to both candidate and reference\n",
1690
+ " common_tokens = candidate_set.intersection(reference_set)\n",
1691
+ " \n",
1692
+ " # Get the set of all tokens found in either candidate or reference\n",
1693
+ " all_tokens = candidate_set.union(reference_set)\n",
1694
+ " \n",
1695
+ " # Compute the percentage of overlap (divide the number of common tokens by the number of all tokens)\n",
1696
+ " overlap = len(common_tokens) / len(all_tokens)\n",
1697
+ " \n",
1698
+ " return overlap"
1699
+ ]
1700
+ },
1701
+ {
1702
+ "cell_type": "code",
1703
+ "execution_count": 91,
1704
+ "id": "fc3384bf",
1705
+ "metadata": {
1706
+ "deletable": false,
1707
+ "editable": false,
1708
+ "tags": [
1709
+ "graded"
1710
+ ]
1711
+ },
1712
+ "outputs": [
1713
+ {
1714
+ "name": "stdout",
1715
+ "output_type": "stream",
1716
+ "text": [
1717
+ "jaccard similarity between lists: [1, 2, 3] and [1, 2, 3, 4] is 0.750\n"
1718
+ ]
1719
+ }
1720
+ ],
1721
+ "source": [
1722
+ "l1 = [1, 2, 3]\n",
1723
+ "l2 = [1, 2, 3, 4]\n",
1724
+ "\n",
1725
+ "js = jaccard_similarity(l1, l2)\n",
1726
+ "\n",
1727
+ "print(f\"jaccard similarity between lists: {l1} and {l2} is {js:.3f}\")"
1728
+ ]
1729
+ },
1730
+ {
1731
+ "cell_type": "markdown",
1732
+ "id": "a6997662",
1733
+ "metadata": {},
1734
+ "source": [
1735
+ "##### __Expected Output__\n",
1736
+ "\n",
1737
+ "```\n",
1738
+ "jaccard similarity between tensors: [1, 2, 3] and [1, 2, 3, 4] is 0.750\n",
1739
+ "\n",
1740
+ "```"
1741
+ ]
1742
+ },
1743
+ {
1744
+ "cell_type": "markdown",
1745
+ "id": "b2510e3d",
1746
+ "metadata": {},
1747
+ "source": [
1748
+ "<a name=\"ex6\"></a>\n",
1749
+ "## Exercise 6 - rouge1_similarity\n",
1750
+ "\n",
1751
+ "Jaccard similarity is good but a more commonly used metric in machine translation is the ROUGE score. For unigrams, this is called ROUGE-1 and as shown in the lectures, you can output the scores for both precision and recall when comparing two samples. To get the final score, you will want to compute the F1-score as given by:\n",
1752
+ "\n",
1753
+ "$$score = 2* \\frac{(precision * recall)}{(precision + recall)}$$\n",
1754
+ "\n",
1755
+ "For the implementation of the `rouge1_similarity` function you want to use the [Counter](https://docs.python.org/3/library/collections.html#collections.Counter) class from the Python standard library:"
1756
+ ]
1757
+ },
1758
+ {
1759
+ "cell_type": "code",
1760
+ "execution_count": 92,
1761
+ "id": "fb2e0a00",
1762
+ "metadata": {
1763
+ "deletable": false,
1764
+ "tags": [
1765
+ "graded"
1766
+ ]
1767
+ },
1768
+ "outputs": [],
1769
+ "source": [
1770
+ "# GRADED FUNCTION: rouge1_similarity\n",
1771
+ "def rouge1_similarity(candidate, reference):\n",
1772
+ " \"\"\"Computes the ROUGE 1 score between two token lists\n",
1773
+ "\n",
1774
+ " Args:\n",
1775
+ " candidate (list[int]): Tokenized candidate translation\n",
1776
+ " reference (list[int]): Tokenized reference translation\n",
1777
+ "\n",
1778
+ " Returns:\n",
1779
+ " float: Overlap between the two token lists\n",
1780
+ " \"\"\"\n",
1781
+ " ### START CODE HERE ###\n",
1782
+ " \n",
1783
+ " # Make a frequency table of the candidate and reference tokens\n",
1784
+ " # Hint: use the Counter class (already imported)\n",
1785
+ " candidate_word_counts = Counter(candidate)\n",
1786
+ " reference_word_counts = Counter(reference)\n",
1787
+ " \n",
1788
+ " # Initialize overlap at 0\n",
1789
+ " overlap = 0\n",
1790
+ " \n",
1791
+ " # Iterate over the tokens in the candidate frequency table\n",
1792
+ " # Hint: Counter is a subclass of dict and you can get the keys \n",
1793
+ " # out of a dict using the keys method like this: dict.keys()\n",
1794
+ " for token in candidate_word_counts.keys():\n",
1795
+ " \n",
1796
+ " # Get the count of the current token in the candidate frequency table\n",
1797
+ " # Hint: You can access the counts of a token as you would access values of a dictionary\n",
1798
+ " token_count_candidate = candidate_word_counts[token]\n",
1799
+ " \n",
1800
+ " # Get the count of the current token in the reference frequency table\n",
1801
+ " # Hint: You can access the counts of a token as you would access values of a dictionary\n",
1802
+ " token_count_reference = reference_word_counts.get(token, 0)\n",
1803
+ " \n",
1804
+ " # Update the overlap by getting the minimum between the two token counts above\n",
1805
+ " overlap += min(token_count_candidate, token_count_reference)\n",
1806
+ " \n",
1807
+ " # Compute the precision\n",
1808
+ " # Hint: precision = overlap / (number of tokens in candidate list) \n",
1809
+ " precision = overlap / len(candidate) if len(candidate) > 0 else 0\n",
1810
+ " \n",
1811
+ " # Compute the recall\n",
1812
+ " # Hint: recall = overlap / (number of tokens in reference list) \n",
1813
+ " recall = overlap / len(reference) if len(reference) > 0 else 0\n",
1814
+ " \n",
1815
+ " if precision + recall != 0:\n",
1816
+ " # Compute the Rouge1 Score\n",
1817
+ " # Hint: This is equivalent to the F1 score\n",
1818
+ " f1_score = 2 * (precision * recall) / (precision + recall)\n",
1819
+ " \n",
1820
+ " return f1_score\n",
1821
+ " \n",
1822
+ " ### END CODE HERE ###\n",
1823
+ " \n",
1824
+ " return 0 # If precision + recall = 0 then return 0"
1825
+ ]
1826
+ },
1827
+ {
1828
+ "cell_type": "code",
1829
+ "execution_count": 93,
1830
+ "id": "14bb5295",
1831
+ "metadata": {
1832
+ "deletable": false,
1833
+ "editable": false,
1834
+ "tags": [
1835
+ "graded"
1836
+ ]
1837
+ },
1838
+ "outputs": [
1839
+ {
1840
+ "name": "stdout",
1841
+ "output_type": "stream",
1842
+ "text": [
1843
+ "rouge 1 similarity between lists: [1, 2, 3] and [1, 2, 3, 4] is 0.857\n"
1844
+ ]
1845
+ }
1846
+ ],
1847
+ "source": [
1848
+ "l1 = [1, 2, 3]\n",
1849
+ "l2 = [1, 2, 3, 4]\n",
1850
+ "\n",
1851
+ "r1s = rouge1_similarity(l1, l2)\n",
1852
+ "\n",
1853
+ "print(f\"rouge 1 similarity between lists: {l1} and {l2} is {r1s:.3f}\")"
1854
+ ]
1855
+ },
1856
+ {
1857
+ "cell_type": "markdown",
1858
+ "id": "afb8c61a",
1859
+ "metadata": {},
1860
+ "source": [
1861
+ "##### __Expected Output__\n",
1862
+ "\n",
1863
+ "```\n",
1864
+ "rouge 1 similarity between lists: [1, 2, 3] and [1, 2, 3, 4] is 0.857\n",
1865
+ "\n",
1866
+ "```"
1867
+ ]
1868
+ },
1869
+ {
1870
+ "cell_type": "code",
1871
+ "execution_count": 94,
1872
+ "id": "a680132e",
1873
+ "metadata": {
1874
+ "deletable": false,
1875
+ "editable": false,
1876
+ "tags": []
1877
+ },
1878
+ "outputs": [
1879
+ {
1880
+ "name": "stdout",
1881
+ "output_type": "stream",
1882
+ "text": [
1883
+ "\u001b[92m All tests passed!\n"
1884
+ ]
1885
+ }
1886
+ ],
1887
+ "source": [
1888
+ "w1_unittest.test_rouge1_similarity(rouge1_similarity)"
1889
+ ]
1890
+ },
1891
+ {
1892
+ "cell_type": "markdown",
1893
+ "id": "aaf8a058",
1894
+ "metadata": {},
1895
+ "source": [
1896
+ "## Computing the Overall Score\n",
1897
+ "\n",
1898
+ "\n",
1899
+ "You will now build a function to generate the overall score for a particular sample. As mentioned in the lectures, you need to compare each sample with all other samples. For instance, if we generated 30 sentences, we will need to compare sentence 1 to sentences 2 through 30. Then, we compare sentence 2 to sentences 1 and 3 through 30, and so forth. At each step, we get the average score of all comparisons to get the overall score for a particular sample. To illustrate, these will be the steps to generate the scores of a 4-sample list.\n",
1900
+ "\n",
1901
+ "- Get similarity score between sample 1 and sample 2\n",
1902
+ "+ Get similarity score between sample 1 and sample 3\n",
1903
+ "- Get similarity score between sample 1 and sample 4\n",
1904
+ "+ Get average score of the first 3 steps. This will be the overall score of sample 1\n",
1905
+ "- Iterate and repeat until samples 1 to 4 have overall scores.\n",
1906
+ "\n",
1907
+ "\n",
1908
+ "The results will be stored in a dictionary for easy lookups.\n",
1909
+ "\n",
1910
+ "<a name=\"ex7\"></a>\n",
1911
+ "## Exercise 7 - average_overlap\n",
1912
+ "\n",
1913
+ "Complete the `average_overlap` function below which should implement the process described above:"
1914
+ ]
1915
+ },
1916
+ {
1917
+ "cell_type": "code",
1918
+ "execution_count": 95,
1919
+ "id": "142264ff",
1920
+ "metadata": {
1921
+ "deletable": false,
1922
+ "tags": [
1923
+ "graded"
1924
+ ]
1925
+ },
1926
+ "outputs": [],
1927
+ "source": [
1928
+ "# GRADED FUNCTION: average_overlap\n",
1929
+ "def average_overlap(samples, similarity_fn):\n",
1930
+ " \"\"\"Computes the arithmetic mean of each candidate sentence in the samples\n",
1931
+ "\n",
1932
+ " Args:\n",
1933
+ " samples (list[list[int]]): Tokenized version of translated sentences\n",
1934
+ " similarity_fn (Function): Similarity function used to compute the overlap\n",
1935
+ "\n",
1936
+ " Returns:\n",
1937
+ " dict[int, float]: A dictionary mapping the index of each translation to its score\n",
1938
+ " \"\"\"\n",
1939
+ " # Initialize dictionary\n",
1940
+ " scores = {}\n",
1941
+ " \n",
1942
+ " # Iterate through all samples (enumerate helps keep track of indexes)\n",
1943
+ " for index_candidate, candidate in enumerate(samples): \n",
1944
+ " \n",
1945
+ " ### START CODE HERE ###\n",
1946
+ " \n",
1947
+ " # Initially overlap is zero\n",
1948
+ " overlap = 0.0\n",
1949
+ " \n",
1950
+ " # Iterate through all samples (enumerate helps keep track of indexes)\n",
1951
+ " for index_sample, sample in enumerate(samples):\n",
1952
+ "\n",
1953
+ " # Skip if the candidate index is the same as the sample index\n",
1954
+ " if index_candidate == index_sample:\n",
1955
+ " continue\n",
1956
+ " \n",
1957
+ " # Get the overlap between candidate and sample using the similarity function\n",
1958
+ " sample_overlap = similarity_fn(candidate, sample)\n",
1959
+ " \n",
1960
+ " # Add the sample overlap to the total overlap\n",
1961
+ " overlap += sample_overlap\n",
1962
+ "\n",
1963
+ " ### END CODE HERE ###\n",
1964
+ " \n",
1965
+ " # Get the score for the candidate by computing the average\n",
1966
+ " score = overlap / (len(samples) - 1)\n",
1967
+ "\n",
1968
+ " # Only use 3 decimal points\n",
1969
+ " score = round(score, 3)\n",
1970
+ " \n",
1971
+ " # Save the score in the dictionary. use index as the key.\n",
1972
+ " scores[index_candidate] = score\n",
1973
+ " \n",
1974
+ " return scores"
1975
+ ]
1976
+ },
1977
+ {
1978
+ "cell_type": "code",
1979
+ "execution_count": 96,
1980
+ "id": "f36cf403",
1981
+ "metadata": {
1982
+ "deletable": false,
1983
+ "editable": false,
1984
+ "tags": [
1985
+ "graded"
1986
+ ]
1987
+ },
1988
+ "outputs": [
1989
+ {
1990
+ "name": "stdout",
1991
+ "output_type": "stream",
1992
+ "text": [
1993
+ "average overlap between lists: [1, 2, 3], [1, 2, 4] and [1, 2, 4, 5] using Jaccard similarity is:\n",
1994
+ "\n",
1995
+ "{0: 0.45, 1: 0.625, 2: 0.575}\n"
1996
+ ]
1997
+ }
1998
+ ],
1999
+ "source": [
2000
+ "# Test with Jaccard similarity\n",
2001
+ "\n",
2002
+ "l1 = [1, 2, 3]\n",
2003
+ "l2 = [1, 2, 4]\n",
2004
+ "l3 = [1, 2, 4, 5]\n",
2005
+ "\n",
2006
+ "avg_ovlp = average_overlap([l1, l2, l3], jaccard_similarity)\n",
2007
+ "\n",
2008
+ "print(f\"average overlap between lists: {l1}, {l2} and {l3} using Jaccard similarity is:\\n\\n{avg_ovlp}\")"
2009
+ ]
2010
+ },
2011
+ {
2012
+ "cell_type": "markdown",
2013
+ "id": "e277aed2-a5c9-4ed0-9ee2-614939f2df7b",
2014
+ "metadata": {},
2015
+ "source": [
2016
+ "##### __Expected Output__\n",
2017
+ "\n",
2018
+ "```\n",
2019
+ "average overlap between lists: [1, 2, 3], [1, 2, 4] and [1, 2, 4, 5] using Jaccard similarity is:\n",
2020
+ "\n",
2021
+ "{0: 0.45, 1: 0.625, 2: 0.575}\n",
2022
+ "```"
2023
+ ]
2024
+ },
2025
+ {
2026
+ "cell_type": "code",
2027
+ "execution_count": 97,
2028
+ "id": "d961a304-7c03-4ecb-ba5f-c8747ed3ec39",
2029
+ "metadata": {
2030
+ "deletable": false,
2031
+ "editable": false,
2032
+ "tags": [
2033
+ "graded"
2034
+ ]
2035
+ },
2036
+ "outputs": [
2037
+ {
2038
+ "name": "stdout",
2039
+ "output_type": "stream",
2040
+ "text": [
2041
+ "average overlap between lists: [1, 2, 3], [1, 4], [1, 2, 4, 5] and [5, 6] using Rouge1 similarity is:\n",
2042
+ "\n",
2043
+ "{0: 0.324, 1: 0.356, 2: 0.524, 3: 0.111}\n"
2044
+ ]
2045
+ }
2046
+ ],
2047
+ "source": [
2048
+ "# Test with Rouge1 similarity\n",
2049
+ "\n",
2050
+ "l1 = [1, 2, 3]\n",
2051
+ "l2 = [1, 4]\n",
2052
+ "l3 = [1, 2, 4, 5]\n",
2053
+ "l4 = [5,6]\n",
2054
+ "\n",
2055
+ "avg_ovlp = average_overlap([l1, l2, l3, l4], rouge1_similarity)\n",
2056
+ "\n",
2057
+ "print(f\"average overlap between lists: {l1}, {l2}, {l3} and {l4} using Rouge1 similarity is:\\n\\n{avg_ovlp}\")"
2058
+ ]
2059
+ },
2060
+ {
2061
+ "cell_type": "markdown",
2062
+ "id": "30adc749-ffcb-4e82-a8f0-c04a7e39da0a",
2063
+ "metadata": {},
2064
+ "source": [
2065
+ "##### __Expected Output__\n",
2066
+ "\n",
2067
+ "```\n",
2068
+ "average overlap between lists: [1, 2, 3], [1, 4], [1, 2, 4, 5] and [5, 6] using Rouge1 similarity is:\n",
2069
+ "\n",
2070
+ "{0: 0.324, 1: 0.356, 2: 0.524, 3: 0.111}\n",
2071
+ "```"
2072
+ ]
2073
+ },
2074
+ {
2075
+ "cell_type": "code",
2076
+ "execution_count": 98,
2077
+ "id": "c41b1fba-fd0f-41e6-9b07-746f64030fe3",
2078
+ "metadata": {
2079
+ "deletable": false,
2080
+ "editable": false,
2081
+ "tags": []
2082
+ },
2083
+ "outputs": [
2084
+ {
2085
+ "name": "stdout",
2086
+ "output_type": "stream",
2087
+ "text": [
2088
+ "\u001b[92m All tests passed!\n"
2089
+ ]
2090
+ }
2091
+ ],
2092
+ "source": [
2093
+ "w1_unittest.test_average_overlap(average_overlap)"
2094
+ ]
2095
+ },
2096
+ {
2097
+ "cell_type": "markdown",
2098
+ "id": "e4482249",
2099
+ "metadata": {},
2100
+ "source": [
2101
+ "In practice, it is also common to see the weighted mean being used to calculate the overall score instead of just the arithmetic mean. This is implemented in the `weighted_avg_overlap` function below and you can use it in your experiments to see which one will give better results:"
2102
+ ]
2103
+ },
2104
+ {
2105
+ "cell_type": "code",
2106
+ "execution_count": 99,
2107
+ "id": "398714be",
2108
+ "metadata": {
2109
+ "deletable": false,
2110
+ "editable": false,
2111
+ "tags": [
2112
+ "graded"
2113
+ ]
2114
+ },
2115
+ "outputs": [],
2116
+ "source": [
2117
+ "def weighted_avg_overlap(samples, log_probs, similarity_fn):\n",
2118
+ " \n",
2119
+ " # Scores dictionary\n",
2120
+ " scores = {}\n",
2121
+ " \n",
2122
+ " # Iterate over the samples\n",
2123
+ " for index_candidate, candidate in enumerate(samples): \n",
2124
+ " \n",
2125
+ " # Initialize overlap and weighted sum\n",
2126
+ " overlap, weight_sum = 0.0, 0.0\n",
2127
+ " \n",
2128
+ " # Iterate over all samples and log probabilities\n",
2129
+ " for index_sample, (sample, logp) in enumerate(zip(samples, log_probs)):\n",
2130
+ "\n",
2131
+ " # Skip if the candidate index is the same as the sample index \n",
2132
+ " if index_candidate == index_sample:\n",
2133
+ " continue\n",
2134
+ " \n",
2135
+ " # Convert log probability to linear scale\n",
2136
+ " sample_p = float(np.exp(logp))\n",
2137
+ "\n",
2138
+ " # Update the weighted sum\n",
2139
+ " weight_sum += sample_p\n",
2140
+ "\n",
2141
+ " # Get the unigram overlap between candidate and sample\n",
2142
+ " sample_overlap = similarity_fn(candidate, sample)\n",
2143
+ " \n",
2144
+ " # Update the overlap\n",
2145
+ " overlap += sample_p * sample_overlap\n",
2146
+ " \n",
2147
+ " # Compute the score for the candidate\n",
2148
+ " score = overlap / weight_sum\n",
2149
+ "\n",
2150
+ " # Only use 3 decimal points\n",
2151
+ " score = round(score, 3)\n",
2152
+ " \n",
2153
+ " # Save the score in the dictionary. use index as the key.\n",
2154
+ " scores[index_candidate] = score\n",
2155
+ " \n",
2156
+ " return scores"
2157
+ ]
2158
+ },
2159
+ {
2160
+ "cell_type": "code",
2161
+ "execution_count": 100,
2162
+ "id": "e3dfd6d3",
2163
+ "metadata": {
2164
+ "deletable": false,
2165
+ "editable": false,
2166
+ "tags": [
2167
+ "graded"
2168
+ ]
2169
+ },
2170
+ "outputs": [
2171
+ {
2172
+ "name": "stdout",
2173
+ "output_type": "stream",
2174
+ "text": [
2175
+ "weighted average overlap using Jaccard similarity is:\n",
2176
+ "\n",
2177
+ "{0: 0.443, 1: 0.631, 2: 0.558}\n"
2178
+ ]
2179
+ }
2180
+ ],
2181
+ "source": [
2182
+ "l1 = [1, 2, 3]\n",
2183
+ "l2 = [1, 2, 4]\n",
2184
+ "l3 = [1, 2, 4, 5]\n",
2185
+ "log_probs = [0.4, 0.2, 0.5]\n",
2186
+ "\n",
2187
+ "w_avg_ovlp = weighted_avg_overlap([l1, l2, l3], log_probs, jaccard_similarity)\n",
2188
+ "\n",
2189
+ "print(f\"weighted average overlap using Jaccard similarity is:\\n\\n{w_avg_ovlp}\")"
2190
+ ]
2191
+ },
2192
+ {
2193
+ "cell_type": "markdown",
2194
+ "id": "cdb0b4db",
2195
+ "metadata": {},
2196
+ "source": [
2197
+ "## mbr_decode\n",
2198
+ "\n",
2199
+ "You will now put everything together in the the `mbr_decode` function below. This final step is not graded as this function is just a wrapper around all the cool stuff you have coded so far! \n",
2200
+ "\n",
2201
+ "You can use it to play around, trying different numbers of samples, temperatures and similarity functions!"
2202
+ ]
2203
+ },
2204
+ {
2205
+ "cell_type": "code",
2206
+ "execution_count": 101,
2207
+ "id": "6fcfa640",
2208
+ "metadata": {
2209
+ "deletable": false,
2210
+ "editable": false,
2211
+ "tags": [
2212
+ "graded"
2213
+ ]
2214
+ },
2215
+ "outputs": [],
2216
+ "source": [
2217
+ "def mbr_decode(model, text, n_samples=5, temperature=0.6, similarity_fn=jaccard_similarity):\n",
2218
+ " \n",
2219
+ " # Generate samples\n",
2220
+ " samples, log_probs = generate_samples(model, text, n_samples=n_samples, temperature=temperature)\n",
2221
+ " \n",
2222
+ " # Compute the overlap scores\n",
2223
+ " scores = weighted_avg_overlap(samples, log_probs, similarity_fn)\n",
2224
+ "\n",
2225
+ " # Decode samples\n",
2226
+ " decoded_translations = [tokens_to_text(s, id_to_word).numpy().decode('utf-8') for s in samples]\n",
2227
+ " \n",
2228
+ " # Find the key with the highest score\n",
2229
+ " max_score_key = max(scores, key=lambda k: scores[k])\n",
2230
+ " \n",
2231
+ " # Get the translation \n",
2232
+ " translation = decoded_translations[max_score_key]\n",
2233
+ " \n",
2234
+ " return translation, decoded_translations"
2235
+ ]
2236
+ },
2237
+ {
2238
+ "cell_type": "code",
2239
+ "execution_count": 102,
2240
+ "id": "99507fcc-7727-45e7-933b-d3a08034f731",
2241
+ "metadata": {
2242
+ "deletable": false,
2243
+ "editable": false,
2244
+ "tags": []
2245
+ },
2246
+ "outputs": [
2247
+ {
2248
+ "name": "stdout",
2249
+ "output_type": "stream",
2250
+ "text": [
2251
+ "Translation candidates:\n",
2252
+ "eu adoro idiomas .\n",
2253
+ "eu adoro idiomas .\n",
2254
+ "eu sinto idiomas .\n",
2255
+ "eu adoro idiomas .\n",
2256
+ "eu adoro idiomas .\n",
2257
+ "eu adoro idiomas .\n",
2258
+ "eu adoro idiomas .\n",
2259
+ "eu adoro idiomas .\n",
2260
+ "eu adoro idiomas .\n",
2261
+ "eu adoro idiomas .\n",
2262
+ "\n",
2263
+ "Selected translation: eu adoro idiomas .\n"
2264
+ ]
2265
+ }
2266
+ ],
2267
+ "source": [
2268
+ "english_sentence = \"I love languages\"\n",
2269
+ "\n",
2270
+ "translation, candidates = mbr_decode(trained_translator, english_sentence, n_samples=10, temperature=0.6)\n",
2271
+ "\n",
2272
+ "print(\"Translation candidates:\")\n",
2273
+ "for c in candidates:\n",
2274
+ " print(c)\n",
2275
+ "\n",
2276
+ "print(f\"\\nSelected translation: {translation}\")"
2277
+ ]
2278
+ },
2279
+ {
2280
+ "cell_type": "markdown",
2281
+ "id": "801b193f-4ea6-4ca1-ae29-a506cce656d9",
2282
+ "metadata": {},
2283
+ "source": [
2284
+ "**Congratulations!** Next week, you'll dive deeper into attention models and study the Transformer architecture. You will build another network but without the recurrent part. It will show that attention is all you need! It should be fun!\n",
2285
+ "\n",
2286
+ "**Keep up the good work!**"
2287
+ ]
2288
+ }
2289
+ ],
2290
+ "metadata": {
2291
+ "grader_version": "1",
2292
+ "kernelspec": {
2293
+ "display_name": "Python 3 (ipykernel)",
2294
+ "language": "python",
2295
+ "name": "python3"
2296
+ },
2297
+ "language_info": {
2298
+ "codemirror_mode": {
2299
+ "name": "ipython",
2300
+ "version": 3
2301
+ },
2302
+ "file_extension": ".py",
2303
+ "mimetype": "text/x-python",
2304
+ "name": "python",
2305
+ "nbconvert_exporter": "python",
2306
+ "pygments_lexer": "ipython3",
2307
+ "version": "3.8.10"
2308
+ }
2309
+ },
2310
+ "nbformat": 4,
2311
+ "nbformat_minor": 5
2312
+ }
NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/__pycache__/ult.cpython-38.pyc ADDED
Binary file (4 kB). View file
 
NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/__pycache__/utils.cpython-311.pyc ADDED
Binary file (6.48 kB). View file
 
NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/__pycache__/utils.cpython-38.pyc ADDED
Binary file (3.26 kB). View file
 
NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/__pycache__/w1_unittest.cpython-311.pyc ADDED
Binary file (24.8 kB). View file
 
NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/__pycache__/w1_unittest.cpython-37.pyc ADDED
Binary file (19.9 kB). View file
 
NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/__pycache__/w1_unittest.cpython-38.pyc ADDED
Binary file (13.2 kB). View file
 
NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/images/NMTModel.png ADDED

Git LFS Details

  • SHA256: f9b251a60aedde3c2140fd11a35c464974072ad3797302a04904c7925f11e16e
  • Pointer size: 131 Bytes
  • Size of remote file: 131 kB
NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/images/att.png ADDED

Git LFS Details

  • SHA256: 8e31fd29f7b79a45a65bae5b8a355857dd67bfcc5e01ea2d5f9cefa08e393044
  • Pointer size: 131 Bytes
  • Size of remote file: 135 kB
NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/images/attention.png ADDED

Git LFS Details

  • SHA256: dda2f8d5bd98a195202b059c41088d33cbc430a9d5e5ad4bf8df4b4c205ed8a6
  • Pointer size: 131 Bytes
  • Size of remote file: 245 kB
NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/images/attention_overview.png ADDED
NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/images/input_encoder.png ADDED
NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/images/plain_rnn.png ADDED
NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/images/pre_attention_decoder.png ADDED
NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/por-eng/por.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5307326410edc8f65c3d0213cf3e5544e41c400efed7fe9ba557c8847a6ee803
3
+ size 27856622
NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/utils.py ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ import tensorflow as tf
3
+ import tensorflow_text as tf_text
4
+ import pathlib
5
+
6
+ path_to_file = pathlib.Path("por-eng/por.txt")
7
+
8
+ np.random.seed(1234)
9
+ tf.random.set_seed(1234)
10
+
11
+ def load_data(path):
12
+ text = path.read_text(encoding="utf-8")
13
+
14
+ lines = text.splitlines()
15
+ pairs = [line.split("\t") for line in lines]
16
+
17
+ context = np.array([context for target, context, _ in pairs])
18
+ target = np.array([target for target, context, _ in pairs])
19
+
20
+ return context, target
21
+
22
+
23
+ portuguese_sentences, english_sentences = load_data(path_to_file)
24
+
25
+ sentences = (portuguese_sentences, english_sentences)
26
+
27
+ BUFFER_SIZE = len(english_sentences)
28
+ BATCH_SIZE = 64
29
+
30
+ is_train = np.random.uniform(size=(len(portuguese_sentences),)) < 0.8
31
+
32
+ train_raw = (
33
+ tf.data.Dataset.from_tensor_slices(
34
+ (english_sentences[is_train], portuguese_sentences[is_train])
35
+ )
36
+ .shuffle(BUFFER_SIZE)
37
+ .batch(BATCH_SIZE)
38
+ )
39
+ val_raw = (
40
+ tf.data.Dataset.from_tensor_slices(
41
+ (english_sentences[~is_train], portuguese_sentences[~is_train])
42
+ )
43
+ .shuffle(BUFFER_SIZE)
44
+ .batch(BATCH_SIZE)
45
+ )
46
+
47
+
48
+ def tf_lower_and_split_punct(text):
49
+ text = tf_text.normalize_utf8(text, "NFKD")
50
+ text = tf.strings.lower(text)
51
+ text = tf.strings.regex_replace(text, "[^ a-z.?!,¿]", "")
52
+ text = tf.strings.regex_replace(text, "[.?!,¿]", r" \0 ")
53
+ text = tf.strings.strip(text)
54
+ text = tf.strings.join(["[SOS]", text, "[EOS]"], separator=" ")
55
+ return text
56
+
57
+
58
+ max_vocab_size = 12000
59
+
60
+ english_vectorizer = tf.keras.layers.TextVectorization(
61
+ standardize=tf_lower_and_split_punct, max_tokens=max_vocab_size, ragged=True
62
+ )
63
+
64
+ english_vectorizer.adapt(train_raw.map(lambda context, target: context))
65
+
66
+ portuguese_vectorizer = tf.keras.layers.TextVectorization(
67
+ standardize=tf_lower_and_split_punct, max_tokens=max_vocab_size, ragged=True
68
+ )
69
+
70
+ portuguese_vectorizer.adapt(train_raw.map(lambda context, target: target))
71
+
72
+
73
+ def process_text(context, target):
74
+ context = english_vectorizer(context).to_tensor()
75
+ target = portuguese_vectorizer(target)
76
+ targ_in = target[:, :-1].to_tensor()
77
+ targ_out = target[:, 1:].to_tensor()
78
+ return (context, targ_in), targ_out
79
+
80
+
81
+ train_data = train_raw.map(process_text, tf.data.AUTOTUNE)
82
+ val_data = val_raw.map(process_text, tf.data.AUTOTUNE)
83
+
84
+ del train_raw
85
+ del val_raw
86
+
87
+
88
+ def masked_loss(y_true, y_pred):
89
+
90
+ loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction='none')
91
+ loss = loss_fn(y_true, y_pred)
92
+
93
+ # Check which elements of y_true are padding
94
+ mask = tf.cast(y_true != 0, loss.dtype)
95
+
96
+ loss *= mask
97
+ # Return the total.
98
+ return tf.reduce_sum(loss)/tf.reduce_sum(mask)
99
+
100
+
101
+ def masked_acc(y_true, y_pred):
102
+ y_pred = tf.argmax(y_pred, axis=-1)
103
+ y_pred = tf.cast(y_pred, y_true.dtype)
104
+ match = tf.cast(y_true == y_pred, tf.float32)
105
+ mask = tf.cast(y_true != 0, tf.float32)
106
+ match*= mask
107
+
108
+ return tf.reduce_sum(match)/tf.reduce_sum(mask)
109
+
110
+
111
+ def tokens_to_text(tokens, id_to_word):
112
+ words = id_to_word(tokens)
113
+ result = tf.strings.reduce_join(words, axis=-1, separator=" ")
114
+ return result
NLP with Attention Models/NMT_with_Attention/NMT with MBR/Files/tf/w1_unittest.py ADDED
@@ -0,0 +1,702 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import math
2
+ from itertools import combinations
3
+ import tensorflow as tf
4
+ import numpy as np
5
+ from dlai_grader.grading import test_case, print_feedback
6
+ from utils import train_data
7
+
8
+ VOCAB_SIZE = 12000
9
+ UNITS = 256
10
+
11
+
12
+ def test_encoder(encoder_to_test):
13
+ def g():
14
+ vocab_sizes = [5, 20, 1000, 15000]
15
+ units = [32, 64, 256, 512]
16
+
17
+ cases = []
18
+
19
+ encoder = encoder_to_test(vocab_sizes[0], units[0])
20
+
21
+ t = test_case()
22
+ if encoder.embedding.mask_zero != True:
23
+ t.failed = True
24
+ t.msg = "Embedding layer has incorrect value for 'mask_zero' attribute"
25
+ t.want = True
26
+ t.got = encoder.embedding.mask_zero
27
+ cases.append(t)
28
+
29
+ for vs, u in zip(vocab_sizes, units):
30
+ encoder = encoder_to_test(vs, u)
31
+
32
+ t = test_case()
33
+ if encoder.embedding.input_dim != vs:
34
+ t.failed = True
35
+ t.msg = "Incorrect input dim of embedding layer"
36
+ t.want = vs
37
+ t.got = encoder.embedding.input_dim
38
+ cases.append(t)
39
+
40
+ t = test_case()
41
+ if encoder.embedding.output_dim != u:
42
+ t.failed = True
43
+ t.msg = "Incorrect output dim of embedding layer"
44
+ t.want = u
45
+ t.got = encoder.embedding.output_dim
46
+ cases.append(t)
47
+
48
+ t = test_case()
49
+ if not isinstance(encoder.rnn.layer, tf.keras.layers.LSTM):
50
+ t.failed = True
51
+ t.msg = "Incorrect type of layer inside Bidirectional"
52
+ t.want = tf.keras.layers.LSTM
53
+ t.got = type(encoder.rnn.layer)
54
+ return [t]
55
+
56
+ for u in units:
57
+ encoder = encoder_to_test(vocab_sizes[1], u)
58
+ t = test_case()
59
+ if encoder.rnn.layer.units != u:
60
+ t.failed = True
61
+ t.msg = "Incorrect number of units in LSTM layer"
62
+ t.want = u
63
+ t.got = encoder.rnn.layer.units
64
+ cases.append(t)
65
+
66
+ t = test_case()
67
+ if encoder.rnn.layer.return_sequences != True:
68
+ t.failed = True
69
+ t.msg = "LSTM layer has incorrect value for 'return_sequences' attribute"
70
+ t.want = True
71
+ t.got = encoder.rnn.layer.return_sequences
72
+ cases.append(t)
73
+
74
+ vocab_size = 16
75
+ n_units = 8
76
+ encoder = encoder_to_test(vocab_size, n_units)
77
+ to_translate = np.array([[1, 2, 3, 4, 5, 6, 14, 0, 0, 0],
78
+ [2, 1, 1, 1, 1, 1, 8, 0, 0, 0],
79
+ [5, 4, 2, 3, 3, 15, 11, 0, 0, 0]])
80
+ #for (to_translate, _), _ in train_data.take(3):
81
+
82
+ first_dim_in, second_dim_in = to_translate.shape
83
+ encoder_output = encoder(to_translate)
84
+ t = test_case()
85
+ if len(encoder_output.shape) != 3:
86
+ t.failed = True
87
+ t.msg = "Incorrect shape of encoder output"
88
+ t.want = "a shape with 3 dimensions"
89
+ t.got = encoder_output.shape
90
+ return [t]
91
+
92
+ first_dim_out, second_dim_out, third_dim_out = encoder_output.shape
93
+
94
+ t = test_case()
95
+ if first_dim_in != first_dim_out:
96
+ t.failed = True
97
+ t.msg = "Incorrect first dimension of encoder output"
98
+ t.want = first_dim_in
99
+ t.got = first_dim_out
100
+ cases.append(t)
101
+
102
+ t = test_case()
103
+ if second_dim_in != second_dim_out:
104
+ t.failed = True
105
+ t.msg = "Incorrect second dimension of encoder output"
106
+ t.want = second_dim_in
107
+ t.got = second_dim_out
108
+ cases.append(t)
109
+
110
+ t = test_case()
111
+ if third_dim_out != n_units:
112
+ t.failed = True
113
+ t.msg = "Incorrect third dimension of encoder output"
114
+ t.want = units
115
+ t.got = third_dim_out
116
+ cases.append(t)
117
+
118
+ return cases
119
+
120
+ cases = g()
121
+ print_feedback(cases)
122
+
123
+
124
+ def test_cross_attention(cross_attention_to_test):
125
+ def g():
126
+ units = [32, 64, 256, 512]
127
+
128
+ cases = []
129
+
130
+ n_units = 512
131
+ cross_attention = cross_attention_to_test(n_units)
132
+
133
+ t = test_case()
134
+ if not isinstance(cross_attention.mha, tf.keras.layers.MultiHeadAttention):
135
+ t.failed = True
136
+ t.msg = "Incorrect type of layer for Multi Head Attention"
137
+ t.want = tf.keras.layers.MultiHeadAttention
138
+ t.got = type(cross_attention.mha)
139
+ return [t]
140
+
141
+ # for u in units:
142
+ # cross_attention = cross_attention_to_test(u)
143
+
144
+ # t = test_case()
145
+ # if cross_attention.mha.key_dim != u:
146
+ # t.failed = True
147
+ # t.msg = "Incorrect key dim of Multi Head Attention layer"
148
+ # t.want = u
149
+ # t.got = cross_attention.mha.key_dim
150
+ # cases.append(t)
151
+
152
+ cross_attention = cross_attention_to_test(n_units)
153
+ embed = tf.keras.layers.Embedding(VOCAB_SIZE, output_dim=UNITS, mask_zero=True)
154
+
155
+ for (to_translate, sr_translation), _ in train_data.take(3):
156
+ sr_translation_embed = embed(sr_translation)
157
+ first_dim_in, second_dim_in, third_dim_in = sr_translation_embed.shape
158
+ dummy_encoder_output = np.random.rand(64, 14, 512)
159
+ cross_attention_output = cross_attention(
160
+ dummy_encoder_output, sr_translation_embed
161
+ )
162
+ # print(cross_attention_output.shape)
163
+
164
+ t = test_case()
165
+ if len(cross_attention_output.shape) != 3:
166
+ t.failed = True
167
+ t.msg = "Incorrect shape of cross_attention output"
168
+ t.want = "a shape with 3 dimensions"
169
+ t.got = cross_attention_output.shape
170
+ return [t]
171
+
172
+ first_dim_out, second_dim_out, third_dim_out = cross_attention_output.shape
173
+
174
+ t = test_case()
175
+ if first_dim_in != first_dim_out:
176
+ t.failed = True
177
+ t.msg = "Incorrect first dimension of cross_attention output"
178
+ t.want = first_dim_in
179
+ t.got = first_dim_out
180
+ cases.append(t)
181
+
182
+ t = test_case()
183
+ if second_dim_in != second_dim_out:
184
+ t.failed = True
185
+ t.msg = "Incorrect second dimension of cross_attention output"
186
+ t.want = second_dim_in
187
+ t.got = second_dim_out
188
+ cases.append(t)
189
+
190
+ t = test_case()
191
+ if third_dim_in != third_dim_out:
192
+ t.failed = True
193
+ t.msg = "Incorrect third dimension of cross_attention output"
194
+ t.want = third_dim_in
195
+ t.got = third_dim_out
196
+ cases.append(t)
197
+
198
+ _, n_heads, key_dim = cross_attention.mha.get_weights()[0].shape
199
+
200
+ t = test_case()
201
+ if n_heads != 1:
202
+ t.failed = True
203
+ t.msg = "Incorrect number of attention heads"
204
+ t.want = 1
205
+ t.got = n_heads
206
+ cases.append(t)
207
+
208
+ t = test_case()
209
+ if key_dim != n_units:
210
+ t.failed = True
211
+ t.msg = f"Incorrect size of query and key for every attention head when passing {n_units} units to the constructor"
212
+ t.want = n_units
213
+ t.got = key_dim
214
+ cases.append(t)
215
+
216
+ return cases
217
+
218
+ cases = g()
219
+ print_feedback(cases)
220
+
221
+
222
+ def test_decoder(decoder_to_test, CrossAttention):
223
+ def g():
224
+ vocab_sizes = [5, 20, 1000, 15000]
225
+ units = [32, 64, 256, 512]
226
+
227
+ cases = []
228
+
229
+ vocab_size = 10000
230
+ n_units = 512
231
+ decoder = decoder_to_test(vocab_size, n_units)
232
+
233
+ t = test_case()
234
+ if not isinstance(decoder.embedding, tf.keras.layers.Embedding):
235
+ t.failed = True
236
+ t.msg = "Incorrect type of embedding layer"
237
+ t.want = tf.keras.layers.Embedding
238
+ t.got = type(decoder.embedding)
239
+ return [t]
240
+
241
+ t = test_case()
242
+ if decoder.embedding.mask_zero != True:
243
+ t.failed = True
244
+ t.msg = "Embedding layer has incorrect value for 'mask_zero' attribute"
245
+ t.want = True
246
+ t.got = decoder.embedding.mask_zero
247
+ cases.append(t)
248
+
249
+ for vs, u in zip(vocab_sizes, units):
250
+ decoder = decoder_to_test(vs, u)
251
+
252
+ t = test_case()
253
+ if decoder.embedding.input_dim != vs:
254
+ t.failed = True
255
+ t.msg = "Incorrect input dim of embedding layer"
256
+ t.want = vs
257
+ t.got = decoder.embedding.input_dim
258
+ cases.append(t)
259
+
260
+ t = test_case()
261
+ if decoder.embedding.output_dim != u:
262
+ t.failed = True
263
+ t.msg = "Incorrect output dim of embedding layer"
264
+ t.want = u
265
+ t.got = decoder.embedding.output_dim
266
+ cases.append(t)
267
+
268
+ t = test_case()
269
+ if not isinstance(decoder.pre_attention_rnn, tf.keras.layers.LSTM):
270
+ t.failed = True
271
+ t.msg = "Incorrect type of pre_attention_rnn layer"
272
+ t.want = tf.keras.layers.LSTM
273
+ t.got = type(decoder.pre_attention_rnn)
274
+ return [t]
275
+
276
+ for u in units:
277
+ decoder = decoder_to_test(vocab_size, u)
278
+ t = test_case()
279
+ if decoder.pre_attention_rnn.units != u:
280
+ t.failed = True
281
+ t.msg = "Incorrect number of units in pre_attention_rnn layer"
282
+ t.want = u
283
+ t.got = decoder.pre_attention_rnn.units
284
+ cases.append(t)
285
+
286
+ # t = test_case()
287
+ # if decoder.attention.units != u:
288
+ # t.failed = True
289
+ # t.msg = "Incorrect number of units in attention layer"
290
+ # t.want = u
291
+ # t.got = decoder.attention.units
292
+ # cases.append(t)
293
+
294
+ t = test_case()
295
+ if decoder.post_attention_rnn.units != u:
296
+ t.failed = True
297
+ t.msg = "Incorrect number of units in post_attention_rnn layer"
298
+ t.want = u
299
+ t.got = decoder.post_attention_rnn.units
300
+ cases.append(t)
301
+
302
+ t = test_case()
303
+ if decoder.pre_attention_rnn.return_sequences != True:
304
+ t.failed = True
305
+ t.msg = "pre_attention_rnn layer has incorrect value for 'return_sequences' attribute"
306
+ t.want = True
307
+ t.got = decoder.pre_attention_rnn.return_sequences
308
+ cases.append(t)
309
+
310
+ t = test_case()
311
+ if decoder.pre_attention_rnn.return_state != True:
312
+ t.failed = True
313
+ t.msg = "pre_attention_rnn layer has incorrect value for 'return_state' attribute"
314
+ t.want = True
315
+ t.got = decoder.pre_attention_rnn.return_state
316
+ cases.append(t)
317
+
318
+ t = test_case()
319
+ if not isinstance(decoder.attention, CrossAttention):
320
+ t.failed = True
321
+ t.msg = "Incorrect type of attention layer"
322
+ t.want = CrossAttention
323
+ t.got = type(decoder.attention)
324
+ return [t]
325
+
326
+ t = test_case()
327
+ if decoder.post_attention_rnn.return_sequences != True:
328
+ t.failed = True
329
+ t.msg = "post_attention_rnn layer has incorrect value for 'return_sequences' attribute"
330
+ t.want = True
331
+ t.got = decoder.post_attention_rnn.return_sequences
332
+ cases.append(t)
333
+
334
+ t = test_case()
335
+ if not isinstance(decoder.post_attention_rnn, tf.keras.layers.LSTM):
336
+ t.failed = True
337
+ t.msg = "Incorrect type of pre_attention_rnn layer"
338
+ t.want = tf.keras.layers.LSTM
339
+ t.got = type(decoder.post_attention_rnn)
340
+ return [t]
341
+
342
+ t = test_case()
343
+ if not isinstance(decoder.output_layer, tf.keras.layers.Dense):
344
+ t.failed = True
345
+ t.msg = "Incorrect type of output_layer layer"
346
+ t.want = tf.keras.layers.Dense
347
+ t.got = type(decoder.output_layer)
348
+ return [t]
349
+
350
+ t = test_case()
351
+ if (
352
+ "log" not in decoder.output_layer.activation.__name__
353
+ or "softmax" not in decoder.output_layer.activation.__name__
354
+ ):
355
+ t.failed = True
356
+ t.msg = "output_layer layer has incorrect activation function"
357
+ t.want = "a log softmax activation function such as 'log_softmax_v2'"
358
+ t.got = decoder.output_layer.activation.__name__
359
+ cases.append(t)
360
+
361
+ vocab_size = 6
362
+ n_units = 4
363
+ decoder = decoder_to_test(vocab_size, n_units)
364
+ sr_translation = np.array([[3, 4, 5, 3, 3, 3, 5, 1, 1, 1, 1, 1],
365
+ [1, 2, 3, 4, 5, 1, 1, 0, 0, 0, 0, 0]])
366
+ encoder_output = np.random.rand(2, 10, n_units)
367
+ decoder_output = decoder(encoder_output, sr_translation)
368
+
369
+ first_dim_in, second_dim_in = sr_translation.shape
370
+
371
+ t = test_case()
372
+ if len(decoder_output.shape) != 3:
373
+ t.failed = True
374
+ t.msg = "Incorrect shape of decoder output"
375
+ t.want = "a shape with 3 dimensions"
376
+ t.got = decoder_output.shape
377
+ return [t]
378
+
379
+ first_dim_out, second_dim_out, third_dim_out = decoder_output.shape
380
+
381
+ t = test_case()
382
+ if first_dim_in != first_dim_out:
383
+ t.failed = True
384
+ t.msg = "Incorrect first dimension of decoder output"
385
+ t.want = first_dim_in
386
+ t.got = first_dim_out
387
+ cases.append(t)
388
+
389
+ t = test_case()
390
+ if second_dim_in != second_dim_out:
391
+ t.failed = True
392
+ t.msg = "Incorrect second dimension of decoder output"
393
+ t.want = second_dim_in
394
+ t.got = second_dim_out
395
+ cases.append(t)
396
+
397
+ t = test_case()
398
+ if third_dim_out != vocab_size:
399
+ t.failed = True
400
+ t.msg = "Incorrect third dimension of decoder output"
401
+ t.want = vocab_size
402
+ t.got = third_dim_out
403
+ cases.append(t)
404
+
405
+ return cases
406
+
407
+ cases = g()
408
+ print_feedback(cases)
409
+
410
+
411
+ def test_translator(translator_to_test, Encoder, Decoder):
412
+ def g():
413
+ vocab_sizes = [5, 20, 1000, 15000]
414
+ units = [32, 64, 256, 512]
415
+
416
+ cases = []
417
+
418
+ vocab_size = 10000
419
+ n_units = 512
420
+ translator = translator_to_test(vocab_size, n_units)
421
+
422
+ t = test_case()
423
+ if not isinstance(translator.encoder, Encoder):
424
+ t.failed = True
425
+ t.msg = "Incorrect type of encoder layer"
426
+ t.want = Encoder
427
+ t.got = type(translator.encoder)
428
+ return [t]
429
+
430
+ t = test_case()
431
+ if not isinstance(translator.decoder, Decoder):
432
+ t.failed = True
433
+ t.msg = "Incorrect type of encoder layer"
434
+ t.want = Decoder
435
+ t.got = type(translator.decoder)
436
+ return [t]
437
+
438
+ vocab_size = 16
439
+ n_units = 8
440
+ translator = translator_to_test(vocab_size, n_units)
441
+
442
+ to_translate = np. array([[1, 2, 3, 4, 5, 0, 0],
443
+ [5, 2, 3, 4, 5, 6, 0],
444
+ [6, 3, 3, 4, 5, 3, 3],
445
+ [7, 9, 9, 6, 5, 3, 3]])
446
+
447
+ sr_translation = np. array([[8, 1, 2, 3, 4, 5, 0, 0],
448
+ [9, 5, 2, 3, 4, 5, 6, 0],
449
+ [10, 6, 3, 3, 4, 5, 3, 3],
450
+ [11, 7, 9, 9, 6, 5, 3, 3]])
451
+
452
+ #for (to_translate, sr_translation), _ in train_data.take(3):
453
+ first_dim_in, second_dim_in = sr_translation.shape
454
+ translator_output = translator((to_translate, sr_translation))
455
+ t = test_case()
456
+ if len(translator_output.shape) != 3:
457
+ t.failed = True
458
+ t.msg = "Incorrect shape of translator output"
459
+ t.want = "a shape with 3 dimensions"
460
+ t.got = translator_output.shape
461
+ return [t]
462
+
463
+ first_dim_out, second_dim_out, third_dim_out = translator_output.shape
464
+
465
+ t = test_case()
466
+ if first_dim_in != first_dim_out:
467
+ t.failed = True
468
+ t.msg = "Incorrect first dimension of translator output"
469
+ t.want = first_dim_in
470
+ t.got = first_dim_out
471
+ cases.append(t)
472
+
473
+ t = test_case()
474
+ if second_dim_in != second_dim_out:
475
+ t.failed = True
476
+ t.msg = "Incorrect second dimension of translator output"
477
+ t.want = second_dim_in
478
+ t.got = second_dim_out
479
+ cases.append(t)
480
+
481
+ t = test_case()
482
+ if third_dim_out != vocab_size:
483
+ t.failed = True
484
+ t.msg = "Incorrect third dimension of translator output"
485
+ t.want = vocab_size
486
+ t.got = third_dim_out
487
+ cases.append(t)
488
+
489
+ return cases
490
+
491
+ cases = g()
492
+ print_feedback(cases)
493
+
494
+
495
+
496
+ def test_translate(learner_func, model):
497
+ def g():
498
+
499
+ cases = []
500
+
501
+ txt = "Hi, my name is Younes"
502
+ try:
503
+ translation, logit, tokens = learner_func(model, txt, temperature=0.9)
504
+ except Exception as e:
505
+ t = test_case()
506
+ t.failed = True
507
+ t.msg = "There was an exception when running your function"
508
+ t.want = "No exceptions"
509
+ t.got = f"{str(e)}"
510
+ return [t]
511
+
512
+ txt = "Hi, my name is Alejandra"
513
+ translation, logit, tokens = learner_func(model, txt, temperature=0.0)
514
+
515
+ t = test_case()
516
+
517
+ if not isinstance(translation, str):
518
+ t.failed = True
519
+ t.msg = "'translation' has incorrect type"
520
+ t.want = str
521
+ t.got = type(translation)
522
+ cases.append(t)
523
+
524
+ if not isinstance(logit, np.number):
525
+ t.failed = True
526
+ t.msg = "'logit' has incorrect type"
527
+ t.want = np.number
528
+ t.got = type(logit)
529
+ cases.append(t)
530
+
531
+ if not isinstance(tokens, tf.Tensor):
532
+ t.failed = True
533
+ t.msg = "'tokens' has incorrect type"
534
+ t.want = tf.Tensor
535
+ t.got = type(tokens)
536
+ cases.append(t)
537
+
538
+ translation2, logit2, tokens2 = learner_func(model, txt, temperature=0.0)
539
+
540
+ t = test_case()
541
+ if translation != translation2:
542
+ t.failed = True
543
+ t.msg = "translate didn't return the same translation when using temperature of 0.0"
544
+ t.want = translation
545
+ t.got = translation2
546
+ cases.append(t)
547
+
548
+ t = test_case()
549
+ if logit != logit2:
550
+ t.failed = True
551
+ t.msg = "translate didn't return the same logit when using temperature of 0.0"
552
+ t.want = logit
553
+ t.got = logit2
554
+ cases.append(t)
555
+
556
+ t = test_case()
557
+ if not np.allclose(tokens, tokens2):
558
+ t.failed = True
559
+ t.msg = "translate didn't return the same tokens when using temperature of 0.0"
560
+ t.want = tokens
561
+ t.got = tokens2
562
+ cases.append(t)
563
+
564
+ # Check that function uses the model.decoder and model.enconder functions
565
+ inputs = tf.keras.Input(shape=(37,))
566
+ outputs = tf.keras.layers.Dense(5, activation="softmax")(inputs)
567
+ model_fake = tf.keras.Model(inputs = inputs, outputs = outputs)
568
+
569
+ model_fake.encoder = model.encoder
570
+ model_fake.decoder = None
571
+ t = test_case()
572
+ try:
573
+ ff = learner_func(model_fake, "Hello world", temperature=0.0)
574
+ t.failed = True
575
+ t.msg = "The translator is not using the internal model.decoder. You are probably using a global variable"
576
+ t.want = "Fail translation"
577
+ t.got = "Succeed translation with wrong decoder"
578
+ except:
579
+ None
580
+
581
+ cases.append(t)
582
+
583
+ model_fake.encoder = None
584
+ model_fake.decoder = model.decoder
585
+ t = test_case()
586
+ try:
587
+ ff = learner_func(model_fake, "Hello world", temperature=0.0)
588
+ t.failed = True
589
+ t.msg = "The translator is not using the internal model.encoder. You are probably using a global variable"
590
+ t.want = "Fail translation"
591
+ t.got = "Succeed translation with wrong encoder"
592
+ except:
593
+ None
594
+
595
+ cases.append(t)
596
+
597
+ return cases
598
+
599
+ cases = g()
600
+ print_feedback(cases)
601
+
602
+
603
+
604
+
605
+ def test_rouge1_similarity(learner_func):
606
+
607
+ def g():
608
+
609
+ tensors = [
610
+ [0],
611
+ [0, 1],
612
+ [0, 1, 2],
613
+ [1, 2, 4, 5],
614
+ [5, 5, 7, 0, 232]
615
+ ]
616
+
617
+ expected = [0.6666666666666666, 0.5, 0, 0.33333333333333337, 0.8, 0.3333333333333333, 0.28571428571428575, 0.5714285714285715, 0.25]
618
+
619
+ cases = []
620
+ pairs = list(combinations(tensors, 2))
621
+
622
+ for (candidate, reference), solution in zip(pairs, expected):
623
+ answer = learner_func(candidate, reference)
624
+ t = test_case()
625
+ if not math.isclose(answer, solution):
626
+ t.failed = True
627
+ t.msg = f"Incorrect similarity for candidate={candidate} and reference={reference}"
628
+ t.want = solution
629
+ t.got = answer
630
+ cases.append(t)
631
+
632
+ return cases
633
+
634
+ cases = g()
635
+ print_feedback(cases)
636
+
637
+
638
+ def test_average_overlap(learner_func):
639
+
640
+ def jaccard_similarity(candidate, reference):
641
+
642
+ # Convert the lists to sets to get the unique tokens
643
+ candidate_set = set(candidate)
644
+ reference_set = set(reference)
645
+
646
+ # Get the set of tokens common to both candidate and reference
647
+ common_tokens = candidate_set.intersection(reference_set)
648
+
649
+ # Get the set of all tokens found in either candidate or reference
650
+ all_tokens = candidate_set.union(reference_set)
651
+
652
+ # Compute the percentage of overlap (divide the number of common tokens by the number of all tokens)
653
+ overlap = len(common_tokens) / len(all_tokens)
654
+
655
+ return overlap
656
+
657
+ def g():
658
+
659
+ l1 = [1, 2, 3]
660
+ l2 = [1, 2, 4]
661
+ l3 = [1, 2, 4, 5]
662
+ l4 = [5,6]
663
+
664
+ elements = [l1, l2, l3, l4]
665
+
666
+ all_combinations = []
667
+
668
+ for r in range(2, len(elements) + 1):
669
+ # Generate combinations of length r
670
+ combinations_r = combinations(elements, r)
671
+
672
+ # Append the combinations to the result list
673
+ all_combinations.extend(combinations_r)
674
+
675
+ expected = [{0: 0.5, 1: 0.5},
676
+ {0: 0.4, 1: 0.4},
677
+ {0: 0.0, 1: 0.0},
678
+ {0: 0.75, 1: 0.75},
679
+ {0: 0.0, 1: 0.0},
680
+ {0: 0.2, 1: 0.2},
681
+ {0: 0.45, 1: 0.625, 2: 0.575},
682
+ {0: 0.25, 1: 0.25, 2: 0.0},
683
+ {0: 0.2, 1: 0.3, 2: 0.1},
684
+ {0: 0.375, 1: 0.475, 2: 0.1},
685
+ {0: 0.3, 1: 0.417, 2: 0.45, 3: 0.067}]
686
+
687
+ cases = []
688
+
689
+ for combination, solution in zip(all_combinations, expected):
690
+ answer = learner_func(combination, jaccard_similarity)
691
+ t = test_case()
692
+ if answer != solution:
693
+ t.failed = True
694
+ t.msg = f"Incorrect overlap for lists={combination}"
695
+ t.want = solution
696
+ t.got = answer
697
+ cases.append(t)
698
+
699
+ return cases
700
+
701
+ cases = g()
702
+ print_feedback(cases)
NLP with Attention Models/QA/BPE_algorithm/home/jovyan/work/.ipynb_checkpoints/C4W3_SentencePiece_and_BPE-checkpoint.ipynb ADDED
@@ -0,0 +1,633 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "# SentencePiece and BPE "
8
+ ]
9
+ },
10
+ {
11
+ "cell_type": "markdown",
12
+ "metadata": {},
13
+ "source": [
14
+ "## Introduction to Tokenization"
15
+ ]
16
+ },
17
+ {
18
+ "cell_type": "markdown",
19
+ "metadata": {},
20
+ "source": [
21
+ "In order to process text in neural network models it is first required to **encode** text as numbers with ids, since the tensor operations act on numbers. Finally, if the output of the network is to be words, it is required to **decode** the predicted tokens ids back to text.\n",
22
+ "\n",
23
+ "To encode text, the first decision that has to be made is to what level of granularity are we going to consider the text? Because ultimately, from these **tokens**, features are going to be created about them. Many different experiments have been carried out using *words*, *morphological units*, *phonemic units* or *characters* as tokens. For example, \n",
24
+ "\n",
25
+ "- Tokens are tricky. (raw text)\n",
26
+ "- Tokens are tricky . ([words](https://arxiv.org/pdf/1301.3781))\n",
27
+ "- Token s _ are _ trick _ y . ([morphemes](https://arxiv.org/pdf/1907.02423.pdf))\n",
28
+ "- t oʊ k ə n z _ ɑː _ ˈt r ɪ k i. ([phonemes](https://www.aclweb.org/anthology/W18-5812.pdf), for STT)\n",
29
+ "- T o k e n s _ a r e _ t r i c k y . ([character](https://www.aclweb.org/anthology/C18-1139/))"
30
+ ]
31
+ },
32
+ {
33
+ "cell_type": "markdown",
34
+ "metadata": {},
35
+ "source": [
36
+ "But how to identify these units, such as words, is largely determined by the language they come from. For example, in many European languages a space is used to separate words, while in some Asian languages there are no spaces between words. Compare English and Mandarin.\n",
37
+ "\n",
38
+ "- Tokens are tricky. (original sentence)\n",
39
+ "- 标记很棘手 (Mandarin)\n",
40
+ "- Biāojì hěn jíshǒu (pinyin)\n",
41
+ "- 标记 很 棘手 (Mandarin with spaces)\n",
42
+ "\n",
43
+ "\n",
44
+ "So, the ability to **tokenize**, i.e. split text into meaningful fundamental units, is not always straight-forward.\n",
45
+ "\n",
46
+ "Also, there are practical issues of how large our *vocabulary* of words, `vocab_size`, should be, considering memory limitations vs. coverage. A compromise may be need to be made between: \n",
47
+ "* the finest-grained models employing characters which can be memory intensive and \n",
48
+ "* more computationally efficient *subword* units such as [n-grams](https://arxiv.org/pdf/1712.09405) or larger units.\n",
49
+ "\n",
50
+ "In [SentencePiece](https://www.aclweb.org/anthology/D18-2012.pdf) unicode characters are grouped together using either a [unigram language model](https://www.aclweb.org/anthology/P18-1007.pdf) (used in this week's assignment) or [BPE](https://arxiv.org/pdf/1508.07909.pdf), **byte-pair encoding**. We will discuss BPE, since BERT and many of its variants use a modified version of BPE and its pseudocode is easy to implement and understand... hopefully!"
51
+ ]
52
+ },
53
+ {
54
+ "cell_type": "markdown",
55
+ "metadata": {},
56
+ "source": [
57
+ "## SentencePiece Preprocessing\n",
58
+ "### NFKC Normalization"
59
+ ]
60
+ },
61
+ {
62
+ "cell_type": "markdown",
63
+ "metadata": {},
64
+ "source": [
65
+ "Unsurprisingly, even using unicode to initially tokenize text can be ambiguous, e.g., "
66
+ ]
67
+ },
68
+ {
69
+ "cell_type": "code",
70
+ "execution_count": 1,
71
+ "metadata": {},
72
+ "outputs": [
73
+ {
74
+ "name": "stdout",
75
+ "output_type": "stream",
76
+ "text": [
77
+ "é = é : False\n"
78
+ ]
79
+ }
80
+ ],
81
+ "source": [
82
+ "eaccent = '\\u00E9'\n",
83
+ "e_accent = '\\u0065\\u0301'\n",
84
+ "print(f'{eaccent} = {e_accent} : {eaccent == e_accent}')"
85
+ ]
86
+ },
87
+ {
88
+ "cell_type": "markdown",
89
+ "metadata": {},
90
+ "source": [
91
+ "SentencePiece uses the Unicode standard normalization form, [NFKC](https://en.wikipedia.org/wiki/Unicode_equivalence), so this isn't an issue. Looking at the example from above but with normalization:"
92
+ ]
93
+ },
94
+ {
95
+ "cell_type": "code",
96
+ "execution_count": 2,
97
+ "metadata": {},
98
+ "outputs": [
99
+ {
100
+ "name": "stdout",
101
+ "output_type": "stream",
102
+ "text": [
103
+ "é = é : True\n"
104
+ ]
105
+ }
106
+ ],
107
+ "source": [
108
+ "from unicodedata import normalize\n",
109
+ "\n",
110
+ "norm_eaccent = normalize('NFKC', '\\u00E9')\n",
111
+ "norm_e_accent = normalize('NFKC', '\\u0065\\u0301')\n",
112
+ "print(f'{norm_eaccent} = {norm_e_accent} : {norm_eaccent == norm_e_accent}')"
113
+ ]
114
+ },
115
+ {
116
+ "cell_type": "markdown",
117
+ "metadata": {},
118
+ "source": [
119
+ "Normalization has actually changed the unicode code point (unicode unique id) for one of these two characters."
120
+ ]
121
+ },
122
+ {
123
+ "cell_type": "code",
124
+ "execution_count": 3,
125
+ "metadata": {},
126
+ "outputs": [],
127
+ "source": [
128
+ "def get_hex_encoding(s):\n",
129
+ " return ' '.join(hex(ord(c)) for c in s)\n",
130
+ "\n",
131
+ "def print_string_and_encoding(s):\n",
132
+ " print(f'{s} : {get_hex_encoding(s)}') "
133
+ ]
134
+ },
135
+ {
136
+ "cell_type": "code",
137
+ "execution_count": 4,
138
+ "metadata": {},
139
+ "outputs": [
140
+ {
141
+ "name": "stdout",
142
+ "output_type": "stream",
143
+ "text": [
144
+ "é : 0xe9\n",
145
+ "é : 0x65 0x301\n",
146
+ "é : 0xe9\n",
147
+ "é : 0xe9\n"
148
+ ]
149
+ }
150
+ ],
151
+ "source": [
152
+ "for s in [eaccent, e_accent, norm_eaccent, norm_e_accent]:\n",
153
+ " print_string_and_encoding(s)"
154
+ ]
155
+ },
156
+ {
157
+ "cell_type": "markdown",
158
+ "metadata": {},
159
+ "source": [
160
+ "This normalization has other side effects which may be considered useful such as converting curly quotes &ldquo; to \" their ASCII equivalent. (<sup>*</sup>Although we *now* lose directionality of the quote...)"
161
+ ]
162
+ },
163
+ {
164
+ "cell_type": "markdown",
165
+ "metadata": {},
166
+ "source": [
167
+ "### Lossless Tokenization\n",
168
+ "\n",
169
+ "SentencePiece also ensures that when you tokenize your data and detokenize your data the original position of white space is preserved. However, tabs and newlines are converted to spaces.\n",
170
+ "\n",
171
+ "To ensure this **lossless tokenization**, SentencePiece replaces white space with _ (U+2581). So that a simple join of the tokens by replacing underscores with spaces can restore the white space, even if there are consecutive symbols. But remember first to normalize and then replace spaces with _ (U+2581)."
172
+ ]
173
+ },
174
+ {
175
+ "cell_type": "code",
176
+ "execution_count": 7,
177
+ "metadata": {},
178
+ "outputs": [],
179
+ "source": [
180
+ "s = 'Tokenization is hard.'\n",
181
+ "sn = normalize('NFKC', s)\n",
182
+ "sn_ = sn.replace(' ', '\\u2581')"
183
+ ]
184
+ },
185
+ {
186
+ "cell_type": "code",
187
+ "execution_count": 8,
188
+ "metadata": {},
189
+ "outputs": [
190
+ {
191
+ "name": "stdout",
192
+ "output_type": "stream",
193
+ "text": [
194
+ "0x54 0x6f 0x6b 0x65 0x6e 0x69 0x7a 0x61 0x74 0x69 0x6f 0x6e 0x20 0x69 0x73 0x20 0x68 0x61 0x72 0x64 0x2e\n",
195
+ "0x54 0x6f 0x6b 0x65 0x6e 0x69 0x7a 0x61 0x74 0x69 0x6f 0x6e 0x20 0x69 0x73 0x20 0x68 0x61 0x72 0x64 0x2e\n",
196
+ "0x54 0x6f 0x6b 0x65 0x6e 0x69 0x7a 0x61 0x74 0x69 0x6f 0x6e 0x2581 0x69 0x73 0x2581 0x68 0x61 0x72 0x64 0x2e\n"
197
+ ]
198
+ }
199
+ ],
200
+ "source": [
201
+ "print(get_hex_encoding(s))\n",
202
+ "print(get_hex_encoding(sn))\n",
203
+ "print(get_hex_encoding(sn_))"
204
+ ]
205
+ },
206
+ {
207
+ "cell_type": "markdown",
208
+ "metadata": {},
209
+ "source": [
210
+ "## BPE Algorithm\n",
211
+ "\n",
212
+ "After discussing the preprocessing that SentencePiece performs, you will get the data, preprocess it, and apply the BPE algorithm. You will see how this reproduces the tokenization produced by training SentencePiece on the example dataset (from this week's assignment).\n",
213
+ "\n",
214
+ "### Preparing our Data\n",
215
+ "First, you get the Squad data and process it as above."
216
+ ]
217
+ },
218
+ {
219
+ "cell_type": "code",
220
+ "execution_count": null,
221
+ "metadata": {},
222
+ "outputs": [],
223
+ "source": [
224
+ "import ast\n",
225
+ "\n",
226
+ "def convert_json_examples_to_text(filepath):\n",
227
+ " example_jsons = list(map(ast.literal_eval, open(filepath))) # Read in the json from the example file\n",
228
+ " texts = [example_json['text'].decode('utf-8') for example_json in example_jsons] # Decode the byte sequences\n",
229
+ " text = '\\n\\n'.join(texts) # Separate different articles by two newlines\n",
230
+ " text = normalize('NFKC', text) # Normalize the text\n",
231
+ "\n",
232
+ " with open('example.txt', 'w') as fw:\n",
233
+ " fw.write(text)\n",
234
+ " \n",
235
+ " return text"
236
+ ]
237
+ },
238
+ {
239
+ "cell_type": "code",
240
+ "execution_count": null,
241
+ "metadata": {},
242
+ "outputs": [],
243
+ "source": [
244
+ "text = convert_json_examples_to_text('./data/data.txt')\n",
245
+ "print(text[:900])"
246
+ ]
247
+ },
248
+ {
249
+ "cell_type": "markdown",
250
+ "metadata": {},
251
+ "source": [
252
+ "In the algorithm the `vocab` variable is actually a frequency dictionary of the words. Those words have been prepended with an *underscore* to indicate that they are the beginning of a word. Finally, the characters have been delimited by spaces so that the BPE algorithm can group the most common characters together in the dictionary in a greedy fashion. You will see how that is done shortly."
253
+ ]
254
+ },
255
+ {
256
+ "cell_type": "code",
257
+ "execution_count": null,
258
+ "metadata": {},
259
+ "outputs": [],
260
+ "source": [
261
+ "from collections import Counter\n",
262
+ "\n",
263
+ "vocab = Counter(['\\u2581' + word for word in text.split()])\n",
264
+ "vocab = {' '.join([l for l in word]): freq for word, freq in vocab.items()}"
265
+ ]
266
+ },
267
+ {
268
+ "cell_type": "code",
269
+ "execution_count": null,
270
+ "metadata": {},
271
+ "outputs": [],
272
+ "source": [
273
+ "def show_vocab(vocab, end='\\n', limit=20):\n",
274
+ " \"\"\"Show word frequencys in vocab up to the limit number of words\"\"\"\n",
275
+ " shown = 0\n",
276
+ " for word, freq in vocab.items():\n",
277
+ " print(f'{word}: {freq}', end=end)\n",
278
+ " shown +=1\n",
279
+ " if shown > limit:\n",
280
+ " break"
281
+ ]
282
+ },
283
+ {
284
+ "cell_type": "code",
285
+ "execution_count": null,
286
+ "metadata": {},
287
+ "outputs": [],
288
+ "source": [
289
+ "show_vocab(vocab)"
290
+ ]
291
+ },
292
+ {
293
+ "cell_type": "markdown",
294
+ "metadata": {},
295
+ "source": [
296
+ "You check the size of the vocabulary (frequency dictionary) because this is the one hyperparameter that BPE depends on crucially on how far it breaks up a word into SentencePieces. It turns out that for your trained model on the small dataset that 60% of 455 merges of the most frequent characters need to be done to reproduce the upperlimit of a 32K `vocab_size` over the entire corpus of examples."
297
+ ]
298
+ },
299
+ {
300
+ "cell_type": "code",
301
+ "execution_count": null,
302
+ "metadata": {},
303
+ "outputs": [],
304
+ "source": [
305
+ "print(f'Total number of unique words: {len(vocab)}')\n",
306
+ "print(f'Number of merges required to reproduce SentencePiece training on the whole corpus: {int(0.60*len(vocab))}')"
307
+ ]
308
+ },
309
+ {
310
+ "cell_type": "markdown",
311
+ "metadata": {},
312
+ "source": [
313
+ "### BPE Algorithm\n",
314
+ "Directly from the BPE paper you have the following algorithm. "
315
+ ]
316
+ },
317
+ {
318
+ "cell_type": "code",
319
+ "execution_count": null,
320
+ "metadata": {},
321
+ "outputs": [],
322
+ "source": [
323
+ "import re, collections\n",
324
+ "\n",
325
+ "def get_stats(vocab):\n",
326
+ " pairs = collections.defaultdict(int)\n",
327
+ " for word, freq in vocab.items():\n",
328
+ " symbols = word.split()\n",
329
+ " for i in range(len(symbols) - 1):\n",
330
+ " pairs[symbols[i], symbols[i+1]] += freq\n",
331
+ " return pairs\n",
332
+ "\n",
333
+ "def merge_vocab(pair, v_in):\n",
334
+ " v_out = {}\n",
335
+ " bigram = re.escape(' '.join(pair))\n",
336
+ " p = re.compile(r'(?<!\\S)' + bigram + r'(?!\\S)')\n",
337
+ " for word in v_in:\n",
338
+ " w_out = p.sub(''.join(pair), word)\n",
339
+ " v_out[w_out] = v_in[word]\n",
340
+ " return v_out\n",
341
+ "\n",
342
+ "def get_sentence_piece_vocab(vocab, frac_merges=0.60):\n",
343
+ " sp_vocab = vocab.copy()\n",
344
+ " num_merges = int(len(sp_vocab)*frac_merges)\n",
345
+ " \n",
346
+ " for i in range(num_merges):\n",
347
+ " pairs = get_stats(sp_vocab)\n",
348
+ " best = max(pairs, key=pairs.get)\n",
349
+ " sp_vocab = merge_vocab(best, sp_vocab)\n",
350
+ "\n",
351
+ " return sp_vocab"
352
+ ]
353
+ },
354
+ {
355
+ "cell_type": "markdown",
356
+ "metadata": {},
357
+ "source": [
358
+ "To understand what's going on first take a look at the third function `get_sentence_piece_vocab`. It takes in the current `vocab` word-frequency dictionary and the fraction, `frac_merges`, of the total `vocab_size` to merge characters in the words of the dictionary, `num_merges` times. Then for each *merge* operation it `get_stats` on how many of each pair of character sequences there are. It gets the most frequent *pair* of symbols as the `best` pair. Then it merges that pair of symbols (removes the space between them) in each word in the `vocab` that contains this `best` (= `pair`). Consequently, `merge_vocab` creates a new `vocab`, `v_out`. This process is repeated `num_merges` times and the result is the set of SentencePieces (keys of the final `sp_vocab`)."
359
+ ]
360
+ },
361
+ {
362
+ "cell_type": "markdown",
363
+ "metadata": {},
364
+ "source": [
365
+ "### Additional Discussion of BPE Algorithm"
366
+ ]
367
+ },
368
+ {
369
+ "cell_type": "markdown",
370
+ "metadata": {},
371
+ "source": [
372
+ "Please feel free to skip the below if the above description was enough.\n",
373
+ "\n",
374
+ "In a little more detail you can see in `get_stats` you initially create a list of bigram (two character sequence) frequencies from the vocabulary. Later, this may include trigrams, quadgrams, etc. Note that the key of the `pairs` frequency dictionary is actually a 2-tuple, which is just shorthand notation for a pair.\n",
375
+ "\n",
376
+ "In `merge_vocab` you take in an individual `pair` (of character sequences, note this is the most frequency `best` pair) and the current `vocab` as `v_in`. You create a new `vocab`, `v_out`, from the old by joining together the characters in the pair (removing the space), if they are present in a word of the dictionary.\n",
377
+ "\n",
378
+ "[Warning](https://regex101.com/): the expression `(?<!\\S)` means that either a whitespace character follows before the `bigram` or there is nothing before the bigram (it is the beginning of the word), similarly for `(?!\\S)` for preceding whitespace or the end of the word. "
379
+ ]
380
+ },
381
+ {
382
+ "cell_type": "code",
383
+ "execution_count": null,
384
+ "metadata": {},
385
+ "outputs": [],
386
+ "source": [
387
+ "sp_vocab = get_sentence_piece_vocab(vocab)\n",
388
+ "show_vocab(sp_vocab) "
389
+ ]
390
+ },
391
+ {
392
+ "cell_type": "markdown",
393
+ "metadata": {},
394
+ "source": [
395
+ "## Train SentencePiece BPE Tokenizer on Example Data\n",
396
+ "### Explore SentencePiece Model\n",
397
+ "First, explore the SentencePiece model provided with this week's assignment. Remember you can always use Python's built in `help` command to see the documentation for any object or method."
398
+ ]
399
+ },
400
+ {
401
+ "cell_type": "code",
402
+ "execution_count": null,
403
+ "metadata": {},
404
+ "outputs": [],
405
+ "source": [
406
+ "import sentencepiece as spm\n",
407
+ "sp = spm.SentencePieceProcessor(model_file='./data/sentencepiece.model')"
408
+ ]
409
+ },
410
+ {
411
+ "cell_type": "code",
412
+ "execution_count": null,
413
+ "metadata": {},
414
+ "outputs": [],
415
+ "source": [
416
+ "# help(sp)"
417
+ ]
418
+ },
419
+ {
420
+ "cell_type": "markdown",
421
+ "metadata": {},
422
+ "source": [
423
+ "Try it out on the first sentence of the example text."
424
+ ]
425
+ },
426
+ {
427
+ "cell_type": "code",
428
+ "execution_count": null,
429
+ "metadata": {},
430
+ "outputs": [],
431
+ "source": [
432
+ "s0 = 'Beginners BBQ Class Taking Place in Missoula!'"
433
+ ]
434
+ },
435
+ {
436
+ "cell_type": "code",
437
+ "execution_count": null,
438
+ "metadata": {},
439
+ "outputs": [],
440
+ "source": [
441
+ "# encode: text => id\n",
442
+ "print(sp.encode_as_pieces(s0))\n",
443
+ "print(sp.encode_as_ids(s0))\n",
444
+ "\n",
445
+ "# decode: id => text\n",
446
+ "print(sp.decode_pieces(sp.encode_as_pieces(s0)))\n",
447
+ "print(sp.decode_ids([12847, 277]))"
448
+ ]
449
+ },
450
+ {
451
+ "cell_type": "markdown",
452
+ "metadata": {},
453
+ "source": [
454
+ "Notice how SentencePiece breaks the words into seemingly odd parts, but you have seen something similar with BPE. But how close was the model trained on the whole corpus of examples with a `vocab_size` of 32,000 instead of 455? Here you can also test what happens to white space, like '\\n'. \n",
455
+ "\n",
456
+ "But first note that SentencePiece encodes the SentencePieces, the tokens, and has reserved some of the ids as can be seen in this week's assignment."
457
+ ]
458
+ },
459
+ {
460
+ "cell_type": "code",
461
+ "execution_count": null,
462
+ "metadata": {},
463
+ "outputs": [],
464
+ "source": [
465
+ "uid = 15068\n",
466
+ "spiece = \"\\u2581BBQ\"\n",
467
+ "unknown = \"__MUST_BE_UNKNOWN__\"\n",
468
+ "\n",
469
+ "# id <=> piece conversion\n",
470
+ "print(f'SentencePiece for ID {uid}: {sp.id_to_piece(uid)}')\n",
471
+ "print(f'ID for Sentence Piece {spiece}: {sp.piece_to_id(spiece)}')\n",
472
+ "\n",
473
+ "# returns 0 for unknown tokens (we can change the id for UNK)\n",
474
+ "print(f'ID for unknown text {unknown}: {sp.piece_to_id(unknown)}')"
475
+ ]
476
+ },
477
+ {
478
+ "cell_type": "code",
479
+ "execution_count": null,
480
+ "metadata": {},
481
+ "outputs": [],
482
+ "source": [
483
+ "print(f'Beginning of sentence id: {sp.bos_id()}')\n",
484
+ "print(f'Pad id: {sp.pad_id()}')\n",
485
+ "print(f'End of sentence id: {sp.eos_id()}')\n",
486
+ "print(f'Unknown id: {sp.unk_id()}')\n",
487
+ "print(f'Vocab size: {sp.vocab_size()}')"
488
+ ]
489
+ },
490
+ {
491
+ "cell_type": "markdown",
492
+ "metadata": {},
493
+ "source": [
494
+ "You can also check what are the ids for the first part and last part of the vocabulary."
495
+ ]
496
+ },
497
+ {
498
+ "cell_type": "code",
499
+ "execution_count": null,
500
+ "metadata": {},
501
+ "outputs": [],
502
+ "source": [
503
+ "print('\\nId\\tSentP\\tControl?')\n",
504
+ "print('------------------------')\n",
505
+ "# <unk>, <s>, </s> are defined by default. Their ids are (0, 1, 2)\n",
506
+ "# <s> and </s> are defined as 'control' symbol.\n",
507
+ "for uid in range(10):\n",
508
+ " print(uid, sp.id_to_piece(uid), sp.is_control(uid), sep='\\t')\n",
509
+ " \n",
510
+ "# for uid in range(sp.vocab_size()-10,sp.vocab_size()):\n",
511
+ "# print(uid, sp.id_to_piece(uid), sp.is_control(uid), sep='\\t')"
512
+ ]
513
+ },
514
+ {
515
+ "cell_type": "markdown",
516
+ "metadata": {},
517
+ "source": [
518
+ "### Train SentencePiece BPE model with our example.txt"
519
+ ]
520
+ },
521
+ {
522
+ "cell_type": "markdown",
523
+ "metadata": {},
524
+ "source": [
525
+ "Finally, train your own BPE model directly from the SentencePiece library and compare it to the results of the implemention of the algorithm from the BPE paper itself."
526
+ ]
527
+ },
528
+ {
529
+ "cell_type": "code",
530
+ "execution_count": null,
531
+ "metadata": {},
532
+ "outputs": [],
533
+ "source": [
534
+ "spm.SentencePieceTrainer.train('--input=example.txt --model_prefix=example_bpe --vocab_size=450 --model_type=bpe')\n",
535
+ "sp_bpe = spm.SentencePieceProcessor()\n",
536
+ "sp_bpe.load('example_bpe.model')\n",
537
+ "\n",
538
+ "print('*** BPE ***')\n",
539
+ "print(sp_bpe.encode_as_pieces(s0))"
540
+ ]
541
+ },
542
+ {
543
+ "cell_type": "code",
544
+ "execution_count": null,
545
+ "metadata": {},
546
+ "outputs": [],
547
+ "source": [
548
+ "show_vocab(sp_vocab, end = ', ')"
549
+ ]
550
+ },
551
+ {
552
+ "cell_type": "markdown",
553
+ "metadata": {},
554
+ "source": [
555
+ "The implementation of BPE's code from the paper matches up pretty well with the library itself! The differences are probably accounted for by the `vocab_size`. There is also another technical difference in that in the SentencePiece implementation of BPE a priority queue is used to more efficiently keep track of the *best pairs*. Actually, there is a priority queue in the Python standard library called `heapq` if you would like to give that a try below! "
556
+ ]
557
+ },
558
+ {
559
+ "cell_type": "markdown",
560
+ "metadata": {},
561
+ "source": [
562
+ "## Optionally try to implement BPE using a priority queue below"
563
+ ]
564
+ },
565
+ {
566
+ "cell_type": "code",
567
+ "execution_count": null,
568
+ "metadata": {},
569
+ "outputs": [],
570
+ "source": [
571
+ "from heapq import heappush, heappop"
572
+ ]
573
+ },
574
+ {
575
+ "cell_type": "code",
576
+ "execution_count": null,
577
+ "metadata": {},
578
+ "outputs": [],
579
+ "source": [
580
+ "def heapsort(iterable):\n",
581
+ " h = []\n",
582
+ " for value in iterable:\n",
583
+ " heappush(h, value)\n",
584
+ " return [heappop(h) for i in range(len(h))]"
585
+ ]
586
+ },
587
+ {
588
+ "cell_type": "code",
589
+ "execution_count": null,
590
+ "metadata": {},
591
+ "outputs": [],
592
+ "source": [
593
+ "a = [1,4,3,1,3,2,1,4,2]\n",
594
+ "heapsort(a)"
595
+ ]
596
+ },
597
+ {
598
+ "cell_type": "markdown",
599
+ "metadata": {},
600
+ "source": [
601
+ "For a more extensive example consider looking at the [SentencePiece repo](https://github.com/google/sentencepiece/blob/master/python/sentencepiece_python_module_example.ipynb). The last few sections of this code were repurposed from that tutorial. Thanks for your participation! Next stop BERT and T5!"
602
+ ]
603
+ },
604
+ {
605
+ "cell_type": "code",
606
+ "execution_count": null,
607
+ "metadata": {},
608
+ "outputs": [],
609
+ "source": []
610
+ }
611
+ ],
612
+ "metadata": {
613
+ "kernelspec": {
614
+ "display_name": "Python 3 (ipykernel)",
615
+ "language": "python",
616
+ "name": "python3"
617
+ },
618
+ "language_info": {
619
+ "codemirror_mode": {
620
+ "name": "ipython",
621
+ "version": 3
622
+ },
623
+ "file_extension": ".py",
624
+ "mimetype": "text/x-python",
625
+ "name": "python",
626
+ "nbconvert_exporter": "python",
627
+ "pygments_lexer": "ipython3",
628
+ "version": "3.10.11"
629
+ }
630
+ },
631
+ "nbformat": 4,
632
+ "nbformat_minor": 4
633
+ }
NLP with Attention Models/QA/BPE_algorithm/home/jovyan/work/C4W3_SentencePiece_and_BPE.ipynb ADDED
@@ -0,0 +1,724 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "# SentencePiece and BPE "
8
+ ]
9
+ },
10
+ {
11
+ "cell_type": "markdown",
12
+ "metadata": {},
13
+ "source": [
14
+ "## Introduction to Tokenization"
15
+ ]
16
+ },
17
+ {
18
+ "cell_type": "markdown",
19
+ "metadata": {},
20
+ "source": [
21
+ "In order to process text in neural network models it is first required to **encode** text as numbers with ids, since the tensor operations act on numbers. Finally, if the output of the network is to be words, it is required to **decode** the predicted tokens ids back to text.\n",
22
+ "\n",
23
+ "To encode text, the first decision that has to be made is to what level of granularity are we going to consider the text? Because ultimately, from these **tokens**, features are going to be created about them. Many different experiments have been carried out using *words*, *morphological units*, *phonemic units* or *characters* as tokens. For example, \n",
24
+ "\n",
25
+ "- Tokens are tricky. (raw text)\n",
26
+ "- Tokens are tricky . ([words](https://arxiv.org/pdf/1301.3781))\n",
27
+ "- Token s _ are _ trick _ y . ([morphemes](https://arxiv.org/pdf/1907.02423.pdf))\n",
28
+ "- t oʊ k ə n z _ ɑː _ ˈt r ɪ k i. ([phonemes](https://www.aclweb.org/anthology/W18-5812.pdf), for STT)\n",
29
+ "- T o k e n s _ a r e _ t r i c k y . ([character](https://www.aclweb.org/anthology/C18-1139/))"
30
+ ]
31
+ },
32
+ {
33
+ "cell_type": "markdown",
34
+ "metadata": {},
35
+ "source": [
36
+ "But how to identify these units, such as words, is largely determined by the language they come from. For example, in many European languages a space is used to separate words, while in some Asian languages there are no spaces between words. Compare English and Mandarin.\n",
37
+ "\n",
38
+ "- Tokens are tricky. (original sentence)\n",
39
+ "- 标记很棘手 (Mandarin)\n",
40
+ "- Biāojì hěn jíshǒu (pinyin)\n",
41
+ "- 标记 很 棘手 (Mandarin with spaces)\n",
42
+ "\n",
43
+ "\n",
44
+ "So, the ability to **tokenize**, i.e. split text into meaningful fundamental units, is not always straight-forward.\n",
45
+ "\n",
46
+ "Also, there are practical issues of how large our *vocabulary* of words, `vocab_size`, should be, considering memory limitations vs. coverage. A compromise may be need to be made between: \n",
47
+ "* the finest-grained models employing characters which can be memory intensive and \n",
48
+ "* more computationally efficient *subword* units such as [n-grams](https://arxiv.org/pdf/1712.09405) or larger units.\n",
49
+ "\n",
50
+ "In [SentencePiece](https://www.aclweb.org/anthology/D18-2012.pdf) unicode characters are grouped together using either a [unigram language model](https://www.aclweb.org/anthology/P18-1007.pdf) (used in this week's assignment) or [BPE](https://arxiv.org/pdf/1508.07909.pdf), **byte-pair encoding**. We will discuss BPE, since BERT and many of its variants use a modified version of BPE and its pseudocode is easy to implement and understand... hopefully!"
51
+ ]
52
+ },
53
+ {
54
+ "cell_type": "markdown",
55
+ "metadata": {},
56
+ "source": [
57
+ "## SentencePiece Preprocessing\n",
58
+ "### NFKC Normalization"
59
+ ]
60
+ },
61
+ {
62
+ "cell_type": "markdown",
63
+ "metadata": {},
64
+ "source": [
65
+ "Unsurprisingly, even using unicode to initially tokenize text can be ambiguous, e.g., "
66
+ ]
67
+ },
68
+ {
69
+ "cell_type": "code",
70
+ "execution_count": 1,
71
+ "metadata": {},
72
+ "outputs": [
73
+ {
74
+ "name": "stdout",
75
+ "output_type": "stream",
76
+ "text": [
77
+ "é = é : False\n"
78
+ ]
79
+ }
80
+ ],
81
+ "source": [
82
+ "eaccent = '\\u00E9'\n",
83
+ "e_accent = '\\u0065\\u0301'\n",
84
+ "print(f'{eaccent} = {e_accent} : {eaccent == e_accent}')"
85
+ ]
86
+ },
87
+ {
88
+ "cell_type": "markdown",
89
+ "metadata": {},
90
+ "source": [
91
+ "SentencePiece uses the Unicode standard normalization form, [NFKC](https://en.wikipedia.org/wiki/Unicode_equivalence), so this isn't an issue. Looking at the example from above but with normalization:"
92
+ ]
93
+ },
94
+ {
95
+ "cell_type": "code",
96
+ "execution_count": 2,
97
+ "metadata": {},
98
+ "outputs": [
99
+ {
100
+ "name": "stdout",
101
+ "output_type": "stream",
102
+ "text": [
103
+ "é = é : True\n"
104
+ ]
105
+ }
106
+ ],
107
+ "source": [
108
+ "from unicodedata import normalize\n",
109
+ "\n",
110
+ "norm_eaccent = normalize('NFKC', '\\u00E9')\n",
111
+ "norm_e_accent = normalize('NFKC', '\\u0065\\u0301')\n",
112
+ "print(f'{norm_eaccent} = {norm_e_accent} : {norm_eaccent == norm_e_accent}')"
113
+ ]
114
+ },
115
+ {
116
+ "cell_type": "markdown",
117
+ "metadata": {},
118
+ "source": [
119
+ "Normalization has actually changed the unicode code point (unicode unique id) for one of these two characters."
120
+ ]
121
+ },
122
+ {
123
+ "cell_type": "code",
124
+ "execution_count": 3,
125
+ "metadata": {},
126
+ "outputs": [],
127
+ "source": [
128
+ "def get_hex_encoding(s):\n",
129
+ " return ' '.join(hex(ord(c)) for c in s)\n",
130
+ "\n",
131
+ "def print_string_and_encoding(s):\n",
132
+ " print(f'{s} : {get_hex_encoding(s)}') "
133
+ ]
134
+ },
135
+ {
136
+ "cell_type": "code",
137
+ "execution_count": 4,
138
+ "metadata": {},
139
+ "outputs": [
140
+ {
141
+ "name": "stdout",
142
+ "output_type": "stream",
143
+ "text": [
144
+ "é : 0xe9\n",
145
+ "é : 0x65 0x301\n",
146
+ "é : 0xe9\n",
147
+ "é : 0xe9\n"
148
+ ]
149
+ }
150
+ ],
151
+ "source": [
152
+ "for s in [eaccent, e_accent, norm_eaccent, norm_e_accent]:\n",
153
+ " print_string_and_encoding(s)"
154
+ ]
155
+ },
156
+ {
157
+ "cell_type": "markdown",
158
+ "metadata": {},
159
+ "source": [
160
+ "This normalization has other side effects which may be considered useful such as converting curly quotes &ldquo; to \" their ASCII equivalent. (<sup>*</sup>Although we *now* lose directionality of the quote...)"
161
+ ]
162
+ },
163
+ {
164
+ "cell_type": "markdown",
165
+ "metadata": {},
166
+ "source": [
167
+ "### Lossless Tokenization\n",
168
+ "\n",
169
+ "SentencePiece also ensures that when you tokenize your data and detokenize your data the original position of white space is preserved. However, tabs and newlines are converted to spaces.\n",
170
+ "\n",
171
+ "To ensure this **lossless tokenization**, SentencePiece replaces white space with _ (U+2581). So that a simple join of the tokens by replacing underscores with spaces can restore the white space, even if there are consecutive symbols. But remember first to normalize and then replace spaces with _ (U+2581)."
172
+ ]
173
+ },
174
+ {
175
+ "cell_type": "code",
176
+ "execution_count": 6,
177
+ "metadata": {},
178
+ "outputs": [],
179
+ "source": [
180
+ "s = 'Tokenization is hard.'\n",
181
+ "sn = normalize('NFKC', s)\n",
182
+ "sn_ = sn.replace(' ', '\\u2581')"
183
+ ]
184
+ },
185
+ {
186
+ "cell_type": "code",
187
+ "execution_count": 7,
188
+ "metadata": {},
189
+ "outputs": [
190
+ {
191
+ "name": "stdout",
192
+ "output_type": "stream",
193
+ "text": [
194
+ "0x54 0x6f 0x6b 0x65 0x6e 0x69 0x7a 0x61 0x74 0x69 0x6f 0x6e 0x20 0x69 0x73 0x20 0x68 0x61 0x72 0x64 0x2e\n",
195
+ "0x54 0x6f 0x6b 0x65 0x6e 0x69 0x7a 0x61 0x74 0x69 0x6f 0x6e 0x20 0x69 0x73 0x20 0x68 0x61 0x72 0x64 0x2e\n",
196
+ "0x54 0x6f 0x6b 0x65 0x6e 0x69 0x7a 0x61 0x74 0x69 0x6f 0x6e 0x2581 0x69 0x73 0x2581 0x68 0x61 0x72 0x64 0x2e\n"
197
+ ]
198
+ }
199
+ ],
200
+ "source": [
201
+ "print(get_hex_encoding(s))\n",
202
+ "print(get_hex_encoding(sn))\n",
203
+ "print(get_hex_encoding(sn_))"
204
+ ]
205
+ },
206
+ {
207
+ "cell_type": "markdown",
208
+ "metadata": {},
209
+ "source": [
210
+ "## BPE Algorithm\n",
211
+ "\n",
212
+ "After discussing the preprocessing that SentencePiece performs, you will get the data, preprocess it, and apply the BPE algorithm. You will see how this reproduces the tokenization produced by training SentencePiece on the example dataset (from this week's assignment).\n",
213
+ "\n",
214
+ "### Preparing our Data\n",
215
+ "First, you get the Squad data and process it as above."
216
+ ]
217
+ },
218
+ {
219
+ "cell_type": "code",
220
+ "execution_count": 8,
221
+ "metadata": {},
222
+ "outputs": [],
223
+ "source": [
224
+ "import ast\n",
225
+ "\n",
226
+ "def convert_json_examples_to_text(filepath):\n",
227
+ " example_jsons = list(map(ast.literal_eval, open(filepath))) # Read in the json from the example file\n",
228
+ " texts = [example_json['text'].decode('utf-8') for example_json in example_jsons] # Decode the byte sequences\n",
229
+ " text = '\\n\\n'.join(texts) # Separate different articles by two newlines\n",
230
+ " text = normalize('NFKC', text) # Normalize the text\n",
231
+ "\n",
232
+ " with open('example.txt', 'w') as fw:\n",
233
+ " fw.write(text)\n",
234
+ " \n",
235
+ " return text"
236
+ ]
237
+ },
238
+ {
239
+ "cell_type": "code",
240
+ "execution_count": 9,
241
+ "metadata": {},
242
+ "outputs": [
243
+ {
244
+ "name": "stdout",
245
+ "output_type": "stream",
246
+ "text": [
247
+ "Beginners BBQ Class Taking Place in Missoula!\n",
248
+ "Do you want to get better at making delicious BBQ? You will have the opportunity, put this on your calendar now. Thursday, September 22nd join World Class BBQ Champion, Tony Balay from Lonestar Smoke Rangers. He will be teaching a beginner level class for everyone who wants to get better with their culinary skills.\n",
249
+ "He will teach you everything you need to know to compete in a KCBS BBQ competition, including techniques, recipes, timelines, meat selection and trimming, plus smoker and fire information.\n",
250
+ "The cost to be in the class is $35 per person, and for spectators it is free. Included in the cost will be either a t-shirt or apron and you will be tasting samples of each meat that is prepared.\n",
251
+ "\n",
252
+ "Discussion in 'Mac OS X Lion (10.7)' started by axboi87, Jan 20, 2012.\n",
253
+ "I've got a 500gb internal drive and a 240gb SSD.\n",
254
+ "When trying to restore using di\n"
255
+ ]
256
+ }
257
+ ],
258
+ "source": [
259
+ "text = convert_json_examples_to_text('./data/data.txt')\n",
260
+ "print(text[:900])"
261
+ ]
262
+ },
263
+ {
264
+ "cell_type": "markdown",
265
+ "metadata": {},
266
+ "source": [
267
+ "In the algorithm the `vocab` variable is actually a frequency dictionary of the words. Those words have been prepended with an *underscore* to indicate that they are the beginning of a word. Finally, the characters have been delimited by spaces so that the BPE algorithm can group the most common characters together in the dictionary in a greedy fashion. You will see how that is done shortly."
268
+ ]
269
+ },
270
+ {
271
+ "cell_type": "code",
272
+ "execution_count": 10,
273
+ "metadata": {},
274
+ "outputs": [],
275
+ "source": [
276
+ "from collections import Counter\n",
277
+ "\n",
278
+ "vocab = Counter(['\\u2581' + word for word in text.split()])\n",
279
+ "vocab = {' '.join([l for l in word]): freq for word, freq in vocab.items()}"
280
+ ]
281
+ },
282
+ {
283
+ "cell_type": "code",
284
+ "execution_count": 11,
285
+ "metadata": {},
286
+ "outputs": [],
287
+ "source": [
288
+ "def show_vocab(vocab, end='\\n', limit=20):\n",
289
+ " \"\"\"Show word frequencys in vocab up to the limit number of words\"\"\"\n",
290
+ " shown = 0\n",
291
+ " for word, freq in vocab.items():\n",
292
+ " print(f'{word}: {freq}', end=end)\n",
293
+ " shown +=1\n",
294
+ " if shown > limit:\n",
295
+ " break"
296
+ ]
297
+ },
298
+ {
299
+ "cell_type": "code",
300
+ "execution_count": 12,
301
+ "metadata": {},
302
+ "outputs": [
303
+ {
304
+ "name": "stdout",
305
+ "output_type": "stream",
306
+ "text": [
307
+ "▁ B e g i n n e r s: 1\n",
308
+ "▁ B B Q: 3\n",
309
+ "▁ C l a s s: 2\n",
310
+ "▁ T a k i n g: 1\n",
311
+ "▁ P l a c e: 1\n",
312
+ "▁ i n: 15\n",
313
+ "▁ M i s s o u l a !: 1\n",
314
+ "▁ D o: 1\n",
315
+ "▁ y o u: 13\n",
316
+ "▁ w a n t: 1\n",
317
+ "▁ t o: 33\n",
318
+ "▁ g e t: 2\n",
319
+ "▁ b e t t e r: 2\n",
320
+ "▁ a t: 1\n",
321
+ "▁ m a k i n g: 2\n",
322
+ "▁ d e l i c i o u s: 1\n",
323
+ "▁ B B Q ?: 1\n",
324
+ "▁ Y o u: 1\n",
325
+ "▁ w i l l: 6\n",
326
+ "▁ h a v e: 4\n",
327
+ "▁ t h e: 31\n"
328
+ ]
329
+ }
330
+ ],
331
+ "source": [
332
+ "show_vocab(vocab)"
333
+ ]
334
+ },
335
+ {
336
+ "cell_type": "markdown",
337
+ "metadata": {},
338
+ "source": [
339
+ "You check the size of the vocabulary (frequency dictionary) because this is the one hyperparameter that BPE depends on crucially on how far it breaks up a word into SentencePieces. It turns out that for your trained model on the small dataset that 60% of 455 merges of the most frequent characters need to be done to reproduce the upperlimit of a 32K `vocab_size` over the entire corpus of examples."
340
+ ]
341
+ },
342
+ {
343
+ "cell_type": "code",
344
+ "execution_count": 13,
345
+ "metadata": {},
346
+ "outputs": [
347
+ {
348
+ "name": "stdout",
349
+ "output_type": "stream",
350
+ "text": [
351
+ "Total number of unique words: 455\n",
352
+ "Number of merges required to reproduce SentencePiece training on the whole corpus: 273\n"
353
+ ]
354
+ }
355
+ ],
356
+ "source": [
357
+ "print(f'Total number of unique words: {len(vocab)}')\n",
358
+ "print(f'Number of merges required to reproduce SentencePiece training on the whole corpus: {int(0.60*len(vocab))}')"
359
+ ]
360
+ },
361
+ {
362
+ "cell_type": "markdown",
363
+ "metadata": {},
364
+ "source": [
365
+ "### BPE Algorithm\n",
366
+ "Directly from the BPE paper you have the following algorithm. "
367
+ ]
368
+ },
369
+ {
370
+ "cell_type": "code",
371
+ "execution_count": 14,
372
+ "metadata": {},
373
+ "outputs": [],
374
+ "source": [
375
+ "import re, collections\n",
376
+ "\n",
377
+ "def get_stats(vocab):\n",
378
+ " pairs = collections.defaultdict(int)\n",
379
+ " for word, freq in vocab.items():\n",
380
+ " symbols = word.split()\n",
381
+ " for i in range(len(symbols) - 1):\n",
382
+ " pairs[symbols[i], symbols[i+1]] += freq\n",
383
+ " return pairs\n",
384
+ "\n",
385
+ "def merge_vocab(pair, v_in):\n",
386
+ " v_out = {}\n",
387
+ " bigram = re.escape(' '.join(pair))\n",
388
+ " p = re.compile(r'(?<!\\S)' + bigram + r'(?!\\S)')\n",
389
+ " for word in v_in:\n",
390
+ " w_out = p.sub(''.join(pair), word)\n",
391
+ " v_out[w_out] = v_in[word]\n",
392
+ " return v_out\n",
393
+ "\n",
394
+ "def get_sentence_piece_vocab(vocab, frac_merges=0.60):\n",
395
+ " sp_vocab = vocab.copy()\n",
396
+ " num_merges = int(len(sp_vocab)*frac_merges)\n",
397
+ " \n",
398
+ " for i in range(num_merges):\n",
399
+ " pairs = get_stats(sp_vocab)\n",
400
+ " best = max(pairs, key=pairs.get)\n",
401
+ " sp_vocab = merge_vocab(best, sp_vocab)\n",
402
+ "\n",
403
+ " return sp_vocab"
404
+ ]
405
+ },
406
+ {
407
+ "cell_type": "markdown",
408
+ "metadata": {},
409
+ "source": [
410
+ "To understand what's going on first take a look at the third function `get_sentence_piece_vocab`. It takes in the current `vocab` word-frequency dictionary and the fraction, `frac_merges`, of the total `vocab_size` to merge characters in the words of the dictionary, `num_merges` times. Then for each *merge* operation it `get_stats` on how many of each pair of character sequences there are. It gets the most frequent *pair* of symbols as the `best` pair. Then it merges that pair of symbols (removes the space between them) in each word in the `vocab` that contains this `best` (= `pair`). Consequently, `merge_vocab` creates a new `vocab`, `v_out`. This process is repeated `num_merges` times and the result is the set of SentencePieces (keys of the final `sp_vocab`)."
411
+ ]
412
+ },
413
+ {
414
+ "cell_type": "markdown",
415
+ "metadata": {},
416
+ "source": [
417
+ "### Additional Discussion of BPE Algorithm"
418
+ ]
419
+ },
420
+ {
421
+ "cell_type": "markdown",
422
+ "metadata": {},
423
+ "source": [
424
+ "Please feel free to skip the below if the above description was enough.\n",
425
+ "\n",
426
+ "In a little more detail you can see in `get_stats` you initially create a list of bigram (two character sequence) frequencies from the vocabulary. Later, this may include trigrams, quadgrams, etc. Note that the key of the `pairs` frequency dictionary is actually a 2-tuple, which is just shorthand notation for a pair.\n",
427
+ "\n",
428
+ "In `merge_vocab` you take in an individual `pair` (of character sequences, note this is the most frequency `best` pair) and the current `vocab` as `v_in`. You create a new `vocab`, `v_out`, from the old by joining together the characters in the pair (removing the space), if they are present in a word of the dictionary.\n",
429
+ "\n",
430
+ "[Warning](https://regex101.com/): the expression `(?<!\\S)` means that either a whitespace character follows before the `bigram` or there is nothing before the bigram (it is the beginning of the word), similarly for `(?!\\S)` for preceding whitespace or the end of the word. "
431
+ ]
432
+ },
433
+ {
434
+ "cell_type": "code",
435
+ "execution_count": 15,
436
+ "metadata": {},
437
+ "outputs": [
438
+ {
439
+ "name": "stdout",
440
+ "output_type": "stream",
441
+ "text": [
442
+ "▁B e g in n ers: 1\n",
443
+ "▁BBQ: 3\n",
444
+ "▁Cl ass: 2\n",
445
+ "▁T ak ing: 1\n",
446
+ "▁P la ce: 1\n",
447
+ "▁in: 15\n",
448
+ "▁M is s ou la !: 1\n",
449
+ "▁D o: 1\n",
450
+ "▁you: 13\n",
451
+ "▁w an t: 1\n",
452
+ "▁to: 33\n",
453
+ "▁g et: 2\n",
454
+ "▁be t ter: 2\n",
455
+ "▁a t: 1\n",
456
+ "▁mak ing: 2\n",
457
+ "▁d e l ic i ou s: 1\n",
458
+ "▁BBQ ?: 1\n",
459
+ "▁ Y ou: 1\n",
460
+ "▁will: 6\n",
461
+ "▁have: 4\n",
462
+ "▁the: 31\n"
463
+ ]
464
+ }
465
+ ],
466
+ "source": [
467
+ "sp_vocab = get_sentence_piece_vocab(vocab)\n",
468
+ "show_vocab(sp_vocab) "
469
+ ]
470
+ },
471
+ {
472
+ "cell_type": "markdown",
473
+ "metadata": {},
474
+ "source": [
475
+ "## Train SentencePiece BPE Tokenizer on Example Data\n",
476
+ "### Explore SentencePiece Model\n",
477
+ "First, explore the SentencePiece model provided with this week's assignment. Remember you can always use Python's built in `help` command to see the documentation for any object or method."
478
+ ]
479
+ },
480
+ {
481
+ "cell_type": "code",
482
+ "execution_count": 16,
483
+ "metadata": {},
484
+ "outputs": [],
485
+ "source": [
486
+ "import sentencepiece as spm\n",
487
+ "sp = spm.SentencePieceProcessor(model_file='./data/sentencepiece.model')"
488
+ ]
489
+ },
490
+ {
491
+ "cell_type": "code",
492
+ "execution_count": 17,
493
+ "metadata": {},
494
+ "outputs": [],
495
+ "source": [
496
+ "# help(sp)"
497
+ ]
498
+ },
499
+ {
500
+ "cell_type": "markdown",
501
+ "metadata": {},
502
+ "source": [
503
+ "Try it out on the first sentence of the example text."
504
+ ]
505
+ },
506
+ {
507
+ "cell_type": "code",
508
+ "execution_count": 19,
509
+ "metadata": {},
510
+ "outputs": [],
511
+ "source": [
512
+ "s0 = 'Beginners BBQ Class Taking Place in Missoula!'"
513
+ ]
514
+ },
515
+ {
516
+ "cell_type": "code",
517
+ "execution_count": 20,
518
+ "metadata": {},
519
+ "outputs": [
520
+ {
521
+ "name": "stdout",
522
+ "output_type": "stream",
523
+ "text": [
524
+ "['▁Beginn', 'ers', '▁BBQ', '▁Class', '▁', 'Taking', '▁Place', '▁in', '▁Miss', 'oul', 'a', '!']\n",
525
+ "[12847, 277, 15068, 4501, 3, 12297, 3399, 16, 5964, 7115, 9, 55]\n",
526
+ "Beginners BBQ Class Taking Place in Missoula!\n",
527
+ "Beginners\n"
528
+ ]
529
+ }
530
+ ],
531
+ "source": [
532
+ "# encode: text => id\n",
533
+ "print(sp.encode_as_pieces(s0))\n",
534
+ "print(sp.encode_as_ids(s0))\n",
535
+ "\n",
536
+ "# decode: id => text\n",
537
+ "print(sp.decode_pieces(sp.encode_as_pieces(s0)))\n",
538
+ "print(sp.decode_ids([12847, 277]))"
539
+ ]
540
+ },
541
+ {
542
+ "cell_type": "markdown",
543
+ "metadata": {},
544
+ "source": [
545
+ "Notice how SentencePiece breaks the words into seemingly odd parts, but you have seen something similar with BPE. But how close was the model trained on the whole corpus of examples with a `vocab_size` of 32,000 instead of 455? Here you can also test what happens to white space, like '\\n'. \n",
546
+ "\n",
547
+ "But first note that SentencePiece encodes the SentencePieces, the tokens, and has reserved some of the ids as can be seen in this week's assignment."
548
+ ]
549
+ },
550
+ {
551
+ "cell_type": "code",
552
+ "execution_count": null,
553
+ "metadata": {},
554
+ "outputs": [],
555
+ "source": [
556
+ "uid = 15068\n",
557
+ "spiece = \"\\u2581BBQ\"\n",
558
+ "unknown = \"__MUST_BE_UNKNOWN__\"\n",
559
+ "\n",
560
+ "# id <=> piece conversion\n",
561
+ "print(f'SentencePiece for ID {uid}: {sp.id_to_piece(uid)}')\n",
562
+ "print(f'ID for Sentence Piece {spiece}: {sp.piece_to_id(spiece)}')\n",
563
+ "\n",
564
+ "# returns 0 for unknown tokens (we can change the id for UNK)\n",
565
+ "print(f'ID for unknown text {unknown}: {sp.piece_to_id(unknown)}')"
566
+ ]
567
+ },
568
+ {
569
+ "cell_type": "code",
570
+ "execution_count": null,
571
+ "metadata": {},
572
+ "outputs": [],
573
+ "source": [
574
+ "print(f'Beginning of sentence id: {sp.bos_id()}')\n",
575
+ "print(f'Pad id: {sp.pad_id()}')\n",
576
+ "print(f'End of sentence id: {sp.eos_id()}')\n",
577
+ "print(f'Unknown id: {sp.unk_id()}')\n",
578
+ "print(f'Vocab size: {sp.vocab_size()}')"
579
+ ]
580
+ },
581
+ {
582
+ "cell_type": "markdown",
583
+ "metadata": {},
584
+ "source": [
585
+ "You can also check what are the ids for the first part and last part of the vocabulary."
586
+ ]
587
+ },
588
+ {
589
+ "cell_type": "code",
590
+ "execution_count": null,
591
+ "metadata": {},
592
+ "outputs": [],
593
+ "source": [
594
+ "print('\\nId\\tSentP\\tControl?')\n",
595
+ "print('------------------------')\n",
596
+ "# <unk>, <s>, </s> are defined by default. Their ids are (0, 1, 2)\n",
597
+ "# <s> and </s> are defined as 'control' symbol.\n",
598
+ "for uid in range(10):\n",
599
+ " print(uid, sp.id_to_piece(uid), sp.is_control(uid), sep='\\t')\n",
600
+ " \n",
601
+ "# for uid in range(sp.vocab_size()-10,sp.vocab_size()):\n",
602
+ "# print(uid, sp.id_to_piece(uid), sp.is_control(uid), sep='\\t')"
603
+ ]
604
+ },
605
+ {
606
+ "cell_type": "markdown",
607
+ "metadata": {},
608
+ "source": [
609
+ "### Train SentencePiece BPE model with our example.txt"
610
+ ]
611
+ },
612
+ {
613
+ "cell_type": "markdown",
614
+ "metadata": {},
615
+ "source": [
616
+ "Finally, train your own BPE model directly from the SentencePiece library and compare it to the results of the implemention of the algorithm from the BPE paper itself."
617
+ ]
618
+ },
619
+ {
620
+ "cell_type": "code",
621
+ "execution_count": null,
622
+ "metadata": {},
623
+ "outputs": [],
624
+ "source": [
625
+ "spm.SentencePieceTrainer.train('--input=example.txt --model_prefix=example_bpe --vocab_size=450 --model_type=bpe')\n",
626
+ "sp_bpe = spm.SentencePieceProcessor()\n",
627
+ "sp_bpe.load('example_bpe.model')\n",
628
+ "\n",
629
+ "print('*** BPE ***')\n",
630
+ "print(sp_bpe.encode_as_pieces(s0))"
631
+ ]
632
+ },
633
+ {
634
+ "cell_type": "code",
635
+ "execution_count": null,
636
+ "metadata": {},
637
+ "outputs": [],
638
+ "source": [
639
+ "show_vocab(sp_vocab, end = ', ')"
640
+ ]
641
+ },
642
+ {
643
+ "cell_type": "markdown",
644
+ "metadata": {},
645
+ "source": [
646
+ "The implementation of BPE's code from the paper matches up pretty well with the library itself! The differences are probably accounted for by the `vocab_size`. There is also another technical difference in that in the SentencePiece implementation of BPE a priority queue is used to more efficiently keep track of the *best pairs*. Actually, there is a priority queue in the Python standard library called `heapq` if you would like to give that a try below! "
647
+ ]
648
+ },
649
+ {
650
+ "cell_type": "markdown",
651
+ "metadata": {},
652
+ "source": [
653
+ "## Optionally try to implement BPE using a priority queue below"
654
+ ]
655
+ },
656
+ {
657
+ "cell_type": "code",
658
+ "execution_count": null,
659
+ "metadata": {},
660
+ "outputs": [],
661
+ "source": [
662
+ "from heapq import heappush, heappop"
663
+ ]
664
+ },
665
+ {
666
+ "cell_type": "code",
667
+ "execution_count": null,
668
+ "metadata": {},
669
+ "outputs": [],
670
+ "source": [
671
+ "def heapsort(iterable):\n",
672
+ " h = []\n",
673
+ " for value in iterable:\n",
674
+ " heappush(h, value)\n",
675
+ " return [heappop(h) for i in range(len(h))]"
676
+ ]
677
+ },
678
+ {
679
+ "cell_type": "code",
680
+ "execution_count": null,
681
+ "metadata": {},
682
+ "outputs": [],
683
+ "source": [
684
+ "a = [1,4,3,1,3,2,1,4,2]\n",
685
+ "heapsort(a)"
686
+ ]
687
+ },
688
+ {
689
+ "cell_type": "markdown",
690
+ "metadata": {},
691
+ "source": [
692
+ "For a more extensive example consider looking at the [SentencePiece repo](https://github.com/google/sentencepiece/blob/master/python/sentencepiece_python_module_example.ipynb). The last few sections of this code were repurposed from that tutorial. Thanks for your participation! Next stop BERT and T5!"
693
+ ]
694
+ },
695
+ {
696
+ "cell_type": "code",
697
+ "execution_count": null,
698
+ "metadata": {},
699
+ "outputs": [],
700
+ "source": []
701
+ }
702
+ ],
703
+ "metadata": {
704
+ "kernelspec": {
705
+ "display_name": "Python 3 (ipykernel)",
706
+ "language": "python",
707
+ "name": "python3"
708
+ },
709
+ "language_info": {
710
+ "codemirror_mode": {
711
+ "name": "ipython",
712
+ "version": 3
713
+ },
714
+ "file_extension": ".py",
715
+ "mimetype": "text/x-python",
716
+ "name": "python",
717
+ "nbconvert_exporter": "python",
718
+ "pygments_lexer": "ipython3",
719
+ "version": "3.10.11"
720
+ }
721
+ },
722
+ "nbformat": 4,
723
+ "nbformat_minor": 4
724
+ }
NLP with Attention Models/QA/BPE_algorithm/home/jovyan/work/data/data.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {'content-length': b'1970', 'content-type': b'text/plain', 'text': b'Beginners BBQ Class Taking Place in Missoula!\nDo you want to get better at making delicious BBQ? You will have the opportunity, put this on your calendar now. Thursday, September 22nd join World Class BBQ Champion, Tony Balay from Lonestar Smoke Rangers. He will be teaching a beginner level class for everyone who wants to get better with their culinary skills.\nHe will teach you everything you need to know to compete in a KCBS BBQ competition, including techniques, recipes, timelines, meat selection and trimming, plus smoker and fire information.\nThe cost to be in the class is $35 per person, and for spectators it is free. Included in the cost will be either a t-shirt or apron and you will be tasting samples of each meat that is prepared.', 'timestamp': b'2019-04-25T12:57:54Z', 'url': b'https://klyq.com/beginners-bbq-class-taking-place-in-missoula/'}
2
+ {'content-length': b'12064', 'content-type': b'text/plain', 'text': b'Discussion in \'Mac OS X Lion (10.7)\' started by axboi87, Jan 20, 2012.\nI\'ve got a 500gb internal drive and a 240gb SSD.\nWhen trying to restore using disk utility i\'m given the error "Not enough space on disk ____ to restore"\nBut I shouldn\'t have to do that!!!\nAny ideas or workarounds before resorting to the above?\nUse Carbon Copy Cloner to copy one drive to the other. I\'ve done this several times going from larger HDD to smaller SSD and I wound up with a bootable SSD drive. One step you have to remember not to skip is to use Disk Utility to partition the SSD as GUID partition scheme HFS+ before doing the clone. If it came Apple Partition Scheme, even if you let CCC do the clone, the resulting drive won\'t be bootable. CCC usually works in "file mode" and it can easily copy a larger drive (that\'s mostly empty) onto a smaller drive. If you tell CCC to clone a drive you did NOT boot from, it can work in block copy mode where the destination drive must be the same size or larger than the drive you are cloning from (if I recall).\nI\'ve actually done this somehow on Disk Utility several times (booting from a different drive (or even the dvd) so not running disk utility from the drive your cloning) and had it work just fine from larger to smaller bootable clone. Definitely format the drive cloning to first, as bootable Apple etc..\nThanks for pointing this out. My only experience using DU to go larger to smaller was when I was trying to make a Lion install stick and I was unable to restore InstallESD.dmg to a 4 GB USB stick but of course the reason that wouldn\'t fit is there was slightly more than 4 GB of data.', 'timestamp': b'2019-04-21T10:07:13Z', 'url':b'https://forums.macrumors.com/threads/restore-from-larger-disk-to-smaller-disk.1311329/'}
3
+ {'content-length': b'5235', 'content-type': b'text/plain', 'text': b'Foil plaid lycra and spandex shortall with metallic slinky insets. Attached metallic elastic belt with O-ring. Headband included. Great hip hop or jazz dance costume. Made in the USA.', 'timestamp': b'2019-04-25T10:40:23Z', 'url': b'https://awishcometrue.com/Catalogs/Clearance/Tweens/V1960-Find-A-Way'}
4
+ {'content-length': b'4967', 'content-type': b'text/plain', 'text': b"How many backlinks per day for new site?\nDiscussion in 'Black Hat SEO' started by Omoplata, Dec 3, 2010.\n1) for a newly created site, what's the max # backlinks per day I should do to be safe?\n2) how long do I have to let my site age before I can start making more blinks?\nI did about 6000 forum profiles every 24 hours for 10 days for one of my sites which had a brand new domain.\nThere is three backlinks for every of these forum profile so thats 18 000 backlinks every 24 hours and nothing happened in terms of being penalized or sandboxed. This is now maybe 3 months ago and the site is ranking on first page for a lot of my targeted keywords.\nbuild more you can in starting but do manual submission and not spammy type means manual + relevant to the post.. then after 1 month you can make a big blast..\nWow, dude, you built 18k backlinks a day on a brand new site? How quickly did you rank up? What kind of competition/searches did those keywords have?", 'timestamp': b'2019-04-21T12:46:19Z', 'url': b'https://www.blackhatworld.com/seo/how-many-backlinks-per-day-for-new-site.258615/'}
5
+ {'content-length': b'4499', 'content-type': b'text/plain', 'text': b'The Denver Board of Education opened the 2017-18 school year with an update on projects that include new construction, upgrades, heat mitigation and quality learning environments.\nWe are excited that Denver students will be the beneficiaries of a four year, $572 million General Obligation Bond. Since the passage of the bond, our construction team has worked to schedule the projects over the four-year term of the bond.\nDenver voters on Tuesday approved bond and mill funding measures for students in Denver Public Schools, agreeing to invest $572 million in bond funding to build and improve schools and $56.6 million in operating dollars to support proven initiatives, such as early literacy.\nDenver voters say yes to bond and mill levy funding support for DPS students and schools. Click to learn more about the details of the voter-approved bond measure.\nDenver voters on Nov. 8 approved bond and mill funding measures for DPS students and schools. Learn more about what\xe2\x80\x99s included in the mill levy measure.', 'timestamp': b'2019-04-20T14:33:21Z', 'url': b'http://bond.dpsk12.org/category/news/'}
NLP with Attention Models/QA/BPE_algorithm/home/jovyan/work/data/example.txt ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Beginners BBQ Class Taking Place in Missoula!
2
+ Do you want to get better at making delicious BBQ? You will have the opportunity, put this on your calendar now. Thursday, September 22nd join World Class BBQ Champion, Tony Balay from Lonestar Smoke Rangers. He will be teaching a beginner level class for everyone who wants to get better with their culinary skills.
3
+ He will teach you everything you need to know to compete in a KCBS BBQ competition, including techniques, recipes, timelines, meat selection and trimming, plus smoker and fire information.
4
+ The cost to be in the class is $35 per person, and for spectators it is free. Included in the cost will be either a t-shirt or apron and you will be tasting samples of each meat that is prepared.
5
+
6
+ Discussion in 'Mac OS X Lion (10.7)' started by axboi87, Jan 20, 2012.
7
+ I've got a 500gb internal drive and a 240gb SSD.
8
+ When trying to restore using disk utility i'm given the error "Not enough space on disk ____ to restore"
9
+ But I shouldn't have to do that!!!
10
+ Any ideas or workarounds before resorting to the above?
11
+ Use Carbon Copy Cloner to copy one drive to the other. I've done this several times going from larger HDD to smaller SSD and I wound up with a bootable SSD drive. One step you have to remember not to skip is to use Disk Utility to partition the SSD as GUID partition scheme HFS+ before doing the clone. If it came Apple Partition Scheme, even if you let CCC do the clone, the resulting drive won't be bootable. CCC usually works in "file mode" and it can easily copy a larger drive (that's mostly empty) onto a smaller drive. If you tell CCC to clone a drive you did NOT boot from, it can work in block copy mode where the destination drive must be the same size or larger than the drive you are cloning from (if I recall).
12
+ I've actually done this somehow on Disk Utility several times (booting from a different drive (or even the dvd) so not running disk utility from the drive your cloning) and had it work just fine from larger to smaller bootable clone. Definitely format the drive cloning to first, as bootable Apple etc..
13
+ Thanks for pointing this out. My only experience using DU to go larger to smaller was when I was trying to make a Lion install stick and I was unable to restore InstallESD.dmg to a 4 GB USB stick but of course the reason that wouldn't fit is there was slightly more than 4 GB of data.
14
+
15
+ Foil plaid lycra and spandex shortall with metallic slinky insets. Attached metallic elastic belt with O-ring. Headband included. Great hip hop or jazz dance costume. Made in the USA.
16
+
17
+ How many backlinks per day for new site?
18
+ Discussion in 'Black Hat SEO' started by Omoplata, Dec 3, 2010.
19
+ 1) for a newly created site, what's the max # backlinks per day I should do to be safe?
20
+ 2) how long do I have to let my site age before I can start making more blinks?
21
+ I did about 6000 forum profiles every 24 hours for 10 days for one of my sites which had a brand new domain.
22
+ There is three backlinks for every of these forum profile so thats 18 000 backlinks every 24 hours and nothing happened in terms of being penalized or sandboxed. This is now maybe 3 months ago and the site is ranking on first page for a lot of my targeted keywords.
23
+ build more you can in starting but do manual submission and not spammy type means manual + relevant to the post.. then after 1 month you can make a big blast..
24
+ Wow, dude, you built 18k backlinks a day on a brand new site? How quickly did you rank up? What kind of competition/searches did those keywords have?
25
+
26
+ The Denver Board of Education opened the 2017-18 school year with an update on projects that include new construction, upgrades, heat mitigation and quality learning environments.
27
+ We are excited that Denver students will be the beneficiaries of a four year, $572 million General Obligation Bond. Since the passage of the bond, our construction team has worked to schedule the projects over the four-year term of the bond.
28
+ Denver voters on Tuesday approved bond and mill funding measures for students in Denver Public Schools, agreeing to invest $572 million in bond funding to build and improve schools and $56.6 million in operating dollars to support proven initiatives, such as early literacy.
29
+ Denver voters say yes to bond and mill levy funding support for DPS students and schools. Click to learn more about the details of the voter-approved bond measure.
30
+ Denver voters on Nov. 8 approved bond and mill funding measures for DPS students and schools. Learn more about what’s included in the mill levy measure.
NLP with Attention Models/QA/BPE_algorithm/home/jovyan/work/data/example_bpe.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:72964ffad404e80f7c7450e333730318ecb18b36c94cd8191b23b4461283610b
3
+ size 243359
NLP with Attention Models/QA/BPE_algorithm/home/jovyan/work/data/example_bpe.vocab ADDED
@@ -0,0 +1,450 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <unk> 0
2
+ <s> 0
3
+ </s> 0
4
+ ▁t -0
5
+ in -1
6
+ on -2
7
+ ▁a -3
8
+ er -4
9
+ ▁s -5
10
+ ▁th -6
11
+ or -7
12
+ ▁b -8
13
+ ▁d -9
14
+ ▁m -10
15
+ it -11
16
+ nd -12
17
+ ou -13
18
+ ▁f -14
19
+ ing -15
20
+ ▁the -16
21
+ ▁to -17
22
+ ve -18
23
+ ▁w -19
24
+ ar -20
25
+ ▁c -21
26
+ at -22
27
+ ll -23
28
+ ▁in -24
29
+ re -25
30
+ en -26
31
+ is -27
32
+ le -28
33
+ st -29
34
+ ion -30
35
+ ▁and -31
36
+ an -32
37
+ ▁p -33
38
+ ot -34
39
+ ▁y -35
40
+ as -36
41
+ ed -37
42
+ ▁o -38
43
+ ch -39
44
+ ro -40
45
+ ▁D -41
46
+ ▁I -42
47
+ ▁e -43
48
+ ▁be -44
49
+ ▁h -45
50
+ ▁for -46
51
+ ▁you -47
52
+ ill -48
53
+ ive -49
54
+ ver -50
55
+ ▁of -51
56
+ ▁n -52
57
+ all -53
58
+ ▁dr -54
59
+ ▁on -55
60
+ ▁drive -56
61
+ ck -57
62
+ es -58
63
+ ▁u -59
64
+ ore -60
65
+ ▁st -61
66
+ et -62
67
+ il -63
68
+ ud -64
69
+ ▁C -65
70
+ ▁S -66
71
+ ▁re -67
72
+ al -68
73
+ ay -69
74
+ pp -70
75
+ ▁2 -71
76
+ ▁B -72
77
+ ▁T -73
78
+ ▁l -74
79
+ lin -75
80
+ ▁cl -76
81
+ ▁co -77
82
+ ks -78
83
+ me -79
84
+ ow -80
85
+ ts -81
86
+ ▁H -82
87
+ ond -83
88
+ one -84
89
+ ▁do -85
90
+ ▁ha -86
91
+ ▁is -87
92
+ ly -88
93
+ mp -89
94
+ art -90
95
+ rom -91
96
+ ▁le -92
97
+ ▁me -93
98
+ ▁bond -94
99
+ ▁from -95
100
+ ▁mill -96
101
+ ic -97
102
+ id -98
103
+ la -99
104
+ se -100
105
+ ▁g -101
106
+ arg -102
107
+ ers -103
108
+ ite -104
109
+ ith -105
110
+ ity -106
111
+ oot -107
112
+ our -108
113
+ ▁Th -109
114
+ ▁ne -110
115
+ ▁wh -111
116
+ ▁Den -112
117
+ ▁sch -113
118
+ links -114
119
+ ▁that -115
120
+ ▁will -116
121
+ ▁Denver -117
122
+ SD -118
123
+ ab -119
124
+ ak -120
125
+ ce -121
126
+ cl -122
127
+ ct -123
128
+ ir -124
129
+ ol -125
130
+ ▁( -126
131
+ ▁1 -127
132
+ ▁G -128
133
+ ▁O -129
134
+ ▁U -130
135
+ ▁W -131
136
+ ack -132
137
+ and -133
138
+ ass -134
139
+ isk -135
140
+ ool -136
141
+ ort -137
142
+ ▁bu -138
143
+ ▁it -139
144
+ ▁or -140
145
+ ▁sm -141
146
+ ▁te -142
147
+ able -143
148
+ clud -144
149
+ ents -145
150
+ rove -146
151
+ very -147
152
+ ▁can -148
153
+ ▁new -149
154
+ ▁wor -150
155
+ arger -151
156
+ ation -152
157
+ ition -153
158
+ ▁back -154
159
+ ▁boot -155
160
+ ▁have -156
161
+ ▁more -157
162
+ ▁site -158
163
+ ▁with -159
164
+ ▁every -160
165
+ ▁larger -161
166
+ ▁backlinks -162
167
+ BQ -163
168
+ ig -164
169
+ ld -165
170
+ py -166
171
+ th -167
172
+ ▁$ -168
173
+ ▁A -169
174
+ ▁L -170
175
+ ▁k -171
176
+ ▁v -172
177
+ ach -173
178
+ asu -174
179
+ ear -175
180
+ ick -176
181
+ out -177
182
+ ter -178
183
+ til -179
184
+ und -180
185
+ ▁20 -181
186
+ ▁Cl -182
187
+ ▁ab -183
188
+ ▁sp -184
189
+ ▁su -185
190
+ ▁up -186
191
+ ools -187
192
+ ▁BBQ -188
193
+ ▁SSD -189
194
+ ▁day -190
195
+ ▁did -191
196
+ ▁mak -192
197
+ ▁not -193
198
+ ▁pro -194
199
+ ▁vot -195
200
+ ▁was -196
201
+ aller -197
202
+ asure -198
203
+ ▁fund -199
204
+ ▁stud -200
205
+ ▁this -201
206
+ ▁work -202
207
+ tility -203
208
+ ▁clone -204
209
+ ▁start -205
210
+ ▁includ -206
211
+ ▁funding -207
212
+ ▁measure -208
213
+ ▁smaller -209
214
+ ▁bootable -210
215
+ ▁students -211
216
+ .. -212
217
+ 00 -213
218
+ ad -214
219
+ ec -215
220
+ fi -216
221
+ ge -217
222
+ if -218
223
+ im -219
224
+ ip -220
225
+ qu -221
226
+ ru -222
227
+ us -223
228
+ ▁M -224
229
+ ▁P -225
230
+ ▁j -226
231
+ ere -227
232
+ ree -228
233
+ ▁$5 -229
234
+ ▁24 -230
235
+ ▁CC -231
236
+ ▁He -232
237
+ ▁as -233
238
+ ▁mo -234
239
+ ▁my -235
240
+ ▁sa -236
241
+ ▁se -237
242
+ ▁sh -238
243
+ ▁so -239
244
+ ▁tr -240
245
+ ▁us -241
246
+ file -242
247
+ fore -243
248
+ mpet -244
249
+ ould -245
250
+ sion -246
251
+ ▁201 -247
252
+ ▁CCC -248
253
+ ▁man -249
254
+ ▁per -250
255
+ ction -251
256
+ oning -252
257
+ pport -253
258
+ roved -254
259
+ store -255
260
+ ▁buil -256
261
+ ▁copy -257
262
+ ▁cost -258
263
+ ▁disk -259
264
+ ▁about -260
265
+ pproved -261
266
+ ▁before -262
267
+ ▁compet -263
268
+ ▁voters -264
269
+ artition -265
270
+ ▁cloning -266
271
+ ▁million -267
272
+ ▁restore -268
273
+ ▁schools -269
274
+ 0. -270
275
+ 72 -271
276
+ PS -272
277
+ __ -273
278
+ ac -274
279
+ am -275
280
+ bl -276
281
+ bo -277
282
+ de -278
283
+ ds -279
284
+ ef -280
285
+ ep -281
286
+ ey -282
287
+ gb -283
288
+ iz -284
289
+ lt -285
290
+ mb -286
291
+ mo -287
292
+ um -288
293
+ ut -289
294
+ vy -290
295
+ ▁" -291
296
+ ▁' -292
297
+ ▁3 -293
298
+ ▁4 -294
299
+ ▁N -295
300
+ ▁i -296
301
+ ▁r -297
302
+ 000 -298
303
+ age -299
304
+ ank -300
305
+ ant -301
306
+ arn -302
307
+ ata -303
308
+ cus -304
309
+ day -305
310
+ eme -306
311
+ erm -307
312
+ eyw -308
313
+ gin -309
314
+ ici -310
315
+ jec -311
316
+ oin -312
317
+ per -313
318
+ ual -314
319
+ ust -315
320
+ ven -316
321
+ ▁18 -317
322
+ ▁GB -318
323
+ ▁If -319
324
+ ▁In -320
325
+ ▁US -321
326
+ ▁Wh -322
327
+ ▁ag -323
328
+ ▁br -324
329
+ ▁by -325
330
+ ▁ca -326
331
+ ▁de -327
332
+ ▁en -328
333
+ ▁ex -329
334
+ ▁go -330
335
+ ▁qu -331
336
+ ▁sk -332
337
+ ally -333
338
+ ened -334
339
+ ginn -335
340
+ imes -336
341
+ irst -337
342
+ last -338
343
+ mber -339
344
+ onst -340
345
+ onth -341
346
+ ords -342
347
+ ound -343
348
+ ours -344
349
+ pple -345
350
+ reat -346
351
+ tter -347
352
+ ying -348
353
+ ▁DPS -349
354
+ ▁Dis -350
355
+ ▁How -351
356
+ ▁Sch -352
357
+ ▁The -353
358
+ ▁are -354
359
+ ▁but -355
360
+ ▁get -356
361
+ ▁had -357
362
+ ▁let -358
363
+ ▁met -359
364
+ ▁now -360
365
+ ▁one -361
366
+ ▁rec -362
367
+ ▁res -363
368
+ allic -364
369
+ jects -365
370
+ ouldn -366
371
+ stall -367
372
+ ually -368
373
+ veral -369
374
+ ▁$572 -370
375
+ ▁Disk -371
376
+ ▁Lion -372
377
+ ▁done -373
378
+ ▁ -374
379
+ e -375
380
+ o -376
381
+ t -377
382
+ n -378
383
+ a -379
384
+ i -380
385
+ r -381
386
+ s -382
387
+ l -383
388
+ d -384
389
+ h -385
390
+ u -386
391
+ c -387
392
+ m -388
393
+ y -389
394
+ p -390
395
+ b -391
396
+ f -392
397
+ g -393
398
+ v -394
399
+ w -395
400
+ k -396
401
+ . -397
402
+ , -398
403
+ D -399
404
+ S -400
405
+ B -401
406
+ C -402
407
+ I -403
408
+ 0 -404
409
+ ' -405
410
+ 2 -406
411
+ 1 -407
412
+ T -408
413
+ ? -409
414
+ H -410
415
+ ) -411
416
+ O -412
417
+ U -413
418
+ x -414
419
+ ( -415
420
+ - -416
421
+ 4 -417
422
+ 5 -418
423
+ 7 -419
424
+ 8 -420
425
+ A -421
426
+ G -422
427
+ P -423
428
+ W -424
429
+ j -425
430
+ ! -426
431
+ " -427
432
+ $ -428
433
+ L -429
434
+ M -430
435
+ Q -431
436
+ _ -432
437
+ z -433
438
+ 3 -434
439
+ 6 -435
440
+ E -436
441
+ N -437
442
+ q -438
443
+ + -439
444
+ F -440
445
+ # -441
446
+ / -442
447
+ J -443
448
+ K -444
449
+ R -445
450
+ X -446
NLP with Attention Models/QA/BPE_algorithm/home/jovyan/work/data/sentencepiece.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d60acb128cf7b7f2536e8f38a5b18a05535c9e14c7a355904270e15b0945ea86
3
+ size 791656
NLP with Attention Models/QA/BPE_algorithm/home/jovyan/work/example.txt ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Beginners BBQ Class Taking Place in Missoula!
2
+ Do you want to get better at making delicious BBQ? You will have the opportunity, put this on your calendar now. Thursday, September 22nd join World Class BBQ Champion, Tony Balay from Lonestar Smoke Rangers. He will be teaching a beginner level class for everyone who wants to get better with their culinary skills.
3
+ He will teach you everything you need to know to compete in a KCBS BBQ competition, including techniques, recipes, timelines, meat selection and trimming, plus smoker and fire information.
4
+ The cost to be in the class is $35 per person, and for spectators it is free. Included in the cost will be either a t-shirt or apron and you will be tasting samples of each meat that is prepared.
5
+
6
+ Discussion in 'Mac OS X Lion (10.7)' started by axboi87, Jan 20, 2012.
7
+ I've got a 500gb internal drive and a 240gb SSD.
8
+ When trying to restore using disk utility i'm given the error "Not enough space on disk ____ to restore"
9
+ But I shouldn't have to do that!!!
10
+ Any ideas or workarounds before resorting to the above?
11
+ Use Carbon Copy Cloner to copy one drive to the other. I've done this several times going from larger HDD to smaller SSD and I wound up with a bootable SSD drive. One step you have to remember not to skip is to use Disk Utility to partition the SSD as GUID partition scheme HFS+ before doing the clone. If it came Apple Partition Scheme, even if you let CCC do the clone, the resulting drive won't be bootable. CCC usually works in "file mode" and it can easily copy a larger drive (that's mostly empty) onto a smaller drive. If you tell CCC to clone a drive you did NOT boot from, it can work in block copy mode where the destination drive must be the same size or larger than the drive you are cloning from (if I recall).
12
+ I've actually done this somehow on Disk Utility several times (booting from a different drive (or even the dvd) so not running disk utility from the drive your cloning) and had it work just fine from larger to smaller bootable clone. Definitely format the drive cloning to first, as bootable Apple etc..
13
+ Thanks for pointing this out. My only experience using DU to go larger to smaller was when I was trying to make a Lion install stick and I was unable to restore InstallESD.dmg to a 4 GB USB stick but of course the reason that wouldn't fit is there was slightly more than 4 GB of data.
14
+
15
+ Foil plaid lycra and spandex shortall with metallic slinky insets. Attached metallic elastic belt with O-ring. Headband included. Great hip hop or jazz dance costume. Made in the USA.
16
+
17
+ How many backlinks per day for new site?
18
+ Discussion in 'Black Hat SEO' started by Omoplata, Dec 3, 2010.
19
+ 1) for a newly created site, what's the max # backlinks per day I should do to be safe?
20
+ 2) how long do I have to let my site age before I can start making more blinks?
21
+ I did about 6000 forum profiles every 24 hours for 10 days for one of my sites which had a brand new domain.
22
+ There is three backlinks for every of these forum profile so thats 18 000 backlinks every 24 hours and nothing happened in terms of being penalized or sandboxed. This is now maybe 3 months ago and the site is ranking on first page for a lot of my targeted keywords.
23
+ build more you can in starting but do manual submission and not spammy type means manual + relevant to the post.. then after 1 month you can make a big blast..
24
+ Wow, dude, you built 18k backlinks a day on a brand new site? How quickly did you rank up? What kind of competition/searches did those keywords have?
25
+
26
+ The Denver Board of Education opened the 2017-18 school year with an update on projects that include new construction, upgrades, heat mitigation and quality learning environments.
27
+ We are excited that Denver students will be the beneficiaries of a four year, $572 million General Obligation Bond. Since the passage of the bond, our construction team has worked to schedule the projects over the four-year term of the bond.
28
+ Denver voters on Tuesday approved bond and mill funding measures for students in Denver Public Schools, agreeing to invest $572 million in bond funding to build and improve schools and $56.6 million in operating dollars to support proven initiatives, such as early literacy.
29
+ Denver voters say yes to bond and mill levy funding support for DPS students and schools. Click to learn more about the details of the voter-approved bond measure.
30
+ Denver voters on Nov. 8 approved bond and mill funding measures for DPS students and schools. Learn more about what’s included in the mill levy measure.
NLP with Attention Models/QA/BPE_algorithm/home/jovyan/work/example_bpe.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e3c6c33b38015133abc0b477cfc51ef435a4741e2f907ae26760f61ce8ae85cb
3
+ size 243359
NLP with Attention Models/QA/BPE_algorithm/home/jovyan/work/example_bpe.vocab ADDED
@@ -0,0 +1,450 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <unk> 0
2
+ <s> 0
3
+ </s> 0
4
+ ▁t -0
5
+ in -1
6
+ on -2
7
+ ▁a -3
8
+ er -4
9
+ ▁s -5
10
+ ▁th -6
11
+ or -7
12
+ ▁b -8
13
+ ▁d -9
14
+ ▁m -10
15
+ it -11
16
+ nd -12
17
+ ou -13
18
+ ▁f -14
19
+ ing -15
20
+ ▁the -16
21
+ ▁to -17
22
+ ve -18
23
+ ▁w -19
24
+ ar -20
25
+ ▁c -21
26
+ at -22
27
+ ll -23
28
+ ▁in -24
29
+ re -25
30
+ en -26
31
+ is -27
32
+ le -28
33
+ st -29
34
+ ion -30
35
+ ▁and -31
36
+ an -32
37
+ ▁p -33
38
+ ot -34
39
+ ▁y -35
40
+ as -36
41
+ ed -37
42
+ ▁o -38
43
+ ch -39
44
+ ro -40
45
+ ▁D -41
46
+ ▁I -42
47
+ ▁e -43
48
+ ▁be -44
49
+ ▁h -45
50
+ ▁for -46
51
+ ▁you -47
52
+ ill -48
53
+ ive -49
54
+ ver -50
55
+ ▁of -51
56
+ ▁n -52
57
+ all -53
58
+ ▁dr -54
59
+ ▁on -55
60
+ ▁drive -56
61
+ ck -57
62
+ es -58
63
+ ▁u -59
64
+ ore -60
65
+ ▁st -61
66
+ et -62
67
+ il -63
68
+ ud -64
69
+ ▁C -65
70
+ ▁S -66
71
+ ▁re -67
72
+ al -68
73
+ ay -69
74
+ pp -70
75
+ ▁2 -71
76
+ ▁B -72
77
+ ▁T -73
78
+ ▁l -74
79
+ lin -75
80
+ ▁cl -76
81
+ ▁co -77
82
+ ks -78
83
+ me -79
84
+ ow -80
85
+ ts -81
86
+ ▁H -82
87
+ ond -83
88
+ one -84
89
+ ▁do -85
90
+ ▁ha -86
91
+ ▁is -87
92
+ ly -88
93
+ mp -89
94
+ art -90
95
+ rom -91
96
+ ▁le -92
97
+ ▁me -93
98
+ ▁bond -94
99
+ ▁from -95
100
+ ▁mill -96
101
+ ic -97
102
+ id -98
103
+ la -99
104
+ se -100
105
+ ▁g -101
106
+ arg -102
107
+ ers -103
108
+ ite -104
109
+ ith -105
110
+ ity -106
111
+ oot -107
112
+ our -108
113
+ ▁Th -109
114
+ ▁ne -110
115
+ ▁wh -111
116
+ ▁Den -112
117
+ ▁sch -113
118
+ links -114
119
+ ▁that -115
120
+ ▁will -116
121
+ ▁Denver -117
122
+ 00 -118
123
+ SD -119
124
+ ab -120
125
+ ak -121
126
+ ce -122
127
+ cl -123
128
+ ct -124
129
+ ir -125
130
+ ol -126
131
+ ▁( -127
132
+ ▁1 -128
133
+ ▁G -129
134
+ ▁O -130
135
+ ▁U -131
136
+ ▁W -132
137
+ ack -133
138
+ and -134
139
+ ass -135
140
+ isk -136
141
+ ool -137
142
+ ort -138
143
+ ▁bu -139
144
+ ▁it -140
145
+ ▁or -141
146
+ ▁sm -142
147
+ ▁te -143
148
+ able -144
149
+ clud -145
150
+ ents -146
151
+ rove -147
152
+ very -148
153
+ ▁can -149
154
+ ▁new -150
155
+ ▁wor -151
156
+ arger -152
157
+ ation -153
158
+ ition -154
159
+ ▁back -155
160
+ ▁boot -156
161
+ ▁have -157
162
+ ▁more -158
163
+ ▁site -159
164
+ ▁with -160
165
+ ▁every -161
166
+ ▁larger -162
167
+ ▁backlinks -163
168
+ BQ -164
169
+ ig -165
170
+ ld -166
171
+ py -167
172
+ th -168
173
+ ▁$ -169
174
+ ▁A -170
175
+ ▁L -171
176
+ ▁k -172
177
+ ▁v -173
178
+ ach -174
179
+ asu -175
180
+ ear -176
181
+ ick -177
182
+ out -178
183
+ ter -179
184
+ til -180
185
+ und -181
186
+ ▁20 -182
187
+ ▁Cl -183
188
+ ▁ab -184
189
+ ▁sp -185
190
+ ▁su -186
191
+ ▁up -187
192
+ ools -188
193
+ ▁BBQ -189
194
+ ▁SSD -190
195
+ ▁day -191
196
+ ▁did -192
197
+ ▁mak -193
198
+ ▁not -194
199
+ ▁pro -195
200
+ ▁vot -196
201
+ ▁was -197
202
+ aller -198
203
+ asure -199
204
+ ▁fund -200
205
+ ▁stud -201
206
+ ▁this -202
207
+ ▁work -203
208
+ tility -204
209
+ ▁clone -205
210
+ ▁start -206
211
+ ▁includ -207
212
+ ▁funding -208
213
+ ▁measure -209
214
+ ▁smaller -210
215
+ ▁bootable -211
216
+ ▁students -212
217
+ .. -213
218
+ CC -214
219
+ __ -215
220
+ ad -216
221
+ ec -217
222
+ fi -218
223
+ ge -219
224
+ if -220
225
+ im -221
226
+ ip -222
227
+ qu -223
228
+ ru -224
229
+ us -225
230
+ ▁M -226
231
+ ▁P -227
232
+ ▁j -228
233
+ ere -229
234
+ ree -230
235
+ ▁$5 -231
236
+ ▁24 -232
237
+ ▁He -233
238
+ ▁as -234
239
+ ▁mo -235
240
+ ▁my -236
241
+ ▁sa -237
242
+ ▁se -238
243
+ ▁sh -239
244
+ ▁so -240
245
+ ▁tr -241
246
+ ▁us -242
247
+ file -243
248
+ fore -244
249
+ mpet -245
250
+ ould -246
251
+ sion -247
252
+ ▁201 -248
253
+ ▁CCC -249
254
+ ▁man -250
255
+ ▁per -251
256
+ ction -252
257
+ oning -253
258
+ pport -254
259
+ roved -255
260
+ store -256
261
+ ▁buil -257
262
+ ▁copy -258
263
+ ▁cost -259
264
+ ▁disk -260
265
+ ▁about -261
266
+ pproved -262
267
+ ▁before -263
268
+ ▁compet -264
269
+ ▁voters -265
270
+ artition -266
271
+ ▁cloning -267
272
+ ▁million -268
273
+ ▁restore -269
274
+ ▁schools -270
275
+ !! -271
276
+ 0. -272
277
+ 72 -273
278
+ PS -274
279
+ ac -275
280
+ am -276
281
+ bl -277
282
+ bo -278
283
+ de -279
284
+ ds -280
285
+ ef -281
286
+ ep -282
287
+ ey -283
288
+ gb -284
289
+ iz -285
290
+ lt -286
291
+ mb -287
292
+ mo -288
293
+ um -289
294
+ ut -290
295
+ vy -291
296
+ ▁" -292
297
+ ▁' -293
298
+ ▁3 -294
299
+ ▁4 -295
300
+ ▁N -296
301
+ ▁i -297
302
+ ▁r -298
303
+ 000 -299
304
+ age -300
305
+ ank -301
306
+ ant -302
307
+ arn -303
308
+ ata -304
309
+ cus -305
310
+ day -306
311
+ eme -307
312
+ erm -308
313
+ eyw -309
314
+ gin -310
315
+ ici -311
316
+ jec -312
317
+ oin -313
318
+ per -314
319
+ ual -315
320
+ ust -316
321
+ ven -317
322
+ ▁18 -318
323
+ ▁GB -319
324
+ ▁If -320
325
+ ▁In -321
326
+ ▁US -322
327
+ ▁Wh -323
328
+ ▁ag -324
329
+ ▁br -325
330
+ ▁by -326
331
+ ▁ca -327
332
+ ▁de -328
333
+ ▁en -329
334
+ ▁ex -330
335
+ ▁go -331
336
+ ▁qu -332
337
+ ▁sk -333
338
+ ally -334
339
+ ened -335
340
+ ginn -336
341
+ imes -337
342
+ irst -338
343
+ last -339
344
+ mber -340
345
+ onst -341
346
+ onth -342
347
+ ords -343
348
+ ound -344
349
+ ours -345
350
+ pple -346
351
+ reat -347
352
+ tter -348
353
+ ying -349
354
+ ▁DPS -350
355
+ ▁Dis -351
356
+ ▁How -352
357
+ ▁Sch -353
358
+ ▁The -354
359
+ ▁are -355
360
+ ▁but -356
361
+ ▁get -357
362
+ ▁had -358
363
+ ▁let -359
364
+ ▁met -360
365
+ ▁now -361
366
+ ▁one -362
367
+ ▁rec -363
368
+ ▁res -364
369
+ allic -365
370
+ jects -366
371
+ ouldn -367
372
+ stall -368
373
+ ually -369
374
+ veral -370
375
+ ▁$572 -371
376
+ ▁Disk -372
377
+ ▁Lion -373
378
+ ▁ -374
379
+ e -375
380
+ o -376
381
+ t -377
382
+ n -378
383
+ a -379
384
+ i -380
385
+ r -381
386
+ s -382
387
+ l -383
388
+ d -384
389
+ h -385
390
+ u -386
391
+ c -387
392
+ m -388
393
+ y -389
394
+ p -390
395
+ b -391
396
+ f -392
397
+ g -393
398
+ v -394
399
+ w -395
400
+ k -396
401
+ . -397
402
+ , -398
403
+ D -399
404
+ S -400
405
+ B -401
406
+ C -402
407
+ I -403
408
+ 0 -404
409
+ ' -405
410
+ 2 -406
411
+ 1 -407
412
+ T -408
413
+ ? -409
414
+ H -410
415
+ ) -411
416
+ O -412
417
+ U -413
418
+ x -414
419
+ ( -415
420
+ - -416
421
+ 4 -417
422
+ 5 -418
423
+ 7 -419
424
+ 8 -420
425
+ A -421
426
+ G -422
427
+ P -423
428
+ W -424
429
+ j -425
430
+ ! -426
431
+ " -427
432
+ $ -428
433
+ L -429
434
+ M -430
435
+ Q -431
436
+ _ -432
437
+ z -433
438
+ 3 -434
439
+ 6 -435
440
+ E -436
441
+ N -437
442
+ q -438
443
+ + -439
444
+ F -440
445
+ # -441
446
+ / -442
447
+ J -443
448
+ K -444
449
+ R -445
450
+ X -446
NLP with Attention Models/QA/QA_DistilBERT_pipline_FT/Files/tf/.ipynb_checkpoints/C4W3_HF_Lab1_QA_BERT-checkpoint.ipynb ADDED
@@ -0,0 +1,2110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {
6
+ "id": "u2UXutvEvpUj"
7
+ },
8
+ "source": [
9
+ "# Question Answering with BERT and HuggingFace\n",
10
+ "\n",
11
+ "You've seen how to use BERT and other transformer models for a wide range of natural language tasks, including machine translation, summarization, and question answering. Transformers have become the standard model for NLP, similar to convolutional models in computer vision. And all started with Attention!\n",
12
+ "\n",
13
+ "In practice, you'll rarely train a transformer model from scratch. Transformers tend to be very large, so they take time, money, and lots of data to train fully. Instead, you'll want to start with a pre-trained model and fine-tune it with your dataset if you need to.\n",
14
+ "\n",
15
+ "[Hugging Face](https://huggingface.co/) (🤗) is the best resource for pre-trained transformers. Their open-source libraries simplify downloading and using transformer models like BERT, T5, and GPT-2. And the best part, you can use them alongside either TensorFlow, PyTorch or Flax.\n",
16
+ "\n",
17
+ "In this notebook, you'll use 🤗 transformers to use the DistilBERT model for question answering."
18
+ ]
19
+ },
20
+ {
21
+ "cell_type": "markdown",
22
+ "metadata": {
23
+ "id": "tm675LmQvpUm"
24
+ },
25
+ "source": [
26
+ "## Pipelines\n",
27
+ "\n",
28
+ "Before fine-tuning a model, you will look at the pipelines from Hugging Face to use pre-trained transformer models for specific tasks. The `transformers` library provides pipelines for popular tasks like sentiment analysis, summarization, and text generation. A pipeline consists of a tokenizer, a model, and the model configuration. All these are packaged together into an easy-to-use object. Hugging Face makes life easier.\n",
29
+ "\n",
30
+ "Pipelines are intended to be used without fine-tuning and will often be immediately helpful in your projects. For example, `transformers` provides a pipeline for [question answering](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.QuestionAnsweringPipeline) that you can directly use to answer your questions if you give some context. Let's see how to do just that.\n",
31
+ "\n",
32
+ "You will import `pipeline` from `transformers` for creating pipelines."
33
+ ]
34
+ },
35
+ {
36
+ "cell_type": "code",
37
+ "execution_count": null,
38
+ "metadata": {
39
+ "id": "uNJGGbRWvpUm"
40
+ },
41
+ "outputs": [],
42
+ "source": [
43
+ "import os\n",
44
+ "os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'\n",
45
+ "\n",
46
+ "from transformers import pipeline"
47
+ ]
48
+ },
49
+ {
50
+ "cell_type": "markdown",
51
+ "metadata": {
52
+ "id": "_CeFTIr7P3QR"
53
+ },
54
+ "source": [
55
+ "Now, you will create the pipeline for question-answering, which uses the [DistilBert](https://hf.co/distilbert-base-cased-distilled-squad) model for extractive question answering (i.e., answering questions with the exact wording provided in the context)."
56
+ ]
57
+ },
58
+ {
59
+ "cell_type": "code",
60
+ "execution_count": null,
61
+ "metadata": {
62
+ "colab": {
63
+ "base_uri": "https://localhost:8080/",
64
+ "height": 177,
65
+ "referenced_widgets": [
66
+ "d7e158e614f44983b229d6dd0d8960f9",
67
+ "69aad11dbc914410b95f6c3cb17a2457",
68
+ "0302e718c6084fb0a96d92fd976738dc",
69
+ "72b47b116b0b4125a35d47e060f46807",
70
+ "6f5fbb8f0f5a4374a7bde870c64f1fa4",
71
+ "331c23df507e4d679e3aaf81af39cd22",
72
+ "e33617a01b03437986c143c9a69ba14f",
73
+ "b1271eb1b7e74250bd9273f229b49cd8",
74
+ "6b20c9e39d36404fb761b4d83954a278",
75
+ "322cc6ef697945ccbbc2b3029dfdf0e3",
76
+ "f9456ff5134242bc9541d9d60c753384",
77
+ "5b917388b6624637ad8d8f60516d4001",
78
+ "13e05f2f64a54245a2478393b1f6b409",
79
+ "3c63301478f54f95ba0f0f8c853a7266",
80
+ "bb455638d3ac451096fd7cce4cb0d82c",
81
+ "c4a6b418089147f6b50eab097ace0342",
82
+ "9f4a770ae6b84593ac7de85e15c305a9",
83
+ "65d019ca643045a2b0933411d059c920",
84
+ "794c088a92ba4f6798fa94cded51d0bd",
85
+ "deb9a0d8d3e1430bacab12c8b4ce7573",
86
+ "c53ce49d9b7c4a3c87ad7f7c75dce1f5",
87
+ "b099d3bb966a4e9b8d02710329030ff3",
88
+ "6d42d468b1d04c4a94fb2eed75e3c238",
89
+ "a371ed9c75184b78a80facda31086426",
90
+ "c363fb4238e0464cb3a4ab16250e554a",
91
+ "86dce8a2c404469dab0dc1a466788c0a",
92
+ "f6a334e9b5da4c82a1c4f1fbc1fe3c7e",
93
+ "e9f1a9476de147c3bc98c1c36960dad6",
94
+ "2f8a7ddfcee64b978ff23f8f43911e01",
95
+ "099a1ac5af9b4e38abb6afb9c333d37a",
96
+ "340a7a5171ba4b8aa9b70e945df1618c",
97
+ "b0f31194b8f24b5ab601f8edd5332e04",
98
+ "cbeb9e9dfdf6420d91ec1126fedd8e48",
99
+ "da1553ec3e044a4fb7bb9b0c2a84bfe0",
100
+ "b440f3b3937549d69a4f188bb8415531",
101
+ "1b937b91e8ee46c4a764a8091f365291",
102
+ "1369029047864cc68100a44ffaad35ca",
103
+ "80cb9869ae694ecda95aab298598c7a2",
104
+ "2aaa61bf4df248be970940e103af0276",
105
+ "3a7f8b6302034a0e889976ba0fbb4531",
106
+ "8cc02da6124448598923899b144afd6d",
107
+ "7361467d667d49ee85c432eec884d882",
108
+ "39c0c457e3bd46c1971ae2913fd66429",
109
+ "2985df1d86904019a8fdf69a356ade6d",
110
+ "6c291edce30b4cefb329113d5ecbe640",
111
+ "1249849d670f4824a0a21aa61d187b56",
112
+ "0e7b7a12422d49d4a33538e638bfc1c9",
113
+ "ec90909f38e0448bab37c735ea9b9ebe",
114
+ "f4274a98f06945a6a1e4a56b680c1790",
115
+ "c04562a21d36405890c11e74915839c7",
116
+ "e6b91c2208e44524a9883309ae431277",
117
+ "9adf994d114141f98aeea509a73e9c59",
118
+ "520e9bf4c0164e60a2c3288bd97ef93e",
119
+ "dd420d29751341faa84c025afa743bb5",
120
+ "4d56ed704097453a930baa9ecdfb1156"
121
+ ]
122
+ },
123
+ "id": "nKy4AAhLvpUo",
124
+ "outputId": "0419ab21-4237-4ad2-b076-9ed83377ed34"
125
+ },
126
+ "outputs": [],
127
+ "source": [
128
+ "# The task \"question-answering\" will return a QuestionAnsweringPipeline object\n",
129
+ "question_answerer = pipeline(task=\"question-answering\", model=\"distilbert-base-cased-distilled-squad\")"
130
+ ]
131
+ },
132
+ {
133
+ "cell_type": "markdown",
134
+ "metadata": {
135
+ "id": "4ltQLVWgvpUo"
136
+ },
137
+ "source": [
138
+ "Notice that this environment already has the model stored in the directory `distilbert-base-cased-distilled-squad`. However if you were to run that exact code on your local computer, Huggingface will download the model for you, which is a great feature!\n",
139
+ "\n",
140
+ "\n",
141
+ "After running the last cell, you have a pipeline for performing question answering given a context string. The pipeline `question_answerer` you just created needs you to pass the question and context as strings. It returns an answer to the question from the context you provided. For example, here are the first few paragraphs from the [Wikipedia entry for tea](https://en.wikipedia.org/wiki/Tea) that you will use as the context.\n",
142
+ "\n",
143
+ "\n"
144
+ ]
145
+ },
146
+ {
147
+ "cell_type": "code",
148
+ "execution_count": null,
149
+ "metadata": {
150
+ "id": "D_-MzZNJvpUp"
151
+ },
152
+ "outputs": [],
153
+ "source": [
154
+ "context = \"\"\"\n",
155
+ "Tea is an aromatic beverage prepared by pouring hot or boiling water over cured or fresh leaves of Camellia sinensis,\n",
156
+ "an evergreen shrub native to China and East Asia. After water, it is the most widely consumed drink in the world.\n",
157
+ "There are many different types of tea; some, like Chinese greens and Darjeeling, have a cooling, slightly bitter,\n",
158
+ "and astringent flavour, while others have vastly different profiles that include sweet, nutty, floral, or grassy\n",
159
+ "notes. Tea has a stimulating effect in humans primarily due to its caffeine content.\n",
160
+ "\n",
161
+ "The tea plant originated in the region encompassing today's Southwest China, Tibet, north Myanmar and Northeast India,\n",
162
+ "where it was used as a medicinal drink by various ethnic groups. An early credible record of tea drinking dates to\n",
163
+ "the 3rd century AD, in a medical text written by Hua Tuo. It was popularised as a recreational drink during the\n",
164
+ "Chinese Tang dynasty, and tea drinking spread to other East Asian countries. Portuguese priests and merchants\n",
165
+ "introduced it to Europe during the 16th century. During the 17th century, drinking tea became fashionable among the\n",
166
+ "English, who started to plant tea on a large scale in India.\n",
167
+ "\n",
168
+ "The term herbal tea refers to drinks not made from Camellia sinensis: infusions of fruit, leaves, or other plant\n",
169
+ "parts, such as steeps of rosehip, chamomile, or rooibos. These may be called tisanes or herbal infusions to prevent\n",
170
+ "confusion with 'tea' made from the tea plant.\n",
171
+ "\"\"\""
172
+ ]
173
+ },
174
+ {
175
+ "cell_type": "markdown",
176
+ "metadata": {
177
+ "id": "HyR3o2mrvpUq"
178
+ },
179
+ "source": [
180
+ "Now, you can ask your model anything related to that passage. For instance, \"Where is tea native to?\"."
181
+ ]
182
+ },
183
+ {
184
+ "cell_type": "code",
185
+ "execution_count": null,
186
+ "metadata": {
187
+ "colab": {
188
+ "base_uri": "https://localhost:8080/"
189
+ },
190
+ "id": "eiRohAWWvpUq",
191
+ "outputId": "a1ddfca3-3723-4d43-cbda-0509337b60d6",
192
+ "scrolled": true
193
+ },
194
+ "outputs": [],
195
+ "source": [
196
+ "result = question_answerer(question=\"Where is tea native to?\", context=context)\n",
197
+ "\n",
198
+ "print(result['answer'])"
199
+ ]
200
+ },
201
+ {
202
+ "cell_type": "markdown",
203
+ "metadata": {
204
+ "id": "cRXzFlZ5vpUr"
205
+ },
206
+ "source": [
207
+ "You can also pass multiple questions to your pipeline within a list so that you can ask:\n",
208
+ "\n",
209
+ "* \"Where is tea native to?\"\n",
210
+ "* \"When was tea discovered?\"\n",
211
+ "* \"What is the species name for tea?\"\n",
212
+ "\n",
213
+ "at the same time, and your `question-answerer` will return all the answers."
214
+ ]
215
+ },
216
+ {
217
+ "cell_type": "code",
218
+ "execution_count": null,
219
+ "metadata": {
220
+ "colab": {
221
+ "base_uri": "https://localhost:8080/"
222
+ },
223
+ "id": "IMLyXeMZvpUr",
224
+ "outputId": "ac9badb1-083d-4234-9474-f112c1f2f20f"
225
+ },
226
+ "outputs": [],
227
+ "source": [
228
+ "questions = [\"Where is tea native to?\",\n",
229
+ " \"When was tea discovered?\",\n",
230
+ " \"What is the species name for tea?\"]\n",
231
+ "\n",
232
+ "results = question_answerer(question=questions, context=context)\n",
233
+ "\n",
234
+ "for q, r in zip(questions, results):\n",
235
+ " print(f\"{q} \\n>> {r['answer']}\")"
236
+ ]
237
+ },
238
+ {
239
+ "cell_type": "markdown",
240
+ "metadata": {
241
+ "id": "XXf18tVu8p70"
242
+ },
243
+ "source": [
244
+ "Although the models used in the Hugging Face pipelines generally give outstanding results, sometimes you will have particular examples where they don't perform so well. Let's use the following example with a context string about the Golden Age of Comic Books:"
245
+ ]
246
+ },
247
+ {
248
+ "cell_type": "code",
249
+ "execution_count": null,
250
+ "metadata": {
251
+ "id": "0v9C0TAqwinw"
252
+ },
253
+ "outputs": [],
254
+ "source": [
255
+ "context = \"\"\"\n",
256
+ "The Golden Age of Comic Books describes an era of American comic books from the\n",
257
+ "late 1930s to circa 1950. During this time, modern comic books were first published\n",
258
+ "and rapidly increased in popularity. The superhero archetype was created and many\n",
259
+ "well-known characters were introduced, including Superman, Batman, Captain Marvel\n",
260
+ "(later known as SHAZAM!), Captain America, and Wonder Woman.\n",
261
+ "Between 1939 and 1941 Detective Comics and its sister company, All-American Publications,\n",
262
+ "introduced popular superheroes such as Batman and Robin, Wonder Woman, the Flash,\n",
263
+ "Green Lantern, Doctor Fate, the Atom, Hawkman, Green Arrow and Aquaman.[7] Timely Comics,\n",
264
+ "the 1940s predecessor of Marvel Comics, had million-selling titles featuring the Human Torch,\n",
265
+ "the Sub-Mariner, and Captain America.[8]\n",
266
+ "As comic books grew in popularity, publishers began launching titles that expanded\n",
267
+ "into a variety of genres. Dell Comics' non-superhero characters (particularly the\n",
268
+ "licensed Walt Disney animated-character comics) outsold the superhero comics of the day.[12]\n",
269
+ "The publisher featured licensed movie and literary characters such as Mickey Mouse, Donald Duck,\n",
270
+ "Roy Rogers and Tarzan.[13] It was during this era that noted Donald Duck writer-artist\n",
271
+ "Carl Barks rose to prominence.[14] Additionally, MLJ's introduction of Archie Andrews\n",
272
+ "in Pep Comics #22 (December 1941) gave rise to teen humor comics,[15] with the Archie\n",
273
+ "Andrews character remaining in print well into the 21st century.[16]\n",
274
+ "At the same time in Canada, American comic books were prohibited importation under\n",
275
+ "the War Exchange Conservation Act[17] which restricted the importation of non-essential\n",
276
+ "goods. As a result, a domestic publishing industry flourished during the duration\n",
277
+ "of the war which were collectively informally called the Canadian Whites.\n",
278
+ "The educational comic book Dagwood Splits the Atom used characters from the comic\n",
279
+ "strip Blondie.[18] According to historian Michael A. Amundson, appealing comic-book\n",
280
+ "characters helped ease young readers' fear of nuclear war and neutralize anxiety\n",
281
+ "about the questions posed by atomic power.[19] It was during this period that long-running\n",
282
+ "humor comics debuted, including EC's Mad and Carl Barks' Uncle Scrooge in Dell's Four\n",
283
+ "Color Comics (both in 1952).[20][21]\n",
284
+ "\"\"\""
285
+ ]
286
+ },
287
+ {
288
+ "cell_type": "markdown",
289
+ "metadata": {
290
+ "id": "fYbERLKQbhyH"
291
+ },
292
+ "source": [
293
+ "Let's ask the following question: \"What popular superheroes were introduced between 1939 and 1941?\" The answer is in the fourth paragraph of the context string."
294
+ ]
295
+ },
296
+ {
297
+ "cell_type": "code",
298
+ "execution_count": null,
299
+ "metadata": {
300
+ "colab": {
301
+ "base_uri": "https://localhost:8080/"
302
+ },
303
+ "id": "SEmAbSSGbg0J",
304
+ "outputId": "35b5e3c4-2fd2-4f37-b674-014681ece042"
305
+ },
306
+ "outputs": [],
307
+ "source": [
308
+ "question = \"What popular superheroes were introduced between 1939 and 1941?\"\n",
309
+ "\n",
310
+ "result = question_answerer(question=question, context=context)\n",
311
+ "print(result['answer'])"
312
+ ]
313
+ },
314
+ {
315
+ "cell_type": "markdown",
316
+ "metadata": {
317
+ "id": "LGx_BHkN-ejY"
318
+ },
319
+ "source": [
320
+ "Here, the answer should be:\n",
321
+ "\"Batman and Robin, Wonder Woman, the Flash,\n",
322
+ "Green Lantern, Doctor Fate, the Atom, Hawkman, Green Arrow, and Aquaman\". Instead, the pipeline returned a different answer. You can even try different question wordings:\n",
323
+ "\n",
324
+ "* \"What superheroes were introduced between 1939 and 1941?\"\n",
325
+ "* \"What comic book characters were created between 1939 and 1941?\"\n",
326
+ "* \"What well-known characters were created between 1939 and 1941?\"\n",
327
+ "* \"What well-known superheroes were introduced between 1939 and 1941 by Detective Comics?\"\n",
328
+ "\n",
329
+ "and you will only get incorrect answers."
330
+ ]
331
+ },
332
+ {
333
+ "cell_type": "code",
334
+ "execution_count": null,
335
+ "metadata": {
336
+ "colab": {
337
+ "base_uri": "https://localhost:8080/"
338
+ },
339
+ "id": "f91kLn9VcRzK",
340
+ "outputId": "bb3942b6-321a-4466-ac18-9f173b115600"
341
+ },
342
+ "outputs": [],
343
+ "source": [
344
+ "questions = [\"What popular superheroes were introduced between 1939 and 1941?\",\n",
345
+ " \"What superheroes were introduced between 1939 and 1941 by Detective Comics and its sister company?\",\n",
346
+ " \"What comic book characters were created between 1939 and 1941?\",\n",
347
+ " \"What well-known characters were created between 1939 and 1941?\",\n",
348
+ " \"What well-known superheroes were introduced between 1939 and 1941 by Detective Comics?\"]\n",
349
+ "\n",
350
+ "results = question_answerer(question=questions, context=context)\n",
351
+ "\n",
352
+ "for q, r in zip(questions, results):\n",
353
+ " print(f\"{q} \\n>> {r['answer']}\")"
354
+ ]
355
+ },
356
+ {
357
+ "cell_type": "markdown",
358
+ "metadata": {
359
+ "id": "QCkLhf27cEsH"
360
+ },
361
+ "source": [
362
+ "It seems like this model is a **huge fan** of Archie Andrews. It even considers him a superhero!\n",
363
+ "\n",
364
+ "The example that fooled your `question_answerer` belongs to the [TyDi QA dataset](https://ai.google.com/research/tydiqa), a dataset from Google for question/answering in diverse languages. To achieve better results when you know that the pipeline isn't working as it should, you need to consider fine-tuning your model.\n",
365
+ "\n",
366
+ "In the next ungraded lab, you will get the chance to fine-tune the DistilBert model using the TyDi QA dataset.\n",
367
+ "\n"
368
+ ]
369
+ }
370
+ ],
371
+ "metadata": {
372
+ "accelerator": "GPU",
373
+ "colab": {
374
+ "provenance": []
375
+ },
376
+ "kernelspec": {
377
+ "display_name": "Python 3 (ipykernel)",
378
+ "language": "python",
379
+ "name": "python3"
380
+ },
381
+ "language_info": {
382
+ "codemirror_mode": {
383
+ "name": "ipython",
384
+ "version": 3
385
+ },
386
+ "file_extension": ".py",
387
+ "mimetype": "text/x-python",
388
+ "name": "python",
389
+ "nbconvert_exporter": "python",
390
+ "pygments_lexer": "ipython3",
391
+ "version": "3.8.10"
392
+ },
393
+ "widgets": {
394
+ "application/vnd.jupyter.widget-state+json": {
395
+ "0302e718c6084fb0a96d92fd976738dc": {
396
+ "model_module": "@jupyter-widgets/controls",
397
+ "model_module_version": "1.5.0",
398
+ "model_name": "FloatProgressModel",
399
+ "state": {
400
+ "_dom_classes": [],
401
+ "_model_module": "@jupyter-widgets/controls",
402
+ "_model_module_version": "1.5.0",
403
+ "_model_name": "FloatProgressModel",
404
+ "_view_count": null,
405
+ "_view_module": "@jupyter-widgets/controls",
406
+ "_view_module_version": "1.5.0",
407
+ "_view_name": "ProgressView",
408
+ "bar_style": "success",
409
+ "description": "",
410
+ "description_tooltip": null,
411
+ "layout": "IPY_MODEL_b1271eb1b7e74250bd9273f229b49cd8",
412
+ "max": 473,
413
+ "min": 0,
414
+ "orientation": "horizontal",
415
+ "style": "IPY_MODEL_6b20c9e39d36404fb761b4d83954a278",
416
+ "value": 473
417
+ }
418
+ },
419
+ "099a1ac5af9b4e38abb6afb9c333d37a": {
420
+ "model_module": "@jupyter-widgets/base",
421
+ "model_module_version": "1.2.0",
422
+ "model_name": "LayoutModel",
423
+ "state": {
424
+ "_model_module": "@jupyter-widgets/base",
425
+ "_model_module_version": "1.2.0",
426
+ "_model_name": "LayoutModel",
427
+ "_view_count": null,
428
+ "_view_module": "@jupyter-widgets/base",
429
+ "_view_module_version": "1.2.0",
430
+ "_view_name": "LayoutView",
431
+ "align_content": null,
432
+ "align_items": null,
433
+ "align_self": null,
434
+ "border": null,
435
+ "bottom": null,
436
+ "display": null,
437
+ "flex": null,
438
+ "flex_flow": null,
439
+ "grid_area": null,
440
+ "grid_auto_columns": null,
441
+ "grid_auto_flow": null,
442
+ "grid_auto_rows": null,
443
+ "grid_column": null,
444
+ "grid_gap": null,
445
+ "grid_row": null,
446
+ "grid_template_areas": null,
447
+ "grid_template_columns": null,
448
+ "grid_template_rows": null,
449
+ "height": null,
450
+ "justify_content": null,
451
+ "justify_items": null,
452
+ "left": null,
453
+ "margin": null,
454
+ "max_height": null,
455
+ "max_width": null,
456
+ "min_height": null,
457
+ "min_width": null,
458
+ "object_fit": null,
459
+ "object_position": null,
460
+ "order": null,
461
+ "overflow": null,
462
+ "overflow_x": null,
463
+ "overflow_y": null,
464
+ "padding": null,
465
+ "right": null,
466
+ "top": null,
467
+ "visibility": null,
468
+ "width": null
469
+ }
470
+ },
471
+ "0e7b7a12422d49d4a33538e638bfc1c9": {
472
+ "model_module": "@jupyter-widgets/controls",
473
+ "model_module_version": "1.5.0",
474
+ "model_name": "FloatProgressModel",
475
+ "state": {
476
+ "_dom_classes": [],
477
+ "_model_module": "@jupyter-widgets/controls",
478
+ "_model_module_version": "1.5.0",
479
+ "_model_name": "FloatProgressModel",
480
+ "_view_count": null,
481
+ "_view_module": "@jupyter-widgets/controls",
482
+ "_view_module_version": "1.5.0",
483
+ "_view_name": "ProgressView",
484
+ "bar_style": "success",
485
+ "description": "",
486
+ "description_tooltip": null,
487
+ "layout": "IPY_MODEL_9adf994d114141f98aeea509a73e9c59",
488
+ "max": 435797,
489
+ "min": 0,
490
+ "orientation": "horizontal",
491
+ "style": "IPY_MODEL_520e9bf4c0164e60a2c3288bd97ef93e",
492
+ "value": 435797
493
+ }
494
+ },
495
+ "1249849d670f4824a0a21aa61d187b56": {
496
+ "model_module": "@jupyter-widgets/controls",
497
+ "model_module_version": "1.5.0",
498
+ "model_name": "HTMLModel",
499
+ "state": {
500
+ "_dom_classes": [],
501
+ "_model_module": "@jupyter-widgets/controls",
502
+ "_model_module_version": "1.5.0",
503
+ "_model_name": "HTMLModel",
504
+ "_view_count": null,
505
+ "_view_module": "@jupyter-widgets/controls",
506
+ "_view_module_version": "1.5.0",
507
+ "_view_name": "HTMLView",
508
+ "description": "",
509
+ "description_tooltip": null,
510
+ "layout": "IPY_MODEL_c04562a21d36405890c11e74915839c7",
511
+ "placeholder": "​",
512
+ "style": "IPY_MODEL_e6b91c2208e44524a9883309ae431277",
513
+ "value": "Downloading: 100%"
514
+ }
515
+ },
516
+ "1369029047864cc68100a44ffaad35ca": {
517
+ "model_module": "@jupyter-widgets/controls",
518
+ "model_module_version": "1.5.0",
519
+ "model_name": "HTMLModel",
520
+ "state": {
521
+ "_dom_classes": [],
522
+ "_model_module": "@jupyter-widgets/controls",
523
+ "_model_module_version": "1.5.0",
524
+ "_model_name": "HTMLModel",
525
+ "_view_count": null,
526
+ "_view_module": "@jupyter-widgets/controls",
527
+ "_view_module_version": "1.5.0",
528
+ "_view_name": "HTMLView",
529
+ "description": "",
530
+ "description_tooltip": null,
531
+ "layout": "IPY_MODEL_39c0c457e3bd46c1971ae2913fd66429",
532
+ "placeholder": "​",
533
+ "style": "IPY_MODEL_2985df1d86904019a8fdf69a356ade6d",
534
+ "value": " 213k/213k [00:00&lt;00:00, 170kB/s]"
535
+ }
536
+ },
537
+ "13e05f2f64a54245a2478393b1f6b409": {
538
+ "model_module": "@jupyter-widgets/controls",
539
+ "model_module_version": "1.5.0",
540
+ "model_name": "HTMLModel",
541
+ "state": {
542
+ "_dom_classes": [],
543
+ "_model_module": "@jupyter-widgets/controls",
544
+ "_model_module_version": "1.5.0",
545
+ "_model_name": "HTMLModel",
546
+ "_view_count": null,
547
+ "_view_module": "@jupyter-widgets/controls",
548
+ "_view_module_version": "1.5.0",
549
+ "_view_name": "HTMLView",
550
+ "description": "",
551
+ "description_tooltip": null,
552
+ "layout": "IPY_MODEL_9f4a770ae6b84593ac7de85e15c305a9",
553
+ "placeholder": "​",
554
+ "style": "IPY_MODEL_65d019ca643045a2b0933411d059c920",
555
+ "value": "Downloading: 100%"
556
+ }
557
+ },
558
+ "1b937b91e8ee46c4a764a8091f365291": {
559
+ "model_module": "@jupyter-widgets/controls",
560
+ "model_module_version": "1.5.0",
561
+ "model_name": "FloatProgressModel",
562
+ "state": {
563
+ "_dom_classes": [],
564
+ "_model_module": "@jupyter-widgets/controls",
565
+ "_model_module_version": "1.5.0",
566
+ "_model_name": "FloatProgressModel",
567
+ "_view_count": null,
568
+ "_view_module": "@jupyter-widgets/controls",
569
+ "_view_module_version": "1.5.0",
570
+ "_view_name": "ProgressView",
571
+ "bar_style": "success",
572
+ "description": "",
573
+ "description_tooltip": null,
574
+ "layout": "IPY_MODEL_8cc02da6124448598923899b144afd6d",
575
+ "max": 213450,
576
+ "min": 0,
577
+ "orientation": "horizontal",
578
+ "style": "IPY_MODEL_7361467d667d49ee85c432eec884d882",
579
+ "value": 213450
580
+ }
581
+ },
582
+ "2985df1d86904019a8fdf69a356ade6d": {
583
+ "model_module": "@jupyter-widgets/controls",
584
+ "model_module_version": "1.5.0",
585
+ "model_name": "DescriptionStyleModel",
586
+ "state": {
587
+ "_model_module": "@jupyter-widgets/controls",
588
+ "_model_module_version": "1.5.0",
589
+ "_model_name": "DescriptionStyleModel",
590
+ "_view_count": null,
591
+ "_view_module": "@jupyter-widgets/base",
592
+ "_view_module_version": "1.2.0",
593
+ "_view_name": "StyleView",
594
+ "description_width": ""
595
+ }
596
+ },
597
+ "2aaa61bf4df248be970940e103af0276": {
598
+ "model_module": "@jupyter-widgets/base",
599
+ "model_module_version": "1.2.0",
600
+ "model_name": "LayoutModel",
601
+ "state": {
602
+ "_model_module": "@jupyter-widgets/base",
603
+ "_model_module_version": "1.2.0",
604
+ "_model_name": "LayoutModel",
605
+ "_view_count": null,
606
+ "_view_module": "@jupyter-widgets/base",
607
+ "_view_module_version": "1.2.0",
608
+ "_view_name": "LayoutView",
609
+ "align_content": null,
610
+ "align_items": null,
611
+ "align_self": null,
612
+ "border": null,
613
+ "bottom": null,
614
+ "display": null,
615
+ "flex": null,
616
+ "flex_flow": null,
617
+ "grid_area": null,
618
+ "grid_auto_columns": null,
619
+ "grid_auto_flow": null,
620
+ "grid_auto_rows": null,
621
+ "grid_column": null,
622
+ "grid_gap": null,
623
+ "grid_row": null,
624
+ "grid_template_areas": null,
625
+ "grid_template_columns": null,
626
+ "grid_template_rows": null,
627
+ "height": null,
628
+ "justify_content": null,
629
+ "justify_items": null,
630
+ "left": null,
631
+ "margin": null,
632
+ "max_height": null,
633
+ "max_width": null,
634
+ "min_height": null,
635
+ "min_width": null,
636
+ "object_fit": null,
637
+ "object_position": null,
638
+ "order": null,
639
+ "overflow": null,
640
+ "overflow_x": null,
641
+ "overflow_y": null,
642
+ "padding": null,
643
+ "right": null,
644
+ "top": null,
645
+ "visibility": null,
646
+ "width": null
647
+ }
648
+ },
649
+ "2f8a7ddfcee64b978ff23f8f43911e01": {
650
+ "model_module": "@jupyter-widgets/controls",
651
+ "model_module_version": "1.5.0",
652
+ "model_name": "DescriptionStyleModel",
653
+ "state": {
654
+ "_model_module": "@jupyter-widgets/controls",
655
+ "_model_module_version": "1.5.0",
656
+ "_model_name": "DescriptionStyleModel",
657
+ "_view_count": null,
658
+ "_view_module": "@jupyter-widgets/base",
659
+ "_view_module_version": "1.2.0",
660
+ "_view_name": "StyleView",
661
+ "description_width": ""
662
+ }
663
+ },
664
+ "322cc6ef697945ccbbc2b3029dfdf0e3": {
665
+ "model_module": "@jupyter-widgets/base",
666
+ "model_module_version": "1.2.0",
667
+ "model_name": "LayoutModel",
668
+ "state": {
669
+ "_model_module": "@jupyter-widgets/base",
670
+ "_model_module_version": "1.2.0",
671
+ "_model_name": "LayoutModel",
672
+ "_view_count": null,
673
+ "_view_module": "@jupyter-widgets/base",
674
+ "_view_module_version": "1.2.0",
675
+ "_view_name": "LayoutView",
676
+ "align_content": null,
677
+ "align_items": null,
678
+ "align_self": null,
679
+ "border": null,
680
+ "bottom": null,
681
+ "display": null,
682
+ "flex": null,
683
+ "flex_flow": null,
684
+ "grid_area": null,
685
+ "grid_auto_columns": null,
686
+ "grid_auto_flow": null,
687
+ "grid_auto_rows": null,
688
+ "grid_column": null,
689
+ "grid_gap": null,
690
+ "grid_row": null,
691
+ "grid_template_areas": null,
692
+ "grid_template_columns": null,
693
+ "grid_template_rows": null,
694
+ "height": null,
695
+ "justify_content": null,
696
+ "justify_items": null,
697
+ "left": null,
698
+ "margin": null,
699
+ "max_height": null,
700
+ "max_width": null,
701
+ "min_height": null,
702
+ "min_width": null,
703
+ "object_fit": null,
704
+ "object_position": null,
705
+ "order": null,
706
+ "overflow": null,
707
+ "overflow_x": null,
708
+ "overflow_y": null,
709
+ "padding": null,
710
+ "right": null,
711
+ "top": null,
712
+ "visibility": null,
713
+ "width": null
714
+ }
715
+ },
716
+ "331c23df507e4d679e3aaf81af39cd22": {
717
+ "model_module": "@jupyter-widgets/base",
718
+ "model_module_version": "1.2.0",
719
+ "model_name": "LayoutModel",
720
+ "state": {
721
+ "_model_module": "@jupyter-widgets/base",
722
+ "_model_module_version": "1.2.0",
723
+ "_model_name": "LayoutModel",
724
+ "_view_count": null,
725
+ "_view_module": "@jupyter-widgets/base",
726
+ "_view_module_version": "1.2.0",
727
+ "_view_name": "LayoutView",
728
+ "align_content": null,
729
+ "align_items": null,
730
+ "align_self": null,
731
+ "border": null,
732
+ "bottom": null,
733
+ "display": null,
734
+ "flex": null,
735
+ "flex_flow": null,
736
+ "grid_area": null,
737
+ "grid_auto_columns": null,
738
+ "grid_auto_flow": null,
739
+ "grid_auto_rows": null,
740
+ "grid_column": null,
741
+ "grid_gap": null,
742
+ "grid_row": null,
743
+ "grid_template_areas": null,
744
+ "grid_template_columns": null,
745
+ "grid_template_rows": null,
746
+ "height": null,
747
+ "justify_content": null,
748
+ "justify_items": null,
749
+ "left": null,
750
+ "margin": null,
751
+ "max_height": null,
752
+ "max_width": null,
753
+ "min_height": null,
754
+ "min_width": null,
755
+ "object_fit": null,
756
+ "object_position": null,
757
+ "order": null,
758
+ "overflow": null,
759
+ "overflow_x": null,
760
+ "overflow_y": null,
761
+ "padding": null,
762
+ "right": null,
763
+ "top": null,
764
+ "visibility": null,
765
+ "width": null
766
+ }
767
+ },
768
+ "340a7a5171ba4b8aa9b70e945df1618c": {
769
+ "model_module": "@jupyter-widgets/controls",
770
+ "model_module_version": "1.5.0",
771
+ "model_name": "ProgressStyleModel",
772
+ "state": {
773
+ "_model_module": "@jupyter-widgets/controls",
774
+ "_model_module_version": "1.5.0",
775
+ "_model_name": "ProgressStyleModel",
776
+ "_view_count": null,
777
+ "_view_module": "@jupyter-widgets/base",
778
+ "_view_module_version": "1.2.0",
779
+ "_view_name": "StyleView",
780
+ "bar_color": null,
781
+ "description_width": ""
782
+ }
783
+ },
784
+ "39c0c457e3bd46c1971ae2913fd66429": {
785
+ "model_module": "@jupyter-widgets/base",
786
+ "model_module_version": "1.2.0",
787
+ "model_name": "LayoutModel",
788
+ "state": {
789
+ "_model_module": "@jupyter-widgets/base",
790
+ "_model_module_version": "1.2.0",
791
+ "_model_name": "LayoutModel",
792
+ "_view_count": null,
793
+ "_view_module": "@jupyter-widgets/base",
794
+ "_view_module_version": "1.2.0",
795
+ "_view_name": "LayoutView",
796
+ "align_content": null,
797
+ "align_items": null,
798
+ "align_self": null,
799
+ "border": null,
800
+ "bottom": null,
801
+ "display": null,
802
+ "flex": null,
803
+ "flex_flow": null,
804
+ "grid_area": null,
805
+ "grid_auto_columns": null,
806
+ "grid_auto_flow": null,
807
+ "grid_auto_rows": null,
808
+ "grid_column": null,
809
+ "grid_gap": null,
810
+ "grid_row": null,
811
+ "grid_template_areas": null,
812
+ "grid_template_columns": null,
813
+ "grid_template_rows": null,
814
+ "height": null,
815
+ "justify_content": null,
816
+ "justify_items": null,
817
+ "left": null,
818
+ "margin": null,
819
+ "max_height": null,
820
+ "max_width": null,
821
+ "min_height": null,
822
+ "min_width": null,
823
+ "object_fit": null,
824
+ "object_position": null,
825
+ "order": null,
826
+ "overflow": null,
827
+ "overflow_x": null,
828
+ "overflow_y": null,
829
+ "padding": null,
830
+ "right": null,
831
+ "top": null,
832
+ "visibility": null,
833
+ "width": null
834
+ }
835
+ },
836
+ "3a7f8b6302034a0e889976ba0fbb4531": {
837
+ "model_module": "@jupyter-widgets/controls",
838
+ "model_module_version": "1.5.0",
839
+ "model_name": "DescriptionStyleModel",
840
+ "state": {
841
+ "_model_module": "@jupyter-widgets/controls",
842
+ "_model_module_version": "1.5.0",
843
+ "_model_name": "DescriptionStyleModel",
844
+ "_view_count": null,
845
+ "_view_module": "@jupyter-widgets/base",
846
+ "_view_module_version": "1.2.0",
847
+ "_view_name": "StyleView",
848
+ "description_width": ""
849
+ }
850
+ },
851
+ "3c63301478f54f95ba0f0f8c853a7266": {
852
+ "model_module": "@jupyter-widgets/controls",
853
+ "model_module_version": "1.5.0",
854
+ "model_name": "FloatProgressModel",
855
+ "state": {
856
+ "_dom_classes": [],
857
+ "_model_module": "@jupyter-widgets/controls",
858
+ "_model_module_version": "1.5.0",
859
+ "_model_name": "FloatProgressModel",
860
+ "_view_count": null,
861
+ "_view_module": "@jupyter-widgets/controls",
862
+ "_view_module_version": "1.5.0",
863
+ "_view_name": "ProgressView",
864
+ "bar_style": "success",
865
+ "description": "",
866
+ "description_tooltip": null,
867
+ "layout": "IPY_MODEL_794c088a92ba4f6798fa94cded51d0bd",
868
+ "max": 260793700,
869
+ "min": 0,
870
+ "orientation": "horizontal",
871
+ "style": "IPY_MODEL_deb9a0d8d3e1430bacab12c8b4ce7573",
872
+ "value": 260793700
873
+ }
874
+ },
875
+ "4d56ed704097453a930baa9ecdfb1156": {
876
+ "model_module": "@jupyter-widgets/controls",
877
+ "model_module_version": "1.5.0",
878
+ "model_name": "DescriptionStyleModel",
879
+ "state": {
880
+ "_model_module": "@jupyter-widgets/controls",
881
+ "_model_module_version": "1.5.0",
882
+ "_model_name": "DescriptionStyleModel",
883
+ "_view_count": null,
884
+ "_view_module": "@jupyter-widgets/base",
885
+ "_view_module_version": "1.2.0",
886
+ "_view_name": "StyleView",
887
+ "description_width": ""
888
+ }
889
+ },
890
+ "520e9bf4c0164e60a2c3288bd97ef93e": {
891
+ "model_module": "@jupyter-widgets/controls",
892
+ "model_module_version": "1.5.0",
893
+ "model_name": "ProgressStyleModel",
894
+ "state": {
895
+ "_model_module": "@jupyter-widgets/controls",
896
+ "_model_module_version": "1.5.0",
897
+ "_model_name": "ProgressStyleModel",
898
+ "_view_count": null,
899
+ "_view_module": "@jupyter-widgets/base",
900
+ "_view_module_version": "1.2.0",
901
+ "_view_name": "StyleView",
902
+ "bar_color": null,
903
+ "description_width": ""
904
+ }
905
+ },
906
+ "5b917388b6624637ad8d8f60516d4001": {
907
+ "model_module": "@jupyter-widgets/controls",
908
+ "model_module_version": "1.5.0",
909
+ "model_name": "HBoxModel",
910
+ "state": {
911
+ "_dom_classes": [],
912
+ "_model_module": "@jupyter-widgets/controls",
913
+ "_model_module_version": "1.5.0",
914
+ "_model_name": "HBoxModel",
915
+ "_view_count": null,
916
+ "_view_module": "@jupyter-widgets/controls",
917
+ "_view_module_version": "1.5.0",
918
+ "_view_name": "HBoxView",
919
+ "box_style": "",
920
+ "children": [
921
+ "IPY_MODEL_13e05f2f64a54245a2478393b1f6b409",
922
+ "IPY_MODEL_3c63301478f54f95ba0f0f8c853a7266",
923
+ "IPY_MODEL_bb455638d3ac451096fd7cce4cb0d82c"
924
+ ],
925
+ "layout": "IPY_MODEL_c4a6b418089147f6b50eab097ace0342"
926
+ }
927
+ },
928
+ "65d019ca643045a2b0933411d059c920": {
929
+ "model_module": "@jupyter-widgets/controls",
930
+ "model_module_version": "1.5.0",
931
+ "model_name": "DescriptionStyleModel",
932
+ "state": {
933
+ "_model_module": "@jupyter-widgets/controls",
934
+ "_model_module_version": "1.5.0",
935
+ "_model_name": "DescriptionStyleModel",
936
+ "_view_count": null,
937
+ "_view_module": "@jupyter-widgets/base",
938
+ "_view_module_version": "1.2.0",
939
+ "_view_name": "StyleView",
940
+ "description_width": ""
941
+ }
942
+ },
943
+ "69aad11dbc914410b95f6c3cb17a2457": {
944
+ "model_module": "@jupyter-widgets/controls",
945
+ "model_module_version": "1.5.0",
946
+ "model_name": "HTMLModel",
947
+ "state": {
948
+ "_dom_classes": [],
949
+ "_model_module": "@jupyter-widgets/controls",
950
+ "_model_module_version": "1.5.0",
951
+ "_model_name": "HTMLModel",
952
+ "_view_count": null,
953
+ "_view_module": "@jupyter-widgets/controls",
954
+ "_view_module_version": "1.5.0",
955
+ "_view_name": "HTMLView",
956
+ "description": "",
957
+ "description_tooltip": null,
958
+ "layout": "IPY_MODEL_331c23df507e4d679e3aaf81af39cd22",
959
+ "placeholder": "​",
960
+ "style": "IPY_MODEL_e33617a01b03437986c143c9a69ba14f",
961
+ "value": "Downloading: 100%"
962
+ }
963
+ },
964
+ "6b20c9e39d36404fb761b4d83954a278": {
965
+ "model_module": "@jupyter-widgets/controls",
966
+ "model_module_version": "1.5.0",
967
+ "model_name": "ProgressStyleModel",
968
+ "state": {
969
+ "_model_module": "@jupyter-widgets/controls",
970
+ "_model_module_version": "1.5.0",
971
+ "_model_name": "ProgressStyleModel",
972
+ "_view_count": null,
973
+ "_view_module": "@jupyter-widgets/base",
974
+ "_view_module_version": "1.2.0",
975
+ "_view_name": "StyleView",
976
+ "bar_color": null,
977
+ "description_width": ""
978
+ }
979
+ },
980
+ "6c291edce30b4cefb329113d5ecbe640": {
981
+ "model_module": "@jupyter-widgets/controls",
982
+ "model_module_version": "1.5.0",
983
+ "model_name": "HBoxModel",
984
+ "state": {
985
+ "_dom_classes": [],
986
+ "_model_module": "@jupyter-widgets/controls",
987
+ "_model_module_version": "1.5.0",
988
+ "_model_name": "HBoxModel",
989
+ "_view_count": null,
990
+ "_view_module": "@jupyter-widgets/controls",
991
+ "_view_module_version": "1.5.0",
992
+ "_view_name": "HBoxView",
993
+ "box_style": "",
994
+ "children": [
995
+ "IPY_MODEL_1249849d670f4824a0a21aa61d187b56",
996
+ "IPY_MODEL_0e7b7a12422d49d4a33538e638bfc1c9",
997
+ "IPY_MODEL_ec90909f38e0448bab37c735ea9b9ebe"
998
+ ],
999
+ "layout": "IPY_MODEL_f4274a98f06945a6a1e4a56b680c1790"
1000
+ }
1001
+ },
1002
+ "6d42d468b1d04c4a94fb2eed75e3c238": {
1003
+ "model_module": "@jupyter-widgets/controls",
1004
+ "model_module_version": "1.5.0",
1005
+ "model_name": "HBoxModel",
1006
+ "state": {
1007
+ "_dom_classes": [],
1008
+ "_model_module": "@jupyter-widgets/controls",
1009
+ "_model_module_version": "1.5.0",
1010
+ "_model_name": "HBoxModel",
1011
+ "_view_count": null,
1012
+ "_view_module": "@jupyter-widgets/controls",
1013
+ "_view_module_version": "1.5.0",
1014
+ "_view_name": "HBoxView",
1015
+ "box_style": "",
1016
+ "children": [
1017
+ "IPY_MODEL_a371ed9c75184b78a80facda31086426",
1018
+ "IPY_MODEL_c363fb4238e0464cb3a4ab16250e554a",
1019
+ "IPY_MODEL_86dce8a2c404469dab0dc1a466788c0a"
1020
+ ],
1021
+ "layout": "IPY_MODEL_f6a334e9b5da4c82a1c4f1fbc1fe3c7e"
1022
+ }
1023
+ },
1024
+ "6f5fbb8f0f5a4374a7bde870c64f1fa4": {
1025
+ "model_module": "@jupyter-widgets/base",
1026
+ "model_module_version": "1.2.0",
1027
+ "model_name": "LayoutModel",
1028
+ "state": {
1029
+ "_model_module": "@jupyter-widgets/base",
1030
+ "_model_module_version": "1.2.0",
1031
+ "_model_name": "LayoutModel",
1032
+ "_view_count": null,
1033
+ "_view_module": "@jupyter-widgets/base",
1034
+ "_view_module_version": "1.2.0",
1035
+ "_view_name": "LayoutView",
1036
+ "align_content": null,
1037
+ "align_items": null,
1038
+ "align_self": null,
1039
+ "border": null,
1040
+ "bottom": null,
1041
+ "display": null,
1042
+ "flex": null,
1043
+ "flex_flow": null,
1044
+ "grid_area": null,
1045
+ "grid_auto_columns": null,
1046
+ "grid_auto_flow": null,
1047
+ "grid_auto_rows": null,
1048
+ "grid_column": null,
1049
+ "grid_gap": null,
1050
+ "grid_row": null,
1051
+ "grid_template_areas": null,
1052
+ "grid_template_columns": null,
1053
+ "grid_template_rows": null,
1054
+ "height": null,
1055
+ "justify_content": null,
1056
+ "justify_items": null,
1057
+ "left": null,
1058
+ "margin": null,
1059
+ "max_height": null,
1060
+ "max_width": null,
1061
+ "min_height": null,
1062
+ "min_width": null,
1063
+ "object_fit": null,
1064
+ "object_position": null,
1065
+ "order": null,
1066
+ "overflow": null,
1067
+ "overflow_x": null,
1068
+ "overflow_y": null,
1069
+ "padding": null,
1070
+ "right": null,
1071
+ "top": null,
1072
+ "visibility": null,
1073
+ "width": null
1074
+ }
1075
+ },
1076
+ "72b47b116b0b4125a35d47e060f46807": {
1077
+ "model_module": "@jupyter-widgets/controls",
1078
+ "model_module_version": "1.5.0",
1079
+ "model_name": "HTMLModel",
1080
+ "state": {
1081
+ "_dom_classes": [],
1082
+ "_model_module": "@jupyter-widgets/controls",
1083
+ "_model_module_version": "1.5.0",
1084
+ "_model_name": "HTMLModel",
1085
+ "_view_count": null,
1086
+ "_view_module": "@jupyter-widgets/controls",
1087
+ "_view_module_version": "1.5.0",
1088
+ "_view_name": "HTMLView",
1089
+ "description": "",
1090
+ "description_tooltip": null,
1091
+ "layout": "IPY_MODEL_322cc6ef697945ccbbc2b3029dfdf0e3",
1092
+ "placeholder": "​",
1093
+ "style": "IPY_MODEL_f9456ff5134242bc9541d9d60c753384",
1094
+ "value": " 473/473 [00:00&lt;00:00, 13.5kB/s]"
1095
+ }
1096
+ },
1097
+ "7361467d667d49ee85c432eec884d882": {
1098
+ "model_module": "@jupyter-widgets/controls",
1099
+ "model_module_version": "1.5.0",
1100
+ "model_name": "ProgressStyleModel",
1101
+ "state": {
1102
+ "_model_module": "@jupyter-widgets/controls",
1103
+ "_model_module_version": "1.5.0",
1104
+ "_model_name": "ProgressStyleModel",
1105
+ "_view_count": null,
1106
+ "_view_module": "@jupyter-widgets/base",
1107
+ "_view_module_version": "1.2.0",
1108
+ "_view_name": "StyleView",
1109
+ "bar_color": null,
1110
+ "description_width": ""
1111
+ }
1112
+ },
1113
+ "794c088a92ba4f6798fa94cded51d0bd": {
1114
+ "model_module": "@jupyter-widgets/base",
1115
+ "model_module_version": "1.2.0",
1116
+ "model_name": "LayoutModel",
1117
+ "state": {
1118
+ "_model_module": "@jupyter-widgets/base",
1119
+ "_model_module_version": "1.2.0",
1120
+ "_model_name": "LayoutModel",
1121
+ "_view_count": null,
1122
+ "_view_module": "@jupyter-widgets/base",
1123
+ "_view_module_version": "1.2.0",
1124
+ "_view_name": "LayoutView",
1125
+ "align_content": null,
1126
+ "align_items": null,
1127
+ "align_self": null,
1128
+ "border": null,
1129
+ "bottom": null,
1130
+ "display": null,
1131
+ "flex": null,
1132
+ "flex_flow": null,
1133
+ "grid_area": null,
1134
+ "grid_auto_columns": null,
1135
+ "grid_auto_flow": null,
1136
+ "grid_auto_rows": null,
1137
+ "grid_column": null,
1138
+ "grid_gap": null,
1139
+ "grid_row": null,
1140
+ "grid_template_areas": null,
1141
+ "grid_template_columns": null,
1142
+ "grid_template_rows": null,
1143
+ "height": null,
1144
+ "justify_content": null,
1145
+ "justify_items": null,
1146
+ "left": null,
1147
+ "margin": null,
1148
+ "max_height": null,
1149
+ "max_width": null,
1150
+ "min_height": null,
1151
+ "min_width": null,
1152
+ "object_fit": null,
1153
+ "object_position": null,
1154
+ "order": null,
1155
+ "overflow": null,
1156
+ "overflow_x": null,
1157
+ "overflow_y": null,
1158
+ "padding": null,
1159
+ "right": null,
1160
+ "top": null,
1161
+ "visibility": null,
1162
+ "width": null
1163
+ }
1164
+ },
1165
+ "80cb9869ae694ecda95aab298598c7a2": {
1166
+ "model_module": "@jupyter-widgets/base",
1167
+ "model_module_version": "1.2.0",
1168
+ "model_name": "LayoutModel",
1169
+ "state": {
1170
+ "_model_module": "@jupyter-widgets/base",
1171
+ "_model_module_version": "1.2.0",
1172
+ "_model_name": "LayoutModel",
1173
+ "_view_count": null,
1174
+ "_view_module": "@jupyter-widgets/base",
1175
+ "_view_module_version": "1.2.0",
1176
+ "_view_name": "LayoutView",
1177
+ "align_content": null,
1178
+ "align_items": null,
1179
+ "align_self": null,
1180
+ "border": null,
1181
+ "bottom": null,
1182
+ "display": null,
1183
+ "flex": null,
1184
+ "flex_flow": null,
1185
+ "grid_area": null,
1186
+ "grid_auto_columns": null,
1187
+ "grid_auto_flow": null,
1188
+ "grid_auto_rows": null,
1189
+ "grid_column": null,
1190
+ "grid_gap": null,
1191
+ "grid_row": null,
1192
+ "grid_template_areas": null,
1193
+ "grid_template_columns": null,
1194
+ "grid_template_rows": null,
1195
+ "height": null,
1196
+ "justify_content": null,
1197
+ "justify_items": null,
1198
+ "left": null,
1199
+ "margin": null,
1200
+ "max_height": null,
1201
+ "max_width": null,
1202
+ "min_height": null,
1203
+ "min_width": null,
1204
+ "object_fit": null,
1205
+ "object_position": null,
1206
+ "order": null,
1207
+ "overflow": null,
1208
+ "overflow_x": null,
1209
+ "overflow_y": null,
1210
+ "padding": null,
1211
+ "right": null,
1212
+ "top": null,
1213
+ "visibility": null,
1214
+ "width": null
1215
+ }
1216
+ },
1217
+ "86dce8a2c404469dab0dc1a466788c0a": {
1218
+ "model_module": "@jupyter-widgets/controls",
1219
+ "model_module_version": "1.5.0",
1220
+ "model_name": "HTMLModel",
1221
+ "state": {
1222
+ "_dom_classes": [],
1223
+ "_model_module": "@jupyter-widgets/controls",
1224
+ "_model_module_version": "1.5.0",
1225
+ "_model_name": "HTMLModel",
1226
+ "_view_count": null,
1227
+ "_view_module": "@jupyter-widgets/controls",
1228
+ "_view_module_version": "1.5.0",
1229
+ "_view_name": "HTMLView",
1230
+ "description": "",
1231
+ "description_tooltip": null,
1232
+ "layout": "IPY_MODEL_b0f31194b8f24b5ab601f8edd5332e04",
1233
+ "placeholder": "​",
1234
+ "style": "IPY_MODEL_cbeb9e9dfdf6420d91ec1126fedd8e48",
1235
+ "value": " 29.0/29.0 [00:00&lt;00:00, 321B/s]"
1236
+ }
1237
+ },
1238
+ "8cc02da6124448598923899b144afd6d": {
1239
+ "model_module": "@jupyter-widgets/base",
1240
+ "model_module_version": "1.2.0",
1241
+ "model_name": "LayoutModel",
1242
+ "state": {
1243
+ "_model_module": "@jupyter-widgets/base",
1244
+ "_model_module_version": "1.2.0",
1245
+ "_model_name": "LayoutModel",
1246
+ "_view_count": null,
1247
+ "_view_module": "@jupyter-widgets/base",
1248
+ "_view_module_version": "1.2.0",
1249
+ "_view_name": "LayoutView",
1250
+ "align_content": null,
1251
+ "align_items": null,
1252
+ "align_self": null,
1253
+ "border": null,
1254
+ "bottom": null,
1255
+ "display": null,
1256
+ "flex": null,
1257
+ "flex_flow": null,
1258
+ "grid_area": null,
1259
+ "grid_auto_columns": null,
1260
+ "grid_auto_flow": null,
1261
+ "grid_auto_rows": null,
1262
+ "grid_column": null,
1263
+ "grid_gap": null,
1264
+ "grid_row": null,
1265
+ "grid_template_areas": null,
1266
+ "grid_template_columns": null,
1267
+ "grid_template_rows": null,
1268
+ "height": null,
1269
+ "justify_content": null,
1270
+ "justify_items": null,
1271
+ "left": null,
1272
+ "margin": null,
1273
+ "max_height": null,
1274
+ "max_width": null,
1275
+ "min_height": null,
1276
+ "min_width": null,
1277
+ "object_fit": null,
1278
+ "object_position": null,
1279
+ "order": null,
1280
+ "overflow": null,
1281
+ "overflow_x": null,
1282
+ "overflow_y": null,
1283
+ "padding": null,
1284
+ "right": null,
1285
+ "top": null,
1286
+ "visibility": null,
1287
+ "width": null
1288
+ }
1289
+ },
1290
+ "9adf994d114141f98aeea509a73e9c59": {
1291
+ "model_module": "@jupyter-widgets/base",
1292
+ "model_module_version": "1.2.0",
1293
+ "model_name": "LayoutModel",
1294
+ "state": {
1295
+ "_model_module": "@jupyter-widgets/base",
1296
+ "_model_module_version": "1.2.0",
1297
+ "_model_name": "LayoutModel",
1298
+ "_view_count": null,
1299
+ "_view_module": "@jupyter-widgets/base",
1300
+ "_view_module_version": "1.2.0",
1301
+ "_view_name": "LayoutView",
1302
+ "align_content": null,
1303
+ "align_items": null,
1304
+ "align_self": null,
1305
+ "border": null,
1306
+ "bottom": null,
1307
+ "display": null,
1308
+ "flex": null,
1309
+ "flex_flow": null,
1310
+ "grid_area": null,
1311
+ "grid_auto_columns": null,
1312
+ "grid_auto_flow": null,
1313
+ "grid_auto_rows": null,
1314
+ "grid_column": null,
1315
+ "grid_gap": null,
1316
+ "grid_row": null,
1317
+ "grid_template_areas": null,
1318
+ "grid_template_columns": null,
1319
+ "grid_template_rows": null,
1320
+ "height": null,
1321
+ "justify_content": null,
1322
+ "justify_items": null,
1323
+ "left": null,
1324
+ "margin": null,
1325
+ "max_height": null,
1326
+ "max_width": null,
1327
+ "min_height": null,
1328
+ "min_width": null,
1329
+ "object_fit": null,
1330
+ "object_position": null,
1331
+ "order": null,
1332
+ "overflow": null,
1333
+ "overflow_x": null,
1334
+ "overflow_y": null,
1335
+ "padding": null,
1336
+ "right": null,
1337
+ "top": null,
1338
+ "visibility": null,
1339
+ "width": null
1340
+ }
1341
+ },
1342
+ "9f4a770ae6b84593ac7de85e15c305a9": {
1343
+ "model_module": "@jupyter-widgets/base",
1344
+ "model_module_version": "1.2.0",
1345
+ "model_name": "LayoutModel",
1346
+ "state": {
1347
+ "_model_module": "@jupyter-widgets/base",
1348
+ "_model_module_version": "1.2.0",
1349
+ "_model_name": "LayoutModel",
1350
+ "_view_count": null,
1351
+ "_view_module": "@jupyter-widgets/base",
1352
+ "_view_module_version": "1.2.0",
1353
+ "_view_name": "LayoutView",
1354
+ "align_content": null,
1355
+ "align_items": null,
1356
+ "align_self": null,
1357
+ "border": null,
1358
+ "bottom": null,
1359
+ "display": null,
1360
+ "flex": null,
1361
+ "flex_flow": null,
1362
+ "grid_area": null,
1363
+ "grid_auto_columns": null,
1364
+ "grid_auto_flow": null,
1365
+ "grid_auto_rows": null,
1366
+ "grid_column": null,
1367
+ "grid_gap": null,
1368
+ "grid_row": null,
1369
+ "grid_template_areas": null,
1370
+ "grid_template_columns": null,
1371
+ "grid_template_rows": null,
1372
+ "height": null,
1373
+ "justify_content": null,
1374
+ "justify_items": null,
1375
+ "left": null,
1376
+ "margin": null,
1377
+ "max_height": null,
1378
+ "max_width": null,
1379
+ "min_height": null,
1380
+ "min_width": null,
1381
+ "object_fit": null,
1382
+ "object_position": null,
1383
+ "order": null,
1384
+ "overflow": null,
1385
+ "overflow_x": null,
1386
+ "overflow_y": null,
1387
+ "padding": null,
1388
+ "right": null,
1389
+ "top": null,
1390
+ "visibility": null,
1391
+ "width": null
1392
+ }
1393
+ },
1394
+ "a371ed9c75184b78a80facda31086426": {
1395
+ "model_module": "@jupyter-widgets/controls",
1396
+ "model_module_version": "1.5.0",
1397
+ "model_name": "HTMLModel",
1398
+ "state": {
1399
+ "_dom_classes": [],
1400
+ "_model_module": "@jupyter-widgets/controls",
1401
+ "_model_module_version": "1.5.0",
1402
+ "_model_name": "HTMLModel",
1403
+ "_view_count": null,
1404
+ "_view_module": "@jupyter-widgets/controls",
1405
+ "_view_module_version": "1.5.0",
1406
+ "_view_name": "HTMLView",
1407
+ "description": "",
1408
+ "description_tooltip": null,
1409
+ "layout": "IPY_MODEL_e9f1a9476de147c3bc98c1c36960dad6",
1410
+ "placeholder": "​",
1411
+ "style": "IPY_MODEL_2f8a7ddfcee64b978ff23f8f43911e01",
1412
+ "value": "Downloading: 100%"
1413
+ }
1414
+ },
1415
+ "b099d3bb966a4e9b8d02710329030ff3": {
1416
+ "model_module": "@jupyter-widgets/controls",
1417
+ "model_module_version": "1.5.0",
1418
+ "model_name": "DescriptionStyleModel",
1419
+ "state": {
1420
+ "_model_module": "@jupyter-widgets/controls",
1421
+ "_model_module_version": "1.5.0",
1422
+ "_model_name": "DescriptionStyleModel",
1423
+ "_view_count": null,
1424
+ "_view_module": "@jupyter-widgets/base",
1425
+ "_view_module_version": "1.2.0",
1426
+ "_view_name": "StyleView",
1427
+ "description_width": ""
1428
+ }
1429
+ },
1430
+ "b0f31194b8f24b5ab601f8edd5332e04": {
1431
+ "model_module": "@jupyter-widgets/base",
1432
+ "model_module_version": "1.2.0",
1433
+ "model_name": "LayoutModel",
1434
+ "state": {
1435
+ "_model_module": "@jupyter-widgets/base",
1436
+ "_model_module_version": "1.2.0",
1437
+ "_model_name": "LayoutModel",
1438
+ "_view_count": null,
1439
+ "_view_module": "@jupyter-widgets/base",
1440
+ "_view_module_version": "1.2.0",
1441
+ "_view_name": "LayoutView",
1442
+ "align_content": null,
1443
+ "align_items": null,
1444
+ "align_self": null,
1445
+ "border": null,
1446
+ "bottom": null,
1447
+ "display": null,
1448
+ "flex": null,
1449
+ "flex_flow": null,
1450
+ "grid_area": null,
1451
+ "grid_auto_columns": null,
1452
+ "grid_auto_flow": null,
1453
+ "grid_auto_rows": null,
1454
+ "grid_column": null,
1455
+ "grid_gap": null,
1456
+ "grid_row": null,
1457
+ "grid_template_areas": null,
1458
+ "grid_template_columns": null,
1459
+ "grid_template_rows": null,
1460
+ "height": null,
1461
+ "justify_content": null,
1462
+ "justify_items": null,
1463
+ "left": null,
1464
+ "margin": null,
1465
+ "max_height": null,
1466
+ "max_width": null,
1467
+ "min_height": null,
1468
+ "min_width": null,
1469
+ "object_fit": null,
1470
+ "object_position": null,
1471
+ "order": null,
1472
+ "overflow": null,
1473
+ "overflow_x": null,
1474
+ "overflow_y": null,
1475
+ "padding": null,
1476
+ "right": null,
1477
+ "top": null,
1478
+ "visibility": null,
1479
+ "width": null
1480
+ }
1481
+ },
1482
+ "b1271eb1b7e74250bd9273f229b49cd8": {
1483
+ "model_module": "@jupyter-widgets/base",
1484
+ "model_module_version": "1.2.0",
1485
+ "model_name": "LayoutModel",
1486
+ "state": {
1487
+ "_model_module": "@jupyter-widgets/base",
1488
+ "_model_module_version": "1.2.0",
1489
+ "_model_name": "LayoutModel",
1490
+ "_view_count": null,
1491
+ "_view_module": "@jupyter-widgets/base",
1492
+ "_view_module_version": "1.2.0",
1493
+ "_view_name": "LayoutView",
1494
+ "align_content": null,
1495
+ "align_items": null,
1496
+ "align_self": null,
1497
+ "border": null,
1498
+ "bottom": null,
1499
+ "display": null,
1500
+ "flex": null,
1501
+ "flex_flow": null,
1502
+ "grid_area": null,
1503
+ "grid_auto_columns": null,
1504
+ "grid_auto_flow": null,
1505
+ "grid_auto_rows": null,
1506
+ "grid_column": null,
1507
+ "grid_gap": null,
1508
+ "grid_row": null,
1509
+ "grid_template_areas": null,
1510
+ "grid_template_columns": null,
1511
+ "grid_template_rows": null,
1512
+ "height": null,
1513
+ "justify_content": null,
1514
+ "justify_items": null,
1515
+ "left": null,
1516
+ "margin": null,
1517
+ "max_height": null,
1518
+ "max_width": null,
1519
+ "min_height": null,
1520
+ "min_width": null,
1521
+ "object_fit": null,
1522
+ "object_position": null,
1523
+ "order": null,
1524
+ "overflow": null,
1525
+ "overflow_x": null,
1526
+ "overflow_y": null,
1527
+ "padding": null,
1528
+ "right": null,
1529
+ "top": null,
1530
+ "visibility": null,
1531
+ "width": null
1532
+ }
1533
+ },
1534
+ "b440f3b3937549d69a4f188bb8415531": {
1535
+ "model_module": "@jupyter-widgets/controls",
1536
+ "model_module_version": "1.5.0",
1537
+ "model_name": "HTMLModel",
1538
+ "state": {
1539
+ "_dom_classes": [],
1540
+ "_model_module": "@jupyter-widgets/controls",
1541
+ "_model_module_version": "1.5.0",
1542
+ "_model_name": "HTMLModel",
1543
+ "_view_count": null,
1544
+ "_view_module": "@jupyter-widgets/controls",
1545
+ "_view_module_version": "1.5.0",
1546
+ "_view_name": "HTMLView",
1547
+ "description": "",
1548
+ "description_tooltip": null,
1549
+ "layout": "IPY_MODEL_2aaa61bf4df248be970940e103af0276",
1550
+ "placeholder": "​",
1551
+ "style": "IPY_MODEL_3a7f8b6302034a0e889976ba0fbb4531",
1552
+ "value": "Downloading: 100%"
1553
+ }
1554
+ },
1555
+ "bb455638d3ac451096fd7cce4cb0d82c": {
1556
+ "model_module": "@jupyter-widgets/controls",
1557
+ "model_module_version": "1.5.0",
1558
+ "model_name": "HTMLModel",
1559
+ "state": {
1560
+ "_dom_classes": [],
1561
+ "_model_module": "@jupyter-widgets/controls",
1562
+ "_model_module_version": "1.5.0",
1563
+ "_model_name": "HTMLModel",
1564
+ "_view_count": null,
1565
+ "_view_module": "@jupyter-widgets/controls",
1566
+ "_view_module_version": "1.5.0",
1567
+ "_view_name": "HTMLView",
1568
+ "description": "",
1569
+ "description_tooltip": null,
1570
+ "layout": "IPY_MODEL_c53ce49d9b7c4a3c87ad7f7c75dce1f5",
1571
+ "placeholder": "​",
1572
+ "style": "IPY_MODEL_b099d3bb966a4e9b8d02710329030ff3",
1573
+ "value": " 261M/261M [00:04&lt;00:00, 53.4MB/s]"
1574
+ }
1575
+ },
1576
+ "c04562a21d36405890c11e74915839c7": {
1577
+ "model_module": "@jupyter-widgets/base",
1578
+ "model_module_version": "1.2.0",
1579
+ "model_name": "LayoutModel",
1580
+ "state": {
1581
+ "_model_module": "@jupyter-widgets/base",
1582
+ "_model_module_version": "1.2.0",
1583
+ "_model_name": "LayoutModel",
1584
+ "_view_count": null,
1585
+ "_view_module": "@jupyter-widgets/base",
1586
+ "_view_module_version": "1.2.0",
1587
+ "_view_name": "LayoutView",
1588
+ "align_content": null,
1589
+ "align_items": null,
1590
+ "align_self": null,
1591
+ "border": null,
1592
+ "bottom": null,
1593
+ "display": null,
1594
+ "flex": null,
1595
+ "flex_flow": null,
1596
+ "grid_area": null,
1597
+ "grid_auto_columns": null,
1598
+ "grid_auto_flow": null,
1599
+ "grid_auto_rows": null,
1600
+ "grid_column": null,
1601
+ "grid_gap": null,
1602
+ "grid_row": null,
1603
+ "grid_template_areas": null,
1604
+ "grid_template_columns": null,
1605
+ "grid_template_rows": null,
1606
+ "height": null,
1607
+ "justify_content": null,
1608
+ "justify_items": null,
1609
+ "left": null,
1610
+ "margin": null,
1611
+ "max_height": null,
1612
+ "max_width": null,
1613
+ "min_height": null,
1614
+ "min_width": null,
1615
+ "object_fit": null,
1616
+ "object_position": null,
1617
+ "order": null,
1618
+ "overflow": null,
1619
+ "overflow_x": null,
1620
+ "overflow_y": null,
1621
+ "padding": null,
1622
+ "right": null,
1623
+ "top": null,
1624
+ "visibility": null,
1625
+ "width": null
1626
+ }
1627
+ },
1628
+ "c363fb4238e0464cb3a4ab16250e554a": {
1629
+ "model_module": "@jupyter-widgets/controls",
1630
+ "model_module_version": "1.5.0",
1631
+ "model_name": "FloatProgressModel",
1632
+ "state": {
1633
+ "_dom_classes": [],
1634
+ "_model_module": "@jupyter-widgets/controls",
1635
+ "_model_module_version": "1.5.0",
1636
+ "_model_name": "FloatProgressModel",
1637
+ "_view_count": null,
1638
+ "_view_module": "@jupyter-widgets/controls",
1639
+ "_view_module_version": "1.5.0",
1640
+ "_view_name": "ProgressView",
1641
+ "bar_style": "success",
1642
+ "description": "",
1643
+ "description_tooltip": null,
1644
+ "layout": "IPY_MODEL_099a1ac5af9b4e38abb6afb9c333d37a",
1645
+ "max": 29,
1646
+ "min": 0,
1647
+ "orientation": "horizontal",
1648
+ "style": "IPY_MODEL_340a7a5171ba4b8aa9b70e945df1618c",
1649
+ "value": 29
1650
+ }
1651
+ },
1652
+ "c4a6b418089147f6b50eab097ace0342": {
1653
+ "model_module": "@jupyter-widgets/base",
1654
+ "model_module_version": "1.2.0",
1655
+ "model_name": "LayoutModel",
1656
+ "state": {
1657
+ "_model_module": "@jupyter-widgets/base",
1658
+ "_model_module_version": "1.2.0",
1659
+ "_model_name": "LayoutModel",
1660
+ "_view_count": null,
1661
+ "_view_module": "@jupyter-widgets/base",
1662
+ "_view_module_version": "1.2.0",
1663
+ "_view_name": "LayoutView",
1664
+ "align_content": null,
1665
+ "align_items": null,
1666
+ "align_self": null,
1667
+ "border": null,
1668
+ "bottom": null,
1669
+ "display": null,
1670
+ "flex": null,
1671
+ "flex_flow": null,
1672
+ "grid_area": null,
1673
+ "grid_auto_columns": null,
1674
+ "grid_auto_flow": null,
1675
+ "grid_auto_rows": null,
1676
+ "grid_column": null,
1677
+ "grid_gap": null,
1678
+ "grid_row": null,
1679
+ "grid_template_areas": null,
1680
+ "grid_template_columns": null,
1681
+ "grid_template_rows": null,
1682
+ "height": null,
1683
+ "justify_content": null,
1684
+ "justify_items": null,
1685
+ "left": null,
1686
+ "margin": null,
1687
+ "max_height": null,
1688
+ "max_width": null,
1689
+ "min_height": null,
1690
+ "min_width": null,
1691
+ "object_fit": null,
1692
+ "object_position": null,
1693
+ "order": null,
1694
+ "overflow": null,
1695
+ "overflow_x": null,
1696
+ "overflow_y": null,
1697
+ "padding": null,
1698
+ "right": null,
1699
+ "top": null,
1700
+ "visibility": null,
1701
+ "width": null
1702
+ }
1703
+ },
1704
+ "c53ce49d9b7c4a3c87ad7f7c75dce1f5": {
1705
+ "model_module": "@jupyter-widgets/base",
1706
+ "model_module_version": "1.2.0",
1707
+ "model_name": "LayoutModel",
1708
+ "state": {
1709
+ "_model_module": "@jupyter-widgets/base",
1710
+ "_model_module_version": "1.2.0",
1711
+ "_model_name": "LayoutModel",
1712
+ "_view_count": null,
1713
+ "_view_module": "@jupyter-widgets/base",
1714
+ "_view_module_version": "1.2.0",
1715
+ "_view_name": "LayoutView",
1716
+ "align_content": null,
1717
+ "align_items": null,
1718
+ "align_self": null,
1719
+ "border": null,
1720
+ "bottom": null,
1721
+ "display": null,
1722
+ "flex": null,
1723
+ "flex_flow": null,
1724
+ "grid_area": null,
1725
+ "grid_auto_columns": null,
1726
+ "grid_auto_flow": null,
1727
+ "grid_auto_rows": null,
1728
+ "grid_column": null,
1729
+ "grid_gap": null,
1730
+ "grid_row": null,
1731
+ "grid_template_areas": null,
1732
+ "grid_template_columns": null,
1733
+ "grid_template_rows": null,
1734
+ "height": null,
1735
+ "justify_content": null,
1736
+ "justify_items": null,
1737
+ "left": null,
1738
+ "margin": null,
1739
+ "max_height": null,
1740
+ "max_width": null,
1741
+ "min_height": null,
1742
+ "min_width": null,
1743
+ "object_fit": null,
1744
+ "object_position": null,
1745
+ "order": null,
1746
+ "overflow": null,
1747
+ "overflow_x": null,
1748
+ "overflow_y": null,
1749
+ "padding": null,
1750
+ "right": null,
1751
+ "top": null,
1752
+ "visibility": null,
1753
+ "width": null
1754
+ }
1755
+ },
1756
+ "cbeb9e9dfdf6420d91ec1126fedd8e48": {
1757
+ "model_module": "@jupyter-widgets/controls",
1758
+ "model_module_version": "1.5.0",
1759
+ "model_name": "DescriptionStyleModel",
1760
+ "state": {
1761
+ "_model_module": "@jupyter-widgets/controls",
1762
+ "_model_module_version": "1.5.0",
1763
+ "_model_name": "DescriptionStyleModel",
1764
+ "_view_count": null,
1765
+ "_view_module": "@jupyter-widgets/base",
1766
+ "_view_module_version": "1.2.0",
1767
+ "_view_name": "StyleView",
1768
+ "description_width": ""
1769
+ }
1770
+ },
1771
+ "d7e158e614f44983b229d6dd0d8960f9": {
1772
+ "model_module": "@jupyter-widgets/controls",
1773
+ "model_module_version": "1.5.0",
1774
+ "model_name": "HBoxModel",
1775
+ "state": {
1776
+ "_dom_classes": [],
1777
+ "_model_module": "@jupyter-widgets/controls",
1778
+ "_model_module_version": "1.5.0",
1779
+ "_model_name": "HBoxModel",
1780
+ "_view_count": null,
1781
+ "_view_module": "@jupyter-widgets/controls",
1782
+ "_view_module_version": "1.5.0",
1783
+ "_view_name": "HBoxView",
1784
+ "box_style": "",
1785
+ "children": [
1786
+ "IPY_MODEL_69aad11dbc914410b95f6c3cb17a2457",
1787
+ "IPY_MODEL_0302e718c6084fb0a96d92fd976738dc",
1788
+ "IPY_MODEL_72b47b116b0b4125a35d47e060f46807"
1789
+ ],
1790
+ "layout": "IPY_MODEL_6f5fbb8f0f5a4374a7bde870c64f1fa4"
1791
+ }
1792
+ },
1793
+ "da1553ec3e044a4fb7bb9b0c2a84bfe0": {
1794
+ "model_module": "@jupyter-widgets/controls",
1795
+ "model_module_version": "1.5.0",
1796
+ "model_name": "HBoxModel",
1797
+ "state": {
1798
+ "_dom_classes": [],
1799
+ "_model_module": "@jupyter-widgets/controls",
1800
+ "_model_module_version": "1.5.0",
1801
+ "_model_name": "HBoxModel",
1802
+ "_view_count": null,
1803
+ "_view_module": "@jupyter-widgets/controls",
1804
+ "_view_module_version": "1.5.0",
1805
+ "_view_name": "HBoxView",
1806
+ "box_style": "",
1807
+ "children": [
1808
+ "IPY_MODEL_b440f3b3937549d69a4f188bb8415531",
1809
+ "IPY_MODEL_1b937b91e8ee46c4a764a8091f365291",
1810
+ "IPY_MODEL_1369029047864cc68100a44ffaad35ca"
1811
+ ],
1812
+ "layout": "IPY_MODEL_80cb9869ae694ecda95aab298598c7a2"
1813
+ }
1814
+ },
1815
+ "dd420d29751341faa84c025afa743bb5": {
1816
+ "model_module": "@jupyter-widgets/base",
1817
+ "model_module_version": "1.2.0",
1818
+ "model_name": "LayoutModel",
1819
+ "state": {
1820
+ "_model_module": "@jupyter-widgets/base",
1821
+ "_model_module_version": "1.2.0",
1822
+ "_model_name": "LayoutModel",
1823
+ "_view_count": null,
1824
+ "_view_module": "@jupyter-widgets/base",
1825
+ "_view_module_version": "1.2.0",
1826
+ "_view_name": "LayoutView",
1827
+ "align_content": null,
1828
+ "align_items": null,
1829
+ "align_self": null,
1830
+ "border": null,
1831
+ "bottom": null,
1832
+ "display": null,
1833
+ "flex": null,
1834
+ "flex_flow": null,
1835
+ "grid_area": null,
1836
+ "grid_auto_columns": null,
1837
+ "grid_auto_flow": null,
1838
+ "grid_auto_rows": null,
1839
+ "grid_column": null,
1840
+ "grid_gap": null,
1841
+ "grid_row": null,
1842
+ "grid_template_areas": null,
1843
+ "grid_template_columns": null,
1844
+ "grid_template_rows": null,
1845
+ "height": null,
1846
+ "justify_content": null,
1847
+ "justify_items": null,
1848
+ "left": null,
1849
+ "margin": null,
1850
+ "max_height": null,
1851
+ "max_width": null,
1852
+ "min_height": null,
1853
+ "min_width": null,
1854
+ "object_fit": null,
1855
+ "object_position": null,
1856
+ "order": null,
1857
+ "overflow": null,
1858
+ "overflow_x": null,
1859
+ "overflow_y": null,
1860
+ "padding": null,
1861
+ "right": null,
1862
+ "top": null,
1863
+ "visibility": null,
1864
+ "width": null
1865
+ }
1866
+ },
1867
+ "deb9a0d8d3e1430bacab12c8b4ce7573": {
1868
+ "model_module": "@jupyter-widgets/controls",
1869
+ "model_module_version": "1.5.0",
1870
+ "model_name": "ProgressStyleModel",
1871
+ "state": {
1872
+ "_model_module": "@jupyter-widgets/controls",
1873
+ "_model_module_version": "1.5.0",
1874
+ "_model_name": "ProgressStyleModel",
1875
+ "_view_count": null,
1876
+ "_view_module": "@jupyter-widgets/base",
1877
+ "_view_module_version": "1.2.0",
1878
+ "_view_name": "StyleView",
1879
+ "bar_color": null,
1880
+ "description_width": ""
1881
+ }
1882
+ },
1883
+ "e33617a01b03437986c143c9a69ba14f": {
1884
+ "model_module": "@jupyter-widgets/controls",
1885
+ "model_module_version": "1.5.0",
1886
+ "model_name": "DescriptionStyleModel",
1887
+ "state": {
1888
+ "_model_module": "@jupyter-widgets/controls",
1889
+ "_model_module_version": "1.5.0",
1890
+ "_model_name": "DescriptionStyleModel",
1891
+ "_view_count": null,
1892
+ "_view_module": "@jupyter-widgets/base",
1893
+ "_view_module_version": "1.2.0",
1894
+ "_view_name": "StyleView",
1895
+ "description_width": ""
1896
+ }
1897
+ },
1898
+ "e6b91c2208e44524a9883309ae431277": {
1899
+ "model_module": "@jupyter-widgets/controls",
1900
+ "model_module_version": "1.5.0",
1901
+ "model_name": "DescriptionStyleModel",
1902
+ "state": {
1903
+ "_model_module": "@jupyter-widgets/controls",
1904
+ "_model_module_version": "1.5.0",
1905
+ "_model_name": "DescriptionStyleModel",
1906
+ "_view_count": null,
1907
+ "_view_module": "@jupyter-widgets/base",
1908
+ "_view_module_version": "1.2.0",
1909
+ "_view_name": "StyleView",
1910
+ "description_width": ""
1911
+ }
1912
+ },
1913
+ "e9f1a9476de147c3bc98c1c36960dad6": {
1914
+ "model_module": "@jupyter-widgets/base",
1915
+ "model_module_version": "1.2.0",
1916
+ "model_name": "LayoutModel",
1917
+ "state": {
1918
+ "_model_module": "@jupyter-widgets/base",
1919
+ "_model_module_version": "1.2.0",
1920
+ "_model_name": "LayoutModel",
1921
+ "_view_count": null,
1922
+ "_view_module": "@jupyter-widgets/base",
1923
+ "_view_module_version": "1.2.0",
1924
+ "_view_name": "LayoutView",
1925
+ "align_content": null,
1926
+ "align_items": null,
1927
+ "align_self": null,
1928
+ "border": null,
1929
+ "bottom": null,
1930
+ "display": null,
1931
+ "flex": null,
1932
+ "flex_flow": null,
1933
+ "grid_area": null,
1934
+ "grid_auto_columns": null,
1935
+ "grid_auto_flow": null,
1936
+ "grid_auto_rows": null,
1937
+ "grid_column": null,
1938
+ "grid_gap": null,
1939
+ "grid_row": null,
1940
+ "grid_template_areas": null,
1941
+ "grid_template_columns": null,
1942
+ "grid_template_rows": null,
1943
+ "height": null,
1944
+ "justify_content": null,
1945
+ "justify_items": null,
1946
+ "left": null,
1947
+ "margin": null,
1948
+ "max_height": null,
1949
+ "max_width": null,
1950
+ "min_height": null,
1951
+ "min_width": null,
1952
+ "object_fit": null,
1953
+ "object_position": null,
1954
+ "order": null,
1955
+ "overflow": null,
1956
+ "overflow_x": null,
1957
+ "overflow_y": null,
1958
+ "padding": null,
1959
+ "right": null,
1960
+ "top": null,
1961
+ "visibility": null,
1962
+ "width": null
1963
+ }
1964
+ },
1965
+ "ec90909f38e0448bab37c735ea9b9ebe": {
1966
+ "model_module": "@jupyter-widgets/controls",
1967
+ "model_module_version": "1.5.0",
1968
+ "model_name": "HTMLModel",
1969
+ "state": {
1970
+ "_dom_classes": [],
1971
+ "_model_module": "@jupyter-widgets/controls",
1972
+ "_model_module_version": "1.5.0",
1973
+ "_model_name": "HTMLModel",
1974
+ "_view_count": null,
1975
+ "_view_module": "@jupyter-widgets/controls",
1976
+ "_view_module_version": "1.5.0",
1977
+ "_view_name": "HTMLView",
1978
+ "description": "",
1979
+ "description_tooltip": null,
1980
+ "layout": "IPY_MODEL_dd420d29751341faa84c025afa743bb5",
1981
+ "placeholder": "​",
1982
+ "style": "IPY_MODEL_4d56ed704097453a930baa9ecdfb1156",
1983
+ "value": " 436k/436k [00:01&lt;00:00, 470kB/s]"
1984
+ }
1985
+ },
1986
+ "f4274a98f06945a6a1e4a56b680c1790": {
1987
+ "model_module": "@jupyter-widgets/base",
1988
+ "model_module_version": "1.2.0",
1989
+ "model_name": "LayoutModel",
1990
+ "state": {
1991
+ "_model_module": "@jupyter-widgets/base",
1992
+ "_model_module_version": "1.2.0",
1993
+ "_model_name": "LayoutModel",
1994
+ "_view_count": null,
1995
+ "_view_module": "@jupyter-widgets/base",
1996
+ "_view_module_version": "1.2.0",
1997
+ "_view_name": "LayoutView",
1998
+ "align_content": null,
1999
+ "align_items": null,
2000
+ "align_self": null,
2001
+ "border": null,
2002
+ "bottom": null,
2003
+ "display": null,
2004
+ "flex": null,
2005
+ "flex_flow": null,
2006
+ "grid_area": null,
2007
+ "grid_auto_columns": null,
2008
+ "grid_auto_flow": null,
2009
+ "grid_auto_rows": null,
2010
+ "grid_column": null,
2011
+ "grid_gap": null,
2012
+ "grid_row": null,
2013
+ "grid_template_areas": null,
2014
+ "grid_template_columns": null,
2015
+ "grid_template_rows": null,
2016
+ "height": null,
2017
+ "justify_content": null,
2018
+ "justify_items": null,
2019
+ "left": null,
2020
+ "margin": null,
2021
+ "max_height": null,
2022
+ "max_width": null,
2023
+ "min_height": null,
2024
+ "min_width": null,
2025
+ "object_fit": null,
2026
+ "object_position": null,
2027
+ "order": null,
2028
+ "overflow": null,
2029
+ "overflow_x": null,
2030
+ "overflow_y": null,
2031
+ "padding": null,
2032
+ "right": null,
2033
+ "top": null,
2034
+ "visibility": null,
2035
+ "width": null
2036
+ }
2037
+ },
2038
+ "f6a334e9b5da4c82a1c4f1fbc1fe3c7e": {
2039
+ "model_module": "@jupyter-widgets/base",
2040
+ "model_module_version": "1.2.0",
2041
+ "model_name": "LayoutModel",
2042
+ "state": {
2043
+ "_model_module": "@jupyter-widgets/base",
2044
+ "_model_module_version": "1.2.0",
2045
+ "_model_name": "LayoutModel",
2046
+ "_view_count": null,
2047
+ "_view_module": "@jupyter-widgets/base",
2048
+ "_view_module_version": "1.2.0",
2049
+ "_view_name": "LayoutView",
2050
+ "align_content": null,
2051
+ "align_items": null,
2052
+ "align_self": null,
2053
+ "border": null,
2054
+ "bottom": null,
2055
+ "display": null,
2056
+ "flex": null,
2057
+ "flex_flow": null,
2058
+ "grid_area": null,
2059
+ "grid_auto_columns": null,
2060
+ "grid_auto_flow": null,
2061
+ "grid_auto_rows": null,
2062
+ "grid_column": null,
2063
+ "grid_gap": null,
2064
+ "grid_row": null,
2065
+ "grid_template_areas": null,
2066
+ "grid_template_columns": null,
2067
+ "grid_template_rows": null,
2068
+ "height": null,
2069
+ "justify_content": null,
2070
+ "justify_items": null,
2071
+ "left": null,
2072
+ "margin": null,
2073
+ "max_height": null,
2074
+ "max_width": null,
2075
+ "min_height": null,
2076
+ "min_width": null,
2077
+ "object_fit": null,
2078
+ "object_position": null,
2079
+ "order": null,
2080
+ "overflow": null,
2081
+ "overflow_x": null,
2082
+ "overflow_y": null,
2083
+ "padding": null,
2084
+ "right": null,
2085
+ "top": null,
2086
+ "visibility": null,
2087
+ "width": null
2088
+ }
2089
+ },
2090
+ "f9456ff5134242bc9541d9d60c753384": {
2091
+ "model_module": "@jupyter-widgets/controls",
2092
+ "model_module_version": "1.5.0",
2093
+ "model_name": "DescriptionStyleModel",
2094
+ "state": {
2095
+ "_model_module": "@jupyter-widgets/controls",
2096
+ "_model_module_version": "1.5.0",
2097
+ "_model_name": "DescriptionStyleModel",
2098
+ "_view_count": null,
2099
+ "_view_module": "@jupyter-widgets/base",
2100
+ "_view_module_version": "1.2.0",
2101
+ "_view_name": "StyleView",
2102
+ "description_width": ""
2103
+ }
2104
+ }
2105
+ }
2106
+ }
2107
+ },
2108
+ "nbformat": 4,
2109
+ "nbformat_minor": 1
2110
+ }
NLP with Attention Models/QA/QA_DistilBERT_pipline_FT/Files/tf/.ipynb_checkpoints/C4W3_HF_Lab2_QA_BERT-checkpoint.ipynb ADDED
@@ -0,0 +1,644 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {
6
+ "id": "u2UXutvEvpUj"
7
+ },
8
+ "source": [
9
+ "# Question Answering with BERT and HuggingFace 🤗 (Fine-tuning)\n",
10
+ "\n",
11
+ "In the previous Hugging Face ungraded lab, you saw how to use the pipeline objects to use transformer models for NLP tasks. In that lab, the model didn't output the desired answers to a series of precise questions for a context related to the history of comic books.\n",
12
+ "\n",
13
+ "In this lab, you will fine-tune the model from that lab to give better answers for that type of context. To do that, you'll be using the [TyDi QA dataset](https://ai.google.com/research/tydiqa) but on a filtered version with only English examples. Additionally, you will use a lot of the tools that Hugging Face has to offer.\n",
14
+ "\n",
15
+ "You have to note that, in general, you will fine-tune general-purpose transformer models to work for specific tasks. However, fine-tuning a general-purpose model can take a lot of time. That's why you will be using the model from the question answering pipeline in this lab.\n",
16
+ "\n",
17
+ "Begin by importing some libraries and/or objects you will use throughout the lab:"
18
+ ]
19
+ },
20
+ {
21
+ "cell_type": "code",
22
+ "execution_count": null,
23
+ "metadata": {},
24
+ "outputs": [],
25
+ "source": [
26
+ "import os\n",
27
+ "os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'\n",
28
+ "\n",
29
+ "import numpy as np\n",
30
+ "\n",
31
+ "from datasets import load_from_disk\n",
32
+ "from transformers import AutoTokenizer, AutoModelForQuestionAnswering, Trainer, TrainingArguments\n",
33
+ "\n",
34
+ "from sklearn.metrics import f1_score"
35
+ ]
36
+ },
37
+ {
38
+ "cell_type": "markdown",
39
+ "metadata": {
40
+ "id": "FrEglXPmvpUr"
41
+ },
42
+ "source": [
43
+ "## Fine-tuning a BERT model\n",
44
+ "\n",
45
+ "As you saw in the previous lab, you can use these pipelines as they are. But sometimes, you'll need something more specific to your problem, or maybe you need it to perform better on your production data. In these cases, you'll need to fine-tune a model.\n",
46
+ "\n",
47
+ "Here, you'll fine-tune a pre-trained DistilBERT model on the TyDi QA dataset.\n",
48
+ "\n",
49
+ "To fine-tune your model, you will leverage three components provided by Hugging Face:\n",
50
+ "\n",
51
+ "* Datasets: Library that contains some datasets and different metrics to evaluate the performance of your models.\n",
52
+ "* Tokenizer: Object in charge of preprocessing your text to be given as input for the transformer models.\n",
53
+ "* Transformers: Library with the pre-trained model checkpoints and the trainer object.\n",
54
+ "\n"
55
+ ]
56
+ },
57
+ {
58
+ "cell_type": "markdown",
59
+ "metadata": {
60
+ "id": "g0Rg-e4jBFFs"
61
+ },
62
+ "source": [
63
+ "### Datasets\n",
64
+ "\n",
65
+ "To get the dataset to fine-tune your model, you will use [🤗 Datasets](https://huggingface.co/docs/datasets/), a lightweight and extensible library to share and access datasets and evaluation metrics for NLP easily. You can download Hugging Face datasets directly using the `load_dataset` function from the `datasets` library. \n",
66
+ "\n",
67
+ "Hugging Face `datasets` allows to load data in several formats, such as CSV, JSON, text files and even parquet. You can see more about the supported formats in the [documentation](https://huggingface.co/docs/datasets/loading)\n",
68
+ "\n",
69
+ "A common approach is to use `load_dataset` and get the full dataset but **for this lab you will use a filtered version containing only the English examples**, which is already saved in this environment. Since this filtered dataset is saved using the Apache Arrow format, you can read it by using the `load_from_disk` function.\n"
70
+ ]
71
+ },
72
+ {
73
+ "cell_type": "code",
74
+ "execution_count": null,
75
+ "metadata": {
76
+ "id": "x68dqaoXg5Ra"
77
+ },
78
+ "outputs": [],
79
+ "source": [
80
+ "#The path where the dataset is stored\n",
81
+ "path = './tydiqa_data/'\n",
82
+ "\n",
83
+ "#Load Dataset\n",
84
+ "tydiqa_data = load_from_disk(path)\n",
85
+ "\n",
86
+ "tydiqa_data"
87
+ ]
88
+ },
89
+ {
90
+ "cell_type": "markdown",
91
+ "metadata": {
92
+ "id": "1hfzBZU3T47O"
93
+ },
94
+ "source": [
95
+ "<a id='datasets_type'></a>\n",
96
+ "You can check below that the type of the loaded dataset is a `datasets.arrow_dataset.Dataset`. This object type corresponds to an Apache Arrow Table that allows creating a hash table that contains the position in memory where data is stored instead of loading the complete dataset into memory. But you don't have to worry too much about that. It is just an efficient way to work with lots of data."
97
+ ]
98
+ },
99
+ {
100
+ "cell_type": "code",
101
+ "execution_count": null,
102
+ "metadata": {
103
+ "id": "gkeppC3GQiW6"
104
+ },
105
+ "outputs": [],
106
+ "source": [
107
+ "# Checking the object type for one of the elements in the dataset\n",
108
+ "type(tydiqa_data['train'])"
109
+ ]
110
+ },
111
+ {
112
+ "cell_type": "markdown",
113
+ "metadata": {
114
+ "id": "q_HLaNtQaFlR"
115
+ },
116
+ "source": [
117
+ "You can also check the structure of the dataset:"
118
+ ]
119
+ },
120
+ {
121
+ "cell_type": "code",
122
+ "execution_count": null,
123
+ "metadata": {
124
+ "id": "2l9ANJTrbP-U"
125
+ },
126
+ "outputs": [],
127
+ "source": [
128
+ "tydiqa_data['train']"
129
+ ]
130
+ },
131
+ {
132
+ "cell_type": "markdown",
133
+ "metadata": {
134
+ "id": "2xRO1yIkvpUt"
135
+ },
136
+ "source": [
137
+ "You can see that each example is like a dictionary object. This dataset consists of questions, contexts, and indices that point to the start and end position of the answer inside the context. You can access the index using the `annotations` key, which is a kind of dictionary."
138
+ ]
139
+ },
140
+ {
141
+ "cell_type": "code",
142
+ "execution_count": null,
143
+ "metadata": {
144
+ "id": "KNVpW6lADk92"
145
+ },
146
+ "outputs": [],
147
+ "source": [
148
+ "idx = 600\n",
149
+ "\n",
150
+ "# start index\n",
151
+ "start_index = tydiqa_data['train'][idx]['annotations']['minimal_answers_start_byte'][0]\n",
152
+ "\n",
153
+ "# end index\n",
154
+ "end_index = tydiqa_data['train'][idx]['annotations']['minimal_answers_end_byte'][0]\n",
155
+ "\n",
156
+ "print(f\"Question: {tydiqa_data['train'][idx]['question_text']}\")\n",
157
+ "print(f\"\\nContext (truncated): {tydiqa_data['train'][idx]['document_plaintext'][0:512]} ...\")\n",
158
+ "print(f\"\\nAnswer: {tydiqa_data['train'][idx]['document_plaintext'][start_index:end_index]}\")"
159
+ ]
160
+ },
161
+ {
162
+ "cell_type": "markdown",
163
+ "metadata": {
164
+ "id": "Z-lZgDTEYm74"
165
+ },
166
+ "source": [
167
+ "The question answering model predicts a start and endpoint in the context to extract as the answer. That's why this NLP task is known as extractive question answering.\n",
168
+ "\n",
169
+ "To train your model, you need to pass start and endpoints as labels. So, you need to implement a function that extracts the start and end positions from the dataset.\n",
170
+ "\n",
171
+ "The dataset contains unanswerable questions. For these, the start and end indices for the answer are equal to `-1`."
172
+ ]
173
+ },
174
+ {
175
+ "cell_type": "code",
176
+ "execution_count": null,
177
+ "metadata": {
178
+ "id": "Ty_QDcdKYw9a"
179
+ },
180
+ "outputs": [],
181
+ "source": [
182
+ "tydiqa_data['train'][0]['annotations']"
183
+ ]
184
+ },
185
+ {
186
+ "cell_type": "markdown",
187
+ "metadata": {
188
+ "id": "lHWcNMudcAuO"
189
+ },
190
+ "source": [
191
+ "Now, you have to flatten the dataset to work with an object with a table structure instead of a dictionary structure. This step facilitates the pre-processing steps."
192
+ ]
193
+ },
194
+ {
195
+ "cell_type": "code",
196
+ "execution_count": null,
197
+ "metadata": {
198
+ "id": "xDCAQQtoCs_r"
199
+ },
200
+ "outputs": [],
201
+ "source": [
202
+ "# Flattening the datasets\n",
203
+ "flattened_train_data = tydiqa_data['train'].flatten()\n",
204
+ "flattened_test_data = tydiqa_data['validation'].flatten()"
205
+ ]
206
+ },
207
+ {
208
+ "cell_type": "markdown",
209
+ "metadata": {
210
+ "id": "q5wUa5xED0fK"
211
+ },
212
+ "source": [
213
+ "Also, to make the training more straightforward and faster, we will extract a subset of the train and test datasets. For that purpose, we will use the Hugging Face Dataset object's method called `select()`. This method allows you to take some data points by their index. Here, you will select the first 3000 rows but you can play with the number of data points, however, consider that this will increase the training time."
214
+ ]
215
+ },
216
+ {
217
+ "cell_type": "code",
218
+ "execution_count": null,
219
+ "metadata": {
220
+ "id": "BkcIhpEnDHSJ"
221
+ },
222
+ "outputs": [],
223
+ "source": [
224
+ "# Selecting a subset of the train dataset\n",
225
+ "flattened_train_data = flattened_train_data.select(range(3000))\n",
226
+ "\n",
227
+ "# Selecting a subset of the test dataset\n",
228
+ "flattened_test_data = flattened_test_data.select(range(1000))"
229
+ ]
230
+ },
231
+ {
232
+ "cell_type": "markdown",
233
+ "metadata": {
234
+ "id": "fBXrmwXhc13M"
235
+ },
236
+ "source": [
237
+ "### Tokenizers\n",
238
+ "\n",
239
+ "Now, you will use the [tokenizer](https://huggingface.co/transformers/main_classes/tokenizer.html) object from Hugging Face. You can load a tokenizer using different methods. Here, you will retrieve it from the pipeline object you created in the previous Hugging Face lab. With this tokenizer, you can ensure that the tokens you get for the dataset will match the tokens used in the original DistilBERT implementation.\n",
240
+ "\n",
241
+ "When loading a tokenizer with any method, you must pass the model checkpoint that you want to fine-tune. Here, you are using the`'distilbert-base-cased-distilled-squad'` checkpoint.\n"
242
+ ]
243
+ },
244
+ {
245
+ "cell_type": "code",
246
+ "execution_count": null,
247
+ "metadata": {
248
+ "id": "LInV3b_HyAIF"
249
+ },
250
+ "outputs": [],
251
+ "source": [
252
+ "# Import the AutoTokenizer from the transformers library\n",
253
+ "tokenizer = AutoTokenizer.from_pretrained(\"distilbert-base-cased-distilled-squad\")\n",
254
+ "\n",
255
+ "# Define max length of sequences in the tokenizer\n",
256
+ "tokenizer.model_max_length = 512"
257
+ ]
258
+ },
259
+ {
260
+ "cell_type": "markdown",
261
+ "metadata": {
262
+ "id": "qz6YtVcOh3qP"
263
+ },
264
+ "source": [
265
+ "Given the characteristics of the dataset and the question-answering task, you will need to add some steps to pre-process the data after the tokenization:\n",
266
+ "\n",
267
+ "1. When there is no answer to a question given a context, you will use the `CLS` token, a unique token used to represent the start of the sequence.\n",
268
+ "\n",
269
+ "2. Tokenizers can split a given string into substrings, resulting in a subtoken for each substring, creating misalignment between the list of dataset tags and the labels generated by the tokenizer. Therefore, you will need to align the start and end indices with the tokens associated with the target answer word.\n",
270
+ "\n",
271
+ "3. Finally, a tokenizer can truncate a very long sequence. So, if the start/end position of an answer is `None`, you will assume that it was truncated and assign the maximum length of the tokenizer to those positions.\n",
272
+ "\n",
273
+ "Those three steps are done within the `process_samples` function defined below."
274
+ ]
275
+ },
276
+ {
277
+ "cell_type": "code",
278
+ "execution_count": null,
279
+ "metadata": {
280
+ "id": "3l-r4wI06LU7"
281
+ },
282
+ "outputs": [],
283
+ "source": [
284
+ "# Processing samples using the 3 steps described above\n",
285
+ "def process_samples(sample):\n",
286
+ " tokenized_data = tokenizer(sample['document_plaintext'], sample['question_text'], truncation=\"only_first\", padding=\"max_length\")\n",
287
+ "\n",
288
+ " input_ids = tokenized_data[\"input_ids\"]\n",
289
+ "\n",
290
+ " # We will label impossible answers with the index of the CLS token.\n",
291
+ " cls_index = input_ids.index(tokenizer.cls_token_id)\n",
292
+ "\n",
293
+ " # If no answers are given, set the cls_index as answer.\n",
294
+ " if sample[\"annotations.minimal_answers_start_byte\"][0] == -1:\n",
295
+ " start_position = cls_index\n",
296
+ " end_position = cls_index\n",
297
+ " else:\n",
298
+ " # Start/end character index of the answer in the text.\n",
299
+ " gold_text = sample[\"document_plaintext\"][sample['annotations.minimal_answers_start_byte'][0]:sample['annotations.minimal_answers_end_byte'][0]]\n",
300
+ " start_char = sample[\"annotations.minimal_answers_start_byte\"][0]\n",
301
+ " end_char = sample['annotations.minimal_answers_end_byte'][0] #start_char + len(gold_text)\n",
302
+ "\n",
303
+ " # sometimes answers are off by a character or two – fix this\n",
304
+ " if sample['document_plaintext'][start_char-1:end_char-1] == gold_text:\n",
305
+ " start_char = start_char - 1\n",
306
+ " end_char = end_char - 1 # When the gold label is off by one character\n",
307
+ " elif sample['document_plaintext'][start_char-2:end_char-2] == gold_text:\n",
308
+ " start_char = start_char - 2\n",
309
+ " end_char = end_char - 2 # When the gold label is off by two characters\n",
310
+ "\n",
311
+ " start_token = tokenized_data.char_to_token(start_char)\n",
312
+ " end_token = tokenized_data.char_to_token(end_char - 1)\n",
313
+ "\n",
314
+ " # if start position is None, the answer passage has been truncated\n",
315
+ " if start_token is None:\n",
316
+ " start_token = tokenizer.model_max_length\n",
317
+ " if end_token is None:\n",
318
+ " end_token = tokenizer.model_max_length\n",
319
+ "\n",
320
+ " start_position = start_token\n",
321
+ " end_position = end_token\n",
322
+ "\n",
323
+ " return {'input_ids': tokenized_data['input_ids'],\n",
324
+ " 'attention_mask': tokenized_data['attention_mask'],\n",
325
+ " 'start_positions': start_position,\n",
326
+ " 'end_positions': end_position}\n"
327
+ ]
328
+ },
329
+ {
330
+ "cell_type": "markdown",
331
+ "metadata": {
332
+ "id": "Q3LAsWSyk_Rm"
333
+ },
334
+ "source": [
335
+ "To apply the `process_samples` function defined above to the whole dataset, you can use the `map` method as follows:"
336
+ ]
337
+ },
338
+ {
339
+ "cell_type": "code",
340
+ "execution_count": null,
341
+ "metadata": {
342
+ "id": "rGbYd7QnFetG"
343
+ },
344
+ "outputs": [],
345
+ "source": [
346
+ "# Tokenizing and processing the flattened dataset\n",
347
+ "processed_train_data = flattened_train_data.map(process_samples)\n",
348
+ "processed_test_data = flattened_test_data.map(process_samples)"
349
+ ]
350
+ },
351
+ {
352
+ "cell_type": "markdown",
353
+ "metadata": {
354
+ "id": "wCpPhYKJluMA"
355
+ },
356
+ "source": [
357
+ "# Transformers\n",
358
+ "\n",
359
+ "The last component of Hugging Face that is useful for fine-tuning a transformer corresponds to the pre-trained models you can access in multiple ways.\n",
360
+ "\n",
361
+ "For this lab, you will use the same model from the question-answering pipeline that you loaded in the previous lab."
362
+ ]
363
+ },
364
+ {
365
+ "cell_type": "code",
366
+ "execution_count": null,
367
+ "metadata": {
368
+ "id": "jR3VqjNc1Vb3"
369
+ },
370
+ "outputs": [],
371
+ "source": [
372
+ "# Import the AutoModelForQuestionAnswering for the pre-trained model. You will only fine tune the head of the model\n",
373
+ "model = AutoModelForQuestionAnswering.from_pretrained(\"distilbert-base-cased-distilled-squad\")"
374
+ ]
375
+ },
376
+ {
377
+ "cell_type": "markdown",
378
+ "metadata": {
379
+ "id": "K29BYtnsm1yH"
380
+ },
381
+ "source": [
382
+ "Now, you can take the necessary columns from the datasets to train/test and return them as Pytorch Tensors."
383
+ ]
384
+ },
385
+ {
386
+ "cell_type": "code",
387
+ "execution_count": null,
388
+ "metadata": {
389
+ "id": "0X14G89noLfW"
390
+ },
391
+ "outputs": [],
392
+ "source": [
393
+ "columns_to_return = ['input_ids','attention_mask', 'start_positions', 'end_positions']\n",
394
+ "\n",
395
+ "processed_train_data.set_format(type='pt', columns=columns_to_return)\n",
396
+ "processed_test_data.set_format(type='pt', columns=columns_to_return)"
397
+ ]
398
+ },
399
+ {
400
+ "cell_type": "markdown",
401
+ "metadata": {
402
+ "id": "yjoUFWu_nLRq"
403
+ },
404
+ "source": [
405
+ "Here, we give you the F1 score as a metric to evaluate your model's performance. We will use this metric for simplicity, although it is based on the start and end values predicted by the model. If you want to dig deeper on other metrics that can be used for a question and answering task, you can also check [this colab notebook resource](https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/question_answering.ipynb) from the Hugging Face team."
406
+ ]
407
+ },
408
+ {
409
+ "cell_type": "code",
410
+ "execution_count": null,
411
+ "metadata": {
412
+ "id": "xcW2wPnirsJk"
413
+ },
414
+ "outputs": [],
415
+ "source": [
416
+ "def compute_f1_metrics(pred):\n",
417
+ " start_labels = pred.label_ids[0]\n",
418
+ " start_preds = pred.predictions[0].argmax(-1)\n",
419
+ " end_labels = pred.label_ids[1]\n",
420
+ " end_preds = pred.predictions[1].argmax(-1)\n",
421
+ "\n",
422
+ " f1_start = f1_score(start_labels, start_preds, average='macro')\n",
423
+ " f1_end = f1_score(end_labels, end_preds, average='macro')\n",
424
+ "\n",
425
+ " return {\n",
426
+ " 'f1_start': f1_start,\n",
427
+ " 'f1_end': f1_end,\n",
428
+ " }"
429
+ ]
430
+ },
431
+ {
432
+ "cell_type": "markdown",
433
+ "metadata": {
434
+ "id": "KuhASU4evpUu"
435
+ },
436
+ "source": [
437
+ "Now, you will use the Hugging Face [Trainer](https://huggingface.co/transformers/main_classes/trainer.html) to fine-tune your model."
438
+ ]
439
+ },
440
+ {
441
+ "cell_type": "code",
442
+ "execution_count": null,
443
+ "metadata": {
444
+ "colab": {
445
+ "background_save": true
446
+ },
447
+ "id": "nxyOwf5utXAt"
448
+ },
449
+ "outputs": [],
450
+ "source": [
451
+ "# Training hyperparameters\n",
452
+ "training_args = TrainingArguments(\n",
453
+ " output_dir='model_results', # output directory\n",
454
+ " overwrite_output_dir=True,\n",
455
+ " num_train_epochs=3, # total number of training epochs\n",
456
+ " per_device_train_batch_size=8, # batch size per device during training\n",
457
+ " per_device_eval_batch_size=8, # batch size for evaluation\n",
458
+ " warmup_steps=20, # number of warmup steps for learning rate scheduler\n",
459
+ " weight_decay=0.01, # strength of weight decay\n",
460
+ " logging_steps=50\n",
461
+ ")\n",
462
+ "\n",
463
+ "# Trainer object\n",
464
+ "trainer = Trainer(\n",
465
+ " model=model, # the instantiated 🤗 Transformers model to be trained\n",
466
+ " args=training_args, # training arguments, defined above\n",
467
+ " train_dataset=processed_train_data, # training dataset\n",
468
+ " eval_dataset=processed_test_data, # evaluation dataset\n",
469
+ " compute_metrics=compute_f1_metrics\n",
470
+ ")\n",
471
+ "\n",
472
+ "# Training loop\n",
473
+ "trainer.train()"
474
+ ]
475
+ },
476
+ {
477
+ "cell_type": "markdown",
478
+ "metadata": {
479
+ "id": "Ic_wNlBHCRMn"
480
+ },
481
+ "source": [
482
+ "And, in the next cell, you can evaluate the fine-tuned model's performance on the test set."
483
+ ]
484
+ },
485
+ {
486
+ "cell_type": "code",
487
+ "execution_count": null,
488
+ "metadata": {
489
+ "id": "92N11A076wRA"
490
+ },
491
+ "outputs": [],
492
+ "source": [
493
+ "trainer.evaluate(processed_test_data)"
494
+ ]
495
+ },
496
+ {
497
+ "cell_type": "markdown",
498
+ "metadata": {
499
+ "id": "_HubPkRbnzh_"
500
+ },
501
+ "source": [
502
+ "### Using your Fine-Tuned Model\n",
503
+ "\n",
504
+ "After training and evaluating your fine-tuned model, you can check its results for the same questions from the previous lab.\n",
505
+ "\n",
506
+ "For that, you will tell Pytorch to use your GPU or your CPU to run the model. Additionally, you will need to tokenize your input context and questions. Finally, you need to post-process the output results to transform them from tokens to human-readable strings using the `tokenizer`."
507
+ ]
508
+ },
509
+ {
510
+ "cell_type": "code",
511
+ "execution_count": null,
512
+ "metadata": {},
513
+ "outputs": [],
514
+ "source": [
515
+ "text = r\"\"\"\n",
516
+ "The Golden Age of Comic Books describes an era of American comic books from the\n",
517
+ "late 1930s to circa 1950. During this time, modern comic books were first published\n",
518
+ "and rapidly increased in popularity. The superhero archetype was created and many\n",
519
+ "well-known characters were introduced, including Superman, Batman, Captain Marvel\n",
520
+ "(later known as SHAZAM!), Captain America, and Wonder Woman.\n",
521
+ "Between 1939 and 1941 Detective Comics and its sister company, All-American Publications,\n",
522
+ "introduced popular superheroes such as Batman and Robin, Wonder Woman, the Flash,\n",
523
+ "Green Lantern, Doctor Fate, the Atom, Hawkman, Green Arrow and Aquaman.[7] Timely Comics,\n",
524
+ "the 1940s predecessor of Marvel Comics, had million-selling titles featuring the Human Torch,\n",
525
+ "the Sub-Mariner, and Captain America.[8]\n",
526
+ "As comic books grew in popularity, publishers began launching titles that expanded\n",
527
+ "into a variety of genres. Dell Comics' non-superhero characters (particularly the\n",
528
+ "licensed Walt Disney animated-character comics) outsold the superhero comics of the day.[12]\n",
529
+ "The publisher featured licensed movie and literary characters such as Mickey Mouse, Donald Duck,\n",
530
+ "Roy Rogers and Tarzan.[13] It was during this era that noted Donald Duck writer-artist\n",
531
+ "Carl Barks rose to prominence.[14] Additionally, MLJ's introduction of Archie Andrews\n",
532
+ "in Pep Comics #22 (December 1941) gave rise to teen humor comics,[15] with the Archie\n",
533
+ "Andrews character remaining in print well into the 21st century.[16]\n",
534
+ "At the same time in Canada, American comic books were prohibited importation under\n",
535
+ "the War Exchange Conservation Act[17] which restricted the importation of non-essential\n",
536
+ "goods. As a result, a domestic publishing industry flourished during the duration\n",
537
+ "of the war which were collectively informally called the Canadian Whites.\n",
538
+ "The educational comic book Dagwood Splits the Atom used characters from the comic\n",
539
+ "strip Blondie.[18] According to historian Michael A. Amundson, appealing comic-book\n",
540
+ "characters helped ease young readers' fear of nuclear war and neutralize anxiety\n",
541
+ "about the questions posed by atomic power.[19] It was during this period that long-running\n",
542
+ "humor comics debuted, including EC's Mad and Carl Barks' Uncle Scrooge in Dell's Four\n",
543
+ "Color Comics (both in 1952).[20][21]\n",
544
+ "\"\"\"\n",
545
+ "\n",
546
+ "questions = [\"What superheroes were introduced between 1939 and 1941 by Detective Comics and its sister company?\",\n",
547
+ " \"What comic book characters were created between 1939 and 1941?\",\n",
548
+ " \"What well-known characters were created between 1939 and 1941?\",\n",
549
+ " \"What well-known superheroes were introduced between 1939 and 1941 by Detective Comics?\"]\n",
550
+ "\n",
551
+ "for question in questions:\n",
552
+ " inputs = tokenizer.encode_plus(question, text, return_tensors=\"pt\")\n",
553
+ "\n",
554
+ " input_ids = inputs[\"input_ids\"].tolist()[0]\n",
555
+ " inputs.to(\"cuda\")\n",
556
+ "\n",
557
+ " text_tokens = tokenizer.convert_ids_to_tokens(input_ids)\n",
558
+ " answer_model = model(**inputs)\n",
559
+ " \n",
560
+ " start_logits = answer_model['start_logits'].cpu().detach().numpy()\n",
561
+ "\n",
562
+ " answer_start = np.argmax(start_logits) \n",
563
+ " \n",
564
+ " end_logits = answer_model['end_logits'].cpu().detach().numpy()\n",
565
+ " \n",
566
+ " # Get the most likely beginning of answer with the argmax of the score\n",
567
+ " answer_end = np.argmax(end_logits) + 1 # Get the most likely end of answer with the argmax of the score\n",
568
+ "\n",
569
+ " answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))\n",
570
+ "\n",
571
+ " print(f\"Question: {question}\")\n",
572
+ " print(f\"Answer: {answer}\\n\")\n"
573
+ ]
574
+ },
575
+ {
576
+ "cell_type": "markdown",
577
+ "metadata": {
578
+ "id": "_yTDQ6kn6pWS"
579
+ },
580
+ "source": [
581
+ "By fine-tuning the model for only 3 epochs you can already see an improvement!\n",
582
+ "\n",
583
+ "You can compare those results with those obtained using the base model (without fine-tuning), as you did in the previous lab. As a reminder, here are those results:\n",
584
+ "\n",
585
+ "```\n",
586
+ "What popular superheroes were introduced between 1939 and 1941?\n",
587
+ ">> teen humor comics\n",
588
+ "What superheroes were introduced between 1939 and 1941 by Detective Comics and its sister company?\n",
589
+ ">> Archie Andrews\n",
590
+ "What comic book characters were created between 1939 and 1941?\n",
591
+ ">> Archie\n",
592
+ "Andrews\n",
593
+ "What well-known characters were created between 1939 and 1941?\n",
594
+ ">> Archie\n",
595
+ "Andrews\n",
596
+ "What well-known superheroes were introduced between 1939 and 1941 by Detective Comics?\n",
597
+ ">> Archie Andrews\n",
598
+ "```"
599
+ ]
600
+ },
601
+ {
602
+ "cell_type": "markdown",
603
+ "metadata": {
604
+ "id": "uf-v8mUSLqXN"
605
+ },
606
+ "source": [
607
+ "**Congratulations!**\n",
608
+ "\n",
609
+ "You have finished this series of ungraded labs. You were able to:\n",
610
+ "\n",
611
+ "* Explore the Hugging Face Pipelines, which can be used right out of the bat.\n",
612
+ "\n",
613
+ "* Fine-tune a model for the Extractive Question & Answering task.\n",
614
+ "\n",
615
+ "We also recommend you go through the free [Hugging Face course](https://huggingface.co/course/chapter1) to explore their ecosystem in more detail and find different ways to use the `transformers` library."
616
+ ]
617
+ }
618
+ ],
619
+ "metadata": {
620
+ "accelerator": "GPU",
621
+ "colab": {
622
+ "provenance": []
623
+ },
624
+ "kernelspec": {
625
+ "display_name": "Python 3 (ipykernel)",
626
+ "language": "python",
627
+ "name": "python3"
628
+ },
629
+ "language_info": {
630
+ "codemirror_mode": {
631
+ "name": "ipython",
632
+ "version": 3
633
+ },
634
+ "file_extension": ".py",
635
+ "mimetype": "text/x-python",
636
+ "name": "python",
637
+ "nbconvert_exporter": "python",
638
+ "pygments_lexer": "ipython3",
639
+ "version": "3.8.10"
640
+ }
641
+ },
642
+ "nbformat": 4,
643
+ "nbformat_minor": 1
644
+ }
NLP with Attention Models/QA/QA_DistilBERT_pipline_FT/Files/tf/C4W3_HF_Lab1_QA_BERT.ipynb ADDED
@@ -0,0 +1,2110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {
6
+ "id": "u2UXutvEvpUj"
7
+ },
8
+ "source": [
9
+ "# Question Answering with BERT and HuggingFace\n",
10
+ "\n",
11
+ "You've seen how to use BERT and other transformer models for a wide range of natural language tasks, including machine translation, summarization, and question answering. Transformers have become the standard model for NLP, similar to convolutional models in computer vision. And all started with Attention!\n",
12
+ "\n",
13
+ "In practice, you'll rarely train a transformer model from scratch. Transformers tend to be very large, so they take time, money, and lots of data to train fully. Instead, you'll want to start with a pre-trained model and fine-tune it with your dataset if you need to.\n",
14
+ "\n",
15
+ "[Hugging Face](https://huggingface.co/) (🤗) is the best resource for pre-trained transformers. Their open-source libraries simplify downloading and using transformer models like BERT, T5, and GPT-2. And the best part, you can use them alongside either TensorFlow, PyTorch or Flax.\n",
16
+ "\n",
17
+ "In this notebook, you'll use 🤗 transformers to use the DistilBERT model for question answering."
18
+ ]
19
+ },
20
+ {
21
+ "cell_type": "markdown",
22
+ "metadata": {
23
+ "id": "tm675LmQvpUm"
24
+ },
25
+ "source": [
26
+ "## Pipelines\n",
27
+ "\n",
28
+ "Before fine-tuning a model, you will look at the pipelines from Hugging Face to use pre-trained transformer models for specific tasks. The `transformers` library provides pipelines for popular tasks like sentiment analysis, summarization, and text generation. A pipeline consists of a tokenizer, a model, and the model configuration. All these are packaged together into an easy-to-use object. Hugging Face makes life easier.\n",
29
+ "\n",
30
+ "Pipelines are intended to be used without fine-tuning and will often be immediately helpful in your projects. For example, `transformers` provides a pipeline for [question answering](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.QuestionAnsweringPipeline) that you can directly use to answer your questions if you give some context. Let's see how to do just that.\n",
31
+ "\n",
32
+ "You will import `pipeline` from `transformers` for creating pipelines."
33
+ ]
34
+ },
35
+ {
36
+ "cell_type": "code",
37
+ "execution_count": 13,
38
+ "metadata": {
39
+ "id": "uNJGGbRWvpUm"
40
+ },
41
+ "outputs": [],
42
+ "source": [
43
+ "import os\n",
44
+ "os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'\n",
45
+ "\n",
46
+ "from transformers import pipeline"
47
+ ]
48
+ },
49
+ {
50
+ "cell_type": "markdown",
51
+ "metadata": {
52
+ "id": "_CeFTIr7P3QR"
53
+ },
54
+ "source": [
55
+ "Now, you will create the pipeline for question-answering, which uses the [DistilBert](https://hf.co/distilbert-base-cased-distilled-squad) model for extractive question answering (i.e., answering questions with the exact wording provided in the context)."
56
+ ]
57
+ },
58
+ {
59
+ "cell_type": "code",
60
+ "execution_count": 14,
61
+ "metadata": {
62
+ "colab": {
63
+ "base_uri": "https://localhost:8080/",
64
+ "height": 177,
65
+ "referenced_widgets": [
66
+ "d7e158e614f44983b229d6dd0d8960f9",
67
+ "69aad11dbc914410b95f6c3cb17a2457",
68
+ "0302e718c6084fb0a96d92fd976738dc",
69
+ "72b47b116b0b4125a35d47e060f46807",
70
+ "6f5fbb8f0f5a4374a7bde870c64f1fa4",
71
+ "331c23df507e4d679e3aaf81af39cd22",
72
+ "e33617a01b03437986c143c9a69ba14f",
73
+ "b1271eb1b7e74250bd9273f229b49cd8",
74
+ "6b20c9e39d36404fb761b4d83954a278",
75
+ "322cc6ef697945ccbbc2b3029dfdf0e3",
76
+ "f9456ff5134242bc9541d9d60c753384",
77
+ "5b917388b6624637ad8d8f60516d4001",
78
+ "13e05f2f64a54245a2478393b1f6b409",
79
+ "3c63301478f54f95ba0f0f8c853a7266",
80
+ "bb455638d3ac451096fd7cce4cb0d82c",
81
+ "c4a6b418089147f6b50eab097ace0342",
82
+ "9f4a770ae6b84593ac7de85e15c305a9",
83
+ "65d019ca643045a2b0933411d059c920",
84
+ "794c088a92ba4f6798fa94cded51d0bd",
85
+ "deb9a0d8d3e1430bacab12c8b4ce7573",
86
+ "c53ce49d9b7c4a3c87ad7f7c75dce1f5",
87
+ "b099d3bb966a4e9b8d02710329030ff3",
88
+ "6d42d468b1d04c4a94fb2eed75e3c238",
89
+ "a371ed9c75184b78a80facda31086426",
90
+ "c363fb4238e0464cb3a4ab16250e554a",
91
+ "86dce8a2c404469dab0dc1a466788c0a",
92
+ "f6a334e9b5da4c82a1c4f1fbc1fe3c7e",
93
+ "e9f1a9476de147c3bc98c1c36960dad6",
94
+ "2f8a7ddfcee64b978ff23f8f43911e01",
95
+ "099a1ac5af9b4e38abb6afb9c333d37a",
96
+ "340a7a5171ba4b8aa9b70e945df1618c",
97
+ "b0f31194b8f24b5ab601f8edd5332e04",
98
+ "cbeb9e9dfdf6420d91ec1126fedd8e48",
99
+ "da1553ec3e044a4fb7bb9b0c2a84bfe0",
100
+ "b440f3b3937549d69a4f188bb8415531",
101
+ "1b937b91e8ee46c4a764a8091f365291",
102
+ "1369029047864cc68100a44ffaad35ca",
103
+ "80cb9869ae694ecda95aab298598c7a2",
104
+ "2aaa61bf4df248be970940e103af0276",
105
+ "3a7f8b6302034a0e889976ba0fbb4531",
106
+ "8cc02da6124448598923899b144afd6d",
107
+ "7361467d667d49ee85c432eec884d882",
108
+ "39c0c457e3bd46c1971ae2913fd66429",
109
+ "2985df1d86904019a8fdf69a356ade6d",
110
+ "6c291edce30b4cefb329113d5ecbe640",
111
+ "1249849d670f4824a0a21aa61d187b56",
112
+ "0e7b7a12422d49d4a33538e638bfc1c9",
113
+ "ec90909f38e0448bab37c735ea9b9ebe",
114
+ "f4274a98f06945a6a1e4a56b680c1790",
115
+ "c04562a21d36405890c11e74915839c7",
116
+ "e6b91c2208e44524a9883309ae431277",
117
+ "9adf994d114141f98aeea509a73e9c59",
118
+ "520e9bf4c0164e60a2c3288bd97ef93e",
119
+ "dd420d29751341faa84c025afa743bb5",
120
+ "4d56ed704097453a930baa9ecdfb1156"
121
+ ]
122
+ },
123
+ "id": "nKy4AAhLvpUo",
124
+ "outputId": "0419ab21-4237-4ad2-b076-9ed83377ed34"
125
+ },
126
+ "outputs": [],
127
+ "source": [
128
+ "# The task \"question-answering\" will return a QuestionAnsweringPipeline object\n",
129
+ "question_answerer = pipeline(task=\"question-answering\", model=\"distilbert-base-cased-distilled-squad\")"
130
+ ]
131
+ },
132
+ {
133
+ "cell_type": "markdown",
134
+ "metadata": {
135
+ "id": "4ltQLVWgvpUo"
136
+ },
137
+ "source": [
138
+ "Notice that this environment already has the model stored in the directory `distilbert-base-cased-distilled-squad`. However if you were to run that exact code on your local computer, Huggingface will download the model for you, which is a great feature!\n",
139
+ "\n",
140
+ "\n",
141
+ "After running the last cell, you have a pipeline for performing question answering given a context string. The pipeline `question_answerer` you just created needs you to pass the question and context as strings. It returns an answer to the question from the context you provided. For example, here are the first few paragraphs from the [Wikipedia entry for tea](https://en.wikipedia.org/wiki/Tea) that you will use as the context.\n",
142
+ "\n",
143
+ "\n"
144
+ ]
145
+ },
146
+ {
147
+ "cell_type": "code",
148
+ "execution_count": 15,
149
+ "metadata": {
150
+ "id": "D_-MzZNJvpUp"
151
+ },
152
+ "outputs": [],
153
+ "source": [
154
+ "context = \"\"\"\n",
155
+ "Tea is an aromatic beverage prepared by pouring hot or boiling water over cured or fresh leaves of Camellia sinensis,\n",
156
+ "an evergreen shrub native to China and East Asia. After water, it is the most widely consumed drink in the world.\n",
157
+ "There are many different types of tea; some, like Chinese greens and Darjeeling, have a cooling, slightly bitter,\n",
158
+ "and astringent flavour, while others have vastly different profiles that include sweet, nutty, floral, or grassy\n",
159
+ "notes. Tea has a stimulating effect in humans primarily due to its caffeine content.\n",
160
+ "\n",
161
+ "The tea plant originated in the region encompassing today's Southwest China, Tibet, north Myanmar and Northeast India,\n",
162
+ "where it was used as a medicinal drink by various ethnic groups. An early credible record of tea drinking dates to\n",
163
+ "the 3rd century AD, in a medical text written by Hua Tuo. It was popularised as a recreational drink during the\n",
164
+ "Chinese Tang dynasty, and tea drinking spread to other East Asian countries. Portuguese priests and merchants\n",
165
+ "introduced it to Europe during the 16th century. During the 17th century, drinking tea became fashionable among the\n",
166
+ "English, who started to plant tea on a large scale in India.\n",
167
+ "\n",
168
+ "The term herbal tea refers to drinks not made from Camellia sinensis: infusions of fruit, leaves, or other plant\n",
169
+ "parts, such as steeps of rosehip, chamomile, or rooibos. These may be called tisanes or herbal infusions to prevent\n",
170
+ "confusion with 'tea' made from the tea plant.\n",
171
+ "\"\"\""
172
+ ]
173
+ },
174
+ {
175
+ "cell_type": "markdown",
176
+ "metadata": {
177
+ "id": "HyR3o2mrvpUq"
178
+ },
179
+ "source": [
180
+ "Now, you can ask your model anything related to that passage. For instance, \"Where is tea native to?\"."
181
+ ]
182
+ },
183
+ {
184
+ "cell_type": "code",
185
+ "execution_count": 16,
186
+ "metadata": {
187
+ "colab": {
188
+ "base_uri": "https://localhost:8080/"
189
+ },
190
+ "id": "eiRohAWWvpUq",
191
+ "outputId": "a1ddfca3-3723-4d43-cbda-0509337b60d6",
192
+ "scrolled": true
193
+ },
194
+ "outputs": [],
195
+ "source": [
196
+ "result = question_answerer(question=\"Where is tea native to?\", context=context)\n",
197
+ "\n",
198
+ "print(result['answer'])"
199
+ ]
200
+ },
201
+ {
202
+ "cell_type": "markdown",
203
+ "metadata": {
204
+ "id": "cRXzFlZ5vpUr"
205
+ },
206
+ "source": [
207
+ "You can also pass multiple questions to your pipeline within a list so that you can ask:\n",
208
+ "\n",
209
+ "* \"Where is tea native to?\"\n",
210
+ "* \"When was tea discovered?\"\n",
211
+ "* \"What is the species name for tea?\"\n",
212
+ "\n",
213
+ "at the same time, and your `question-answerer` will return all the answers."
214
+ ]
215
+ },
216
+ {
217
+ "cell_type": "code",
218
+ "execution_count": 17,
219
+ "metadata": {
220
+ "colab": {
221
+ "base_uri": "https://localhost:8080/"
222
+ },
223
+ "id": "IMLyXeMZvpUr",
224
+ "outputId": "ac9badb1-083d-4234-9474-f112c1f2f20f"
225
+ },
226
+ "outputs": [],
227
+ "source": [
228
+ "questions = [\"Where is tea native to?\",\n",
229
+ " \"When was tea discovered?\",\n",
230
+ " \"What is the species name for tea?\"]\n",
231
+ "\n",
232
+ "results = question_answerer(question=questions, context=context)\n",
233
+ "\n",
234
+ "for q, r in zip(questions, results):\n",
235
+ " print(f\"{q} \\n>> {r['answer']}\")"
236
+ ]
237
+ },
238
+ {
239
+ "cell_type": "markdown",
240
+ "metadata": {
241
+ "id": "XXf18tVu8p70"
242
+ },
243
+ "source": [
244
+ "Although the models used in the Hugging Face pipelines generally give outstanding results, sometimes you will have particular examples where they don't perform so well. Let's use the following example with a context string about the Golden Age of Comic Books:"
245
+ ]
246
+ },
247
+ {
248
+ "cell_type": "code",
249
+ "execution_count": 19,
250
+ "metadata": {
251
+ "id": "0v9C0TAqwinw"
252
+ },
253
+ "outputs": [],
254
+ "source": [
255
+ "context = \"\"\"\n",
256
+ "The Golden Age of Comic Books describes an era of American comic books from the\n",
257
+ "late 1930s to circa 1950. During this time, modern comic books were first published\n",
258
+ "and rapidly increased in popularity. The superhero archetype was created and many\n",
259
+ "well-known characters were introduced, including Superman, Batman, Captain Marvel\n",
260
+ "(later known as SHAZAM!), Captain America, and Wonder Woman.\n",
261
+ "Between 1939 and 1941 Detective Comics and its sister company, All-American Publications,\n",
262
+ "introduced popular superheroes such as Batman and Robin, Wonder Woman, the Flash,\n",
263
+ "Green Lantern, Doctor Fate, the Atom, Hawkman, Green Arrow and Aquaman.[7] Timely Comics,\n",
264
+ "the 1940s predecessor of Marvel Comics, had million-selling titles featuring the Human Torch,\n",
265
+ "the Sub-Mariner, and Captain America.[8]\n",
266
+ "As comic books grew in popularity, publishers began launching titles that expanded\n",
267
+ "into a variety of genres. Dell Comics' non-superhero characters (particularly the\n",
268
+ "licensed Walt Disney animated-character comics) outsold the superhero comics of the day.[12]\n",
269
+ "The publisher featured licensed movie and literary characters such as Mickey Mouse, Donald Duck,\n",
270
+ "Roy Rogers and Tarzan.[13] It was during this era that noted Donald Duck writer-artist\n",
271
+ "Carl Barks rose to prominence.[14] Additionally, MLJ's introduction of Archie Andrews\n",
272
+ "in Pep Comics #22 (December 1941) gave rise to teen humor comics,[15] with the Archie\n",
273
+ "Andrews character remaining in print well into the 21st century.[16]\n",
274
+ "At the same time in Canada, American comic books were prohibited importation under\n",
275
+ "the War Exchange Conservation Act[17] which restricted the importation of non-essential\n",
276
+ "goods. As a result, a domestic publishing industry flourished during the duration\n",
277
+ "of the war which were collectively informally called the Canadian Whites.\n",
278
+ "The educational comic book Dagwood Splits the Atom used characters from the comic\n",
279
+ "strip Blondie.[18] According to historian Michael A. Amundson, appealing comic-book\n",
280
+ "characters helped ease young readers' fear of nuclear war and neutralize anxiety\n",
281
+ "about the questions posed by atomic power.[19] It was during this period that long-running\n",
282
+ "humor comics debuted, including EC's Mad and Carl Barks' Uncle Scrooge in Dell's Four\n",
283
+ "Color Comics (both in 1952).[20][21]\n",
284
+ "\"\"\""
285
+ ]
286
+ },
287
+ {
288
+ "cell_type": "markdown",
289
+ "metadata": {
290
+ "id": "fYbERLKQbhyH"
291
+ },
292
+ "source": [
293
+ "Let's ask the following question: \"What popular superheroes were introduced between 1939 and 1941?\" The answer is in the fourth paragraph of the context string."
294
+ ]
295
+ },
296
+ {
297
+ "cell_type": "code",
298
+ "execution_count": 20,
299
+ "metadata": {
300
+ "colab": {
301
+ "base_uri": "https://localhost:8080/"
302
+ },
303
+ "id": "SEmAbSSGbg0J",
304
+ "outputId": "35b5e3c4-2fd2-4f37-b674-014681ece042"
305
+ },
306
+ "outputs": [],
307
+ "source": [
308
+ "question = \"What popular superheroes were introduced between 1939 and 1941?\"\n",
309
+ "\n",
310
+ "result = question_answerer(question=question, context=context)\n",
311
+ "print(result['answer'])"
312
+ ]
313
+ },
314
+ {
315
+ "cell_type": "markdown",
316
+ "metadata": {
317
+ "id": "LGx_BHkN-ejY"
318
+ },
319
+ "source": [
320
+ "Here, the answer should be:\n",
321
+ "\"Batman and Robin, Wonder Woman, the Flash,\n",
322
+ "Green Lantern, Doctor Fate, the Atom, Hawkman, Green Arrow, and Aquaman\". Instead, the pipeline returned a different answer. You can even try different question wordings:\n",
323
+ "\n",
324
+ "* \"What superheroes were introduced between 1939 and 1941?\"\n",
325
+ "* \"What comic book characters were created between 1939 and 1941?\"\n",
326
+ "* \"What well-known characters were created between 1939 and 1941?\"\n",
327
+ "* \"What well-known superheroes were introduced between 1939 and 1941 by Detective Comics?\"\n",
328
+ "\n",
329
+ "and you will only get incorrect answers."
330
+ ]
331
+ },
332
+ {
333
+ "cell_type": "code",
334
+ "execution_count": 21,
335
+ "metadata": {
336
+ "colab": {
337
+ "base_uri": "https://localhost:8080/"
338
+ },
339
+ "id": "f91kLn9VcRzK",
340
+ "outputId": "bb3942b6-321a-4466-ac18-9f173b115600"
341
+ },
342
+ "outputs": [],
343
+ "source": [
344
+ "questions = [\"What popular superheroes were introduced between 1939 and 1941?\",\n",
345
+ " \"What superheroes were introduced between 1939 and 1941 by Detective Comics and its sister company?\",\n",
346
+ " \"What comic book characters were created between 1939 and 1941?\",\n",
347
+ " \"What well-known characters were created between 1939 and 1941?\",\n",
348
+ " \"What well-known superheroes were introduced between 1939 and 1941 by Detective Comics?\"]\n",
349
+ "\n",
350
+ "results = question_answerer(question=questions, context=context)\n",
351
+ "\n",
352
+ "for q, r in zip(questions, results):\n",
353
+ " print(f\"{q} \\n>> {r['answer']}\")"
354
+ ]
355
+ },
356
+ {
357
+ "cell_type": "markdown",
358
+ "metadata": {
359
+ "id": "QCkLhf27cEsH"
360
+ },
361
+ "source": [
362
+ "It seems like this model is a **huge fan** of Archie Andrews. It even considers him a superhero!\n",
363
+ "\n",
364
+ "The example that fooled your `question_answerer` belongs to the [TyDi QA dataset](https://ai.google.com/research/tydiqa), a dataset from Google for question/answering in diverse languages. To achieve better results when you know that the pipeline isn't working as it should, you need to consider fine-tuning your model.\n",
365
+ "\n",
366
+ "In the next ungraded lab, you will get the chance to fine-tune the DistilBert model using the TyDi QA dataset.\n",
367
+ "\n"
368
+ ]
369
+ }
370
+ ],
371
+ "metadata": {
372
+ "accelerator": "GPU",
373
+ "colab": {
374
+ "provenance": []
375
+ },
376
+ "kernelspec": {
377
+ "display_name": "Python 3 (ipykernel)",
378
+ "language": "python",
379
+ "name": "python3"
380
+ },
381
+ "language_info": {
382
+ "codemirror_mode": {
383
+ "name": "ipython",
384
+ "version": 3
385
+ },
386
+ "file_extension": ".py",
387
+ "mimetype": "text/x-python",
388
+ "name": "python",
389
+ "nbconvert_exporter": "python",
390
+ "pygments_lexer": "ipython3",
391
+ "version": "3.8.10"
392
+ },
393
+ "widgets": {
394
+ "application/vnd.jupyter.widget-state+json": {
395
+ "0302e718c6084fb0a96d92fd976738dc": {
396
+ "model_module": "@jupyter-widgets/controls",
397
+ "model_module_version": "1.5.0",
398
+ "model_name": "FloatProgressModel",
399
+ "state": {
400
+ "_dom_classes": [],
401
+ "_model_module": "@jupyter-widgets/controls",
402
+ "_model_module_version": "1.5.0",
403
+ "_model_name": "FloatProgressModel",
404
+ "_view_count": null,
405
+ "_view_module": "@jupyter-widgets/controls",
406
+ "_view_module_version": "1.5.0",
407
+ "_view_name": "ProgressView",
408
+ "bar_style": "success",
409
+ "description": "",
410
+ "description_tooltip": null,
411
+ "layout": "IPY_MODEL_b1271eb1b7e74250bd9273f229b49cd8",
412
+ "max": 473,
413
+ "min": 0,
414
+ "orientation": "horizontal",
415
+ "style": "IPY_MODEL_6b20c9e39d36404fb761b4d83954a278",
416
+ "value": 473
417
+ }
418
+ },
419
+ "099a1ac5af9b4e38abb6afb9c333d37a": {
420
+ "model_module": "@jupyter-widgets/base",
421
+ "model_module_version": "1.2.0",
422
+ "model_name": "LayoutModel",
423
+ "state": {
424
+ "_model_module": "@jupyter-widgets/base",
425
+ "_model_module_version": "1.2.0",
426
+ "_model_name": "LayoutModel",
427
+ "_view_count": null,
428
+ "_view_module": "@jupyter-widgets/base",
429
+ "_view_module_version": "1.2.0",
430
+ "_view_name": "LayoutView",
431
+ "align_content": null,
432
+ "align_items": null,
433
+ "align_self": null,
434
+ "border": null,
435
+ "bottom": null,
436
+ "display": null,
437
+ "flex": null,
438
+ "flex_flow": null,
439
+ "grid_area": null,
440
+ "grid_auto_columns": null,
441
+ "grid_auto_flow": null,
442
+ "grid_auto_rows": null,
443
+ "grid_column": null,
444
+ "grid_gap": null,
445
+ "grid_row": null,
446
+ "grid_template_areas": null,
447
+ "grid_template_columns": null,
448
+ "grid_template_rows": null,
449
+ "height": null,
450
+ "justify_content": null,
451
+ "justify_items": null,
452
+ "left": null,
453
+ "margin": null,
454
+ "max_height": null,
455
+ "max_width": null,
456
+ "min_height": null,
457
+ "min_width": null,
458
+ "object_fit": null,
459
+ "object_position": null,
460
+ "order": null,
461
+ "overflow": null,
462
+ "overflow_x": null,
463
+ "overflow_y": null,
464
+ "padding": null,
465
+ "right": null,
466
+ "top": null,
467
+ "visibility": null,
468
+ "width": null
469
+ }
470
+ },
471
+ "0e7b7a12422d49d4a33538e638bfc1c9": {
472
+ "model_module": "@jupyter-widgets/controls",
473
+ "model_module_version": "1.5.0",
474
+ "model_name": "FloatProgressModel",
475
+ "state": {
476
+ "_dom_classes": [],
477
+ "_model_module": "@jupyter-widgets/controls",
478
+ "_model_module_version": "1.5.0",
479
+ "_model_name": "FloatProgressModel",
480
+ "_view_count": null,
481
+ "_view_module": "@jupyter-widgets/controls",
482
+ "_view_module_version": "1.5.0",
483
+ "_view_name": "ProgressView",
484
+ "bar_style": "success",
485
+ "description": "",
486
+ "description_tooltip": null,
487
+ "layout": "IPY_MODEL_9adf994d114141f98aeea509a73e9c59",
488
+ "max": 435797,
489
+ "min": 0,
490
+ "orientation": "horizontal",
491
+ "style": "IPY_MODEL_520e9bf4c0164e60a2c3288bd97ef93e",
492
+ "value": 435797
493
+ }
494
+ },
495
+ "1249849d670f4824a0a21aa61d187b56": {
496
+ "model_module": "@jupyter-widgets/controls",
497
+ "model_module_version": "1.5.0",
498
+ "model_name": "HTMLModel",
499
+ "state": {
500
+ "_dom_classes": [],
501
+ "_model_module": "@jupyter-widgets/controls",
502
+ "_model_module_version": "1.5.0",
503
+ "_model_name": "HTMLModel",
504
+ "_view_count": null,
505
+ "_view_module": "@jupyter-widgets/controls",
506
+ "_view_module_version": "1.5.0",
507
+ "_view_name": "HTMLView",
508
+ "description": "",
509
+ "description_tooltip": null,
510
+ "layout": "IPY_MODEL_c04562a21d36405890c11e74915839c7",
511
+ "placeholder": "​",
512
+ "style": "IPY_MODEL_e6b91c2208e44524a9883309ae431277",
513
+ "value": "Downloading: 100%"
514
+ }
515
+ },
516
+ "1369029047864cc68100a44ffaad35ca": {
517
+ "model_module": "@jupyter-widgets/controls",
518
+ "model_module_version": "1.5.0",
519
+ "model_name": "HTMLModel",
520
+ "state": {
521
+ "_dom_classes": [],
522
+ "_model_module": "@jupyter-widgets/controls",
523
+ "_model_module_version": "1.5.0",
524
+ "_model_name": "HTMLModel",
525
+ "_view_count": null,
526
+ "_view_module": "@jupyter-widgets/controls",
527
+ "_view_module_version": "1.5.0",
528
+ "_view_name": "HTMLView",
529
+ "description": "",
530
+ "description_tooltip": null,
531
+ "layout": "IPY_MODEL_39c0c457e3bd46c1971ae2913fd66429",
532
+ "placeholder": "​",
533
+ "style": "IPY_MODEL_2985df1d86904019a8fdf69a356ade6d",
534
+ "value": " 213k/213k [00:00&lt;00:00, 170kB/s]"
535
+ }
536
+ },
537
+ "13e05f2f64a54245a2478393b1f6b409": {
538
+ "model_module": "@jupyter-widgets/controls",
539
+ "model_module_version": "1.5.0",
540
+ "model_name": "HTMLModel",
541
+ "state": {
542
+ "_dom_classes": [],
543
+ "_model_module": "@jupyter-widgets/controls",
544
+ "_model_module_version": "1.5.0",
545
+ "_model_name": "HTMLModel",
546
+ "_view_count": null,
547
+ "_view_module": "@jupyter-widgets/controls",
548
+ "_view_module_version": "1.5.0",
549
+ "_view_name": "HTMLView",
550
+ "description": "",
551
+ "description_tooltip": null,
552
+ "layout": "IPY_MODEL_9f4a770ae6b84593ac7de85e15c305a9",
553
+ "placeholder": "​",
554
+ "style": "IPY_MODEL_65d019ca643045a2b0933411d059c920",
555
+ "value": "Downloading: 100%"
556
+ }
557
+ },
558
+ "1b937b91e8ee46c4a764a8091f365291": {
559
+ "model_module": "@jupyter-widgets/controls",
560
+ "model_module_version": "1.5.0",
561
+ "model_name": "FloatProgressModel",
562
+ "state": {
563
+ "_dom_classes": [],
564
+ "_model_module": "@jupyter-widgets/controls",
565
+ "_model_module_version": "1.5.0",
566
+ "_model_name": "FloatProgressModel",
567
+ "_view_count": null,
568
+ "_view_module": "@jupyter-widgets/controls",
569
+ "_view_module_version": "1.5.0",
570
+ "_view_name": "ProgressView",
571
+ "bar_style": "success",
572
+ "description": "",
573
+ "description_tooltip": null,
574
+ "layout": "IPY_MODEL_8cc02da6124448598923899b144afd6d",
575
+ "max": 213450,
576
+ "min": 0,
577
+ "orientation": "horizontal",
578
+ "style": "IPY_MODEL_7361467d667d49ee85c432eec884d882",
579
+ "value": 213450
580
+ }
581
+ },
582
+ "2985df1d86904019a8fdf69a356ade6d": {
583
+ "model_module": "@jupyter-widgets/controls",
584
+ "model_module_version": "1.5.0",
585
+ "model_name": "DescriptionStyleModel",
586
+ "state": {
587
+ "_model_module": "@jupyter-widgets/controls",
588
+ "_model_module_version": "1.5.0",
589
+ "_model_name": "DescriptionStyleModel",
590
+ "_view_count": null,
591
+ "_view_module": "@jupyter-widgets/base",
592
+ "_view_module_version": "1.2.0",
593
+ "_view_name": "StyleView",
594
+ "description_width": ""
595
+ }
596
+ },
597
+ "2aaa61bf4df248be970940e103af0276": {
598
+ "model_module": "@jupyter-widgets/base",
599
+ "model_module_version": "1.2.0",
600
+ "model_name": "LayoutModel",
601
+ "state": {
602
+ "_model_module": "@jupyter-widgets/base",
603
+ "_model_module_version": "1.2.0",
604
+ "_model_name": "LayoutModel",
605
+ "_view_count": null,
606
+ "_view_module": "@jupyter-widgets/base",
607
+ "_view_module_version": "1.2.0",
608
+ "_view_name": "LayoutView",
609
+ "align_content": null,
610
+ "align_items": null,
611
+ "align_self": null,
612
+ "border": null,
613
+ "bottom": null,
614
+ "display": null,
615
+ "flex": null,
616
+ "flex_flow": null,
617
+ "grid_area": null,
618
+ "grid_auto_columns": null,
619
+ "grid_auto_flow": null,
620
+ "grid_auto_rows": null,
621
+ "grid_column": null,
622
+ "grid_gap": null,
623
+ "grid_row": null,
624
+ "grid_template_areas": null,
625
+ "grid_template_columns": null,
626
+ "grid_template_rows": null,
627
+ "height": null,
628
+ "justify_content": null,
629
+ "justify_items": null,
630
+ "left": null,
631
+ "margin": null,
632
+ "max_height": null,
633
+ "max_width": null,
634
+ "min_height": null,
635
+ "min_width": null,
636
+ "object_fit": null,
637
+ "object_position": null,
638
+ "order": null,
639
+ "overflow": null,
640
+ "overflow_x": null,
641
+ "overflow_y": null,
642
+ "padding": null,
643
+ "right": null,
644
+ "top": null,
645
+ "visibility": null,
646
+ "width": null
647
+ }
648
+ },
649
+ "2f8a7ddfcee64b978ff23f8f43911e01": {
650
+ "model_module": "@jupyter-widgets/controls",
651
+ "model_module_version": "1.5.0",
652
+ "model_name": "DescriptionStyleModel",
653
+ "state": {
654
+ "_model_module": "@jupyter-widgets/controls",
655
+ "_model_module_version": "1.5.0",
656
+ "_model_name": "DescriptionStyleModel",
657
+ "_view_count": null,
658
+ "_view_module": "@jupyter-widgets/base",
659
+ "_view_module_version": "1.2.0",
660
+ "_view_name": "StyleView",
661
+ "description_width": ""
662
+ }
663
+ },
664
+ "322cc6ef697945ccbbc2b3029dfdf0e3": {
665
+ "model_module": "@jupyter-widgets/base",
666
+ "model_module_version": "1.2.0",
667
+ "model_name": "LayoutModel",
668
+ "state": {
669
+ "_model_module": "@jupyter-widgets/base",
670
+ "_model_module_version": "1.2.0",
671
+ "_model_name": "LayoutModel",
672
+ "_view_count": null,
673
+ "_view_module": "@jupyter-widgets/base",
674
+ "_view_module_version": "1.2.0",
675
+ "_view_name": "LayoutView",
676
+ "align_content": null,
677
+ "align_items": null,
678
+ "align_self": null,
679
+ "border": null,
680
+ "bottom": null,
681
+ "display": null,
682
+ "flex": null,
683
+ "flex_flow": null,
684
+ "grid_area": null,
685
+ "grid_auto_columns": null,
686
+ "grid_auto_flow": null,
687
+ "grid_auto_rows": null,
688
+ "grid_column": null,
689
+ "grid_gap": null,
690
+ "grid_row": null,
691
+ "grid_template_areas": null,
692
+ "grid_template_columns": null,
693
+ "grid_template_rows": null,
694
+ "height": null,
695
+ "justify_content": null,
696
+ "justify_items": null,
697
+ "left": null,
698
+ "margin": null,
699
+ "max_height": null,
700
+ "max_width": null,
701
+ "min_height": null,
702
+ "min_width": null,
703
+ "object_fit": null,
704
+ "object_position": null,
705
+ "order": null,
706
+ "overflow": null,
707
+ "overflow_x": null,
708
+ "overflow_y": null,
709
+ "padding": null,
710
+ "right": null,
711
+ "top": null,
712
+ "visibility": null,
713
+ "width": null
714
+ }
715
+ },
716
+ "331c23df507e4d679e3aaf81af39cd22": {
717
+ "model_module": "@jupyter-widgets/base",
718
+ "model_module_version": "1.2.0",
719
+ "model_name": "LayoutModel",
720
+ "state": {
721
+ "_model_module": "@jupyter-widgets/base",
722
+ "_model_module_version": "1.2.0",
723
+ "_model_name": "LayoutModel",
724
+ "_view_count": null,
725
+ "_view_module": "@jupyter-widgets/base",
726
+ "_view_module_version": "1.2.0",
727
+ "_view_name": "LayoutView",
728
+ "align_content": null,
729
+ "align_items": null,
730
+ "align_self": null,
731
+ "border": null,
732
+ "bottom": null,
733
+ "display": null,
734
+ "flex": null,
735
+ "flex_flow": null,
736
+ "grid_area": null,
737
+ "grid_auto_columns": null,
738
+ "grid_auto_flow": null,
739
+ "grid_auto_rows": null,
740
+ "grid_column": null,
741
+ "grid_gap": null,
742
+ "grid_row": null,
743
+ "grid_template_areas": null,
744
+ "grid_template_columns": null,
745
+ "grid_template_rows": null,
746
+ "height": null,
747
+ "justify_content": null,
748
+ "justify_items": null,
749
+ "left": null,
750
+ "margin": null,
751
+ "max_height": null,
752
+ "max_width": null,
753
+ "min_height": null,
754
+ "min_width": null,
755
+ "object_fit": null,
756
+ "object_position": null,
757
+ "order": null,
758
+ "overflow": null,
759
+ "overflow_x": null,
760
+ "overflow_y": null,
761
+ "padding": null,
762
+ "right": null,
763
+ "top": null,
764
+ "visibility": null,
765
+ "width": null
766
+ }
767
+ },
768
+ "340a7a5171ba4b8aa9b70e945df1618c": {
769
+ "model_module": "@jupyter-widgets/controls",
770
+ "model_module_version": "1.5.0",
771
+ "model_name": "ProgressStyleModel",
772
+ "state": {
773
+ "_model_module": "@jupyter-widgets/controls",
774
+ "_model_module_version": "1.5.0",
775
+ "_model_name": "ProgressStyleModel",
776
+ "_view_count": null,
777
+ "_view_module": "@jupyter-widgets/base",
778
+ "_view_module_version": "1.2.0",
779
+ "_view_name": "StyleView",
780
+ "bar_color": null,
781
+ "description_width": ""
782
+ }
783
+ },
784
+ "39c0c457e3bd46c1971ae2913fd66429": {
785
+ "model_module": "@jupyter-widgets/base",
786
+ "model_module_version": "1.2.0",
787
+ "model_name": "LayoutModel",
788
+ "state": {
789
+ "_model_module": "@jupyter-widgets/base",
790
+ "_model_module_version": "1.2.0",
791
+ "_model_name": "LayoutModel",
792
+ "_view_count": null,
793
+ "_view_module": "@jupyter-widgets/base",
794
+ "_view_module_version": "1.2.0",
795
+ "_view_name": "LayoutView",
796
+ "align_content": null,
797
+ "align_items": null,
798
+ "align_self": null,
799
+ "border": null,
800
+ "bottom": null,
801
+ "display": null,
802
+ "flex": null,
803
+ "flex_flow": null,
804
+ "grid_area": null,
805
+ "grid_auto_columns": null,
806
+ "grid_auto_flow": null,
807
+ "grid_auto_rows": null,
808
+ "grid_column": null,
809
+ "grid_gap": null,
810
+ "grid_row": null,
811
+ "grid_template_areas": null,
812
+ "grid_template_columns": null,
813
+ "grid_template_rows": null,
814
+ "height": null,
815
+ "justify_content": null,
816
+ "justify_items": null,
817
+ "left": null,
818
+ "margin": null,
819
+ "max_height": null,
820
+ "max_width": null,
821
+ "min_height": null,
822
+ "min_width": null,
823
+ "object_fit": null,
824
+ "object_position": null,
825
+ "order": null,
826
+ "overflow": null,
827
+ "overflow_x": null,
828
+ "overflow_y": null,
829
+ "padding": null,
830
+ "right": null,
831
+ "top": null,
832
+ "visibility": null,
833
+ "width": null
834
+ }
835
+ },
836
+ "3a7f8b6302034a0e889976ba0fbb4531": {
837
+ "model_module": "@jupyter-widgets/controls",
838
+ "model_module_version": "1.5.0",
839
+ "model_name": "DescriptionStyleModel",
840
+ "state": {
841
+ "_model_module": "@jupyter-widgets/controls",
842
+ "_model_module_version": "1.5.0",
843
+ "_model_name": "DescriptionStyleModel",
844
+ "_view_count": null,
845
+ "_view_module": "@jupyter-widgets/base",
846
+ "_view_module_version": "1.2.0",
847
+ "_view_name": "StyleView",
848
+ "description_width": ""
849
+ }
850
+ },
851
+ "3c63301478f54f95ba0f0f8c853a7266": {
852
+ "model_module": "@jupyter-widgets/controls",
853
+ "model_module_version": "1.5.0",
854
+ "model_name": "FloatProgressModel",
855
+ "state": {
856
+ "_dom_classes": [],
857
+ "_model_module": "@jupyter-widgets/controls",
858
+ "_model_module_version": "1.5.0",
859
+ "_model_name": "FloatProgressModel",
860
+ "_view_count": null,
861
+ "_view_module": "@jupyter-widgets/controls",
862
+ "_view_module_version": "1.5.0",
863
+ "_view_name": "ProgressView",
864
+ "bar_style": "success",
865
+ "description": "",
866
+ "description_tooltip": null,
867
+ "layout": "IPY_MODEL_794c088a92ba4f6798fa94cded51d0bd",
868
+ "max": 260793700,
869
+ "min": 0,
870
+ "orientation": "horizontal",
871
+ "style": "IPY_MODEL_deb9a0d8d3e1430bacab12c8b4ce7573",
872
+ "value": 260793700
873
+ }
874
+ },
875
+ "4d56ed704097453a930baa9ecdfb1156": {
876
+ "model_module": "@jupyter-widgets/controls",
877
+ "model_module_version": "1.5.0",
878
+ "model_name": "DescriptionStyleModel",
879
+ "state": {
880
+ "_model_module": "@jupyter-widgets/controls",
881
+ "_model_module_version": "1.5.0",
882
+ "_model_name": "DescriptionStyleModel",
883
+ "_view_count": null,
884
+ "_view_module": "@jupyter-widgets/base",
885
+ "_view_module_version": "1.2.0",
886
+ "_view_name": "StyleView",
887
+ "description_width": ""
888
+ }
889
+ },
890
+ "520e9bf4c0164e60a2c3288bd97ef93e": {
891
+ "model_module": "@jupyter-widgets/controls",
892
+ "model_module_version": "1.5.0",
893
+ "model_name": "ProgressStyleModel",
894
+ "state": {
895
+ "_model_module": "@jupyter-widgets/controls",
896
+ "_model_module_version": "1.5.0",
897
+ "_model_name": "ProgressStyleModel",
898
+ "_view_count": null,
899
+ "_view_module": "@jupyter-widgets/base",
900
+ "_view_module_version": "1.2.0",
901
+ "_view_name": "StyleView",
902
+ "bar_color": null,
903
+ "description_width": ""
904
+ }
905
+ },
906
+ "5b917388b6624637ad8d8f60516d4001": {
907
+ "model_module": "@jupyter-widgets/controls",
908
+ "model_module_version": "1.5.0",
909
+ "model_name": "HBoxModel",
910
+ "state": {
911
+ "_dom_classes": [],
912
+ "_model_module": "@jupyter-widgets/controls",
913
+ "_model_module_version": "1.5.0",
914
+ "_model_name": "HBoxModel",
915
+ "_view_count": null,
916
+ "_view_module": "@jupyter-widgets/controls",
917
+ "_view_module_version": "1.5.0",
918
+ "_view_name": "HBoxView",
919
+ "box_style": "",
920
+ "children": [
921
+ "IPY_MODEL_13e05f2f64a54245a2478393b1f6b409",
922
+ "IPY_MODEL_3c63301478f54f95ba0f0f8c853a7266",
923
+ "IPY_MODEL_bb455638d3ac451096fd7cce4cb0d82c"
924
+ ],
925
+ "layout": "IPY_MODEL_c4a6b418089147f6b50eab097ace0342"
926
+ }
927
+ },
928
+ "65d019ca643045a2b0933411d059c920": {
929
+ "model_module": "@jupyter-widgets/controls",
930
+ "model_module_version": "1.5.0",
931
+ "model_name": "DescriptionStyleModel",
932
+ "state": {
933
+ "_model_module": "@jupyter-widgets/controls",
934
+ "_model_module_version": "1.5.0",
935
+ "_model_name": "DescriptionStyleModel",
936
+ "_view_count": null,
937
+ "_view_module": "@jupyter-widgets/base",
938
+ "_view_module_version": "1.2.0",
939
+ "_view_name": "StyleView",
940
+ "description_width": ""
941
+ }
942
+ },
943
+ "69aad11dbc914410b95f6c3cb17a2457": {
944
+ "model_module": "@jupyter-widgets/controls",
945
+ "model_module_version": "1.5.0",
946
+ "model_name": "HTMLModel",
947
+ "state": {
948
+ "_dom_classes": [],
949
+ "_model_module": "@jupyter-widgets/controls",
950
+ "_model_module_version": "1.5.0",
951
+ "_model_name": "HTMLModel",
952
+ "_view_count": null,
953
+ "_view_module": "@jupyter-widgets/controls",
954
+ "_view_module_version": "1.5.0",
955
+ "_view_name": "HTMLView",
956
+ "description": "",
957
+ "description_tooltip": null,
958
+ "layout": "IPY_MODEL_331c23df507e4d679e3aaf81af39cd22",
959
+ "placeholder": "​",
960
+ "style": "IPY_MODEL_e33617a01b03437986c143c9a69ba14f",
961
+ "value": "Downloading: 100%"
962
+ }
963
+ },
964
+ "6b20c9e39d36404fb761b4d83954a278": {
965
+ "model_module": "@jupyter-widgets/controls",
966
+ "model_module_version": "1.5.0",
967
+ "model_name": "ProgressStyleModel",
968
+ "state": {
969
+ "_model_module": "@jupyter-widgets/controls",
970
+ "_model_module_version": "1.5.0",
971
+ "_model_name": "ProgressStyleModel",
972
+ "_view_count": null,
973
+ "_view_module": "@jupyter-widgets/base",
974
+ "_view_module_version": "1.2.0",
975
+ "_view_name": "StyleView",
976
+ "bar_color": null,
977
+ "description_width": ""
978
+ }
979
+ },
980
+ "6c291edce30b4cefb329113d5ecbe640": {
981
+ "model_module": "@jupyter-widgets/controls",
982
+ "model_module_version": "1.5.0",
983
+ "model_name": "HBoxModel",
984
+ "state": {
985
+ "_dom_classes": [],
986
+ "_model_module": "@jupyter-widgets/controls",
987
+ "_model_module_version": "1.5.0",
988
+ "_model_name": "HBoxModel",
989
+ "_view_count": null,
990
+ "_view_module": "@jupyter-widgets/controls",
991
+ "_view_module_version": "1.5.0",
992
+ "_view_name": "HBoxView",
993
+ "box_style": "",
994
+ "children": [
995
+ "IPY_MODEL_1249849d670f4824a0a21aa61d187b56",
996
+ "IPY_MODEL_0e7b7a12422d49d4a33538e638bfc1c9",
997
+ "IPY_MODEL_ec90909f38e0448bab37c735ea9b9ebe"
998
+ ],
999
+ "layout": "IPY_MODEL_f4274a98f06945a6a1e4a56b680c1790"
1000
+ }
1001
+ },
1002
+ "6d42d468b1d04c4a94fb2eed75e3c238": {
1003
+ "model_module": "@jupyter-widgets/controls",
1004
+ "model_module_version": "1.5.0",
1005
+ "model_name": "HBoxModel",
1006
+ "state": {
1007
+ "_dom_classes": [],
1008
+ "_model_module": "@jupyter-widgets/controls",
1009
+ "_model_module_version": "1.5.0",
1010
+ "_model_name": "HBoxModel",
1011
+ "_view_count": null,
1012
+ "_view_module": "@jupyter-widgets/controls",
1013
+ "_view_module_version": "1.5.0",
1014
+ "_view_name": "HBoxView",
1015
+ "box_style": "",
1016
+ "children": [
1017
+ "IPY_MODEL_a371ed9c75184b78a80facda31086426",
1018
+ "IPY_MODEL_c363fb4238e0464cb3a4ab16250e554a",
1019
+ "IPY_MODEL_86dce8a2c404469dab0dc1a466788c0a"
1020
+ ],
1021
+ "layout": "IPY_MODEL_f6a334e9b5da4c82a1c4f1fbc1fe3c7e"
1022
+ }
1023
+ },
1024
+ "6f5fbb8f0f5a4374a7bde870c64f1fa4": {
1025
+ "model_module": "@jupyter-widgets/base",
1026
+ "model_module_version": "1.2.0",
1027
+ "model_name": "LayoutModel",
1028
+ "state": {
1029
+ "_model_module": "@jupyter-widgets/base",
1030
+ "_model_module_version": "1.2.0",
1031
+ "_model_name": "LayoutModel",
1032
+ "_view_count": null,
1033
+ "_view_module": "@jupyter-widgets/base",
1034
+ "_view_module_version": "1.2.0",
1035
+ "_view_name": "LayoutView",
1036
+ "align_content": null,
1037
+ "align_items": null,
1038
+ "align_self": null,
1039
+ "border": null,
1040
+ "bottom": null,
1041
+ "display": null,
1042
+ "flex": null,
1043
+ "flex_flow": null,
1044
+ "grid_area": null,
1045
+ "grid_auto_columns": null,
1046
+ "grid_auto_flow": null,
1047
+ "grid_auto_rows": null,
1048
+ "grid_column": null,
1049
+ "grid_gap": null,
1050
+ "grid_row": null,
1051
+ "grid_template_areas": null,
1052
+ "grid_template_columns": null,
1053
+ "grid_template_rows": null,
1054
+ "height": null,
1055
+ "justify_content": null,
1056
+ "justify_items": null,
1057
+ "left": null,
1058
+ "margin": null,
1059
+ "max_height": null,
1060
+ "max_width": null,
1061
+ "min_height": null,
1062
+ "min_width": null,
1063
+ "object_fit": null,
1064
+ "object_position": null,
1065
+ "order": null,
1066
+ "overflow": null,
1067
+ "overflow_x": null,
1068
+ "overflow_y": null,
1069
+ "padding": null,
1070
+ "right": null,
1071
+ "top": null,
1072
+ "visibility": null,
1073
+ "width": null
1074
+ }
1075
+ },
1076
+ "72b47b116b0b4125a35d47e060f46807": {
1077
+ "model_module": "@jupyter-widgets/controls",
1078
+ "model_module_version": "1.5.0",
1079
+ "model_name": "HTMLModel",
1080
+ "state": {
1081
+ "_dom_classes": [],
1082
+ "_model_module": "@jupyter-widgets/controls",
1083
+ "_model_module_version": "1.5.0",
1084
+ "_model_name": "HTMLModel",
1085
+ "_view_count": null,
1086
+ "_view_module": "@jupyter-widgets/controls",
1087
+ "_view_module_version": "1.5.0",
1088
+ "_view_name": "HTMLView",
1089
+ "description": "",
1090
+ "description_tooltip": null,
1091
+ "layout": "IPY_MODEL_322cc6ef697945ccbbc2b3029dfdf0e3",
1092
+ "placeholder": "​",
1093
+ "style": "IPY_MODEL_f9456ff5134242bc9541d9d60c753384",
1094
+ "value": " 473/473 [00:00&lt;00:00, 13.5kB/s]"
1095
+ }
1096
+ },
1097
+ "7361467d667d49ee85c432eec884d882": {
1098
+ "model_module": "@jupyter-widgets/controls",
1099
+ "model_module_version": "1.5.0",
1100
+ "model_name": "ProgressStyleModel",
1101
+ "state": {
1102
+ "_model_module": "@jupyter-widgets/controls",
1103
+ "_model_module_version": "1.5.0",
1104
+ "_model_name": "ProgressStyleModel",
1105
+ "_view_count": null,
1106
+ "_view_module": "@jupyter-widgets/base",
1107
+ "_view_module_version": "1.2.0",
1108
+ "_view_name": "StyleView",
1109
+ "bar_color": null,
1110
+ "description_width": ""
1111
+ }
1112
+ },
1113
+ "794c088a92ba4f6798fa94cded51d0bd": {
1114
+ "model_module": "@jupyter-widgets/base",
1115
+ "model_module_version": "1.2.0",
1116
+ "model_name": "LayoutModel",
1117
+ "state": {
1118
+ "_model_module": "@jupyter-widgets/base",
1119
+ "_model_module_version": "1.2.0",
1120
+ "_model_name": "LayoutModel",
1121
+ "_view_count": null,
1122
+ "_view_module": "@jupyter-widgets/base",
1123
+ "_view_module_version": "1.2.0",
1124
+ "_view_name": "LayoutView",
1125
+ "align_content": null,
1126
+ "align_items": null,
1127
+ "align_self": null,
1128
+ "border": null,
1129
+ "bottom": null,
1130
+ "display": null,
1131
+ "flex": null,
1132
+ "flex_flow": null,
1133
+ "grid_area": null,
1134
+ "grid_auto_columns": null,
1135
+ "grid_auto_flow": null,
1136
+ "grid_auto_rows": null,
1137
+ "grid_column": null,
1138
+ "grid_gap": null,
1139
+ "grid_row": null,
1140
+ "grid_template_areas": null,
1141
+ "grid_template_columns": null,
1142
+ "grid_template_rows": null,
1143
+ "height": null,
1144
+ "justify_content": null,
1145
+ "justify_items": null,
1146
+ "left": null,
1147
+ "margin": null,
1148
+ "max_height": null,
1149
+ "max_width": null,
1150
+ "min_height": null,
1151
+ "min_width": null,
1152
+ "object_fit": null,
1153
+ "object_position": null,
1154
+ "order": null,
1155
+ "overflow": null,
1156
+ "overflow_x": null,
1157
+ "overflow_y": null,
1158
+ "padding": null,
1159
+ "right": null,
1160
+ "top": null,
1161
+ "visibility": null,
1162
+ "width": null
1163
+ }
1164
+ },
1165
+ "80cb9869ae694ecda95aab298598c7a2": {
1166
+ "model_module": "@jupyter-widgets/base",
1167
+ "model_module_version": "1.2.0",
1168
+ "model_name": "LayoutModel",
1169
+ "state": {
1170
+ "_model_module": "@jupyter-widgets/base",
1171
+ "_model_module_version": "1.2.0",
1172
+ "_model_name": "LayoutModel",
1173
+ "_view_count": null,
1174
+ "_view_module": "@jupyter-widgets/base",
1175
+ "_view_module_version": "1.2.0",
1176
+ "_view_name": "LayoutView",
1177
+ "align_content": null,
1178
+ "align_items": null,
1179
+ "align_self": null,
1180
+ "border": null,
1181
+ "bottom": null,
1182
+ "display": null,
1183
+ "flex": null,
1184
+ "flex_flow": null,
1185
+ "grid_area": null,
1186
+ "grid_auto_columns": null,
1187
+ "grid_auto_flow": null,
1188
+ "grid_auto_rows": null,
1189
+ "grid_column": null,
1190
+ "grid_gap": null,
1191
+ "grid_row": null,
1192
+ "grid_template_areas": null,
1193
+ "grid_template_columns": null,
1194
+ "grid_template_rows": null,
1195
+ "height": null,
1196
+ "justify_content": null,
1197
+ "justify_items": null,
1198
+ "left": null,
1199
+ "margin": null,
1200
+ "max_height": null,
1201
+ "max_width": null,
1202
+ "min_height": null,
1203
+ "min_width": null,
1204
+ "object_fit": null,
1205
+ "object_position": null,
1206
+ "order": null,
1207
+ "overflow": null,
1208
+ "overflow_x": null,
1209
+ "overflow_y": null,
1210
+ "padding": null,
1211
+ "right": null,
1212
+ "top": null,
1213
+ "visibility": null,
1214
+ "width": null
1215
+ }
1216
+ },
1217
+ "86dce8a2c404469dab0dc1a466788c0a": {
1218
+ "model_module": "@jupyter-widgets/controls",
1219
+ "model_module_version": "1.5.0",
1220
+ "model_name": "HTMLModel",
1221
+ "state": {
1222
+ "_dom_classes": [],
1223
+ "_model_module": "@jupyter-widgets/controls",
1224
+ "_model_module_version": "1.5.0",
1225
+ "_model_name": "HTMLModel",
1226
+ "_view_count": null,
1227
+ "_view_module": "@jupyter-widgets/controls",
1228
+ "_view_module_version": "1.5.0",
1229
+ "_view_name": "HTMLView",
1230
+ "description": "",
1231
+ "description_tooltip": null,
1232
+ "layout": "IPY_MODEL_b0f31194b8f24b5ab601f8edd5332e04",
1233
+ "placeholder": "​",
1234
+ "style": "IPY_MODEL_cbeb9e9dfdf6420d91ec1126fedd8e48",
1235
+ "value": " 29.0/29.0 [00:00&lt;00:00, 321B/s]"
1236
+ }
1237
+ },
1238
+ "8cc02da6124448598923899b144afd6d": {
1239
+ "model_module": "@jupyter-widgets/base",
1240
+ "model_module_version": "1.2.0",
1241
+ "model_name": "LayoutModel",
1242
+ "state": {
1243
+ "_model_module": "@jupyter-widgets/base",
1244
+ "_model_module_version": "1.2.0",
1245
+ "_model_name": "LayoutModel",
1246
+ "_view_count": null,
1247
+ "_view_module": "@jupyter-widgets/base",
1248
+ "_view_module_version": "1.2.0",
1249
+ "_view_name": "LayoutView",
1250
+ "align_content": null,
1251
+ "align_items": null,
1252
+ "align_self": null,
1253
+ "border": null,
1254
+ "bottom": null,
1255
+ "display": null,
1256
+ "flex": null,
1257
+ "flex_flow": null,
1258
+ "grid_area": null,
1259
+ "grid_auto_columns": null,
1260
+ "grid_auto_flow": null,
1261
+ "grid_auto_rows": null,
1262
+ "grid_column": null,
1263
+ "grid_gap": null,
1264
+ "grid_row": null,
1265
+ "grid_template_areas": null,
1266
+ "grid_template_columns": null,
1267
+ "grid_template_rows": null,
1268
+ "height": null,
1269
+ "justify_content": null,
1270
+ "justify_items": null,
1271
+ "left": null,
1272
+ "margin": null,
1273
+ "max_height": null,
1274
+ "max_width": null,
1275
+ "min_height": null,
1276
+ "min_width": null,
1277
+ "object_fit": null,
1278
+ "object_position": null,
1279
+ "order": null,
1280
+ "overflow": null,
1281
+ "overflow_x": null,
1282
+ "overflow_y": null,
1283
+ "padding": null,
1284
+ "right": null,
1285
+ "top": null,
1286
+ "visibility": null,
1287
+ "width": null
1288
+ }
1289
+ },
1290
+ "9adf994d114141f98aeea509a73e9c59": {
1291
+ "model_module": "@jupyter-widgets/base",
1292
+ "model_module_version": "1.2.0",
1293
+ "model_name": "LayoutModel",
1294
+ "state": {
1295
+ "_model_module": "@jupyter-widgets/base",
1296
+ "_model_module_version": "1.2.0",
1297
+ "_model_name": "LayoutModel",
1298
+ "_view_count": null,
1299
+ "_view_module": "@jupyter-widgets/base",
1300
+ "_view_module_version": "1.2.0",
1301
+ "_view_name": "LayoutView",
1302
+ "align_content": null,
1303
+ "align_items": null,
1304
+ "align_self": null,
1305
+ "border": null,
1306
+ "bottom": null,
1307
+ "display": null,
1308
+ "flex": null,
1309
+ "flex_flow": null,
1310
+ "grid_area": null,
1311
+ "grid_auto_columns": null,
1312
+ "grid_auto_flow": null,
1313
+ "grid_auto_rows": null,
1314
+ "grid_column": null,
1315
+ "grid_gap": null,
1316
+ "grid_row": null,
1317
+ "grid_template_areas": null,
1318
+ "grid_template_columns": null,
1319
+ "grid_template_rows": null,
1320
+ "height": null,
1321
+ "justify_content": null,
1322
+ "justify_items": null,
1323
+ "left": null,
1324
+ "margin": null,
1325
+ "max_height": null,
1326
+ "max_width": null,
1327
+ "min_height": null,
1328
+ "min_width": null,
1329
+ "object_fit": null,
1330
+ "object_position": null,
1331
+ "order": null,
1332
+ "overflow": null,
1333
+ "overflow_x": null,
1334
+ "overflow_y": null,
1335
+ "padding": null,
1336
+ "right": null,
1337
+ "top": null,
1338
+ "visibility": null,
1339
+ "width": null
1340
+ }
1341
+ },
1342
+ "9f4a770ae6b84593ac7de85e15c305a9": {
1343
+ "model_module": "@jupyter-widgets/base",
1344
+ "model_module_version": "1.2.0",
1345
+ "model_name": "LayoutModel",
1346
+ "state": {
1347
+ "_model_module": "@jupyter-widgets/base",
1348
+ "_model_module_version": "1.2.0",
1349
+ "_model_name": "LayoutModel",
1350
+ "_view_count": null,
1351
+ "_view_module": "@jupyter-widgets/base",
1352
+ "_view_module_version": "1.2.0",
1353
+ "_view_name": "LayoutView",
1354
+ "align_content": null,
1355
+ "align_items": null,
1356
+ "align_self": null,
1357
+ "border": null,
1358
+ "bottom": null,
1359
+ "display": null,
1360
+ "flex": null,
1361
+ "flex_flow": null,
1362
+ "grid_area": null,
1363
+ "grid_auto_columns": null,
1364
+ "grid_auto_flow": null,
1365
+ "grid_auto_rows": null,
1366
+ "grid_column": null,
1367
+ "grid_gap": null,
1368
+ "grid_row": null,
1369
+ "grid_template_areas": null,
1370
+ "grid_template_columns": null,
1371
+ "grid_template_rows": null,
1372
+ "height": null,
1373
+ "justify_content": null,
1374
+ "justify_items": null,
1375
+ "left": null,
1376
+ "margin": null,
1377
+ "max_height": null,
1378
+ "max_width": null,
1379
+ "min_height": null,
1380
+ "min_width": null,
1381
+ "object_fit": null,
1382
+ "object_position": null,
1383
+ "order": null,
1384
+ "overflow": null,
1385
+ "overflow_x": null,
1386
+ "overflow_y": null,
1387
+ "padding": null,
1388
+ "right": null,
1389
+ "top": null,
1390
+ "visibility": null,
1391
+ "width": null
1392
+ }
1393
+ },
1394
+ "a371ed9c75184b78a80facda31086426": {
1395
+ "model_module": "@jupyter-widgets/controls",
1396
+ "model_module_version": "1.5.0",
1397
+ "model_name": "HTMLModel",
1398
+ "state": {
1399
+ "_dom_classes": [],
1400
+ "_model_module": "@jupyter-widgets/controls",
1401
+ "_model_module_version": "1.5.0",
1402
+ "_model_name": "HTMLModel",
1403
+ "_view_count": null,
1404
+ "_view_module": "@jupyter-widgets/controls",
1405
+ "_view_module_version": "1.5.0",
1406
+ "_view_name": "HTMLView",
1407
+ "description": "",
1408
+ "description_tooltip": null,
1409
+ "layout": "IPY_MODEL_e9f1a9476de147c3bc98c1c36960dad6",
1410
+ "placeholder": "​",
1411
+ "style": "IPY_MODEL_2f8a7ddfcee64b978ff23f8f43911e01",
1412
+ "value": "Downloading: 100%"
1413
+ }
1414
+ },
1415
+ "b099d3bb966a4e9b8d02710329030ff3": {
1416
+ "model_module": "@jupyter-widgets/controls",
1417
+ "model_module_version": "1.5.0",
1418
+ "model_name": "DescriptionStyleModel",
1419
+ "state": {
1420
+ "_model_module": "@jupyter-widgets/controls",
1421
+ "_model_module_version": "1.5.0",
1422
+ "_model_name": "DescriptionStyleModel",
1423
+ "_view_count": null,
1424
+ "_view_module": "@jupyter-widgets/base",
1425
+ "_view_module_version": "1.2.0",
1426
+ "_view_name": "StyleView",
1427
+ "description_width": ""
1428
+ }
1429
+ },
1430
+ "b0f31194b8f24b5ab601f8edd5332e04": {
1431
+ "model_module": "@jupyter-widgets/base",
1432
+ "model_module_version": "1.2.0",
1433
+ "model_name": "LayoutModel",
1434
+ "state": {
1435
+ "_model_module": "@jupyter-widgets/base",
1436
+ "_model_module_version": "1.2.0",
1437
+ "_model_name": "LayoutModel",
1438
+ "_view_count": null,
1439
+ "_view_module": "@jupyter-widgets/base",
1440
+ "_view_module_version": "1.2.0",
1441
+ "_view_name": "LayoutView",
1442
+ "align_content": null,
1443
+ "align_items": null,
1444
+ "align_self": null,
1445
+ "border": null,
1446
+ "bottom": null,
1447
+ "display": null,
1448
+ "flex": null,
1449
+ "flex_flow": null,
1450
+ "grid_area": null,
1451
+ "grid_auto_columns": null,
1452
+ "grid_auto_flow": null,
1453
+ "grid_auto_rows": null,
1454
+ "grid_column": null,
1455
+ "grid_gap": null,
1456
+ "grid_row": null,
1457
+ "grid_template_areas": null,
1458
+ "grid_template_columns": null,
1459
+ "grid_template_rows": null,
1460
+ "height": null,
1461
+ "justify_content": null,
1462
+ "justify_items": null,
1463
+ "left": null,
1464
+ "margin": null,
1465
+ "max_height": null,
1466
+ "max_width": null,
1467
+ "min_height": null,
1468
+ "min_width": null,
1469
+ "object_fit": null,
1470
+ "object_position": null,
1471
+ "order": null,
1472
+ "overflow": null,
1473
+ "overflow_x": null,
1474
+ "overflow_y": null,
1475
+ "padding": null,
1476
+ "right": null,
1477
+ "top": null,
1478
+ "visibility": null,
1479
+ "width": null
1480
+ }
1481
+ },
1482
+ "b1271eb1b7e74250bd9273f229b49cd8": {
1483
+ "model_module": "@jupyter-widgets/base",
1484
+ "model_module_version": "1.2.0",
1485
+ "model_name": "LayoutModel",
1486
+ "state": {
1487
+ "_model_module": "@jupyter-widgets/base",
1488
+ "_model_module_version": "1.2.0",
1489
+ "_model_name": "LayoutModel",
1490
+ "_view_count": null,
1491
+ "_view_module": "@jupyter-widgets/base",
1492
+ "_view_module_version": "1.2.0",
1493
+ "_view_name": "LayoutView",
1494
+ "align_content": null,
1495
+ "align_items": null,
1496
+ "align_self": null,
1497
+ "border": null,
1498
+ "bottom": null,
1499
+ "display": null,
1500
+ "flex": null,
1501
+ "flex_flow": null,
1502
+ "grid_area": null,
1503
+ "grid_auto_columns": null,
1504
+ "grid_auto_flow": null,
1505
+ "grid_auto_rows": null,
1506
+ "grid_column": null,
1507
+ "grid_gap": null,
1508
+ "grid_row": null,
1509
+ "grid_template_areas": null,
1510
+ "grid_template_columns": null,
1511
+ "grid_template_rows": null,
1512
+ "height": null,
1513
+ "justify_content": null,
1514
+ "justify_items": null,
1515
+ "left": null,
1516
+ "margin": null,
1517
+ "max_height": null,
1518
+ "max_width": null,
1519
+ "min_height": null,
1520
+ "min_width": null,
1521
+ "object_fit": null,
1522
+ "object_position": null,
1523
+ "order": null,
1524
+ "overflow": null,
1525
+ "overflow_x": null,
1526
+ "overflow_y": null,
1527
+ "padding": null,
1528
+ "right": null,
1529
+ "top": null,
1530
+ "visibility": null,
1531
+ "width": null
1532
+ }
1533
+ },
1534
+ "b440f3b3937549d69a4f188bb8415531": {
1535
+ "model_module": "@jupyter-widgets/controls",
1536
+ "model_module_version": "1.5.0",
1537
+ "model_name": "HTMLModel",
1538
+ "state": {
1539
+ "_dom_classes": [],
1540
+ "_model_module": "@jupyter-widgets/controls",
1541
+ "_model_module_version": "1.5.0",
1542
+ "_model_name": "HTMLModel",
1543
+ "_view_count": null,
1544
+ "_view_module": "@jupyter-widgets/controls",
1545
+ "_view_module_version": "1.5.0",
1546
+ "_view_name": "HTMLView",
1547
+ "description": "",
1548
+ "description_tooltip": null,
1549
+ "layout": "IPY_MODEL_2aaa61bf4df248be970940e103af0276",
1550
+ "placeholder": "​",
1551
+ "style": "IPY_MODEL_3a7f8b6302034a0e889976ba0fbb4531",
1552
+ "value": "Downloading: 100%"
1553
+ }
1554
+ },
1555
+ "bb455638d3ac451096fd7cce4cb0d82c": {
1556
+ "model_module": "@jupyter-widgets/controls",
1557
+ "model_module_version": "1.5.0",
1558
+ "model_name": "HTMLModel",
1559
+ "state": {
1560
+ "_dom_classes": [],
1561
+ "_model_module": "@jupyter-widgets/controls",
1562
+ "_model_module_version": "1.5.0",
1563
+ "_model_name": "HTMLModel",
1564
+ "_view_count": null,
1565
+ "_view_module": "@jupyter-widgets/controls",
1566
+ "_view_module_version": "1.5.0",
1567
+ "_view_name": "HTMLView",
1568
+ "description": "",
1569
+ "description_tooltip": null,
1570
+ "layout": "IPY_MODEL_c53ce49d9b7c4a3c87ad7f7c75dce1f5",
1571
+ "placeholder": "​",
1572
+ "style": "IPY_MODEL_b099d3bb966a4e9b8d02710329030ff3",
1573
+ "value": " 261M/261M [00:04&lt;00:00, 53.4MB/s]"
1574
+ }
1575
+ },
1576
+ "c04562a21d36405890c11e74915839c7": {
1577
+ "model_module": "@jupyter-widgets/base",
1578
+ "model_module_version": "1.2.0",
1579
+ "model_name": "LayoutModel",
1580
+ "state": {
1581
+ "_model_module": "@jupyter-widgets/base",
1582
+ "_model_module_version": "1.2.0",
1583
+ "_model_name": "LayoutModel",
1584
+ "_view_count": null,
1585
+ "_view_module": "@jupyter-widgets/base",
1586
+ "_view_module_version": "1.2.0",
1587
+ "_view_name": "LayoutView",
1588
+ "align_content": null,
1589
+ "align_items": null,
1590
+ "align_self": null,
1591
+ "border": null,
1592
+ "bottom": null,
1593
+ "display": null,
1594
+ "flex": null,
1595
+ "flex_flow": null,
1596
+ "grid_area": null,
1597
+ "grid_auto_columns": null,
1598
+ "grid_auto_flow": null,
1599
+ "grid_auto_rows": null,
1600
+ "grid_column": null,
1601
+ "grid_gap": null,
1602
+ "grid_row": null,
1603
+ "grid_template_areas": null,
1604
+ "grid_template_columns": null,
1605
+ "grid_template_rows": null,
1606
+ "height": null,
1607
+ "justify_content": null,
1608
+ "justify_items": null,
1609
+ "left": null,
1610
+ "margin": null,
1611
+ "max_height": null,
1612
+ "max_width": null,
1613
+ "min_height": null,
1614
+ "min_width": null,
1615
+ "object_fit": null,
1616
+ "object_position": null,
1617
+ "order": null,
1618
+ "overflow": null,
1619
+ "overflow_x": null,
1620
+ "overflow_y": null,
1621
+ "padding": null,
1622
+ "right": null,
1623
+ "top": null,
1624
+ "visibility": null,
1625
+ "width": null
1626
+ }
1627
+ },
1628
+ "c363fb4238e0464cb3a4ab16250e554a": {
1629
+ "model_module": "@jupyter-widgets/controls",
1630
+ "model_module_version": "1.5.0",
1631
+ "model_name": "FloatProgressModel",
1632
+ "state": {
1633
+ "_dom_classes": [],
1634
+ "_model_module": "@jupyter-widgets/controls",
1635
+ "_model_module_version": "1.5.0",
1636
+ "_model_name": "FloatProgressModel",
1637
+ "_view_count": null,
1638
+ "_view_module": "@jupyter-widgets/controls",
1639
+ "_view_module_version": "1.5.0",
1640
+ "_view_name": "ProgressView",
1641
+ "bar_style": "success",
1642
+ "description": "",
1643
+ "description_tooltip": null,
1644
+ "layout": "IPY_MODEL_099a1ac5af9b4e38abb6afb9c333d37a",
1645
+ "max": 29,
1646
+ "min": 0,
1647
+ "orientation": "horizontal",
1648
+ "style": "IPY_MODEL_340a7a5171ba4b8aa9b70e945df1618c",
1649
+ "value": 29
1650
+ }
1651
+ },
1652
+ "c4a6b418089147f6b50eab097ace0342": {
1653
+ "model_module": "@jupyter-widgets/base",
1654
+ "model_module_version": "1.2.0",
1655
+ "model_name": "LayoutModel",
1656
+ "state": {
1657
+ "_model_module": "@jupyter-widgets/base",
1658
+ "_model_module_version": "1.2.0",
1659
+ "_model_name": "LayoutModel",
1660
+ "_view_count": null,
1661
+ "_view_module": "@jupyter-widgets/base",
1662
+ "_view_module_version": "1.2.0",
1663
+ "_view_name": "LayoutView",
1664
+ "align_content": null,
1665
+ "align_items": null,
1666
+ "align_self": null,
1667
+ "border": null,
1668
+ "bottom": null,
1669
+ "display": null,
1670
+ "flex": null,
1671
+ "flex_flow": null,
1672
+ "grid_area": null,
1673
+ "grid_auto_columns": null,
1674
+ "grid_auto_flow": null,
1675
+ "grid_auto_rows": null,
1676
+ "grid_column": null,
1677
+ "grid_gap": null,
1678
+ "grid_row": null,
1679
+ "grid_template_areas": null,
1680
+ "grid_template_columns": null,
1681
+ "grid_template_rows": null,
1682
+ "height": null,
1683
+ "justify_content": null,
1684
+ "justify_items": null,
1685
+ "left": null,
1686
+ "margin": null,
1687
+ "max_height": null,
1688
+ "max_width": null,
1689
+ "min_height": null,
1690
+ "min_width": null,
1691
+ "object_fit": null,
1692
+ "object_position": null,
1693
+ "order": null,
1694
+ "overflow": null,
1695
+ "overflow_x": null,
1696
+ "overflow_y": null,
1697
+ "padding": null,
1698
+ "right": null,
1699
+ "top": null,
1700
+ "visibility": null,
1701
+ "width": null
1702
+ }
1703
+ },
1704
+ "c53ce49d9b7c4a3c87ad7f7c75dce1f5": {
1705
+ "model_module": "@jupyter-widgets/base",
1706
+ "model_module_version": "1.2.0",
1707
+ "model_name": "LayoutModel",
1708
+ "state": {
1709
+ "_model_module": "@jupyter-widgets/base",
1710
+ "_model_module_version": "1.2.0",
1711
+ "_model_name": "LayoutModel",
1712
+ "_view_count": null,
1713
+ "_view_module": "@jupyter-widgets/base",
1714
+ "_view_module_version": "1.2.0",
1715
+ "_view_name": "LayoutView",
1716
+ "align_content": null,
1717
+ "align_items": null,
1718
+ "align_self": null,
1719
+ "border": null,
1720
+ "bottom": null,
1721
+ "display": null,
1722
+ "flex": null,
1723
+ "flex_flow": null,
1724
+ "grid_area": null,
1725
+ "grid_auto_columns": null,
1726
+ "grid_auto_flow": null,
1727
+ "grid_auto_rows": null,
1728
+ "grid_column": null,
1729
+ "grid_gap": null,
1730
+ "grid_row": null,
1731
+ "grid_template_areas": null,
1732
+ "grid_template_columns": null,
1733
+ "grid_template_rows": null,
1734
+ "height": null,
1735
+ "justify_content": null,
1736
+ "justify_items": null,
1737
+ "left": null,
1738
+ "margin": null,
1739
+ "max_height": null,
1740
+ "max_width": null,
1741
+ "min_height": null,
1742
+ "min_width": null,
1743
+ "object_fit": null,
1744
+ "object_position": null,
1745
+ "order": null,
1746
+ "overflow": null,
1747
+ "overflow_x": null,
1748
+ "overflow_y": null,
1749
+ "padding": null,
1750
+ "right": null,
1751
+ "top": null,
1752
+ "visibility": null,
1753
+ "width": null
1754
+ }
1755
+ },
1756
+ "cbeb9e9dfdf6420d91ec1126fedd8e48": {
1757
+ "model_module": "@jupyter-widgets/controls",
1758
+ "model_module_version": "1.5.0",
1759
+ "model_name": "DescriptionStyleModel",
1760
+ "state": {
1761
+ "_model_module": "@jupyter-widgets/controls",
1762
+ "_model_module_version": "1.5.0",
1763
+ "_model_name": "DescriptionStyleModel",
1764
+ "_view_count": null,
1765
+ "_view_module": "@jupyter-widgets/base",
1766
+ "_view_module_version": "1.2.0",
1767
+ "_view_name": "StyleView",
1768
+ "description_width": ""
1769
+ }
1770
+ },
1771
+ "d7e158e614f44983b229d6dd0d8960f9": {
1772
+ "model_module": "@jupyter-widgets/controls",
1773
+ "model_module_version": "1.5.0",
1774
+ "model_name": "HBoxModel",
1775
+ "state": {
1776
+ "_dom_classes": [],
1777
+ "_model_module": "@jupyter-widgets/controls",
1778
+ "_model_module_version": "1.5.0",
1779
+ "_model_name": "HBoxModel",
1780
+ "_view_count": null,
1781
+ "_view_module": "@jupyter-widgets/controls",
1782
+ "_view_module_version": "1.5.0",
1783
+ "_view_name": "HBoxView",
1784
+ "box_style": "",
1785
+ "children": [
1786
+ "IPY_MODEL_69aad11dbc914410b95f6c3cb17a2457",
1787
+ "IPY_MODEL_0302e718c6084fb0a96d92fd976738dc",
1788
+ "IPY_MODEL_72b47b116b0b4125a35d47e060f46807"
1789
+ ],
1790
+ "layout": "IPY_MODEL_6f5fbb8f0f5a4374a7bde870c64f1fa4"
1791
+ }
1792
+ },
1793
+ "da1553ec3e044a4fb7bb9b0c2a84bfe0": {
1794
+ "model_module": "@jupyter-widgets/controls",
1795
+ "model_module_version": "1.5.0",
1796
+ "model_name": "HBoxModel",
1797
+ "state": {
1798
+ "_dom_classes": [],
1799
+ "_model_module": "@jupyter-widgets/controls",
1800
+ "_model_module_version": "1.5.0",
1801
+ "_model_name": "HBoxModel",
1802
+ "_view_count": null,
1803
+ "_view_module": "@jupyter-widgets/controls",
1804
+ "_view_module_version": "1.5.0",
1805
+ "_view_name": "HBoxView",
1806
+ "box_style": "",
1807
+ "children": [
1808
+ "IPY_MODEL_b440f3b3937549d69a4f188bb8415531",
1809
+ "IPY_MODEL_1b937b91e8ee46c4a764a8091f365291",
1810
+ "IPY_MODEL_1369029047864cc68100a44ffaad35ca"
1811
+ ],
1812
+ "layout": "IPY_MODEL_80cb9869ae694ecda95aab298598c7a2"
1813
+ }
1814
+ },
1815
+ "dd420d29751341faa84c025afa743bb5": {
1816
+ "model_module": "@jupyter-widgets/base",
1817
+ "model_module_version": "1.2.0",
1818
+ "model_name": "LayoutModel",
1819
+ "state": {
1820
+ "_model_module": "@jupyter-widgets/base",
1821
+ "_model_module_version": "1.2.0",
1822
+ "_model_name": "LayoutModel",
1823
+ "_view_count": null,
1824
+ "_view_module": "@jupyter-widgets/base",
1825
+ "_view_module_version": "1.2.0",
1826
+ "_view_name": "LayoutView",
1827
+ "align_content": null,
1828
+ "align_items": null,
1829
+ "align_self": null,
1830
+ "border": null,
1831
+ "bottom": null,
1832
+ "display": null,
1833
+ "flex": null,
1834
+ "flex_flow": null,
1835
+ "grid_area": null,
1836
+ "grid_auto_columns": null,
1837
+ "grid_auto_flow": null,
1838
+ "grid_auto_rows": null,
1839
+ "grid_column": null,
1840
+ "grid_gap": null,
1841
+ "grid_row": null,
1842
+ "grid_template_areas": null,
1843
+ "grid_template_columns": null,
1844
+ "grid_template_rows": null,
1845
+ "height": null,
1846
+ "justify_content": null,
1847
+ "justify_items": null,
1848
+ "left": null,
1849
+ "margin": null,
1850
+ "max_height": null,
1851
+ "max_width": null,
1852
+ "min_height": null,
1853
+ "min_width": null,
1854
+ "object_fit": null,
1855
+ "object_position": null,
1856
+ "order": null,
1857
+ "overflow": null,
1858
+ "overflow_x": null,
1859
+ "overflow_y": null,
1860
+ "padding": null,
1861
+ "right": null,
1862
+ "top": null,
1863
+ "visibility": null,
1864
+ "width": null
1865
+ }
1866
+ },
1867
+ "deb9a0d8d3e1430bacab12c8b4ce7573": {
1868
+ "model_module": "@jupyter-widgets/controls",
1869
+ "model_module_version": "1.5.0",
1870
+ "model_name": "ProgressStyleModel",
1871
+ "state": {
1872
+ "_model_module": "@jupyter-widgets/controls",
1873
+ "_model_module_version": "1.5.0",
1874
+ "_model_name": "ProgressStyleModel",
1875
+ "_view_count": null,
1876
+ "_view_module": "@jupyter-widgets/base",
1877
+ "_view_module_version": "1.2.0",
1878
+ "_view_name": "StyleView",
1879
+ "bar_color": null,
1880
+ "description_width": ""
1881
+ }
1882
+ },
1883
+ "e33617a01b03437986c143c9a69ba14f": {
1884
+ "model_module": "@jupyter-widgets/controls",
1885
+ "model_module_version": "1.5.0",
1886
+ "model_name": "DescriptionStyleModel",
1887
+ "state": {
1888
+ "_model_module": "@jupyter-widgets/controls",
1889
+ "_model_module_version": "1.5.0",
1890
+ "_model_name": "DescriptionStyleModel",
1891
+ "_view_count": null,
1892
+ "_view_module": "@jupyter-widgets/base",
1893
+ "_view_module_version": "1.2.0",
1894
+ "_view_name": "StyleView",
1895
+ "description_width": ""
1896
+ }
1897
+ },
1898
+ "e6b91c2208e44524a9883309ae431277": {
1899
+ "model_module": "@jupyter-widgets/controls",
1900
+ "model_module_version": "1.5.0",
1901
+ "model_name": "DescriptionStyleModel",
1902
+ "state": {
1903
+ "_model_module": "@jupyter-widgets/controls",
1904
+ "_model_module_version": "1.5.0",
1905
+ "_model_name": "DescriptionStyleModel",
1906
+ "_view_count": null,
1907
+ "_view_module": "@jupyter-widgets/base",
1908
+ "_view_module_version": "1.2.0",
1909
+ "_view_name": "StyleView",
1910
+ "description_width": ""
1911
+ }
1912
+ },
1913
+ "e9f1a9476de147c3bc98c1c36960dad6": {
1914
+ "model_module": "@jupyter-widgets/base",
1915
+ "model_module_version": "1.2.0",
1916
+ "model_name": "LayoutModel",
1917
+ "state": {
1918
+ "_model_module": "@jupyter-widgets/base",
1919
+ "_model_module_version": "1.2.0",
1920
+ "_model_name": "LayoutModel",
1921
+ "_view_count": null,
1922
+ "_view_module": "@jupyter-widgets/base",
1923
+ "_view_module_version": "1.2.0",
1924
+ "_view_name": "LayoutView",
1925
+ "align_content": null,
1926
+ "align_items": null,
1927
+ "align_self": null,
1928
+ "border": null,
1929
+ "bottom": null,
1930
+ "display": null,
1931
+ "flex": null,
1932
+ "flex_flow": null,
1933
+ "grid_area": null,
1934
+ "grid_auto_columns": null,
1935
+ "grid_auto_flow": null,
1936
+ "grid_auto_rows": null,
1937
+ "grid_column": null,
1938
+ "grid_gap": null,
1939
+ "grid_row": null,
1940
+ "grid_template_areas": null,
1941
+ "grid_template_columns": null,
1942
+ "grid_template_rows": null,
1943
+ "height": null,
1944
+ "justify_content": null,
1945
+ "justify_items": null,
1946
+ "left": null,
1947
+ "margin": null,
1948
+ "max_height": null,
1949
+ "max_width": null,
1950
+ "min_height": null,
1951
+ "min_width": null,
1952
+ "object_fit": null,
1953
+ "object_position": null,
1954
+ "order": null,
1955
+ "overflow": null,
1956
+ "overflow_x": null,
1957
+ "overflow_y": null,
1958
+ "padding": null,
1959
+ "right": null,
1960
+ "top": null,
1961
+ "visibility": null,
1962
+ "width": null
1963
+ }
1964
+ },
1965
+ "ec90909f38e0448bab37c735ea9b9ebe": {
1966
+ "model_module": "@jupyter-widgets/controls",
1967
+ "model_module_version": "1.5.0",
1968
+ "model_name": "HTMLModel",
1969
+ "state": {
1970
+ "_dom_classes": [],
1971
+ "_model_module": "@jupyter-widgets/controls",
1972
+ "_model_module_version": "1.5.0",
1973
+ "_model_name": "HTMLModel",
1974
+ "_view_count": null,
1975
+ "_view_module": "@jupyter-widgets/controls",
1976
+ "_view_module_version": "1.5.0",
1977
+ "_view_name": "HTMLView",
1978
+ "description": "",
1979
+ "description_tooltip": null,
1980
+ "layout": "IPY_MODEL_dd420d29751341faa84c025afa743bb5",
1981
+ "placeholder": "​",
1982
+ "style": "IPY_MODEL_4d56ed704097453a930baa9ecdfb1156",
1983
+ "value": " 436k/436k [00:01&lt;00:00, 470kB/s]"
1984
+ }
1985
+ },
1986
+ "f4274a98f06945a6a1e4a56b680c1790": {
1987
+ "model_module": "@jupyter-widgets/base",
1988
+ "model_module_version": "1.2.0",
1989
+ "model_name": "LayoutModel",
1990
+ "state": {
1991
+ "_model_module": "@jupyter-widgets/base",
1992
+ "_model_module_version": "1.2.0",
1993
+ "_model_name": "LayoutModel",
1994
+ "_view_count": null,
1995
+ "_view_module": "@jupyter-widgets/base",
1996
+ "_view_module_version": "1.2.0",
1997
+ "_view_name": "LayoutView",
1998
+ "align_content": null,
1999
+ "align_items": null,
2000
+ "align_self": null,
2001
+ "border": null,
2002
+ "bottom": null,
2003
+ "display": null,
2004
+ "flex": null,
2005
+ "flex_flow": null,
2006
+ "grid_area": null,
2007
+ "grid_auto_columns": null,
2008
+ "grid_auto_flow": null,
2009
+ "grid_auto_rows": null,
2010
+ "grid_column": null,
2011
+ "grid_gap": null,
2012
+ "grid_row": null,
2013
+ "grid_template_areas": null,
2014
+ "grid_template_columns": null,
2015
+ "grid_template_rows": null,
2016
+ "height": null,
2017
+ "justify_content": null,
2018
+ "justify_items": null,
2019
+ "left": null,
2020
+ "margin": null,
2021
+ "max_height": null,
2022
+ "max_width": null,
2023
+ "min_height": null,
2024
+ "min_width": null,
2025
+ "object_fit": null,
2026
+ "object_position": null,
2027
+ "order": null,
2028
+ "overflow": null,
2029
+ "overflow_x": null,
2030
+ "overflow_y": null,
2031
+ "padding": null,
2032
+ "right": null,
2033
+ "top": null,
2034
+ "visibility": null,
2035
+ "width": null
2036
+ }
2037
+ },
2038
+ "f6a334e9b5da4c82a1c4f1fbc1fe3c7e": {
2039
+ "model_module": "@jupyter-widgets/base",
2040
+ "model_module_version": "1.2.0",
2041
+ "model_name": "LayoutModel",
2042
+ "state": {
2043
+ "_model_module": "@jupyter-widgets/base",
2044
+ "_model_module_version": "1.2.0",
2045
+ "_model_name": "LayoutModel",
2046
+ "_view_count": null,
2047
+ "_view_module": "@jupyter-widgets/base",
2048
+ "_view_module_version": "1.2.0",
2049
+ "_view_name": "LayoutView",
2050
+ "align_content": null,
2051
+ "align_items": null,
2052
+ "align_self": null,
2053
+ "border": null,
2054
+ "bottom": null,
2055
+ "display": null,
2056
+ "flex": null,
2057
+ "flex_flow": null,
2058
+ "grid_area": null,
2059
+ "grid_auto_columns": null,
2060
+ "grid_auto_flow": null,
2061
+ "grid_auto_rows": null,
2062
+ "grid_column": null,
2063
+ "grid_gap": null,
2064
+ "grid_row": null,
2065
+ "grid_template_areas": null,
2066
+ "grid_template_columns": null,
2067
+ "grid_template_rows": null,
2068
+ "height": null,
2069
+ "justify_content": null,
2070
+ "justify_items": null,
2071
+ "left": null,
2072
+ "margin": null,
2073
+ "max_height": null,
2074
+ "max_width": null,
2075
+ "min_height": null,
2076
+ "min_width": null,
2077
+ "object_fit": null,
2078
+ "object_position": null,
2079
+ "order": null,
2080
+ "overflow": null,
2081
+ "overflow_x": null,
2082
+ "overflow_y": null,
2083
+ "padding": null,
2084
+ "right": null,
2085
+ "top": null,
2086
+ "visibility": null,
2087
+ "width": null
2088
+ }
2089
+ },
2090
+ "f9456ff5134242bc9541d9d60c753384": {
2091
+ "model_module": "@jupyter-widgets/controls",
2092
+ "model_module_version": "1.5.0",
2093
+ "model_name": "DescriptionStyleModel",
2094
+ "state": {
2095
+ "_model_module": "@jupyter-widgets/controls",
2096
+ "_model_module_version": "1.5.0",
2097
+ "_model_name": "DescriptionStyleModel",
2098
+ "_view_count": null,
2099
+ "_view_module": "@jupyter-widgets/base",
2100
+ "_view_module_version": "1.2.0",
2101
+ "_view_name": "StyleView",
2102
+ "description_width": ""
2103
+ }
2104
+ }
2105
+ }
2106
+ }
2107
+ },
2108
+ "nbformat": 4,
2109
+ "nbformat_minor": 1
2110
+ }
NLP with Attention Models/QA/QA_DistilBERT_pipline_FT/Files/tf/C4W3_HF_Lab2_QA_BERT.ipynb ADDED
@@ -0,0 +1,644 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {
6
+ "id": "u2UXutvEvpUj"
7
+ },
8
+ "source": [
9
+ "# Question Answering with BERT and HuggingFace 🤗 (Fine-tuning)\n",
10
+ "\n",
11
+ "In the previous Hugging Face ungraded lab, you saw how to use the pipeline objects to use transformer models for NLP tasks. In that lab, the model didn't output the desired answers to a series of precise questions for a context related to the history of comic books.\n",
12
+ "\n",
13
+ "In this lab, you will fine-tune the model from that lab to give better answers for that type of context. To do that, you'll be using the [TyDi QA dataset](https://ai.google.com/research/tydiqa) but on a filtered version with only English examples. Additionally, you will use a lot of the tools that Hugging Face has to offer.\n",
14
+ "\n",
15
+ "You have to note that, in general, you will fine-tune general-purpose transformer models to work for specific tasks. However, fine-tuning a general-purpose model can take a lot of time. That's why you will be using the model from the question answering pipeline in this lab.\n",
16
+ "\n",
17
+ "Begin by importing some libraries and/or objects you will use throughout the lab:"
18
+ ]
19
+ },
20
+ {
21
+ "cell_type": "code",
22
+ "execution_count": null,
23
+ "metadata": {},
24
+ "outputs": [],
25
+ "source": [
26
+ "import os\n",
27
+ "os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'\n",
28
+ "\n",
29
+ "import numpy as np\n",
30
+ "\n",
31
+ "from datasets import load_from_disk\n",
32
+ "from transformers import AutoTokenizer, AutoModelForQuestionAnswering, Trainer, TrainingArguments\n",
33
+ "\n",
34
+ "from sklearn.metrics import f1_score"
35
+ ]
36
+ },
37
+ {
38
+ "cell_type": "markdown",
39
+ "metadata": {
40
+ "id": "FrEglXPmvpUr"
41
+ },
42
+ "source": [
43
+ "## Fine-tuning a BERT model\n",
44
+ "\n",
45
+ "As you saw in the previous lab, you can use these pipelines as they are. But sometimes, you'll need something more specific to your problem, or maybe you need it to perform better on your production data. In these cases, you'll need to fine-tune a model.\n",
46
+ "\n",
47
+ "Here, you'll fine-tune a pre-trained DistilBERT model on the TyDi QA dataset.\n",
48
+ "\n",
49
+ "To fine-tune your model, you will leverage three components provided by Hugging Face:\n",
50
+ "\n",
51
+ "* Datasets: Library that contains some datasets and different metrics to evaluate the performance of your models.\n",
52
+ "* Tokenizer: Object in charge of preprocessing your text to be given as input for the transformer models.\n",
53
+ "* Transformers: Library with the pre-trained model checkpoints and the trainer object.\n",
54
+ "\n"
55
+ ]
56
+ },
57
+ {
58
+ "cell_type": "markdown",
59
+ "metadata": {
60
+ "id": "g0Rg-e4jBFFs"
61
+ },
62
+ "source": [
63
+ "### Datasets\n",
64
+ "\n",
65
+ "To get the dataset to fine-tune your model, you will use [🤗 Datasets](https://huggingface.co/docs/datasets/), a lightweight and extensible library to share and access datasets and evaluation metrics for NLP easily. You can download Hugging Face datasets directly using the `load_dataset` function from the `datasets` library. \n",
66
+ "\n",
67
+ "Hugging Face `datasets` allows to load data in several formats, such as CSV, JSON, text files and even parquet. You can see more about the supported formats in the [documentation](https://huggingface.co/docs/datasets/loading)\n",
68
+ "\n",
69
+ "A common approach is to use `load_dataset` and get the full dataset but **for this lab you will use a filtered version containing only the English examples**, which is already saved in this environment. Since this filtered dataset is saved using the Apache Arrow format, you can read it by using the `load_from_disk` function.\n"
70
+ ]
71
+ },
72
+ {
73
+ "cell_type": "code",
74
+ "execution_count": null,
75
+ "metadata": {
76
+ "id": "x68dqaoXg5Ra"
77
+ },
78
+ "outputs": [],
79
+ "source": [
80
+ "#The path where the dataset is stored\n",
81
+ "path = './tydiqa_data/'\n",
82
+ "\n",
83
+ "#Load Dataset\n",
84
+ "tydiqa_data = load_from_disk(path)\n",
85
+ "\n",
86
+ "tydiqa_data"
87
+ ]
88
+ },
89
+ {
90
+ "cell_type": "markdown",
91
+ "metadata": {
92
+ "id": "1hfzBZU3T47O"
93
+ },
94
+ "source": [
95
+ "<a id='datasets_type'></a>\n",
96
+ "You can check below that the type of the loaded dataset is a `datasets.arrow_dataset.Dataset`. This object type corresponds to an Apache Arrow Table that allows creating a hash table that contains the position in memory where data is stored instead of loading the complete dataset into memory. But you don't have to worry too much about that. It is just an efficient way to work with lots of data."
97
+ ]
98
+ },
99
+ {
100
+ "cell_type": "code",
101
+ "execution_count": null,
102
+ "metadata": {
103
+ "id": "gkeppC3GQiW6"
104
+ },
105
+ "outputs": [],
106
+ "source": [
107
+ "# Checking the object type for one of the elements in the dataset\n",
108
+ "type(tydiqa_data['train'])"
109
+ ]
110
+ },
111
+ {
112
+ "cell_type": "markdown",
113
+ "metadata": {
114
+ "id": "q_HLaNtQaFlR"
115
+ },
116
+ "source": [
117
+ "You can also check the structure of the dataset:"
118
+ ]
119
+ },
120
+ {
121
+ "cell_type": "code",
122
+ "execution_count": null,
123
+ "metadata": {
124
+ "id": "2l9ANJTrbP-U"
125
+ },
126
+ "outputs": [],
127
+ "source": [
128
+ "tydiqa_data['train']"
129
+ ]
130
+ },
131
+ {
132
+ "cell_type": "markdown",
133
+ "metadata": {
134
+ "id": "2xRO1yIkvpUt"
135
+ },
136
+ "source": [
137
+ "You can see that each example is like a dictionary object. This dataset consists of questions, contexts, and indices that point to the start and end position of the answer inside the context. You can access the index using the `annotations` key, which is a kind of dictionary."
138
+ ]
139
+ },
140
+ {
141
+ "cell_type": "code",
142
+ "execution_count": null,
143
+ "metadata": {
144
+ "id": "KNVpW6lADk92"
145
+ },
146
+ "outputs": [],
147
+ "source": [
148
+ "idx = 600\n",
149
+ "\n",
150
+ "# start index\n",
151
+ "start_index = tydiqa_data['train'][idx]['annotations']['minimal_answers_start_byte'][0]\n",
152
+ "\n",
153
+ "# end index\n",
154
+ "end_index = tydiqa_data['train'][idx]['annotations']['minimal_answers_end_byte'][0]\n",
155
+ "\n",
156
+ "print(f\"Question: {tydiqa_data['train'][idx]['question_text']}\")\n",
157
+ "print(f\"\\nContext (truncated): {tydiqa_data['train'][idx]['document_plaintext'][0:512]} ...\")\n",
158
+ "print(f\"\\nAnswer: {tydiqa_data['train'][idx]['document_plaintext'][start_index:end_index]}\")"
159
+ ]
160
+ },
161
+ {
162
+ "cell_type": "markdown",
163
+ "metadata": {
164
+ "id": "Z-lZgDTEYm74"
165
+ },
166
+ "source": [
167
+ "The question answering model predicts a start and endpoint in the context to extract as the answer. That's why this NLP task is known as extractive question answering.\n",
168
+ "\n",
169
+ "To train your model, you need to pass start and endpoints as labels. So, you need to implement a function that extracts the start and end positions from the dataset.\n",
170
+ "\n",
171
+ "The dataset contains unanswerable questions. For these, the start and end indices for the answer are equal to `-1`."
172
+ ]
173
+ },
174
+ {
175
+ "cell_type": "code",
176
+ "execution_count": null,
177
+ "metadata": {
178
+ "id": "Ty_QDcdKYw9a"
179
+ },
180
+ "outputs": [],
181
+ "source": [
182
+ "tydiqa_data['train'][0]['annotations']"
183
+ ]
184
+ },
185
+ {
186
+ "cell_type": "markdown",
187
+ "metadata": {
188
+ "id": "lHWcNMudcAuO"
189
+ },
190
+ "source": [
191
+ "Now, you have to flatten the dataset to work with an object with a table structure instead of a dictionary structure. This step facilitates the pre-processing steps."
192
+ ]
193
+ },
194
+ {
195
+ "cell_type": "code",
196
+ "execution_count": null,
197
+ "metadata": {
198
+ "id": "xDCAQQtoCs_r"
199
+ },
200
+ "outputs": [],
201
+ "source": [
202
+ "# Flattening the datasets\n",
203
+ "flattened_train_data = tydiqa_data['train'].flatten()\n",
204
+ "flattened_test_data = tydiqa_data['validation'].flatten()"
205
+ ]
206
+ },
207
+ {
208
+ "cell_type": "markdown",
209
+ "metadata": {
210
+ "id": "q5wUa5xED0fK"
211
+ },
212
+ "source": [
213
+ "Also, to make the training more straightforward and faster, we will extract a subset of the train and test datasets. For that purpose, we will use the Hugging Face Dataset object's method called `select()`. This method allows you to take some data points by their index. Here, you will select the first 3000 rows but you can play with the number of data points, however, consider that this will increase the training time."
214
+ ]
215
+ },
216
+ {
217
+ "cell_type": "code",
218
+ "execution_count": null,
219
+ "metadata": {
220
+ "id": "BkcIhpEnDHSJ"
221
+ },
222
+ "outputs": [],
223
+ "source": [
224
+ "# Selecting a subset of the train dataset\n",
225
+ "flattened_train_data = flattened_train_data.select(range(3000))\n",
226
+ "\n",
227
+ "# Selecting a subset of the test dataset\n",
228
+ "flattened_test_data = flattened_test_data.select(range(1000))"
229
+ ]
230
+ },
231
+ {
232
+ "cell_type": "markdown",
233
+ "metadata": {
234
+ "id": "fBXrmwXhc13M"
235
+ },
236
+ "source": [
237
+ "### Tokenizers\n",
238
+ "\n",
239
+ "Now, you will use the [tokenizer](https://huggingface.co/transformers/main_classes/tokenizer.html) object from Hugging Face. You can load a tokenizer using different methods. Here, you will retrieve it from the pipeline object you created in the previous Hugging Face lab. With this tokenizer, you can ensure that the tokens you get for the dataset will match the tokens used in the original DistilBERT implementation.\n",
240
+ "\n",
241
+ "When loading a tokenizer with any method, you must pass the model checkpoint that you want to fine-tune. Here, you are using the`'distilbert-base-cased-distilled-squad'` checkpoint.\n"
242
+ ]
243
+ },
244
+ {
245
+ "cell_type": "code",
246
+ "execution_count": null,
247
+ "metadata": {
248
+ "id": "LInV3b_HyAIF"
249
+ },
250
+ "outputs": [],
251
+ "source": [
252
+ "# Import the AutoTokenizer from the transformers library\n",
253
+ "tokenizer = AutoTokenizer.from_pretrained(\"distilbert-base-cased-distilled-squad\")\n",
254
+ "\n",
255
+ "# Define max length of sequences in the tokenizer\n",
256
+ "tokenizer.model_max_length = 512"
257
+ ]
258
+ },
259
+ {
260
+ "cell_type": "markdown",
261
+ "metadata": {
262
+ "id": "qz6YtVcOh3qP"
263
+ },
264
+ "source": [
265
+ "Given the characteristics of the dataset and the question-answering task, you will need to add some steps to pre-process the data after the tokenization:\n",
266
+ "\n",
267
+ "1. When there is no answer to a question given a context, you will use the `CLS` token, a unique token used to represent the start of the sequence.\n",
268
+ "\n",
269
+ "2. Tokenizers can split a given string into substrings, resulting in a subtoken for each substring, creating misalignment between the list of dataset tags and the labels generated by the tokenizer. Therefore, you will need to align the start and end indices with the tokens associated with the target answer word.\n",
270
+ "\n",
271
+ "3. Finally, a tokenizer can truncate a very long sequence. So, if the start/end position of an answer is `None`, you will assume that it was truncated and assign the maximum length of the tokenizer to those positions.\n",
272
+ "\n",
273
+ "Those three steps are done within the `process_samples` function defined below."
274
+ ]
275
+ },
276
+ {
277
+ "cell_type": "code",
278
+ "execution_count": null,
279
+ "metadata": {
280
+ "id": "3l-r4wI06LU7"
281
+ },
282
+ "outputs": [],
283
+ "source": [
284
+ "# Processing samples using the 3 steps described above\n",
285
+ "def process_samples(sample):\n",
286
+ " tokenized_data = tokenizer(sample['document_plaintext'], sample['question_text'], truncation=\"only_first\", padding=\"max_length\")\n",
287
+ "\n",
288
+ " input_ids = tokenized_data[\"input_ids\"]\n",
289
+ "\n",
290
+ " # We will label impossible answers with the index of the CLS token.\n",
291
+ " cls_index = input_ids.index(tokenizer.cls_token_id)\n",
292
+ "\n",
293
+ " # If no answers are given, set the cls_index as answer.\n",
294
+ " if sample[\"annotations.minimal_answers_start_byte\"][0] == -1:\n",
295
+ " start_position = cls_index\n",
296
+ " end_position = cls_index\n",
297
+ " else:\n",
298
+ " # Start/end character index of the answer in the text.\n",
299
+ " gold_text = sample[\"document_plaintext\"][sample['annotations.minimal_answers_start_byte'][0]:sample['annotations.minimal_answers_end_byte'][0]]\n",
300
+ " start_char = sample[\"annotations.minimal_answers_start_byte\"][0]\n",
301
+ " end_char = sample['annotations.minimal_answers_end_byte'][0] #start_char + len(gold_text)\n",
302
+ "\n",
303
+ " # sometimes answers are off by a character or two – fix this\n",
304
+ " if sample['document_plaintext'][start_char-1:end_char-1] == gold_text:\n",
305
+ " start_char = start_char - 1\n",
306
+ " end_char = end_char - 1 # When the gold label is off by one character\n",
307
+ " elif sample['document_plaintext'][start_char-2:end_char-2] == gold_text:\n",
308
+ " start_char = start_char - 2\n",
309
+ " end_char = end_char - 2 # When the gold label is off by two characters\n",
310
+ "\n",
311
+ " start_token = tokenized_data.char_to_token(start_char)\n",
312
+ " end_token = tokenized_data.char_to_token(end_char - 1)\n",
313
+ "\n",
314
+ " # if start position is None, the answer passage has been truncated\n",
315
+ " if start_token is None:\n",
316
+ " start_token = tokenizer.model_max_length\n",
317
+ " if end_token is None:\n",
318
+ " end_token = tokenizer.model_max_length\n",
319
+ "\n",
320
+ " start_position = start_token\n",
321
+ " end_position = end_token\n",
322
+ "\n",
323
+ " return {'input_ids': tokenized_data['input_ids'],\n",
324
+ " 'attention_mask': tokenized_data['attention_mask'],\n",
325
+ " 'start_positions': start_position,\n",
326
+ " 'end_positions': end_position}\n"
327
+ ]
328
+ },
329
+ {
330
+ "cell_type": "markdown",
331
+ "metadata": {
332
+ "id": "Q3LAsWSyk_Rm"
333
+ },
334
+ "source": [
335
+ "To apply the `process_samples` function defined above to the whole dataset, you can use the `map` method as follows:"
336
+ ]
337
+ },
338
+ {
339
+ "cell_type": "code",
340
+ "execution_count": null,
341
+ "metadata": {
342
+ "id": "rGbYd7QnFetG"
343
+ },
344
+ "outputs": [],
345
+ "source": [
346
+ "# Tokenizing and processing the flattened dataset\n",
347
+ "processed_train_data = flattened_train_data.map(process_samples)\n",
348
+ "processed_test_data = flattened_test_data.map(process_samples)"
349
+ ]
350
+ },
351
+ {
352
+ "cell_type": "markdown",
353
+ "metadata": {
354
+ "id": "wCpPhYKJluMA"
355
+ },
356
+ "source": [
357
+ "# Transformers\n",
358
+ "\n",
359
+ "The last component of Hugging Face that is useful for fine-tuning a transformer corresponds to the pre-trained models you can access in multiple ways.\n",
360
+ "\n",
361
+ "For this lab, you will use the same model from the question-answering pipeline that you loaded in the previous lab."
362
+ ]
363
+ },
364
+ {
365
+ "cell_type": "code",
366
+ "execution_count": null,
367
+ "metadata": {
368
+ "id": "jR3VqjNc1Vb3"
369
+ },
370
+ "outputs": [],
371
+ "source": [
372
+ "# Import the AutoModelForQuestionAnswering for the pre-trained model. You will only fine tune the head of the model\n",
373
+ "model = AutoModelForQuestionAnswering.from_pretrained(\"distilbert-base-cased-distilled-squad\")"
374
+ ]
375
+ },
376
+ {
377
+ "cell_type": "markdown",
378
+ "metadata": {
379
+ "id": "K29BYtnsm1yH"
380
+ },
381
+ "source": [
382
+ "Now, you can take the necessary columns from the datasets to train/test and return them as Pytorch Tensors."
383
+ ]
384
+ },
385
+ {
386
+ "cell_type": "code",
387
+ "execution_count": null,
388
+ "metadata": {
389
+ "id": "0X14G89noLfW"
390
+ },
391
+ "outputs": [],
392
+ "source": [
393
+ "columns_to_return = ['input_ids','attention_mask', 'start_positions', 'end_positions']\n",
394
+ "\n",
395
+ "processed_train_data.set_format(type='pt', columns=columns_to_return)\n",
396
+ "processed_test_data.set_format(type='pt', columns=columns_to_return)"
397
+ ]
398
+ },
399
+ {
400
+ "cell_type": "markdown",
401
+ "metadata": {
402
+ "id": "yjoUFWu_nLRq"
403
+ },
404
+ "source": [
405
+ "Here, we give you the F1 score as a metric to evaluate your model's performance. We will use this metric for simplicity, although it is based on the start and end values predicted by the model. If you want to dig deeper on other metrics that can be used for a question and answering task, you can also check [this colab notebook resource](https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/question_answering.ipynb) from the Hugging Face team."
406
+ ]
407
+ },
408
+ {
409
+ "cell_type": "code",
410
+ "execution_count": null,
411
+ "metadata": {
412
+ "id": "xcW2wPnirsJk"
413
+ },
414
+ "outputs": [],
415
+ "source": [
416
+ "def compute_f1_metrics(pred):\n",
417
+ " start_labels = pred.label_ids[0]\n",
418
+ " start_preds = pred.predictions[0].argmax(-1)\n",
419
+ " end_labels = pred.label_ids[1]\n",
420
+ " end_preds = pred.predictions[1].argmax(-1)\n",
421
+ "\n",
422
+ " f1_start = f1_score(start_labels, start_preds, average='macro')\n",
423
+ " f1_end = f1_score(end_labels, end_preds, average='macro')\n",
424
+ "\n",
425
+ " return {\n",
426
+ " 'f1_start': f1_start,\n",
427
+ " 'f1_end': f1_end,\n",
428
+ " }"
429
+ ]
430
+ },
431
+ {
432
+ "cell_type": "markdown",
433
+ "metadata": {
434
+ "id": "KuhASU4evpUu"
435
+ },
436
+ "source": [
437
+ "Now, you will use the Hugging Face [Trainer](https://huggingface.co/transformers/main_classes/trainer.html) to fine-tune your model."
438
+ ]
439
+ },
440
+ {
441
+ "cell_type": "code",
442
+ "execution_count": null,
443
+ "metadata": {
444
+ "colab": {
445
+ "background_save": true
446
+ },
447
+ "id": "nxyOwf5utXAt"
448
+ },
449
+ "outputs": [],
450
+ "source": [
451
+ "# Training hyperparameters\n",
452
+ "training_args = TrainingArguments(\n",
453
+ " output_dir='model_results', # output directory\n",
454
+ " overwrite_output_dir=True,\n",
455
+ " num_train_epochs=3, # total number of training epochs\n",
456
+ " per_device_train_batch_size=8, # batch size per device during training\n",
457
+ " per_device_eval_batch_size=8, # batch size for evaluation\n",
458
+ " warmup_steps=20, # number of warmup steps for learning rate scheduler\n",
459
+ " weight_decay=0.01, # strength of weight decay\n",
460
+ " logging_steps=50\n",
461
+ ")\n",
462
+ "\n",
463
+ "# Trainer object\n",
464
+ "trainer = Trainer(\n",
465
+ " model=model, # the instantiated 🤗 Transformers model to be trained\n",
466
+ " args=training_args, # training arguments, defined above\n",
467
+ " train_dataset=processed_train_data, # training dataset\n",
468
+ " eval_dataset=processed_test_data, # evaluation dataset\n",
469
+ " compute_metrics=compute_f1_metrics\n",
470
+ ")\n",
471
+ "\n",
472
+ "# Training loop\n",
473
+ "trainer.train()"
474
+ ]
475
+ },
476
+ {
477
+ "cell_type": "markdown",
478
+ "metadata": {
479
+ "id": "Ic_wNlBHCRMn"
480
+ },
481
+ "source": [
482
+ "And, in the next cell, you can evaluate the fine-tuned model's performance on the test set."
483
+ ]
484
+ },
485
+ {
486
+ "cell_type": "code",
487
+ "execution_count": null,
488
+ "metadata": {
489
+ "id": "92N11A076wRA"
490
+ },
491
+ "outputs": [],
492
+ "source": [
493
+ "trainer.evaluate(processed_test_data)"
494
+ ]
495
+ },
496
+ {
497
+ "cell_type": "markdown",
498
+ "metadata": {
499
+ "id": "_HubPkRbnzh_"
500
+ },
501
+ "source": [
502
+ "### Using your Fine-Tuned Model\n",
503
+ "\n",
504
+ "After training and evaluating your fine-tuned model, you can check its results for the same questions from the previous lab.\n",
505
+ "\n",
506
+ "For that, you will tell Pytorch to use your GPU or your CPU to run the model. Additionally, you will need to tokenize your input context and questions. Finally, you need to post-process the output results to transform them from tokens to human-readable strings using the `tokenizer`."
507
+ ]
508
+ },
509
+ {
510
+ "cell_type": "code",
511
+ "execution_count": null,
512
+ "metadata": {},
513
+ "outputs": [],
514
+ "source": [
515
+ "text = r\"\"\"\n",
516
+ "The Golden Age of Comic Books describes an era of American comic books from the\n",
517
+ "late 1930s to circa 1950. During this time, modern comic books were first published\n",
518
+ "and rapidly increased in popularity. The superhero archetype was created and many\n",
519
+ "well-known characters were introduced, including Superman, Batman, Captain Marvel\n",
520
+ "(later known as SHAZAM!), Captain America, and Wonder Woman.\n",
521
+ "Between 1939 and 1941 Detective Comics and its sister company, All-American Publications,\n",
522
+ "introduced popular superheroes such as Batman and Robin, Wonder Woman, the Flash,\n",
523
+ "Green Lantern, Doctor Fate, the Atom, Hawkman, Green Arrow and Aquaman.[7] Timely Comics,\n",
524
+ "the 1940s predecessor of Marvel Comics, had million-selling titles featuring the Human Torch,\n",
525
+ "the Sub-Mariner, and Captain America.[8]\n",
526
+ "As comic books grew in popularity, publishers began launching titles that expanded\n",
527
+ "into a variety of genres. Dell Comics' non-superhero characters (particularly the\n",
528
+ "licensed Walt Disney animated-character comics) outsold the superhero comics of the day.[12]\n",
529
+ "The publisher featured licensed movie and literary characters such as Mickey Mouse, Donald Duck,\n",
530
+ "Roy Rogers and Tarzan.[13] It was during this era that noted Donald Duck writer-artist\n",
531
+ "Carl Barks rose to prominence.[14] Additionally, MLJ's introduction of Archie Andrews\n",
532
+ "in Pep Comics #22 (December 1941) gave rise to teen humor comics,[15] with the Archie\n",
533
+ "Andrews character remaining in print well into the 21st century.[16]\n",
534
+ "At the same time in Canada, American comic books were prohibited importation under\n",
535
+ "the War Exchange Conservation Act[17] which restricted the importation of non-essential\n",
536
+ "goods. As a result, a domestic publishing industry flourished during the duration\n",
537
+ "of the war which were collectively informally called the Canadian Whites.\n",
538
+ "The educational comic book Dagwood Splits the Atom used characters from the comic\n",
539
+ "strip Blondie.[18] According to historian Michael A. Amundson, appealing comic-book\n",
540
+ "characters helped ease young readers' fear of nuclear war and neutralize anxiety\n",
541
+ "about the questions posed by atomic power.[19] It was during this period that long-running\n",
542
+ "humor comics debuted, including EC's Mad and Carl Barks' Uncle Scrooge in Dell's Four\n",
543
+ "Color Comics (both in 1952).[20][21]\n",
544
+ "\"\"\"\n",
545
+ "\n",
546
+ "questions = [\"What superheroes were introduced between 1939 and 1941 by Detective Comics and its sister company?\",\n",
547
+ " \"What comic book characters were created between 1939 and 1941?\",\n",
548
+ " \"What well-known characters were created between 1939 and 1941?\",\n",
549
+ " \"What well-known superheroes were introduced between 1939 and 1941 by Detective Comics?\"]\n",
550
+ "\n",
551
+ "for question in questions:\n",
552
+ " inputs = tokenizer.encode_plus(question, text, return_tensors=\"pt\")\n",
553
+ "\n",
554
+ " input_ids = inputs[\"input_ids\"].tolist()[0]\n",
555
+ " inputs.to(\"cuda\")\n",
556
+ "\n",
557
+ " text_tokens = tokenizer.convert_ids_to_tokens(input_ids)\n",
558
+ " answer_model = model(**inputs)\n",
559
+ " \n",
560
+ " start_logits = answer_model['start_logits'].cpu().detach().numpy()\n",
561
+ "\n",
562
+ " answer_start = np.argmax(start_logits) \n",
563
+ " \n",
564
+ " end_logits = answer_model['end_logits'].cpu().detach().numpy()\n",
565
+ " \n",
566
+ " # Get the most likely beginning of answer with the argmax of the score\n",
567
+ " answer_end = np.argmax(end_logits) + 1 # Get the most likely end of answer with the argmax of the score\n",
568
+ "\n",
569
+ " answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))\n",
570
+ "\n",
571
+ " print(f\"Question: {question}\")\n",
572
+ " print(f\"Answer: {answer}\\n\")\n"
573
+ ]
574
+ },
575
+ {
576
+ "cell_type": "markdown",
577
+ "metadata": {
578
+ "id": "_yTDQ6kn6pWS"
579
+ },
580
+ "source": [
581
+ "By fine-tuning the model for only 3 epochs you can already see an improvement!\n",
582
+ "\n",
583
+ "You can compare those results with those obtained using the base model (without fine-tuning), as you did in the previous lab. As a reminder, here are those results:\n",
584
+ "\n",
585
+ "```\n",
586
+ "What popular superheroes were introduced between 1939 and 1941?\n",
587
+ ">> teen humor comics\n",
588
+ "What superheroes were introduced between 1939 and 1941 by Detective Comics and its sister company?\n",
589
+ ">> Archie Andrews\n",
590
+ "What comic book characters were created between 1939 and 1941?\n",
591
+ ">> Archie\n",
592
+ "Andrews\n",
593
+ "What well-known characters were created between 1939 and 1941?\n",
594
+ ">> Archie\n",
595
+ "Andrews\n",
596
+ "What well-known superheroes were introduced between 1939 and 1941 by Detective Comics?\n",
597
+ ">> Archie Andrews\n",
598
+ "```"
599
+ ]
600
+ },
601
+ {
602
+ "cell_type": "markdown",
603
+ "metadata": {
604
+ "id": "uf-v8mUSLqXN"
605
+ },
606
+ "source": [
607
+ "**Congratulations!**\n",
608
+ "\n",
609
+ "You have finished this series of ungraded labs. You were able to:\n",
610
+ "\n",
611
+ "* Explore the Hugging Face Pipelines, which can be used right out of the bat.\n",
612
+ "\n",
613
+ "* Fine-tune a model for the Extractive Question & Answering task.\n",
614
+ "\n",
615
+ "We also recommend you go through the free [Hugging Face course](https://huggingface.co/course/chapter1) to explore their ecosystem in more detail and find different ways to use the `transformers` library."
616
+ ]
617
+ }
618
+ ],
619
+ "metadata": {
620
+ "accelerator": "GPU",
621
+ "colab": {
622
+ "provenance": []
623
+ },
624
+ "kernelspec": {
625
+ "display_name": "Python 3 (ipykernel)",
626
+ "language": "python",
627
+ "name": "python3"
628
+ },
629
+ "language_info": {
630
+ "codemirror_mode": {
631
+ "name": "ipython",
632
+ "version": 3
633
+ },
634
+ "file_extension": ".py",
635
+ "mimetype": "text/x-python",
636
+ "name": "python",
637
+ "nbconvert_exporter": "python",
638
+ "pygments_lexer": "ipython3",
639
+ "version": "3.8.10"
640
+ }
641
+ },
642
+ "nbformat": 4,
643
+ "nbformat_minor": 1
644
+ }