MuzzammilShah commited on
Commit
fc3fc2e
·
verified ·
1 Parent(s): 15409b3

Initial commit for files

Browse files
Files changed (5) hide show
  1. A-Main-Notebook.ipynb +0 -0
  2. B-Main-Notebook.ipynb +549 -0
  3. C-Main-Notebook.ipynb +530 -0
  4. README.md +67 -0
  5. names.txt +0 -0
A-Main-Notebook.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
B-Main-Notebook.ipynb ADDED
@@ -0,0 +1,549 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 23,
6
+ "metadata": {},
7
+ "outputs": [],
8
+ "source": [
9
+ "words = open('names.txt', 'r').read().splitlines()"
10
+ ]
11
+ },
12
+ {
13
+ "cell_type": "code",
14
+ "execution_count": 24,
15
+ "metadata": {},
16
+ "outputs": [],
17
+ "source": [
18
+ "import torch\n",
19
+ "\n",
20
+ "N = torch.zeros((27, 27), dtype = torch.int32)\n",
21
+ "\n",
22
+ "chars = sorted(list(set(''.join(words))))\n",
23
+ "\n",
24
+ "stoi = {s:i+1 for i,s in enumerate(chars)}\n",
25
+ "stoi['.'] = 0\n",
26
+ "\n",
27
+ "itos = {i:s for s,i in stoi.items()}"
28
+ ]
29
+ },
30
+ {
31
+ "cell_type": "code",
32
+ "execution_count": 25,
33
+ "metadata": {},
34
+ "outputs": [],
35
+ "source": [
36
+ "P = N.float()\n",
37
+ "P /= P.sum(1, keepdim=True)"
38
+ ]
39
+ },
40
+ {
41
+ "cell_type": "code",
42
+ "execution_count": 26,
43
+ "metadata": {},
44
+ "outputs": [
45
+ {
46
+ "name": "stdout",
47
+ "output_type": "stream",
48
+ "text": [
49
+ ". e\n",
50
+ "e m\n",
51
+ "m m\n",
52
+ "m a\n",
53
+ "a .\n"
54
+ ]
55
+ }
56
+ ],
57
+ "source": [
58
+ "#Creating the training set of bigrams (x,y)\n",
59
+ "xs, ys = [], []\n",
60
+ "\n",
61
+ "for word in words[:1]:\n",
62
+ " chs = ['.'] + list(word) + ['.']\n",
63
+ " for ch1, ch2 in zip(chs, chs[1:]):\n",
64
+ " ix1 = stoi[ch1]\n",
65
+ " ix2 = stoi[ch2]\n",
66
+ " print(ch1, ch2)\n",
67
+ " xs.append(ix1)\n",
68
+ " ys.append(ix2)\n",
69
+ "\n",
70
+ "xs = torch.tensor(xs)\n",
71
+ "ys = torch.tensor(ys)"
72
+ ]
73
+ },
74
+ {
75
+ "cell_type": "code",
76
+ "execution_count": 5,
77
+ "metadata": {},
78
+ "outputs": [
79
+ {
80
+ "data": {
81
+ "text/plain": [
82
+ "tensor([ 0, 5, 13, 13, 1])"
83
+ ]
84
+ },
85
+ "execution_count": 5,
86
+ "metadata": {},
87
+ "output_type": "execute_result"
88
+ }
89
+ ],
90
+ "source": [
91
+ "xs"
92
+ ]
93
+ },
94
+ {
95
+ "cell_type": "code",
96
+ "execution_count": 6,
97
+ "metadata": {},
98
+ "outputs": [
99
+ {
100
+ "data": {
101
+ "text/plain": [
102
+ "tensor([ 5, 13, 13, 1, 0])"
103
+ ]
104
+ },
105
+ "execution_count": 6,
106
+ "metadata": {},
107
+ "output_type": "execute_result"
108
+ }
109
+ ],
110
+ "source": [
111
+ "ys"
112
+ ]
113
+ },
114
+ {
115
+ "cell_type": "code",
116
+ "execution_count": 18,
117
+ "metadata": {},
118
+ "outputs": [
119
+ {
120
+ "data": {
121
+ "text/plain": [
122
+ "tensor([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
123
+ " 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
124
+ " [0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
125
+ " 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
126
+ " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.,\n",
127
+ " 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
128
+ " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.,\n",
129
+ " 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
130
+ " [0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
131
+ " 0., 0., 0., 0., 0., 0., 0., 0., 0.]])"
132
+ ]
133
+ },
134
+ "execution_count": 18,
135
+ "metadata": {},
136
+ "output_type": "execute_result"
137
+ }
138
+ ],
139
+ "source": [
140
+ "#Feeding these examples into a neural network\n",
141
+ "import torch.nn.functional as F\n",
142
+ "xenc = F.one_hot(xs, num_classes=27).float() #IMP: manual type casting\n",
143
+ "xenc"
144
+ ]
145
+ },
146
+ {
147
+ "cell_type": "code",
148
+ "execution_count": 20,
149
+ "metadata": {},
150
+ "outputs": [
151
+ {
152
+ "data": {
153
+ "text/plain": [
154
+ "torch.Size([5, 27])"
155
+ ]
156
+ },
157
+ "execution_count": 20,
158
+ "metadata": {},
159
+ "output_type": "execute_result"
160
+ }
161
+ ],
162
+ "source": [
163
+ "xenc.shape"
164
+ ]
165
+ },
166
+ {
167
+ "cell_type": "code",
168
+ "execution_count": 16,
169
+ "metadata": {},
170
+ "outputs": [],
171
+ "source": [
172
+ "import matplotlib.pyplot as plt"
173
+ ]
174
+ },
175
+ {
176
+ "cell_type": "code",
177
+ "execution_count": 21,
178
+ "metadata": {},
179
+ "outputs": [
180
+ {
181
+ "data": {
182
+ "text/plain": [
183
+ "<matplotlib.image.AxesImage at 0x24c6d3e5ae0>"
184
+ ]
185
+ },
186
+ "execution_count": 21,
187
+ "metadata": {},
188
+ "output_type": "execute_result"
189
+ },
190
+ {
191
+ "data": {
192
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAhYAAACHCAYAAABK4hAcAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAN2klEQVR4nO3df2hV9ePH8dfd2q4/urs6137cNufUUmpukrolkgkbTgvJ9A8r/1hDjOoqzlHJAl1CsDAIqSQjKP/xV0ImyQdDlpsE8wcTMaH21SFfr8xtKR/vdOZcu+/PH3263+9Nnd7tvXt2r88HHLj33Df3vHjzlr0899x7XMYYIwAAAAuSnA4AAAASB8UCAABYQ7EAAADWUCwAAIA1FAsAAGANxQIAAFhDsQAAANY8EsuDhUIhtbe3y+PxyOVyxfLQAABgkIwxun79unw+n5KSBj4nEdNi0d7erry8vFgeEgAAWBIIBJSbmzvgmJgWC4/HI0n631OTlPbo0D6FefnJGTYiAQCA+/hTffpZ/wr/HR9ITIvF3x9/pD2apDTP0IrFI64UG5EAAMD9/PfmHw9yGQMXbwIAAGsoFgAAwBqKBQAAsGZQxWLbtm2aNGmSRo0apdLSUp04ccJ2LgAAEIeiLhZ79+5VTU2N6urqdOrUKRUXF6uiokJdXV3DkQ8AAMSRqIvFJ598otWrV6uqqkpPPfWUtm/frjFjxujrr78ejnwAACCORFUsbt++rZaWFpWXl//fGyQlqby8XM3NzXeM7+3tVXd3d8QGAAASV1TF4sqVK+rv71dWVlbE/qysLHV0dNwxvr6+Xl6vN7zxq5sAACS2Yf1WSG1trYLBYHgLBALDeTgAAOCwqH55MyMjQ8nJyers7IzY39nZqezs7DvGu91uud3uoSUEAABxI6ozFqmpqZo1a5YaGhrC+0KhkBoaGjR37lzr4QAAQHyJ+l4hNTU1qqys1OzZs1VSUqKtW7eqp6dHVVVVw5EPAADEkaiLxYoVK/T7779r06ZN6ujo0MyZM3Xo0KE7LugEAAAPH5cxxsTqYN3d3fJ6vfr3/0we8t1NK3wz7YQCAAAD+tP0qVEHFAwGlZaWNuBY7hUCAACsifqjEBtefnKGHnGlOHHoh86P7aetvA9niAAAD4IzFgAAwBqKBQAAsIZiAQAArKFYAAAAaygWAADAGooFAACwhmIBAACsoVgAAABrKBYAAMAaigUAALCGYgEAAKyhWAAAAGsoFgAAwBqKBQAAsIZiAQAArKFYAAAAaygWAADAGooFAACw5hGnA2B4VfhmOh0BCeLH9tNW3oc1CSQ2zlgAAABrKBYAAMAaigUAALCGYgEAAKyJqljU19drzpw58ng8yszM1NKlS9Xa2jpc2QAAQJyJqlg0NTXJ7/fr2LFjOnz4sPr6+rRw4UL19PQMVz4AABBHovq66aFDhyKe79ixQ5mZmWppadH8+fOtBgMAAPFnSL9jEQwGJUnp6el3fb23t1e9vb3h593d3UM5HAAAGOEGffFmKBRSdXW15s2bp8LCwruOqa+vl9frDW95eXmDDgoAAEa+QRcLv9+vs2fPas+ePfccU1tbq2AwGN4CgcBgDwcAAOLAoD4KWbNmjQ4ePKijR48qNzf3nuPcbrfcbvegwwEAgPgSVbEwxmjt2rXav3+/GhsbVVBQMFy5AABAHIqqWPj9fu3atUsHDhyQx+NRR0eHJMnr9Wr06NHDEhAAAMSPqK6x+OKLLxQMBrVgwQLl5OSEt7179w5XPgAAEEei/igEAADgXrhXCAAAsIZiAQAArKFYAAAAaygWAADAGooFAACwhmIBAACsoVgAAABrKBYAAMAaigUAALCGYgEAAKyhWAAAAGsoFgAAwBqKBQAAsIZiAQAArKFYAAAAaygWAADAGooFAACwhmIBAACsoVgAAABrKBYAAMAaigUAALDmEacDDNaP7aetvVeFb6a19wISFf9OADwIzlgAAABrKBYAAMAaigUAALCGYgEAAKwZUrH46KOP5HK5VF1dbSkOAACIZ4MuFidPntSXX36poqIim3kAAEAcG1SxuHHjhlauXKmvvvpK48ePt50JAADEqUEVC7/frxdffFHl5eUDjuvt7VV3d3fEBgAAElfUP5C1Z88enTp1SidPnrzv2Pr6em3evHlQwQAAQPyJ6oxFIBDQunXrtHPnTo0aNeq+42traxUMBsNbIBAYdFAAADDyRXXGoqWlRV1dXXrmmWfC+/r7+3X06FF9/vnn6u3tVXJycvg1t9stt9ttLy0AABjRoioWZWVl+uWXXyL2VVVVafr06dqwYUNEqQAAAA+fqIqFx+NRYWFhxL6xY8dqwoQJd+wHAAAPH355EwAAWDPk26Y3NjZaiAEAABIBZywAAIA1Qz5jEQ1jjCTpT/VJZmjv1X09ZCHRX/40fdbeCwCARPOn/vo7+fff8YG4zIOMsuTSpUvKy8uL1eEAAIBFgUBAubm5A46JabEIhUJqb2+Xx+ORy+W657ju7m7l5eUpEAgoLS0tVvEeWsx37DDXscV8xxbzHVuxnG9jjK5fvy6fz6ekpIGvoojpRyFJSUn3bTr/X1paGoszhpjv2GGuY4v5ji3mO7ZiNd9er/eBxnHxJgAAsIZiAQAArBmRxcLtdquuro77jMQI8x07zHVsMd+xxXzH1kid75hevAkAABLbiDxjAQAA4hPFAgAAWEOxAAAA1lAsAACANRQLAABgzYgrFtu2bdOkSZM0atQolZaW6sSJE05HSkgffPCBXC5XxDZ9+nSnYyWMo0ePasmSJfL5fHK5XPr+++8jXjfGaNOmTcrJydHo0aNVXl6uc+fOORM2Adxvvl9//fU71vuiRYucCRvn6uvrNWfOHHk8HmVmZmrp0qVqbW2NGHPr1i35/X5NmDBBjz76qJYvX67Ozk6HEse3B5nvBQsW3LG+33zzTYcSj7BisXfvXtXU1Kiurk6nTp1ScXGxKioq1NXV5XS0hPT000/r8uXL4e3nn392OlLC6OnpUXFxsbZt23bX17ds2aJPP/1U27dv1/HjxzV27FhVVFTo1q1bMU6aGO4335K0aNGiiPW+e/fuGCZMHE1NTfL7/Tp27JgOHz6svr4+LVy4UD09PeEx69ev1w8//KB9+/apqalJ7e3tWrZsmYOp49eDzLckrV69OmJ9b9myxaHEkswIUlJSYvx+f/h5f3+/8fl8pr6+3sFUiamurs4UFxc7HeOhIMns378//DwUCpns7Gzz8ccfh/ddu3bNuN1us3v3bgcSJpZ/zrcxxlRWVpqXXnrJkTyJrqury0gyTU1Nxpi/1nJKSorZt29feMyvv/5qJJnm5manYiaMf863McY8//zzZt26dc6F+ocRc8bi9u3bamlpUXl5eXhfUlKSysvL1dzc7GCyxHXu3Dn5fD5NnjxZK1eu1MWLF52O9FC4cOGCOjo6Ita61+tVaWkpa30YNTY2KjMzU9OmTdNbb72lq1evOh0pIQSDQUlSenq6JKmlpUV9fX0R63v69OmaOHEi69uCf87333bu3KmMjAwVFhaqtrZWN2/edCKepBjf3XQgV65cUX9/v7KysiL2Z2Vl6bfffnMoVeIqLS3Vjh07NG3aNF2+fFmbN2/Wc889p7Nnz8rj8TgdL6F1dHRI0l3X+t+vwa5FixZp2bJlKigoUFtbm95//30tXrxYzc3NSk5Odjpe3AqFQqqurta8efNUWFgo6a/1nZqaqnHjxkWMZX0P3d3mW5Jee+015efny+fz6cyZM9qwYYNaW1v13XffOZJzxBQLxNbixYvDj4uKilRaWqr8/Hx9++23WrVqlYPJAPteeeWV8OMZM2aoqKhIU6ZMUWNjo8rKyhxMFt/8fr/Onj3L9Vkxcq/5fuONN8KPZ8yYoZycHJWVlamtrU1TpkyJdcyRc/FmRkaGkpOT77hyuLOzU9nZ2Q6leniMGzdOTz75pM6fP+90lIT393pmrTtn8uTJysjIYL0PwZo1a3Tw4EEdOXJEubm54f3Z2dm6ffu2rl27FjGe9T0095rvuyktLZUkx9b3iCkWqampmjVrlhoaGsL7QqGQGhoaNHfuXAeTPRxu3LihtrY25eTkOB0l4RUUFCg7OztirXd3d+v48eOs9Ri5dOmSrl69ynofBGOM1qxZo/379+unn35SQUFBxOuzZs1SSkpKxPpubW3VxYsXWd+DcL/5vpvTp09LkmPre0R9FFJTU6PKykrNnj1bJSUl2rp1q3p6elRVVeV0tITzzjvvaMmSJcrPz1d7e7vq6uqUnJysV1991eloCeHGjRsR/1u4cOGCTp8+rfT0dE2cOFHV1dX68MMP9cQTT6igoEAbN26Uz+fT0qVLnQsdxwaa7/T0dG3evFnLly9Xdna22tra9N5772nq1KmqqKhwMHV88vv92rVrlw4cOCCPxxO+bsLr9Wr06NHyer1atWqVampqlJ6errS0NK1du1Zz587Vs88+63D6+HO/+W5ra9OuXbv0wgsvaMKECTpz5ozWr1+v+fPnq6ioyJnQTn8t5Z8+++wzM3HiRJOammpKSkrMsWPHnI6UkFasWGFycnJMamqqefzxx82KFSvM+fPnnY6VMI4cOWIk3bFVVlYaY/76yunGjRtNVlaWcbvdpqyszLS2tjobOo4NNN83b940CxcuNI899phJSUkx+fn5ZvXq1aajo8Pp2HHpbvMsyXzzzTfhMX/88Yd5++23zfjx482YMWPMyy+/bC5fvuxc6Dh2v/m+ePGimT9/vklPTzdut9tMnTrVvPvuuyYYDDqW2fXf4AAAAEM2Yq6xAAAA8Y9iAQAArKFYAAAAaygWAADAGooFAACwhmIBAACsoVgAAABrKBYAAMAaigUAALCGYgEAAKyhWAAAAGv+A6sEjbDe9GoiAAAAAElFTkSuQmCC",
193
+ "text/plain": [
194
+ "<Figure size 640x480 with 1 Axes>"
195
+ ]
196
+ },
197
+ "metadata": {},
198
+ "output_type": "display_data"
199
+ }
200
+ ],
201
+ "source": [
202
+ "plt.imshow(xenc)"
203
+ ]
204
+ },
205
+ {
206
+ "cell_type": "code",
207
+ "execution_count": null,
208
+ "metadata": {},
209
+ "outputs": [
210
+ {
211
+ "data": {
212
+ "text/plain": [
213
+ "tensor([[ 0.5838, -0.8614, 0.1874, -0.5662, 0.2449, 1.4738, 1.8403, 0.3233,\n",
214
+ " 1.0014, 0.0263, -0.5269, -0.8413, 0.0329, -0.0670, -0.7272, -0.2977,\n",
215
+ " -0.5083, 0.1050, -0.5482, 1.0237, 1.2359, 1.6366, -1.6188, 0.3283,\n",
216
+ " 0.7180, -0.9729, -1.5425],\n",
217
+ " [ 1.4868, -0.0457, 0.2224, 1.5423, -0.0151, -0.2254, 0.7613, -0.4738,\n",
218
+ " -0.2175, -0.9024, 0.0148, 0.6673, -0.1291, -1.4357, 0.2100, -0.5559,\n",
219
+ " -0.0711, -0.1631, 0.1704, 0.5689, -1.2534, -0.0207, 0.2485, 0.9525,\n",
220
+ " 0.1465, 0.1339, 0.1875],\n",
221
+ " [-0.3253, 0.6007, 1.3449, 0.0990, -0.6273, 0.4972, -0.2262, 0.4910,\n",
222
+ " -1.6546, 0.5298, -0.3165, -0.7659, 0.9075, -0.4458, 0.9129, -2.7461,\n",
223
+ " 0.0098, 0.9013, 0.7363, -0.7745, -0.8155, 1.5463, 0.0723, -0.5926,\n",
224
+ " -0.2548, 0.4572, -0.9398],\n",
225
+ " [-0.3253, 0.6007, 1.3449, 0.0990, -0.6273, 0.4972, -0.2262, 0.4910,\n",
226
+ " -1.6546, 0.5298, -0.3165, -0.7659, 0.9075, -0.4458, 0.9129, -2.7461,\n",
227
+ " 0.0098, 0.9013, 0.7363, -0.7745, -0.8155, 1.5463, 0.0723, -0.5926,\n",
228
+ " -0.2548, 0.4572, -0.9398],\n",
229
+ " [-0.6620, 0.3081, 0.4002, 1.4361, -0.9089, -0.3304, 0.1364, -1.0887,\n",
230
+ " 0.6219, 0.6222, -0.6723, 0.9616, -0.4970, 0.2513, -0.2499, 1.1944,\n",
231
+ " 0.7755, 1.2483, 0.8315, -0.1463, 0.2847, -0.4837, -0.7275, -2.0723,\n",
232
+ " -2.0994, -0.3072, -1.8622]])"
233
+ ]
234
+ },
235
+ "execution_count": 19,
236
+ "metadata": {},
237
+ "output_type": "execute_result"
238
+ }
239
+ ],
240
+ "source": [
241
+ "W = torch.randn((27, 27)) #Generating the weights\n",
242
+ "xenc @ W #Doing matrix multiplication"
243
+ ]
244
+ },
245
+ {
246
+ "cell_type": "code",
247
+ "execution_count": null,
248
+ "metadata": {},
249
+ "outputs": [
250
+ {
251
+ "data": {
252
+ "text/plain": [
253
+ "tensor(-0.4458)"
254
+ ]
255
+ },
256
+ "execution_count": 20,
257
+ "metadata": {},
258
+ "output_type": "execute_result"
259
+ }
260
+ ],
261
+ "source": [
262
+ "#Checking for one element\n",
263
+ "(xenc @ W)[3, 13]"
264
+ ]
265
+ },
266
+ {
267
+ "cell_type": "code",
268
+ "execution_count": null,
269
+ "metadata": {},
270
+ "outputs": [
271
+ {
272
+ "data": {
273
+ "text/plain": [
274
+ "tensor(-0.4458)"
275
+ ]
276
+ },
277
+ "execution_count": 21,
278
+ "metadata": {},
279
+ "output_type": "execute_result"
280
+ }
281
+ ],
282
+ "source": [
283
+ "#Doing manual multiplication for verifying\n",
284
+ "(xenc[3] * W[:,13]).sum()"
285
+ ]
286
+ },
287
+ {
288
+ "cell_type": "code",
289
+ "execution_count": null,
290
+ "metadata": {},
291
+ "outputs": [
292
+ {
293
+ "data": {
294
+ "text/plain": [
295
+ "tensor([[0.0415, 0.0098, 0.0279, 0.0132, 0.0296, 0.1012, 0.1459, 0.0320, 0.0631,\n",
296
+ " 0.0238, 0.0137, 0.0100, 0.0239, 0.0217, 0.0112, 0.0172, 0.0139, 0.0257,\n",
297
+ " 0.0134, 0.0645, 0.0797, 0.1190, 0.0046, 0.0322, 0.0475, 0.0088, 0.0050],\n",
298
+ " [0.1218, 0.0263, 0.0344, 0.1287, 0.0271, 0.0220, 0.0589, 0.0171, 0.0221,\n",
299
+ " 0.0112, 0.0279, 0.0537, 0.0242, 0.0066, 0.0340, 0.0158, 0.0256, 0.0234,\n",
300
+ " 0.0326, 0.0486, 0.0079, 0.0270, 0.0353, 0.0714, 0.0319, 0.0315, 0.0332],\n",
301
+ " [0.0199, 0.0501, 0.1055, 0.0303, 0.0147, 0.0452, 0.0219, 0.0449, 0.0053,\n",
302
+ " 0.0467, 0.0200, 0.0128, 0.0681, 0.0176, 0.0685, 0.0018, 0.0278, 0.0677,\n",
303
+ " 0.0574, 0.0127, 0.0122, 0.1290, 0.0295, 0.0152, 0.0213, 0.0434, 0.0107],\n",
304
+ " [0.0199, 0.0501, 0.1055, 0.0303, 0.0147, 0.0452, 0.0219, 0.0449, 0.0053,\n",
305
+ " 0.0467, 0.0200, 0.0128, 0.0681, 0.0176, 0.0685, 0.0018, 0.0278, 0.0677,\n",
306
+ " 0.0574, 0.0127, 0.0122, 0.1290, 0.0295, 0.0152, 0.0213, 0.0434, 0.0107],\n",
307
+ " [0.0146, 0.0385, 0.0422, 0.1188, 0.0114, 0.0203, 0.0324, 0.0095, 0.0526,\n",
308
+ " 0.0526, 0.0144, 0.0739, 0.0172, 0.0363, 0.0220, 0.0933, 0.0614, 0.0985,\n",
309
+ " 0.0649, 0.0244, 0.0376, 0.0174, 0.0137, 0.0036, 0.0035, 0.0208, 0.0044]])"
310
+ ]
311
+ },
312
+ "execution_count": 22,
313
+ "metadata": {},
314
+ "output_type": "execute_result"
315
+ }
316
+ ],
317
+ "source": [
318
+ "logits = xenc @ W #log-counts\n",
319
+ "counts = logits.exp() #equivalent to N, as done in A-Main-Notebook\n",
320
+ "probs = counts / counts.sum(1, keepdims=True) #Normalising the rows (as we had done in A-Main as well. To calculate the probability)\n",
321
+ "probs"
322
+ ]
323
+ },
324
+ {
325
+ "cell_type": "markdown",
326
+ "metadata": {},
327
+ "source": [
328
+ "-------------"
329
+ ]
330
+ },
331
+ {
332
+ "cell_type": "markdown",
333
+ "metadata": {},
334
+ "source": [
335
+ "-----------"
336
+ ]
337
+ },
338
+ {
339
+ "cell_type": "code",
340
+ "execution_count": null,
341
+ "metadata": {},
342
+ "outputs": [],
343
+ "source": [
344
+ "# SUMMARY ------------------------------>>>>\n",
345
+ "#Run the first 4 cells of this notebook and then continue"
346
+ ]
347
+ },
348
+ {
349
+ "cell_type": "code",
350
+ "execution_count": 27,
351
+ "metadata": {},
352
+ "outputs": [
353
+ {
354
+ "data": {
355
+ "text/plain": [
356
+ "tensor([ 0, 5, 13, 13, 1])"
357
+ ]
358
+ },
359
+ "execution_count": 27,
360
+ "metadata": {},
361
+ "output_type": "execute_result"
362
+ }
363
+ ],
364
+ "source": [
365
+ "xs"
366
+ ]
367
+ },
368
+ {
369
+ "cell_type": "code",
370
+ "execution_count": 28,
371
+ "metadata": {},
372
+ "outputs": [
373
+ {
374
+ "data": {
375
+ "text/plain": [
376
+ "tensor([ 5, 13, 13, 1, 0])"
377
+ ]
378
+ },
379
+ "execution_count": 28,
380
+ "metadata": {},
381
+ "output_type": "execute_result"
382
+ }
383
+ ],
384
+ "source": [
385
+ "ys"
386
+ ]
387
+ },
388
+ {
389
+ "cell_type": "code",
390
+ "execution_count": 29,
391
+ "metadata": {},
392
+ "outputs": [],
393
+ "source": [
394
+ "# randomly initialize 27 neurons' weights. each neuron receives 27 inputs\n",
395
+ "g = torch.Generator().manual_seed(2147483647)\n",
396
+ "W = torch.randn((27, 27), generator=g)"
397
+ ]
398
+ },
399
+ {
400
+ "cell_type": "code",
401
+ "execution_count": 30,
402
+ "metadata": {},
403
+ "outputs": [],
404
+ "source": [
405
+ "\n",
406
+ "xenc = F.one_hot(xs, num_classes=27).float() # input to the network: one-hot encoding\n",
407
+ "logits = xenc @ W # predict log-counts\n",
408
+ "counts = logits.exp() # counts, equivalent to N\n",
409
+ "probs = counts / counts.sum(1, keepdims=True) # probabilities for next character\n",
410
+ "# btw: the last 2 lines here are together called a 'softmax'"
411
+ ]
412
+ },
413
+ {
414
+ "cell_type": "code",
415
+ "execution_count": 31,
416
+ "metadata": {},
417
+ "outputs": [
418
+ {
419
+ "data": {
420
+ "text/plain": [
421
+ "torch.Size([5, 27])"
422
+ ]
423
+ },
424
+ "execution_count": 31,
425
+ "metadata": {},
426
+ "output_type": "execute_result"
427
+ }
428
+ ],
429
+ "source": [
430
+ "probs.shape"
431
+ ]
432
+ },
433
+ {
434
+ "cell_type": "code",
435
+ "execution_count": 32,
436
+ "metadata": {},
437
+ "outputs": [
438
+ {
439
+ "name": "stdout",
440
+ "output_type": "stream",
441
+ "text": [
442
+ "--------\n",
443
+ "bigram example 1: .e (indexes 0,5)\n",
444
+ "input to the neural net: 0\n",
445
+ "output probabilities from the neural net: tensor([0.0607, 0.0100, 0.0123, 0.0042, 0.0168, 0.0123, 0.0027, 0.0232, 0.0137,\n",
446
+ " 0.0313, 0.0079, 0.0278, 0.0091, 0.0082, 0.0500, 0.2378, 0.0603, 0.0025,\n",
447
+ " 0.0249, 0.0055, 0.0339, 0.0109, 0.0029, 0.0198, 0.0118, 0.1537, 0.1459])\n",
448
+ "label (actual next character): 5\n",
449
+ "probability assigned by the net to the the correct character: 0.01228625513613224\n",
450
+ "log likelihood: -4.399273872375488\n",
451
+ "negative log likelihood: 4.399273872375488\n",
452
+ "--------\n",
453
+ "bigram example 2: em (indexes 5,13)\n",
454
+ "input to the neural net: 5\n",
455
+ "output probabilities from the neural net: tensor([0.0290, 0.0796, 0.0248, 0.0521, 0.1989, 0.0289, 0.0094, 0.0335, 0.0097,\n",
456
+ " 0.0301, 0.0702, 0.0228, 0.0115, 0.0181, 0.0108, 0.0315, 0.0291, 0.0045,\n",
457
+ " 0.0916, 0.0215, 0.0486, 0.0300, 0.0501, 0.0027, 0.0118, 0.0022, 0.0472])\n",
458
+ "label (actual next character): 13\n",
459
+ "probability assigned by the net to the the correct character: 0.018050700426101685\n",
460
+ "log likelihood: -4.014570713043213\n",
461
+ "negative log likelihood: 4.014570713043213\n",
462
+ "--------\n",
463
+ "bigram example 3: mm (indexes 13,13)\n",
464
+ "input to the neural net: 13\n",
465
+ "output probabilities from the neural net: tensor([0.0312, 0.0737, 0.0484, 0.0333, 0.0674, 0.0200, 0.0263, 0.0249, 0.1226,\n",
466
+ " 0.0164, 0.0075, 0.0789, 0.0131, 0.0267, 0.0147, 0.0112, 0.0585, 0.0121,\n",
467
+ " 0.0650, 0.0058, 0.0208, 0.0078, 0.0133, 0.0203, 0.1204, 0.0469, 0.0126])\n",
468
+ "label (actual next character): 13\n",
469
+ "probability assigned by the net to the the correct character: 0.026691533625125885\n",
470
+ "log likelihood: -3.623408794403076\n",
471
+ "negative log likelihood: 3.623408794403076\n",
472
+ "--------\n",
473
+ "bigram example 4: ma (indexes 13,1)\n",
474
+ "input to the neural net: 13\n",
475
+ "output probabilities from the neural net: tensor([0.0312, 0.0737, 0.0484, 0.0333, 0.0674, 0.0200, 0.0263, 0.0249, 0.1226,\n",
476
+ " 0.0164, 0.0075, 0.0789, 0.0131, 0.0267, 0.0147, 0.0112, 0.0585, 0.0121,\n",
477
+ " 0.0650, 0.0058, 0.0208, 0.0078, 0.0133, 0.0203, 0.1204, 0.0469, 0.0126])\n",
478
+ "label (actual next character): 1\n",
479
+ "probability assigned by the net to the the correct character: 0.07367686182260513\n",
480
+ "log likelihood: -2.6080665588378906\n",
481
+ "negative log likelihood: 2.6080665588378906\n",
482
+ "--------\n",
483
+ "bigram example 5: a. (indexes 1,0)\n",
484
+ "input to the neural net: 1\n",
485
+ "output probabilities from the neural net: tensor([0.0150, 0.0086, 0.0396, 0.0100, 0.0606, 0.0308, 0.1084, 0.0131, 0.0125,\n",
486
+ " 0.0048, 0.1024, 0.0086, 0.0988, 0.0112, 0.0232, 0.0207, 0.0408, 0.0078,\n",
487
+ " 0.0899, 0.0531, 0.0463, 0.0309, 0.0051, 0.0329, 0.0654, 0.0503, 0.0091])\n",
488
+ "label (actual next character): 0\n",
489
+ "probability assigned by the net to the the correct character: 0.014977526850998402\n",
490
+ "log likelihood: -4.201204299926758\n",
491
+ "negative log likelihood: 4.201204299926758\n",
492
+ "=========\n",
493
+ "average negative log likelihood, i.e. loss = 3.7693049907684326\n"
494
+ ]
495
+ }
496
+ ],
497
+ "source": [
498
+ "nlls = torch.zeros(5)\n",
499
+ "for i in range(5):\n",
500
+ " # i-th bigram:\n",
501
+ " x = xs[i].item() # input character index\n",
502
+ " y = ys[i].item() # label character index\n",
503
+ " print('--------')\n",
504
+ " print(f'bigram example {i+1}: {itos[x]}{itos[y]} (indexes {x},{y})')\n",
505
+ " print('input to the neural net:', x)\n",
506
+ " print('output probabilities from the neural net:', probs[i])\n",
507
+ " print('label (actual next character):', y)\n",
508
+ " p = probs[i, y]\n",
509
+ " print('probability assigned by the net to the the correct character:', p.item())\n",
510
+ " logp = torch.log(p)\n",
511
+ " print('log likelihood:', logp.item())\n",
512
+ " nll = -logp\n",
513
+ " print('negative log likelihood:', nll.item())\n",
514
+ " nlls[i] = nll\n",
515
+ "\n",
516
+ "print('=========')\n",
517
+ "print('average negative log likelihood, i.e. loss =', nlls.mean().item())"
518
+ ]
519
+ },
520
+ {
521
+ "cell_type": "markdown",
522
+ "metadata": {},
523
+ "source": [
524
+ "--------------------"
525
+ ]
526
+ }
527
+ ],
528
+ "metadata": {
529
+ "kernelspec": {
530
+ "display_name": "venv",
531
+ "language": "python",
532
+ "name": "python3"
533
+ },
534
+ "language_info": {
535
+ "codemirror_mode": {
536
+ "name": "ipython",
537
+ "version": 3
538
+ },
539
+ "file_extension": ".py",
540
+ "mimetype": "text/x-python",
541
+ "name": "python",
542
+ "nbconvert_exporter": "python",
543
+ "pygments_lexer": "ipython3",
544
+ "version": "3.10.0"
545
+ }
546
+ },
547
+ "nbformat": 4,
548
+ "nbformat_minor": 2
549
+ }
C-Main-Notebook.ipynb ADDED
@@ -0,0 +1,530 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 1,
6
+ "metadata": {},
7
+ "outputs": [],
8
+ "source": [
9
+ "words = open('names.txt', 'r').read().splitlines()"
10
+ ]
11
+ },
12
+ {
13
+ "cell_type": "code",
14
+ "execution_count": 2,
15
+ "metadata": {},
16
+ "outputs": [],
17
+ "source": [
18
+ "import torch\n",
19
+ "\n",
20
+ "N = torch.zeros((27, 27), dtype = torch.int32)\n",
21
+ "\n",
22
+ "chars = sorted(list(set(''.join(words))))\n",
23
+ "\n",
24
+ "stoi = {s:i+1 for i,s in enumerate(chars)}\n",
25
+ "stoi['.'] = 0\n",
26
+ "\n",
27
+ "itos = {i:s for s,i in stoi.items()}"
28
+ ]
29
+ },
30
+ {
31
+ "cell_type": "code",
32
+ "execution_count": 3,
33
+ "metadata": {},
34
+ "outputs": [],
35
+ "source": [
36
+ "P = N.float()\n",
37
+ "P /= P.sum(1, keepdim=True)"
38
+ ]
39
+ },
40
+ {
41
+ "cell_type": "code",
42
+ "execution_count": 4,
43
+ "metadata": {},
44
+ "outputs": [
45
+ {
46
+ "name": "stdout",
47
+ "output_type": "stream",
48
+ "text": [
49
+ ". e\n",
50
+ "e m\n",
51
+ "m m\n",
52
+ "m a\n",
53
+ "a .\n"
54
+ ]
55
+ }
56
+ ],
57
+ "source": [
58
+ "#Creating the training set of bigrams (x,y)\n",
59
+ "xs, ys = [], []\n",
60
+ "\n",
61
+ "for word in words[:1]:\n",
62
+ " chs = ['.'] + list(word) + ['.']\n",
63
+ " for ch1, ch2 in zip(chs, chs[1:]):\n",
64
+ " ix1 = stoi[ch1]\n",
65
+ " ix2 = stoi[ch2]\n",
66
+ " print(ch1, ch2)\n",
67
+ " xs.append(ix1)\n",
68
+ " ys.append(ix2)\n",
69
+ "\n",
70
+ "xs = torch.tensor(xs)\n",
71
+ "ys = torch.tensor(ys)"
72
+ ]
73
+ },
74
+ {
75
+ "cell_type": "code",
76
+ "execution_count": 5,
77
+ "metadata": {},
78
+ "outputs": [],
79
+ "source": [
80
+ "#Feeding these examples into a neural network\n",
81
+ "import torch.nn.functional as F"
82
+ ]
83
+ },
84
+ {
85
+ "cell_type": "code",
86
+ "execution_count": null,
87
+ "metadata": {},
88
+ "outputs": [],
89
+ "source": [
90
+ "#<=========OPTIMIZATION============>"
91
+ ]
92
+ },
93
+ {
94
+ "cell_type": "code",
95
+ "execution_count": 6,
96
+ "metadata": {},
97
+ "outputs": [
98
+ {
99
+ "data": {
100
+ "text/plain": [
101
+ "tensor([ 0, 5, 13, 13, 1])"
102
+ ]
103
+ },
104
+ "execution_count": 6,
105
+ "metadata": {},
106
+ "output_type": "execute_result"
107
+ }
108
+ ],
109
+ "source": [
110
+ "xs"
111
+ ]
112
+ },
113
+ {
114
+ "cell_type": "code",
115
+ "execution_count": 7,
116
+ "metadata": {},
117
+ "outputs": [
118
+ {
119
+ "data": {
120
+ "text/plain": [
121
+ "tensor([ 5, 13, 13, 1, 0])"
122
+ ]
123
+ },
124
+ "execution_count": 7,
125
+ "metadata": {},
126
+ "output_type": "execute_result"
127
+ }
128
+ ],
129
+ "source": [
130
+ "ys"
131
+ ]
132
+ },
133
+ {
134
+ "cell_type": "code",
135
+ "execution_count": 12,
136
+ "metadata": {},
137
+ "outputs": [],
138
+ "source": [
139
+ "# randomly initialize 27 neurons' weights. each neuron receives 27 inputs\n",
140
+ "g = torch.Generator().manual_seed(2147483647)\n",
141
+ "W = torch.randn((27, 27), generator=g, requires_grad=True) #Adding the third parameter here for the Backward pass (as remember in micrograd we had done the same thing)"
142
+ ]
143
+ },
144
+ {
145
+ "cell_type": "code",
146
+ "execution_count": 13,
147
+ "metadata": {},
148
+ "outputs": [],
149
+ "source": [
150
+ "#FORWARD PASS\n",
151
+ "xenc = F.one_hot(xs, num_classes=27).float() # input to the network: one-hot encoding\n",
152
+ "logits = xenc @ W # predict log-counts\n",
153
+ "counts = logits.exp() # counts, equivalent to N\n",
154
+ "probs = counts / counts.sum(1, keepdims=True) # probabilities for next character\n",
155
+ "loss = -probs[torch.arange(5), ys].log().mean() #torch.arange(5) is basically 0 to 5(4) position, ys is from that tuple list | We calculate the probability values of that | Then we take their log values | Then we take their mean | Finally take the negative value (since NLL)"
156
+ ]
157
+ },
158
+ {
159
+ "cell_type": "code",
160
+ "execution_count": null,
161
+ "metadata": {},
162
+ "outputs": [
163
+ {
164
+ "data": {
165
+ "text/plain": [
166
+ "tensor(3.7693)"
167
+ ]
168
+ },
169
+ "execution_count": 10,
170
+ "metadata": {},
171
+ "output_type": "execute_result"
172
+ }
173
+ ],
174
+ "source": [
175
+ "loss #This will be similar to the one we also calculated in the SUMMARY part of B-Main"
176
+ ]
177
+ },
178
+ {
179
+ "cell_type": "code",
180
+ "execution_count": 14,
181
+ "metadata": {},
182
+ "outputs": [],
183
+ "source": [
184
+ "#BACKWARD PASS\n",
185
+ "W.grad = None #the gradient is first set to zero\n",
186
+ "loss.backward()"
187
+ ]
188
+ },
189
+ {
190
+ "cell_type": "code",
191
+ "execution_count": 15,
192
+ "metadata": {},
193
+ "outputs": [
194
+ {
195
+ "data": {
196
+ "text/plain": [
197
+ "torch.Size([27, 27])"
198
+ ]
199
+ },
200
+ "execution_count": 15,
201
+ "metadata": {},
202
+ "output_type": "execute_result"
203
+ }
204
+ ],
205
+ "source": [
206
+ "W.grad.shape"
207
+ ]
208
+ },
209
+ {
210
+ "cell_type": "code",
211
+ "execution_count": null,
212
+ "metadata": {},
213
+ "outputs": [],
214
+ "source": [
215
+ "W.grad"
216
+ ]
217
+ },
218
+ {
219
+ "cell_type": "code",
220
+ "execution_count": null,
221
+ "metadata": {},
222
+ "outputs": [],
223
+ "source": [
224
+ "#UPDATE\n",
225
+ "W.data += -0.1 * W.grad"
226
+ ]
227
+ },
228
+ {
229
+ "cell_type": "markdown",
230
+ "metadata": {},
231
+ "source": [
232
+ "--------------"
233
+ ]
234
+ },
235
+ {
236
+ "cell_type": "code",
237
+ "execution_count": null,
238
+ "metadata": {},
239
+ "outputs": [],
240
+ "source": [
241
+ "#JUST PUTTING THEM TOGETHER TO PERFORM GRADIENT DESCENT"
242
+ ]
243
+ },
244
+ {
245
+ "cell_type": "code",
246
+ "execution_count": null,
247
+ "metadata": {},
248
+ "outputs": [],
249
+ "source": [
250
+ "#ONLY RUN THIS THE FIRST TIME\n",
251
+ "# randomly initialize 27 neurons' weights. each neuron receives 27 inputs\n",
252
+ "g = torch.Generator().manual_seed(2147483647)\n",
253
+ "W = torch.randn((27, 27), generator=g, requires_grad=True) #Adding the third parameter here for the Backward pass (as remember in micrograd we had done the same thing)"
254
+ ]
255
+ },
256
+ {
257
+ "cell_type": "code",
258
+ "execution_count": 34,
259
+ "metadata": {},
260
+ "outputs": [],
261
+ "source": [
262
+ "#FORWARD PASS\n",
263
+ "xenc = F.one_hot(xs, num_classes=27).float() # input to the network: one-hot encoding\n",
264
+ "logits = xenc @ W # predict log-counts\n",
265
+ "counts = logits.exp() # counts, equivalent to N\n",
266
+ "probs = counts / counts.sum(1, keepdims=True) # probabilities for next character\n",
267
+ "loss = -probs[torch.arange(5), ys].log().mean() #torch.arange(5) is basically 0 to 5(4) position, ys is from that tuple list | We calculate the probability values of that | Then we take their log values | Then we take their mean | Finally take the negative value (since NLL)"
268
+ ]
269
+ },
270
+ {
271
+ "cell_type": "code",
272
+ "execution_count": 35,
273
+ "metadata": {},
274
+ "outputs": [
275
+ {
276
+ "name": "stdout",
277
+ "output_type": "stream",
278
+ "text": [
279
+ "3.6891887187957764\n"
280
+ ]
281
+ }
282
+ ],
283
+ "source": [
284
+ "print(loss.item()) #CHECKING THE LOSS VALUE"
285
+ ]
286
+ },
287
+ {
288
+ "cell_type": "code",
289
+ "execution_count": 32,
290
+ "metadata": {},
291
+ "outputs": [],
292
+ "source": [
293
+ "#BACKWARD PASS\n",
294
+ "W.grad = None #the gradient is first set to zero\n",
295
+ "loss.backward()"
296
+ ]
297
+ },
298
+ {
299
+ "cell_type": "code",
300
+ "execution_count": 33,
301
+ "metadata": {},
302
+ "outputs": [],
303
+ "source": [
304
+ "#UPDATE\n",
305
+ "W.data += -0.1 * W.grad"
306
+ ]
307
+ },
308
+ {
309
+ "cell_type": "markdown",
310
+ "metadata": {},
311
+ "source": [
312
+ "Yay, that worked. Noice"
313
+ ]
314
+ },
315
+ {
316
+ "cell_type": "markdown",
317
+ "metadata": {},
318
+ "source": [
319
+ "----------------"
320
+ ]
321
+ },
322
+ {
323
+ "cell_type": "markdown",
324
+ "metadata": {},
325
+ "source": [
326
+ "---------------"
327
+ ]
328
+ },
329
+ {
330
+ "cell_type": "markdown",
331
+ "metadata": {},
332
+ "source": [
333
+ "### **PUTTING THEM ALL TOGETHER**"
334
+ ]
335
+ },
336
+ {
337
+ "cell_type": "code",
338
+ "execution_count": 36,
339
+ "metadata": {},
340
+ "outputs": [
341
+ {
342
+ "name": "stdout",
343
+ "output_type": "stream",
344
+ "text": [
345
+ "number of examples: 228146\n"
346
+ ]
347
+ }
348
+ ],
349
+ "source": [
350
+ "# create the dataset\n",
351
+ "xs, ys = [], []\n",
352
+ "for w in words:\n",
353
+ " chs = ['.'] + list(w) + ['.']\n",
354
+ " for ch1, ch2 in zip(chs, chs[1:]):\n",
355
+ " ix1 = stoi[ch1]\n",
356
+ " ix2 = stoi[ch2]\n",
357
+ " xs.append(ix1)\n",
358
+ " ys.append(ix2)\n",
359
+ "xs = torch.tensor(xs)\n",
360
+ "ys = torch.tensor(ys)\n",
361
+ "num = xs.nelement()\n",
362
+ "print('number of examples: ', num)\n",
363
+ "\n",
364
+ "# initialize the 'network'\n",
365
+ "g = torch.Generator().manual_seed(2147483647)\n",
366
+ "W = torch.randn((27, 27), generator=g, requires_grad=True)"
367
+ ]
368
+ },
369
+ {
370
+ "cell_type": "code",
371
+ "execution_count": 37,
372
+ "metadata": {},
373
+ "outputs": [
374
+ {
375
+ "name": "stdout",
376
+ "output_type": "stream",
377
+ "text": [
378
+ "3.7686190605163574\n",
379
+ "3.378804922103882\n",
380
+ "3.1610896587371826\n",
381
+ "3.0271859169006348\n",
382
+ "2.9344847202301025\n",
383
+ "2.867231607437134\n",
384
+ "2.816654920578003\n",
385
+ "2.777147054672241\n",
386
+ "2.7452545166015625\n",
387
+ "2.7188305854797363\n",
388
+ "2.6965057849884033\n",
389
+ "2.6773722171783447\n",
390
+ "2.6608052253723145\n",
391
+ "2.6463513374328613\n",
392
+ "2.633665084838867\n",
393
+ "2.622471332550049\n",
394
+ "2.6125471591949463\n",
395
+ "2.6037065982818604\n",
396
+ "2.595794439315796\n",
397
+ "2.5886802673339844\n"
398
+ ]
399
+ }
400
+ ],
401
+ "source": [
402
+ "# gradient descent\n",
403
+ "for k in range(20):\n",
404
+ " \n",
405
+ " # forward pass\n",
406
+ " xenc = F.one_hot(xs, num_classes=27).float() # input to the network: one-hot encoding\n",
407
+ " logits = xenc @ W # predict log-counts\n",
408
+ " counts = logits.exp() # counts, equivalent to N\n",
409
+ " probs = counts / counts.sum(1, keepdims=True) # probabilities for next character\n",
410
+ " loss = -probs[torch.arange(num), ys].log().mean() + 0.01*(W**2).mean()\n",
411
+ " print(loss.item())\n",
412
+ " \n",
413
+ " # backward pass\n",
414
+ " W.grad = None # set to zero the gradient\n",
415
+ " loss.backward()\n",
416
+ " \n",
417
+ " # update\n",
418
+ " W.data += -50 * W.grad"
419
+ ]
420
+ },
421
+ {
422
+ "cell_type": "markdown",
423
+ "metadata": {},
424
+ "source": [
425
+ "SO WE ALMOST ACHIEVED A VERY LOW LOSS VALUE. SIMILAR TO THE LOSS VALUE WE CALCULATED IN A-MAIN, WHEN WE TYPED OUR OWN NAME AND SAW HOW IT PERFORMS"
426
+ ]
427
+ },
428
+ {
429
+ "cell_type": "markdown",
430
+ "metadata": {},
431
+ "source": [
432
+ "--------"
433
+ ]
434
+ },
435
+ {
436
+ "cell_type": "markdown",
437
+ "metadata": {},
438
+ "source": [
439
+ "--------------"
440
+ ]
441
+ },
442
+ {
443
+ "cell_type": "markdown",
444
+ "metadata": {},
445
+ "source": [
446
+ "Finally *drumrolls*, we are going to see how sampling from this model produces the outputs (Spoiler alert: it will be the same as how we made the model manually, coz... it is the same model just that we made it using Neural nets)"
447
+ ]
448
+ },
449
+ {
450
+ "cell_type": "code",
451
+ "execution_count": 38,
452
+ "metadata": {},
453
+ "outputs": [
454
+ {
455
+ "name": "stdout",
456
+ "output_type": "stream",
457
+ "text": [
458
+ "juwjde.\n",
459
+ "janaqah.\n",
460
+ "pxzfby.\n",
461
+ "a.\n",
462
+ "nn.\n"
463
+ ]
464
+ }
465
+ ],
466
+ "source": [
467
+ "# finally, sample from the 'neural net' model\n",
468
+ "g = torch.Generator().manual_seed(2147483647)\n",
469
+ "\n",
470
+ "for i in range(5):\n",
471
+ " \n",
472
+ " out = []\n",
473
+ " ix = 0\n",
474
+ " while True:\n",
475
+ " \n",
476
+ " # ----------\n",
477
+ " # BEFORE:\n",
478
+ " #p = P[ix]\n",
479
+ " # ----------\n",
480
+ " # NOW:\n",
481
+ " xenc = F.one_hot(torch.tensor([ix]), num_classes=27).float()\n",
482
+ " logits = xenc @ W # predict log-counts\n",
483
+ " counts = logits.exp() # counts, equivalent to N\n",
484
+ " p = counts / counts.sum(1, keepdims=True) # probabilities for next character\n",
485
+ " # ----------\n",
486
+ " \n",
487
+ " ix = torch.multinomial(p, num_samples=1, replacement=True, generator=g).item()\n",
488
+ " out.append(itos[ix])\n",
489
+ " if ix == 0:\n",
490
+ " break\n",
491
+ " print(''.join(out))"
492
+ ]
493
+ },
494
+ {
495
+ "cell_type": "markdown",
496
+ "metadata": {},
497
+ "source": [
498
+ "--------"
499
+ ]
500
+ },
501
+ {
502
+ "cell_type": "markdown",
503
+ "metadata": {},
504
+ "source": [
505
+ "---------"
506
+ ]
507
+ }
508
+ ],
509
+ "metadata": {
510
+ "kernelspec": {
511
+ "display_name": "venv",
512
+ "language": "python",
513
+ "name": "python3"
514
+ },
515
+ "language_info": {
516
+ "codemirror_mode": {
517
+ "name": "ipython",
518
+ "version": 3
519
+ },
520
+ "file_extension": ".py",
521
+ "mimetype": "text/x-python",
522
+ "name": "python",
523
+ "nbconvert_exporter": "python",
524
+ "pygments_lexer": "ipython3",
525
+ "version": "3.10.0"
526
+ }
527
+ },
528
+ "nbformat": 4,
529
+ "nbformat_minor": 2
530
+ }
README.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## SET 1 - MAKEMORE (PART 1) 🔗
2
+
3
+ [![Documentation](https://img.shields.io/badge/Documentation-Available-blue)](https://muzzammilshah.github.io/Road-to-GPT/Makemore-part1/)
4
+ ![Number of Commits](https://img.shields.io/github/commit-activity/m/MuzzammilShah/NeuralNetworks-LanguageModels-1?label=Commits)
5
+ [![Last Commit](https://img.shields.io/github/last-commit/MuzzammilShah/NeuralNetworks-LanguageModels-1.svg?style=flat)](https://github.com/MuzzammilShah/NeuralNetworks-LanguageModels-1/commits/main)
6
+ ![Project Status](https://img.shields.io/badge/Status-Done-success)
7
+
8
+ &nbsp;
9
+
10
+ ### **Overview**
11
+ Introduced to the concept of a bigram character-level language model, this repository explores its **training**, **sampling**, and **evaluation** processes. The model evaluation was conducted using the **Negative Log Likelihood (NLL)** loss to assess its quality.
12
+
13
+ The model was trained in two distinct ways, both yielding identical results:
14
+
15
+ 1. **Frequency-Based Approach**: Directly counting and normalizing bigram frequencies.
16
+ 2. **Gradient-Based Optimization**: Optimizing the counts matrix using a gradient-based framework guided by minimizing the NLL loss.
17
+
18
+ This demonstrated that **both methods converge to the same result**, showcasing their equivalence in achieving the desired outcome.
19
+
20
+ &nbsp;
21
+
22
+ ### **🗂️Repository Structure**
23
+
24
+ ```plaintext
25
+ ├── .gitignore
26
+ ├── A-Main-Notebook.ipynb
27
+ ├── B-Main-Notebook.ipynb
28
+ ├── C-Main-Notebook.ipynb
29
+ ├── README.md
30
+ ├── notes/
31
+ │ ├── A-main-makemore-part1.md
32
+ │ ├── B-main-makemore-part1.md
33
+ │ ├── C-main-makemore-part1.md
34
+ │ └── README.md
35
+ └── names.txt
36
+ ```
37
+
38
+ - **Notes Directory**: Contains detailed notes corresponding to each notebook section.
39
+ - **Jupyter Notebooks**: Step-by-step implementation and exploration of the bigram model.
40
+ - **README.md**: Overview and guide for this repository.
41
+ - **names.txt**: Supplementary data file used in training the model.
42
+
43
+ &nbsp;
44
+
45
+ ### **📄Instructions**
46
+
47
+ To get the best understanding:
48
+
49
+ 1. Start by reading the notes in the `notes/` directory. Each section corresponds to a notebook for step-by-step explanations.
50
+ 2. Open the corresponding Jupyter Notebook (e.g., `A-Main-Notebook.ipynb` for `A-main-makemore-part1.md`).
51
+ 3. Follow the code and comments for a deeper dive into the implementation details.
52
+
53
+ &nbsp;
54
+
55
+ ### **⭐Documentation**
56
+
57
+ For a better reading experience and detailed notes, visit my **[Road to GPT Documentation Site](https://muzzammilshah.github.io/Road-to-GPT/)**.
58
+
59
+ > **💡Pro Tip**: This site provides an interactive and visually rich explanation of the notes and code. It is highly recommended you view this project from there.
60
+
61
+ &nbsp;
62
+
63
+
64
+ ### **✍🏻Acknowledgments**
65
+ Notes and implementations inspired by the **Makemore - Part 1** video by [Andrej Karpathy](https://karpathy.ai/).
66
+
67
+ For more of my projects, visit my [Portfolio Site](https://muhammedshah.com).
names.txt ADDED
The diff for this file is too large to render. See raw diff