File size: 18,526 Bytes
b386992
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 88
    },
    "id": "d4KCUoxSpdoZ",
    "outputId": "51c8a307-54e4-4056-a677-2baee15081b8"
   },
   "outputs": [],
   "source": [
    "BRANCH = 'main'\n",
    "\n",
    "\"\"\"\n",
    "You can run either this notebook locally (if you have all the dependencies and a GPU) or on Google Colab.\n",
    "\n",
    "Instructions for setting up Colab are as follows:\n",
    "1. Open a new Python 3 notebook.\n",
    "2. Import this notebook from GitHub (File -> Upload Notebook -> \"GITHUB\" tab -> copy/paste GitHub URL)\n",
    "3. Connect to an instance with a GPU (Runtime -> Change runtime type -> select \"GPU\" for hardware accelerator)\n",
    "4. Run this cell to set up dependencies.\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "JDk9zxC6pdod",
    "outputId": "3ba8e4ef-65b6-4731-ad27-d34d15adc3d1"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "# either provide a path to local NeMo repository with NeMo already installed or git clone\n",
    "\n",
    "# option #1: local path to NeMo repo with NeMo already installed\n",
    "NEMO_DIR_PATH = \"NeMo\"\n",
    "\n",
    "# option #2: download NeMo repo\n",
    "if 'google.colab' in str(get_ipython()) or not os.path.exists(NEMO_DIR_PATH):\n",
    "  !git clone -b $BRANCH https://github.com/NVIDIA/NeMo\n",
    "  %cd NeMo\n",
    "  !python -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[all]\n",
    "  %cd -"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "7ZO_421L7bOH"
   },
   "source": [
    "This tutorial contains external links. Each user is responsible for checking the content and the applicable licenses and determining if suitable for the intended use."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "id": "A4NE9GhNn8f-"
   },
   "source": [
    "In this tutorial, we will use [NeMo Forced Aligner](https://github.com/NVIDIA/NeMo/tree/main/tools/nemo_forced_aligner) to generate token and word alignments for a video of Neil Armstrong's first steps on the moon. We will use the ASS-format subtitle files generated by NFA to add subtitles with token-by-token and word-by-word highlighting to the video.\n",
    "\n",
    "\n",
    "We will use the video at this [link](https://www.nasa.gov/wp-content/uploads/static/history/alsj/a11/a11.v1092338.mov), which is in the public domain and was obtained from the NASA website [here](https://history.nasa.gov/alsj/a11/video11.html#Step). The transcript for the video is obtained from the transcript of the mission [here](https://history.nasa.gov/alsj/a11/a11transcript_tec.pdf). As referenced on this [page](https://history.nasa.gov/alsj/a11/a11trans.html), this is a raw transcript with no copyright asserted.\n",
    "\n",
    "The alignment process is shown below. To better understand the 'Viterbi decoding' process, you can refer to our tutorial [here](https://nvidia.github.io/NeMo/blogs/2023/2023-08-forced-alignment/).\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "PGr_hMTCcm3J"
   },
   "source": [
    "![NFA forced alignment pipeline](https://github.com/NVIDIA/NeMo/releases/download/v1.20.0/nfa_forced_alignment_pipeline.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "gQOQ-EK1aIBN"
   },
   "source": [
    "\n",
    "\n",
    "We will do the following:\n",
    "\n",
    "1. Download the source video.\n",
    "2. Prepare a manifest to input to NFA.\n",
    "3. Generate alignments using NFA.\n",
    "4. Generate a video with subtitles with token-by-token or word-by-word highlighting."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "CH7yR7cSwPKr"
   },
   "outputs": [],
   "source": [
    "import json\n",
    "import os\n",
    "\n",
    "from IPython.display import HTML\n",
    "from base64 import b64encode"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "40XnfPNop_7V"
   },
   "source": [
    "# 1. Video download\n",
    "We will download the video from the provided link and convert it to a higher-resolution format - the video footage will not get any clearer, but it will allow the subtitles to look sharper when we burn them in at the end of the tutorial."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "sl34bUsUp_EV",
    "outputId": "a1170a38-7fa0-439f-8ca8-2b79d9d0406d"
   },
   "outputs": [],
   "source": [
    "# first make a directory WORK_DIR that we will save all our new files in\n",
    "WORK_DIR=\"WORK_DIR\"\n",
    "!mkdir $WORK_DIR\n",
    "!wget https://www.nasa.gov/wp-content/uploads/static/history/alsj/a11/a11.v1092338.mov --directory-prefix=$WORK_DIR"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "GuFjl8-mrJub",
    "outputId": "a216ef90-3d6e-49e5-8c23-88f60b2fc11f"
   },
   "outputs": [],
   "source": [
    "# scale up the number of pixels in video so that the subtitles will look sharp when we burn them in later\n",
    "!/usr/bin/ffmpeg -loglevel warning -y -i $WORK_DIR/a11.v1092338.mov -vf \"scale=-1:480\" $WORK_DIR/one_small_step.mp4\n",
    "# also save the audio as a separate file, so that we can use this as input to NFA\n",
    "!/usr/bin/ffmpeg -loglevel warning -y -i $WORK_DIR/a11.v1092338.mov $WORK_DIR/one_small_step.wav"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "laCH__s6KYSF"
   },
   "source": [
    "A note on FFMPEG commands:\n",
    "\n",
    "*In this tutorial, instead of just calling the `ffmpeg` command, we will specifically call `/usr/bin/ffmpeg`. If we just call `ffmpeg`, there is a chance the conda version of `ffmpeg` will be called (depending on how you installed the dependencies for this tutorial). We have observed that the conda version of `ffmpeg` may generate incorrectly formatted MP4 files with the commands used, and may not be able to generate the subtitle videos which we will create at the end of this tutorial.*\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 322
    },
    "id": "5In-mlX7pPqq",
    "outputId": "47651480-9e64-48e8-d3bb-08bae51e6f50"
   },
   "outputs": [],
   "source": [
    "# display video so we know what we will be working with\n",
    "mp4 = open(f'{WORK_DIR}/one_small_step.mp4','rb').read()\n",
    "data_url = \"data:video/mp4;base64,\" + b64encode(mp4).decode()\n",
    "HTML(\"\"\"\n",
    "<video width=400 controls>\n",
    "      <source src=\"%s\" type=\"video/mp4\">\n",
    "</video>\n",
    "\"\"\" % data_url)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "dNSOrcDyscvV"
   },
   "source": [
    "The audio and video quality of the video isn't great - it was taken on the Moon in 1969 after all. Using NFA, we will still be able to obtain good token & word alignments despite the poor audio quality."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "VGsMdjOsdBn6"
   },
   "source": [
    "# 2. Prepare a manifest to input to NFA.\n",
    "NFA requires a manifest as input, in the form shown in the diagram below. As we are only aligning a single audio file, our manifest will only contain one line.\n",
    "\n",
    "**Note**: the text field is optional, but if you omit it, you need to specify `align_using_pred_text=true` in the config you feed into NFA.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "i84J0AyiW6X7"
   },
   "source": [
    "![NFA usage pipeline](https://github.com/NVIDIA/NeMo/releases/download/v1.20.0/nfa_run.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "lmxvl-tcsyz6"
   },
   "source": [
    "We prepare the text obtained from [here](https://history.nasa.gov/alsj/a11/a11transcript_tec.pdf). We add `|` to the text and we will specify in the NFA config that `additional_segment_grouping_separator=\"|\"`. This will make sure NFA produces timestamps for the groups of text that is in between the `|` characters. It also means that, by default, the ASS subtitle files will display those groups of text together. By choosing the location of `|`, you can make sure your subtitles break up the text in the desired location.\n",
    "\n",
    "**Extra info**: Alternatively, you can specify in the NFA config that `ass_file_config.resegment_text_to_fill_space=true`, which will cause automatic grouping of words in the ASS files. You can also specify `ass_file_config.max_lines_per_segment` to set the maximum number of lines of text that will appear at a time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "IS-ef4o5s_tv"
   },
   "outputs": [],
   "source": [
    "text = \"\"\"\n",
    "I'm at the foot of the ladder. The LM footpads are only depressed in the |\n",
    "surface about 1 or 2 inches, although the surface appears to be very, very |\n",
    "fine grained, as you get close to it. It's almost like a powder. |\n",
    "Down there, it's very fine. I'm going to step off the LM now. That's one |\n",
    "small step for man, one giant leap for mankind.\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "OHq0trf2h24g"
   },
   "outputs": [],
   "source": [
    "manifest_filepath = f\"{WORK_DIR}/manifest.json\"\n",
    "manifest_data = {\n",
    "    \"audio_filepath\": f\"{WORK_DIR}/one_small_step.wav\",\n",
    "    \"text\": text\n",
    "}\n",
    "with open(manifest_filepath, 'w') as f:\n",
    "  line = json.dumps(manifest_data)\n",
    "  f.write(line + \"\\n\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "RvTrg8C3wHYw",
    "outputId": "b257a766-4dfc-4f69-f75e-4b3bbf84ef60"
   },
   "outputs": [],
   "source": [
    "!cat $manifest_filepath"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "QKnTJG5Ig4lP"
   },
   "source": [
    "# 3. Run NFA"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "id": "izC7x71-T0Il"
   },
   "source": [
    "We run NFA by calling the `align.py` function and passing in config parameters. In our case these want to set:\n",
    "\n",
    "**Compulsory parameters**\n",
    "\n",
    "* `pretrained_name` (either this or a `model_path` is required) - this is the name of the ASR model that we will use to do alignment. You can use any of NeMo's CTC models for alignment.\n",
    "* `manifest_filepath` - the path to the manifest specifying the audio that you want to align and its reference text.\n",
    "* `output_dir` - the path to the directory where your output files will be saved.\n",
    "\n",
    "**Optional parameters**\n",
    "* `additional_segment_grouping_separator` - a string (of any length) that you will use to define splits between different segments. In our case we used \"|\".\n",
    "* `ass_file_config.vertical_alignment` - by default this is set to `\"center\"`, meaning the subtitles will appear in the center of the screen. We want them to appear at the bottom of our video, so we set this parameter to `\"bottom\"`.\n",
    "* `ass_file_config.text_already_spoken_rgb`, `ass_file_config.text_being_spoken_rgb`, `ass_file_config.text_not_yet_spoken_rgb` - these parameters define the RGB values of the colors of the text. The default colors do not show up very clearly against our video, so we set them to some different values, which will generate the colors teal, yellow, and a very pale blue."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "QDcyfW_RWrqB",
    "outputId": "9b76c513-30bd-44d6-ef70-7daba8f604f0"
   },
   "outputs": [],
   "source": [
    "!python $NEMO_DIR_PATH/tools/nemo_forced_aligner/align.py \\\n",
    "  pretrained_name=\"stt_en_fastconformer_hybrid_large_pc\" \\\n",
    "  manifest_filepath=$manifest_filepath \\\n",
    "  output_dir=$WORK_DIR/nfa_output/ \\\n",
    "  additional_segment_grouping_separator=\"|\" \\\n",
    "  ass_file_config.vertical_alignment=\"bottom\" \\\n",
    "  ass_file_config.text_already_spoken_rgb=[66,245,212] \\\n",
    "  ass_file_config.text_being_spoken_rgb=[242,222,44] \\\n",
    "  ass_file_config.text_not_yet_spoken_rgb=[223,242,239]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "dHU-YmALUvVf"
   },
   "source": [
    "The alignment process should have finished successfully, let's look at some of the output files."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "VSyM9zp4wQSJ",
    "outputId": "3d77e04d-cc56-425a-92bc-d70a34545f00"
   },
   "outputs": [],
   "source": [
    "!head $WORK_DIR/nfa_output/ctm/*/*.ctm\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "-LESm1cudqZ0"
   },
   "source": [
    "The token timestamps are produced directly from the Viterbi alignment process, and the word and segment timestamps are obtained from the token timestamps using a simple grouping process shown below:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "OXUL-KyUdVpL"
   },
   "source": [
    "![How NFA generates word and segment alignments from token alignments](https://github.com/NVIDIA/NeMo/releases/download/v1.20.0/nfa_word_segment_alignments.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "3ZZuKU5YwXBh"
   },
   "source": [
    "# 4. Make subtitled video"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "tlYr1ymnU4Du"
   },
   "source": [
    "To make the subtitled video, we will use the ASS-format files that NFA produces."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "RauJ91RgzCvp",
    "outputId": "b0b34085-847a-4af0-d937-b67de12304e5"
   },
   "outputs": [],
   "source": [
    "!head -n20 $WORK_DIR/nfa_output/ass/words/*.ass"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "OaYjR7mDdVKF",
    "outputId": "2fedc7ae-b931-409f-a338-5a36e6c457c4"
   },
   "outputs": [],
   "source": [
    "!head -n20 $WORK_DIR/nfa_output/ass/tokens/*.ass"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ipIlViegVDgQ"
   },
   "source": [
    "We burn the subtitles into the video using the below commands. We generate 2 videos, one with token-by-token highlighting, and one with word-by-word highlighting."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "Av7L4Qzojykv",
    "outputId": "dd1dc2af-3631-4776-e79d-157b61afa98e"
   },
   "outputs": [],
   "source": [
    "!/usr/bin/ffmpeg -loglevel warning -y -i \"$WORK_DIR/one_small_step.mp4\" \\\n",
    "  -vf \"ass=$WORK_DIR/nfa_output/ass/tokens/one_small_step.ass\" \\\n",
    "  $WORK_DIR/one_small_step_tokens_aligned.mp4\n",
    "\n",
    "!/usr/bin/ffmpeg -loglevel warning -y -i \"$WORK_DIR/one_small_step.mp4\" \\\n",
    "  -vf \"ass=$WORK_DIR/nfa_output/ass/words/one_small_step.ass\" \\\n",
    "  $WORK_DIR/one_small_step_words_aligned.mp4"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "gfhQxkbPVPSj"
   },
   "source": [
    "Let's look at the resulting videos below.\n",
    "\n",
    "You can see that the token timestamps (in the first video) are very accurate, even despite the poor audio quality of the video.\n",
    "\n",
    "The word timestamps (in the second video) are also very good. The only noticeable mistakes are when the word has punctuation at the end (or beginning). This is because punctuation that is not separated from a word by a space is considered to be part of that word. If the alignment for the punctuation is in a region of non-speech, then the word alignment will also contain that region of non-speech."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 322
    },
    "id": "ik6FcCgIMYsa",
    "outputId": "4781a087-a8a2-44a7-9e0f-ea57e05cde59"
   },
   "outputs": [],
   "source": [
    "mp4 = open(f'{WORK_DIR}/one_small_step_tokens_aligned.mp4','rb').read()\n",
    "data_url = \"data:video/mp4;base64,\" + b64encode(mp4).decode()\n",
    "HTML(\"\"\"\n",
    "<video width=400 controls>\n",
    "      <source src=\"%s\" type=\"video/mp4\">\n",
    "</video>\n",
    "\"\"\" % data_url)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 322
    },
    "id": "mgFUhdlsdXLs",
    "outputId": "94f8ae2f-8752-4671-aeca-a19f03db7fd6"
   },
   "outputs": [],
   "source": [
    "mp4 = open(f'{WORK_DIR}/one_small_step_words_aligned.mp4','rb').read()\n",
    "data_url = \"data:video/mp4;base64,\" + b64encode(mp4).decode()\n",
    "HTML(\"\"\"\n",
    "<video width=400 controls>\n",
    "      <source src=\"%s\" type=\"video/mp4\">\n",
    "</video>\n",
    "\"\"\" % data_url)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Gr70zxMoeLqR"
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "provenance": []
  },
  "gpuClass": "standard",
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}