EddyGiusepe commited on
Commit
730856b
·
1 Parent(s): 0a10a4d

Pipeline de Dados

Browse files
.gitignore ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ # EddyGiusepe
2
+ venv_pipeData/
3
+
1_Data_Pipelines/exemplo_Data_Pipeline.ipynb ADDED
@@ -0,0 +1,565 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "<h1 align=\"center\"><font color=\"gree\">Data Pipelines</font></h1>"
8
+ ]
9
+ },
10
+ {
11
+ "cell_type": "markdown",
12
+ "metadata": {},
13
+ "source": [
14
+ "<font color=\"yellow\">Data Scientist.: Dr. Eddy Giusepe Chirinos Isidro</font>"
15
+ ]
16
+ },
17
+ {
18
+ "cell_type": "markdown",
19
+ "metadata": {},
20
+ "source": [
21
+ "Aqui vamos a executar um `Pipeline` que carregue os dados do [PokéAPI](https://pokeapi.co/) em um Bando de Dados [DuckDB](https://duckdb.org/) usando a Biblioteca [dlt](https://dlthub.com/docs/intro)."
22
+ ]
23
+ },
24
+ {
25
+ "cell_type": "markdown",
26
+ "metadata": {},
27
+ "source": [
28
+ "# <font color=\"gree\">Contextualizando</font>"
29
+ ]
30
+ },
31
+ {
32
+ "cell_type": "markdown",
33
+ "metadata": {},
34
+ "source": [
35
+ "## <font color=\"red\">O que é `dlt`</font>"
36
+ ]
37
+ },
38
+ {
39
+ "cell_type": "markdown",
40
+ "metadata": {},
41
+ "source": [
42
+ "`dlt` é uma biblioteca de código aberto que você pode adicionar aos seus scripts Python para carregar dados de várias fontes de dados, muitas vezes confusas, em conjuntos de dados ativos e bem estruturados. Para começar, instale-o com:\n",
43
+ "\n",
44
+ "```\n",
45
+ "pip install dlt\n",
46
+ "```\n",
47
+ "\n",
48
+ "\n",
49
+ "Ao contrário de outras soluções, com `dlt` não há necessidade de usar `back-ends` ou `contêineres`. Basta importar `dlt` um arquivo Python ou uma célula do Jupyter Notebook e criar um pipeline para carregar dados em qualquer um dos destinos suportados. Você pode carregar dados de qualquer fonte que produza estruturas de dados `Python`, incluindo `APIs`, `arquivos`, `bancos de dados` e muito mais. `dlt` também suporta a construção de um [destino personalizado](https://dlthub.com/docs/dlt-ecosystem/destinations/destination), que você pode usar como `ETL` reverso.\n",
50
+ "\n",
51
+ "A biblioteca criará ou atualizará tabelas, inferirá tipos de dados e manipulará dados aninhados automaticamente."
52
+ ]
53
+ },
54
+ {
55
+ "cell_type": "markdown",
56
+ "metadata": {},
57
+ "source": [
58
+ "# <font color=\"pink\">Exemplo 1:</font>"
59
+ ]
60
+ },
61
+ {
62
+ "cell_type": "markdown",
63
+ "metadata": {},
64
+ "source": [
65
+ "## <font color=\"red\">Instalação</font>"
66
+ ]
67
+ },
68
+ {
69
+ "cell_type": "code",
70
+ "execution_count": null,
71
+ "metadata": {},
72
+ "outputs": [],
73
+ "source": [
74
+ "%%capture\n",
75
+ "%pip install \"dlt[duckdb]\" # Instale o dlt com todas as dependências necessárias do DuckDB"
76
+ ]
77
+ },
78
+ {
79
+ "cell_type": "code",
80
+ "execution_count": null,
81
+ "metadata": {},
82
+ "outputs": [],
83
+ "source": [
84
+ "!dlt --version"
85
+ ]
86
+ },
87
+ {
88
+ "cell_type": "markdown",
89
+ "metadata": {},
90
+ "source": [
91
+ "## <font color=\"red\">Importe `dlt` e inicialize o pipeline</font>"
92
+ ]
93
+ },
94
+ {
95
+ "cell_type": "code",
96
+ "execution_count": null,
97
+ "metadata": {},
98
+ "outputs": [],
99
+ "source": [
100
+ "import dlt\n",
101
+ "\n",
102
+ "pipeline = dlt.pipeline(pipeline_name=\"Eddy_pipeline_pokemon\",\n",
103
+ " destination=\"duckdb\",\n",
104
+ " dataset_name=\"Eddy_dataset_pokemon\")"
105
+ ]
106
+ },
107
+ {
108
+ "cell_type": "code",
109
+ "execution_count": null,
110
+ "metadata": {},
111
+ "outputs": [],
112
+ "source": [
113
+ "pipeline"
114
+ ]
115
+ },
116
+ {
117
+ "cell_type": "markdown",
118
+ "metadata": {},
119
+ "source": [
120
+ "## <font color=\"red\">Obtenha dados da fonte</font>"
121
+ ]
122
+ },
123
+ {
124
+ "cell_type": "markdown",
125
+ "metadata": {},
126
+ "source": [
127
+ "<font color=\"orange\">Carregamos a lista de dados de pokémons usando url.</font>"
128
+ ]
129
+ },
130
+ {
131
+ "cell_type": "code",
132
+ "execution_count": null,
133
+ "metadata": {},
134
+ "outputs": [],
135
+ "source": [
136
+ "from dlt.sources.helpers import requests\n",
137
+ "\n",
138
+ "POKEMON_URL = \"https://pokeapi.co/api/v2/pokemon/\"\n",
139
+ "\n",
140
+ "data = requests.get(POKEMON_URL).json()[\"results\"]"
141
+ ]
142
+ },
143
+ {
144
+ "cell_type": "code",
145
+ "execution_count": null,
146
+ "metadata": {},
147
+ "outputs": [],
148
+ "source": [
149
+ "data"
150
+ ]
151
+ },
152
+ {
153
+ "cell_type": "markdown",
154
+ "metadata": {},
155
+ "source": [
156
+ "## <font color=\"red\">Execute o pipeline</font>"
157
+ ]
158
+ },
159
+ {
160
+ "cell_type": "code",
161
+ "execution_count": null,
162
+ "metadata": {},
163
+ "outputs": [],
164
+ "source": [
165
+ "%%capture\n",
166
+ "\n",
167
+ "# Normalize e carregue os dados no banco de dados duckdb criado localmente 'pokemon_pipeline.duckdb'\n",
168
+ "pipeline.run(data, table_name='Eddy_Table_pokemon')"
169
+ ]
170
+ },
171
+ {
172
+ "cell_type": "markdown",
173
+ "metadata": {},
174
+ "source": [
175
+ "## <font color=\"red\">Query nos dados carregados 🦆</font>"
176
+ ]
177
+ },
178
+ {
179
+ "cell_type": "markdown",
180
+ "metadata": {},
181
+ "source": [
182
+ "<font color=\"orange\">Para acessar os dados carregados, conecte-se ao Banco de Dados `DuckDB` usando o conector Python `DuckDB`.</font>"
183
+ ]
184
+ },
185
+ {
186
+ "cell_type": "code",
187
+ "execution_count": null,
188
+ "metadata": {},
189
+ "outputs": [],
190
+ "source": [
191
+ "import duckdb\n",
192
+ "#from google.colab import data_table\n",
193
+ "#data_table.enable_dataframe_formatter()\n",
194
+ "\n",
195
+ "# a database 'chess_pipeline.duckdb' was created in working directory so just connect to it\n",
196
+ "conn = duckdb.connect(f\"{pipeline.pipeline_name}.duckdb\")\n",
197
+ "\n",
198
+ "# Listar todas as tabelas\n",
199
+ "display(conn.sql(\"DESCRIBE\"))\n"
200
+ ]
201
+ },
202
+ {
203
+ "cell_type": "code",
204
+ "execution_count": null,
205
+ "metadata": {},
206
+ "outputs": [],
207
+ "source": [
208
+ "# Isso nos permite consultar dados sem adicionar prefixo de esquema aos nomes de tabelas:\n",
209
+ "conn.sql(f\"SET search_path = '{pipeline.dataset_name}'\")\n"
210
+ ]
211
+ },
212
+ {
213
+ "cell_type": "code",
214
+ "execution_count": null,
215
+ "metadata": {},
216
+ "outputs": [],
217
+ "source": [
218
+ "# listar todas as tabelas\n",
219
+ "display(conn.sql(\"DESCRIBE\"))\n"
220
+ ]
221
+ },
222
+ {
223
+ "cell_type": "code",
224
+ "execution_count": null,
225
+ "metadata": {},
226
+ "outputs": [],
227
+ "source": [
228
+ "stats_table = conn.sql(\"SELECT * FROM Eddy_Table_pokemon\").df()\n",
229
+ "\n",
230
+ "display(stats_table)\n"
231
+ ]
232
+ },
233
+ {
234
+ "cell_type": "markdown",
235
+ "metadata": {},
236
+ "source": [
237
+ "# <font color=\"pink\">Exemplo 2:</font>"
238
+ ]
239
+ },
240
+ {
241
+ "cell_type": "code",
242
+ "execution_count": 1,
243
+ "metadata": {},
244
+ "outputs": [],
245
+ "source": [
246
+ "import dlt\n",
247
+ "from dlt.sources.helpers import requests\n",
248
+ "\n",
249
+ "# Crie um pipeline dlt que será carregado.\n",
250
+ "# Dados do jogador de xadrez para o destino DuckDB\n",
251
+ "pipeline = dlt.pipeline(pipeline_name=\"chess_pipeline\",\n",
252
+ " destination=\"duckdb\",\n",
253
+ " dataset_name=\"player_data\"\n",
254
+ " )\n",
255
+ "\n",
256
+ "\n",
257
+ "# Obtenha alguns dados de jogadores da API do Chess.com:\n",
258
+ "data = []\n",
259
+ "for player in [\"magnuscarlsen\", \"rpragchess\"]:\n",
260
+ " response = requests.get(f\"https://api.chess.com/pub/player/{player}\")\n",
261
+ " response.raise_for_status()\n",
262
+ " data.append(response.json())\n",
263
+ "\n",
264
+ "# Extraia, normalize e carregue os dados:\n",
265
+ "load_info = pipeline.run(data, table_name=\"player\")\n"
266
+ ]
267
+ },
268
+ {
269
+ "cell_type": "code",
270
+ "execution_count": 2,
271
+ "metadata": {},
272
+ "outputs": [
273
+ {
274
+ "name": "stdout",
275
+ "output_type": "stream",
276
+ "text": [
277
+ "Pipeline chess_pipeline load step completed in 0.19 seconds\n",
278
+ "1 load package(s) were loaded to destination duckdb and into dataset player_data\n",
279
+ "The duckdb destination used duckdb:////home/eddygiusepe/1_Eddy_Giusepe/6_REPO_HuggingFace/Data_Pipelines/1_Data_Pipelines/chess_pipeline.duckdb location to store data\n",
280
+ "Load package 1712029347.285123 is LOADED and contains no failed jobs\n"
281
+ ]
282
+ }
283
+ ],
284
+ "source": [
285
+ "print(load_info)"
286
+ ]
287
+ },
288
+ {
289
+ "cell_type": "code",
290
+ "execution_count": 3,
291
+ "metadata": {},
292
+ "outputs": [
293
+ {
294
+ "data": {
295
+ "text/plain": [
296
+ "┌────────────────┬─────────────┬─────────────────────┬──────────────────────┬──────────────────────────────┬───────────┐\n",
297
+ "│ database │ schema │ name │ column_names │ column_types │ temporary │\n",
298
+ "│ varchar │ varchar │ varchar │ varchar[] │ varchar[] │ boolean │\n",
299
+ "├────────────────┼─────────────┼─────────────────────┼──────────────────────┼──────────────────────────────┼───────────┤\n",
300
+ "│ chess_pipeline │ player_data │ _dlt_loads │ [load_id, schema_n… │ [VARCHAR, VARCHAR, BIGINT,… │ false │\n",
301
+ "│ chess_pipeline │ player_data │ _dlt_pipeline_state │ [version, engine_v… │ [BIGINT, BIGINT, VARCHAR, … │ false │\n",
302
+ "│ chess_pipeline │ player_data │ _dlt_version │ [version, engine_v… │ [BIGINT, BIGINT, TIMESTAMP… │ false │\n",
303
+ "│ chess_pipeline │ player_data │ player │ [avatar, player_id… │ [VARCHAR, BIGINT, VARCHAR,… │ false │\n",
304
+ "└────────────────┴─────────────┴─────────────────────┴──────────────────────┴──────────────────────────────┴───────────┘"
305
+ ]
306
+ },
307
+ "metadata": {},
308
+ "output_type": "display_data"
309
+ }
310
+ ],
311
+ "source": [
312
+ "import duckdb\n",
313
+ "\n",
314
+ "# a database 'chess_pipeline.duckdb' was created in working directory so just connect to it\n",
315
+ "conn1 = duckdb.connect(f\"{pipeline.pipeline_name}.duckdb\")\n",
316
+ "\n",
317
+ "# Listar todas as tabelas:\n",
318
+ "display(conn1.sql(\"DESCRIBE\"))"
319
+ ]
320
+ },
321
+ {
322
+ "cell_type": "code",
323
+ "execution_count": 4,
324
+ "metadata": {},
325
+ "outputs": [],
326
+ "source": [
327
+ "# Isso nos permite consultar dados sem adicionar prefixo de esquema aos nomes de tabelas:\n",
328
+ "conn1.sql(f\"SET search_path = '{pipeline.dataset_name}'\")"
329
+ ]
330
+ },
331
+ {
332
+ "cell_type": "code",
333
+ "execution_count": 5,
334
+ "metadata": {},
335
+ "outputs": [
336
+ {
337
+ "data": {
338
+ "text/html": [
339
+ "<div>\n",
340
+ "<style scoped>\n",
341
+ " .dataframe tbody tr th:only-of-type {\n",
342
+ " vertical-align: middle;\n",
343
+ " }\n",
344
+ "\n",
345
+ " .dataframe tbody tr th {\n",
346
+ " vertical-align: top;\n",
347
+ " }\n",
348
+ "\n",
349
+ " .dataframe thead th {\n",
350
+ " text-align: right;\n",
351
+ " }\n",
352
+ "</style>\n",
353
+ "<table border=\"1\" class=\"dataframe\">\n",
354
+ " <thead>\n",
355
+ " <tr style=\"text-align: right;\">\n",
356
+ " <th></th>\n",
357
+ " <th>avatar</th>\n",
358
+ " <th>player_id</th>\n",
359
+ " <th>aid</th>\n",
360
+ " <th>url</th>\n",
361
+ " <th>name</th>\n",
362
+ " <th>username</th>\n",
363
+ " <th>title</th>\n",
364
+ " <th>followers</th>\n",
365
+ " <th>country</th>\n",
366
+ " <th>location</th>\n",
367
+ " <th>last_online</th>\n",
368
+ " <th>joined</th>\n",
369
+ " <th>status</th>\n",
370
+ " <th>is_streamer</th>\n",
371
+ " <th>verified</th>\n",
372
+ " <th>league</th>\n",
373
+ " <th>_dlt_load_id</th>\n",
374
+ " <th>_dlt_id</th>\n",
375
+ " </tr>\n",
376
+ " </thead>\n",
377
+ " <tbody>\n",
378
+ " <tr>\n",
379
+ " <th>0</th>\n",
380
+ " <td>https://images.chesscomfiles.com/uploads/v1/us...</td>\n",
381
+ " <td>3889224</td>\n",
382
+ " <td>https://api.chess.com/pub/player/magnuscarlsen</td>\n",
383
+ " <td>https://www.chess.com/member/MagnusCarlsen</td>\n",
384
+ " <td>Magnus Carlsen</td>\n",
385
+ " <td>magnuscarlsen</td>\n",
386
+ " <td>GM</td>\n",
387
+ " <td>189570</td>\n",
388
+ " <td>https://api.chess.com/pub/country/NO</td>\n",
389
+ " <td>Norway</td>\n",
390
+ " <td>1712007974</td>\n",
391
+ " <td>1282856720</td>\n",
392
+ " <td>premium</td>\n",
393
+ " <td>False</td>\n",
394
+ " <td>False</td>\n",
395
+ " <td>Champion</td>\n",
396
+ " <td>1712028961.2599978</td>\n",
397
+ " <td>lcRtKfLWXvYMIA</td>\n",
398
+ " </tr>\n",
399
+ " <tr>\n",
400
+ " <th>1</th>\n",
401
+ " <td>https://images.chesscomfiles.com/uploads/v1/us...</td>\n",
402
+ " <td>28692936</td>\n",
403
+ " <td>https://api.chess.com/pub/player/rpragchess</td>\n",
404
+ " <td>https://www.chess.com/member/rpragchess</td>\n",
405
+ " <td>Praggnanandhaa Rameshbabu</td>\n",
406
+ " <td>rpragchess</td>\n",
407
+ " <td>GM</td>\n",
408
+ " <td>6906</td>\n",
409
+ " <td>https://api.chess.com/pub/country/IN</td>\n",
410
+ " <td>CHENNAI</td>\n",
411
+ " <td>1711993442</td>\n",
412
+ " <td>1466301035</td>\n",
413
+ " <td>premium</td>\n",
414
+ " <td>False</td>\n",
415
+ " <td>False</td>\n",
416
+ " <td>Crystal</td>\n",
417
+ " <td>1712028961.2599978</td>\n",
418
+ " <td>+erswyz4f2/xMQ</td>\n",
419
+ " </tr>\n",
420
+ " <tr>\n",
421
+ " <th>2</th>\n",
422
+ " <td>https://images.chesscomfiles.com/uploads/v1/us...</td>\n",
423
+ " <td>3889224</td>\n",
424
+ " <td>https://api.chess.com/pub/player/magnuscarlsen</td>\n",
425
+ " <td>https://www.chess.com/member/MagnusCarlsen</td>\n",
426
+ " <td>Magnus Carlsen</td>\n",
427
+ " <td>magnuscarlsen</td>\n",
428
+ " <td>GM</td>\n",
429
+ " <td>189571</td>\n",
430
+ " <td>https://api.chess.com/pub/country/NO</td>\n",
431
+ " <td>Norway</td>\n",
432
+ " <td>1712007974</td>\n",
433
+ " <td>1282856720</td>\n",
434
+ " <td>premium</td>\n",
435
+ " <td>False</td>\n",
436
+ " <td>False</td>\n",
437
+ " <td>Champion</td>\n",
438
+ " <td>1712029347.285123</td>\n",
439
+ " <td>opkc5HKMEFbeuw</td>\n",
440
+ " </tr>\n",
441
+ " <tr>\n",
442
+ " <th>3</th>\n",
443
+ " <td>https://images.chesscomfiles.com/uploads/v1/us...</td>\n",
444
+ " <td>28692936</td>\n",
445
+ " <td>https://api.chess.com/pub/player/rpragchess</td>\n",
446
+ " <td>https://www.chess.com/member/rpragchess</td>\n",
447
+ " <td>Praggnanandhaa Rameshbabu</td>\n",
448
+ " <td>rpragchess</td>\n",
449
+ " <td>GM</td>\n",
450
+ " <td>6906</td>\n",
451
+ " <td>https://api.chess.com/pub/country/IN</td>\n",
452
+ " <td>CHENNAI</td>\n",
453
+ " <td>1711993442</td>\n",
454
+ " <td>1466301035</td>\n",
455
+ " <td>premium</td>\n",
456
+ " <td>False</td>\n",
457
+ " <td>False</td>\n",
458
+ " <td>Crystal</td>\n",
459
+ " <td>1712029347.285123</td>\n",
460
+ " <td>XvJq9hHcwg80rg</td>\n",
461
+ " </tr>\n",
462
+ " </tbody>\n",
463
+ "</table>\n",
464
+ "</div>"
465
+ ],
466
+ "text/plain": [
467
+ " avatar player_id \\\n",
468
+ "0 https://images.chesscomfiles.com/uploads/v1/us... 3889224 \n",
469
+ "1 https://images.chesscomfiles.com/uploads/v1/us... 28692936 \n",
470
+ "2 https://images.chesscomfiles.com/uploads/v1/us... 3889224 \n",
471
+ "3 https://images.chesscomfiles.com/uploads/v1/us... 28692936 \n",
472
+ "\n",
473
+ " aid \\\n",
474
+ "0 https://api.chess.com/pub/player/magnuscarlsen \n",
475
+ "1 https://api.chess.com/pub/player/rpragchess \n",
476
+ "2 https://api.chess.com/pub/player/magnuscarlsen \n",
477
+ "3 https://api.chess.com/pub/player/rpragchess \n",
478
+ "\n",
479
+ " url name \\\n",
480
+ "0 https://www.chess.com/member/MagnusCarlsen Magnus Carlsen \n",
481
+ "1 https://www.chess.com/member/rpragchess Praggnanandhaa Rameshbabu \n",
482
+ "2 https://www.chess.com/member/MagnusCarlsen Magnus Carlsen \n",
483
+ "3 https://www.chess.com/member/rpragchess Praggnanandhaa Rameshbabu \n",
484
+ "\n",
485
+ " username title followers country \\\n",
486
+ "0 magnuscarlsen GM 189570 https://api.chess.com/pub/country/NO \n",
487
+ "1 rpragchess GM 6906 https://api.chess.com/pub/country/IN \n",
488
+ "2 magnuscarlsen GM 189571 https://api.chess.com/pub/country/NO \n",
489
+ "3 rpragchess GM 6906 https://api.chess.com/pub/country/IN \n",
490
+ "\n",
491
+ " location last_online joined status is_streamer verified league \\\n",
492
+ "0 Norway 1712007974 1282856720 premium False False Champion \n",
493
+ "1 CHENNAI 1711993442 1466301035 premium False False Crystal \n",
494
+ "2 Norway 1712007974 1282856720 premium False False Champion \n",
495
+ "3 CHENNAI 1711993442 1466301035 premium False False Crystal \n",
496
+ "\n",
497
+ " _dlt_load_id _dlt_id \n",
498
+ "0 1712028961.2599978 lcRtKfLWXvYMIA \n",
499
+ "1 1712028961.2599978 +erswyz4f2/xMQ \n",
500
+ "2 1712029347.285123 opkc5HKMEFbeuw \n",
501
+ "3 1712029347.285123 XvJq9hHcwg80rg "
502
+ ]
503
+ },
504
+ "execution_count": 5,
505
+ "metadata": {},
506
+ "output_type": "execute_result"
507
+ }
508
+ ],
509
+ "source": [
510
+ "table = conn1.sql(\"SELECT * FROM player\").df()\n",
511
+ "\n",
512
+ "\n",
513
+ "table.head()"
514
+ ]
515
+ },
516
+ {
517
+ "cell_type": "code",
518
+ "execution_count": 6,
519
+ "metadata": {},
520
+ "outputs": [
521
+ {
522
+ "data": {
523
+ "text/plain": [
524
+ "(4, 18)"
525
+ ]
526
+ },
527
+ "execution_count": 6,
528
+ "metadata": {},
529
+ "output_type": "execute_result"
530
+ }
531
+ ],
532
+ "source": [
533
+ "table.shape"
534
+ ]
535
+ },
536
+ {
537
+ "cell_type": "code",
538
+ "execution_count": null,
539
+ "metadata": {},
540
+ "outputs": [],
541
+ "source": []
542
+ }
543
+ ],
544
+ "metadata": {
545
+ "kernelspec": {
546
+ "display_name": "venv_pipeData",
547
+ "language": "python",
548
+ "name": "python3"
549
+ },
550
+ "language_info": {
551
+ "codemirror_mode": {
552
+ "name": "ipython",
553
+ "version": 3
554
+ },
555
+ "file_extension": ".py",
556
+ "mimetype": "text/x-python",
557
+ "name": "python",
558
+ "nbconvert_exporter": "python",
559
+ "pygments_lexer": "ipython3",
560
+ "version": "3.10.12"
561
+ }
562
+ },
563
+ "nbformat": 4,
564
+ "nbformat_minor": 2
565
+ }
requirements.txt ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ astunparse==1.6.3
2
+ certifi==2024.2.2
3
+ charset-normalizer==3.3.2
4
+ click==8.1.7
5
+ dlt==0.4.7
6
+ duckdb==0.9.2
7
+ fsspec==2024.3.1
8
+ gitdb==4.0.11
9
+ GitPython==3.1.43
10
+ giturlparse==0.12.0
11
+ hexbytes==1.2.0
12
+ humanize==4.9.0
13
+ idna==3.6
14
+ install==1.3.5
15
+ jsonpath-ng==1.6.1
16
+ makefun==1.15.2
17
+ orjson==3.9.10
18
+ packaging==24.0
19
+ pathvalidate==3.2.0
20
+ pendulum==3.0.0
21
+ ply==3.11
22
+ python-dateutil==2.9.0.post0
23
+ pytz==2024.1
24
+ PyYAML==6.0.1
25
+ requests==2.31.0
26
+ requirements-parser==0.7.0
27
+ semver==3.0.2
28
+ simplejson==3.19.2
29
+ six==1.16.0
30
+ smmap==5.0.1
31
+ tenacity==8.2.3
32
+ time-machine==2.14.1
33
+ tomlkit==0.12.4
34
+ types-setuptools==69.2.0.20240317
35
+ typing_extensions==4.10.0
36
+ tzdata==2024.1
37
+ urllib3==2.2.1