IsmatS commited on
Commit
d52b844
·
2 Parent(s): f69ca069beafec

Merge pull request #3 from Ismat-Samadov/claude/analyze-code-ApUhl

Browse files
presentation/pitch_deck.pdf ADDED
Binary file (79.4 kB). View file
 
presentation/pitch_deck_print.html ADDED
@@ -0,0 +1,1084 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>SOCAR Historical Documents AI - Hackathon Pitch</title>
7
+ <style>
8
+ @page {
9
+ size: A4 landscape;
10
+ margin: 0;
11
+ }
12
+
13
+ * {
14
+ margin: 0;
15
+ padding: 0;
16
+ box-sizing: border-box;
17
+ }
18
+
19
+ body {
20
+ font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
21
+ background: #0d1117;
22
+ color: #ffffff;
23
+ }
24
+
25
+ .slide {
26
+ width: 297mm;
27
+ height: 210mm;
28
+ padding: 15mm 20mm;
29
+ background: linear-gradient(135deg, #0d1117 0%, #161b22 100%);
30
+ position: relative;
31
+ overflow: hidden;
32
+ page-break-after: always;
33
+ page-break-inside: avoid;
34
+ }
35
+
36
+ .slide:last-child {
37
+ page-break-after: auto;
38
+ }
39
+
40
+ .slide::before {
41
+ content: '';
42
+ position: absolute;
43
+ top: 0;
44
+ left: 0;
45
+ right: 0;
46
+ height: 3px;
47
+ background: linear-gradient(90deg, #00d4aa, #0099ff, #00d4aa);
48
+ }
49
+
50
+ /* Title Slide */
51
+ .title-slide {
52
+ text-align: center;
53
+ display: flex;
54
+ flex-direction: column;
55
+ justify-content: center;
56
+ align-items: center;
57
+ }
58
+
59
+ .title-slide h1 {
60
+ font-size: 42pt;
61
+ font-weight: 700;
62
+ color: #00d4aa;
63
+ margin-bottom: 10px;
64
+ }
65
+
66
+ .title-slide .subtitle {
67
+ font-size: 18pt;
68
+ color: #8b949e;
69
+ margin-bottom: 20px;
70
+ }
71
+
72
+ .title-slide .tagline {
73
+ font-size: 14pt;
74
+ color: #58a6ff;
75
+ padding: 10px 20px;
76
+ border: 2px solid #30363d;
77
+ border-radius: 8px;
78
+ background: rgba(88, 166, 255, 0.1);
79
+ }
80
+
81
+ .title-slide .team-info {
82
+ margin-top: 30px;
83
+ }
84
+
85
+ .title-slide .team-name {
86
+ font-size: 16pt;
87
+ color: #00d4aa;
88
+ font-weight: 700;
89
+ margin-bottom: 8px;
90
+ }
91
+
92
+ .title-slide .team-members {
93
+ font-size: 12pt;
94
+ color: #8b949e;
95
+ }
96
+
97
+ /* Regular Slides */
98
+ h2 {
99
+ font-size: 28pt;
100
+ color: #00d4aa;
101
+ margin-bottom: 20px;
102
+ }
103
+
104
+ h2 .icon {
105
+ font-size: 24pt;
106
+ margin-right: 10px;
107
+ }
108
+
109
+ h3 {
110
+ font-size: 14pt;
111
+ color: #58a6ff;
112
+ margin: 15px 0 10px 0;
113
+ }
114
+
115
+ p {
116
+ font-size: 11pt;
117
+ line-height: 1.6;
118
+ color: #c9d1d9;
119
+ }
120
+
121
+ ul {
122
+ list-style: none;
123
+ padding-left: 0;
124
+ }
125
+
126
+ li {
127
+ font-size: 11pt;
128
+ line-height: 1.8;
129
+ color: #c9d1d9;
130
+ padding-left: 20px;
131
+ position: relative;
132
+ }
133
+
134
+ li::before {
135
+ content: '>';
136
+ position: absolute;
137
+ left: 0;
138
+ color: #00d4aa;
139
+ font-size: 10pt;
140
+ }
141
+
142
+ /* Stats Grid */
143
+ .stats-grid {
144
+ display: flex;
145
+ justify-content: space-around;
146
+ gap: 15px;
147
+ margin-top: 20px;
148
+ }
149
+
150
+ .stat-card {
151
+ background: linear-gradient(135deg, #21262d 0%, #161b22 100%);
152
+ border: 1px solid #30363d;
153
+ border-radius: 10px;
154
+ padding: 15px 25px;
155
+ text-align: center;
156
+ flex: 1;
157
+ }
158
+
159
+ .stat-card .number {
160
+ font-size: 24pt;
161
+ font-weight: 700;
162
+ color: #00d4aa;
163
+ }
164
+
165
+ .stat-card .label {
166
+ font-size: 9pt;
167
+ color: #8b949e;
168
+ margin-top: 5px;
169
+ text-transform: uppercase;
170
+ letter-spacing: 1px;
171
+ }
172
+
173
+ /* Architecture Diagram */
174
+ .architecture {
175
+ display: flex;
176
+ justify-content: space-between;
177
+ align-items: center;
178
+ margin-top: 15px;
179
+ padding: 10px;
180
+ }
181
+
182
+ .arch-box {
183
+ background: linear-gradient(135deg, #21262d 0%, #161b22 100%);
184
+ border: 2px solid #30363d;
185
+ border-radius: 8px;
186
+ padding: 12px 18px;
187
+ text-align: center;
188
+ min-width: 90px;
189
+ }
190
+
191
+ .arch-box.highlight {
192
+ border-color: #00d4aa;
193
+ }
194
+
195
+ .arch-box .title {
196
+ font-size: 8pt;
197
+ color: #8b949e;
198
+ text-transform: uppercase;
199
+ letter-spacing: 1px;
200
+ margin-bottom: 4px;
201
+ }
202
+
203
+ .arch-box .tech {
204
+ font-size: 10pt;
205
+ color: #58a6ff;
206
+ font-weight: 600;
207
+ }
208
+
209
+ .arrow {
210
+ font-size: 16pt;
211
+ color: #00d4aa;
212
+ }
213
+
214
+ /* Two Column Layout */
215
+ .two-col {
216
+ display: flex;
217
+ gap: 30px;
218
+ margin-top: 10px;
219
+ }
220
+
221
+ .col {
222
+ background: rgba(33, 38, 45, 0.5);
223
+ border-radius: 10px;
224
+ padding: 15px;
225
+ border: 1px solid #30363d;
226
+ flex: 1;
227
+ }
228
+
229
+ /* Tech Stack */
230
+ .tech-stack {
231
+ display: flex;
232
+ flex-wrap: wrap;
233
+ gap: 10px;
234
+ margin-top: 15px;
235
+ }
236
+
237
+ .tech-item {
238
+ background: linear-gradient(135deg, #21262d 0%, #161b22 100%);
239
+ border: 1px solid #30363d;
240
+ border-radius: 8px;
241
+ padding: 10px 15px;
242
+ display: flex;
243
+ align-items: center;
244
+ gap: 10px;
245
+ width: calc(33% - 10px);
246
+ }
247
+
248
+ .tech-item .icon {
249
+ font-size: 16pt;
250
+ }
251
+
252
+ .tech-item .name {
253
+ font-size: 10pt;
254
+ color: #c9d1d9;
255
+ }
256
+
257
+ .tech-item .desc {
258
+ font-size: 8pt;
259
+ color: #8b949e;
260
+ }
261
+
262
+ /* Comparison Table */
263
+ .comparison-table {
264
+ width: 100%;
265
+ margin-top: 15px;
266
+ border-collapse: collapse;
267
+ font-size: 10pt;
268
+ }
269
+
270
+ .comparison-table th,
271
+ .comparison-table td {
272
+ padding: 10px 15px;
273
+ text-align: left;
274
+ border-bottom: 1px solid #30363d;
275
+ }
276
+
277
+ .comparison-table th {
278
+ background: #21262d;
279
+ color: #58a6ff;
280
+ font-size: 10pt;
281
+ font-weight: 600;
282
+ }
283
+
284
+ .comparison-table td {
285
+ color: #c9d1d9;
286
+ }
287
+
288
+ .comparison-table .winner {
289
+ color: #00d4aa;
290
+ font-weight: 600;
291
+ }
292
+
293
+ .comparison-table .badge {
294
+ display: inline-block;
295
+ padding: 2px 8px;
296
+ border-radius: 10px;
297
+ font-size: 8pt;
298
+ font-weight: 600;
299
+ }
300
+
301
+ .badge.open {
302
+ background: rgba(0, 212, 170, 0.2);
303
+ color: #00d4aa;
304
+ }
305
+
306
+ .badge.closed {
307
+ background: rgba(255, 107, 107, 0.2);
308
+ color: #ff6b6b;
309
+ }
310
+
311
+ /* Flow Diagram */
312
+ .flow {
313
+ display: flex;
314
+ flex-direction: column;
315
+ gap: 8px;
316
+ margin-top: 10px;
317
+ }
318
+
319
+ .flow-row {
320
+ display: flex;
321
+ align-items: center;
322
+ gap: 8px;
323
+ }
324
+
325
+ .flow-box {
326
+ background: #21262d;
327
+ border: 1px solid #30363d;
328
+ border-radius: 6px;
329
+ padding: 6px 12px;
330
+ font-size: 9pt;
331
+ color: #c9d1d9;
332
+ }
333
+
334
+ .flow-box.primary {
335
+ border-color: #00d4aa;
336
+ color: #00d4aa;
337
+ }
338
+
339
+ .flow-arrow {
340
+ color: #58a6ff;
341
+ font-size: 10pt;
342
+ }
343
+
344
+ /* Problem icons */
345
+ .problem-grid {
346
+ display: flex;
347
+ gap: 20px;
348
+ margin-top: 20px;
349
+ }
350
+
351
+ .problem-card {
352
+ background: linear-gradient(135deg, #21262d 0%, #161b22 100%);
353
+ border: 1px solid #30363d;
354
+ border-radius: 10px;
355
+ padding: 20px;
356
+ text-align: center;
357
+ flex: 1;
358
+ }
359
+
360
+ .problem-card .icon {
361
+ font-size: 24pt;
362
+ margin-bottom: 10px;
363
+ }
364
+
365
+ .problem-card h4 {
366
+ font-size: 12pt;
367
+ color: #ff6b6b;
368
+ margin-bottom: 8px;
369
+ }
370
+
371
+ .problem-card p {
372
+ font-size: 9pt;
373
+ color: #8b949e;
374
+ }
375
+
376
+ /* Solution cards */
377
+ .solution-grid {
378
+ display: flex;
379
+ flex-wrap: wrap;
380
+ gap: 15px;
381
+ margin-top: 15px;
382
+ }
383
+
384
+ .solution-card {
385
+ background: linear-gradient(135deg, rgba(0, 212, 170, 0.1) 0%, rgba(0, 153, 255, 0.1) 100%);
386
+ border: 1px solid #00d4aa;
387
+ border-radius: 10px;
388
+ padding: 15px;
389
+ width: calc(50% - 10px);
390
+ }
391
+
392
+ .solution-card h4 {
393
+ font-size: 12pt;
394
+ color: #00d4aa;
395
+ margin-bottom: 8px;
396
+ }
397
+
398
+ .solution-card p {
399
+ font-size: 9pt;
400
+ color: #c9d1d9;
401
+ }
402
+
403
+ /* Score breakdown */
404
+ .score-breakdown {
405
+ margin-top: 15px;
406
+ }
407
+
408
+ .score-item {
409
+ display: flex;
410
+ align-items: center;
411
+ margin-bottom: 12px;
412
+ }
413
+
414
+ .score-label {
415
+ width: 140px;
416
+ font-size: 10pt;
417
+ color: #c9d1d9;
418
+ }
419
+
420
+ .score-bar-container {
421
+ flex: 1;
422
+ height: 20px;
423
+ background: #21262d;
424
+ border-radius: 10px;
425
+ overflow: hidden;
426
+ margin: 0 15px;
427
+ }
428
+
429
+ .score-bar {
430
+ height: 100%;
431
+ background: linear-gradient(90deg, #00d4aa, #0099ff);
432
+ border-radius: 10px;
433
+ }
434
+
435
+ .score-value {
436
+ width: 80px;
437
+ font-size: 11pt;
438
+ font-weight: 700;
439
+ color: #00d4aa;
440
+ text-align: right;
441
+ }
442
+
443
+ /* Final slide */
444
+ .final-slide {
445
+ text-align: center;
446
+ display: flex;
447
+ flex-direction: column;
448
+ justify-content: center;
449
+ align-items: center;
450
+ }
451
+
452
+ .final-slide h2 {
453
+ font-size: 32pt;
454
+ margin-bottom: 15px;
455
+ }
456
+
457
+ /* Highlight text */
458
+ .highlight-text {
459
+ color: #00d4aa;
460
+ font-weight: 600;
461
+ }
462
+
463
+ /* Demo section */
464
+ .demo-features {
465
+ display: flex;
466
+ flex-wrap: wrap;
467
+ gap: 15px;
468
+ margin-top: 15px;
469
+ }
470
+
471
+ .demo-feature {
472
+ background: #21262d;
473
+ border: 1px solid #30363d;
474
+ border-radius: 10px;
475
+ padding: 15px;
476
+ display: flex;
477
+ gap: 12px;
478
+ align-items: flex-start;
479
+ width: calc(50% - 10px);
480
+ }
481
+
482
+ .demo-feature .icon {
483
+ font-size: 20pt;
484
+ color: #00d4aa;
485
+ }
486
+
487
+ .demo-feature h4 {
488
+ font-size: 11pt;
489
+ color: #c9d1d9;
490
+ margin-bottom: 5px;
491
+ }
492
+
493
+ .demo-feature p {
494
+ font-size: 9pt;
495
+ color: #8b949e;
496
+ }
497
+
498
+ /* API endpoints */
499
+ .endpoint {
500
+ background: #161b22;
501
+ border: 1px solid #30363d;
502
+ border-radius: 6px;
503
+ padding: 10px 15px;
504
+ margin: 8px 0;
505
+ font-family: 'Courier New', monospace;
506
+ }
507
+
508
+ .endpoint .method {
509
+ display: inline-block;
510
+ padding: 2px 8px;
511
+ border-radius: 4px;
512
+ font-size: 8pt;
513
+ font-weight: 700;
514
+ margin-right: 10px;
515
+ }
516
+
517
+ .endpoint .method.post {
518
+ background: rgba(0, 212, 170, 0.2);
519
+ color: #00d4aa;
520
+ }
521
+
522
+ .endpoint .method.get {
523
+ background: rgba(88, 166, 255, 0.2);
524
+ color: #58a6ff;
525
+ }
526
+
527
+ .endpoint .path {
528
+ color: #c9d1d9;
529
+ font-size: 10pt;
530
+ }
531
+
532
+ .endpoint .desc {
533
+ color: #8b949e;
534
+ font-size: 8pt;
535
+ margin-left: 50px;
536
+ margin-top: 3px;
537
+ }
538
+
539
+ /* Key decisions */
540
+ .decision-list {
541
+ margin-top: 12px;
542
+ }
543
+
544
+ .decision-item {
545
+ background: #21262d;
546
+ border-left: 3px solid #00d4aa;
547
+ padding: 12px 15px;
548
+ margin: 10px 0;
549
+ border-radius: 0 6px 6px 0;
550
+ }
551
+
552
+ .decision-item h4 {
553
+ color: #58a6ff;
554
+ font-size: 11pt;
555
+ margin-bottom: 5px;
556
+ }
557
+
558
+ .decision-item p {
559
+ font-size: 9pt;
560
+ color: #8b949e;
561
+ }
562
+
563
+ .decision-item .result {
564
+ color: #00d4aa;
565
+ font-weight: 600;
566
+ }
567
+
568
+ .slide-number {
569
+ position: absolute;
570
+ bottom: 10mm;
571
+ right: 15mm;
572
+ font-size: 10pt;
573
+ color: #8b949e;
574
+ }
575
+ </style>
576
+ </head>
577
+ <body>
578
+ <!-- Slide 1: Title -->
579
+ <div class="slide title-slide">
580
+ <h1>SOCAR Historical Documents AI</h1>
581
+ <p class="subtitle">Intelligent OCR & RAG System for Oil & Gas Archives</p>
582
+ <p class="tagline">Transforming 28 Historical Documents into Searchable Knowledge</p>
583
+ <div class="team-info">
584
+ <p class="team-name">Team BeatByte</p>
585
+ <p class="team-members">Ulvi Bashirov | Samir Mehdiyev | Ismat Samadov</p>
586
+ </div>
587
+ <div class="slide-number">1 / 12</div>
588
+ </div>
589
+
590
+ <!-- Slide 2: Problem Statement -->
591
+ <div class="slide">
592
+ <h2><span class="icon">!</span> The Problem</h2>
593
+ <div class="problem-grid">
594
+ <div class="problem-card">
595
+ <div class="icon">PDF</div>
596
+ <h4>Inaccessible Archives</h4>
597
+ <p>Decades of valuable historical documents locked in PDF format, impossible to search</p>
598
+ </div>
599
+ <div class="problem-card">
600
+ <div class="icon">ABC</div>
601
+ <h4>Multi-Language Barrier</h4>
602
+ <p>Documents in Azerbaijani, Russian, and English with complex Cyrillic text</p>
603
+ </div>
604
+ <div class="problem-card">
605
+ <div class="icon">TIME</div>
606
+ <h4>Time-Consuming Research</h4>
607
+ <p>Manual document review takes hours to find specific information</p>
608
+ </div>
609
+ </div>
610
+ <p style="margin-top: 25px; text-align: center; font-size: 14pt; color: #ff6b6b;">
611
+ How can we unlock institutional knowledge trapped in historical documents?
612
+ </p>
613
+ <div class="slide-number">2 / 12</div>
614
+ </div>
615
+
616
+ <!-- Slide 3: Our Solution -->
617
+ <div class="slide">
618
+ <h2><span class="icon">*</span> Our Solution</h2>
619
+ <div class="solution-grid">
620
+ <div class="solution-card">
621
+ <h4>Vision-Language OCR</h4>
622
+ <p>State-of-the-art Llama-4-Maverick model extracts text from scanned documents with <span class="highlight-text">87.75% accuracy</span>, preserving Cyrillic characters perfectly</p>
623
+ </div>
624
+ <div class="solution-card">
625
+ <h4>Semantic Search</h4>
626
+ <p>BAAI/bge-large embeddings + Pinecone vector database enable instant retrieval across <span class="highlight-text">1,128 document chunks</span></p>
627
+ </div>
628
+ <div class="solution-card">
629
+ <h4>RAG-Powered Q&A</h4>
630
+ <p>Natural language questions answered with relevant context and <span class="highlight-text">source citations</span> for verification</p>
631
+ </div>
632
+ <div class="solution-card">
633
+ <h4>Production-Ready API</h4>
634
+ <p>FastAPI backend with Docker deployment, health monitoring, and interactive web interface</p>
635
+ </div>
636
+ </div>
637
+ <div class="slide-number">3 / 12</div>
638
+ </div>
639
+
640
+ <!-- Slide 4: Architecture -->
641
+ <div class="slide">
642
+ <h2><span class="icon">#</span> System Architecture</h2>
643
+ <div class="architecture">
644
+ <div class="arch-box">
645
+ <div class="title">Input</div>
646
+ <div class="tech">PDF Documents</div>
647
+ </div>
648
+ <span class="arrow">-></span>
649
+ <div class="arch-box highlight">
650
+ <div class="title">OCR Engine</div>
651
+ <div class="tech">Llama-4 Vision</div>
652
+ </div>
653
+ <span class="arrow">-></span>
654
+ <div class="arch-box">
655
+ <div class="title">Embeddings</div>
656
+ <div class="tech">BAAI/bge-large</div>
657
+ </div>
658
+ <span class="arrow">-></span>
659
+ <div class="arch-box highlight">
660
+ <div class="title">Vector DB</div>
661
+ <div class="tech">Pinecone Cloud</div>
662
+ </div>
663
+ <span class="arrow">-></span>
664
+ <div class="arch-box">
665
+ <div class="title">LLM</div>
666
+ <div class="tech">Llama-4 17B</div>
667
+ </div>
668
+ </div>
669
+ <div class="two-col" style="margin-top: 20px;">
670
+ <div class="col">
671
+ <h3>OCR Pipeline</h3>
672
+ <div class="flow">
673
+ <div class="flow-row">
674
+ <div class="flow-box">PDF Upload</div>
675
+ <span class="flow-arrow">-></span>
676
+ <div class="flow-box">PyMuPDF (100 DPI)</div>
677
+ <span class="flow-arrow">-></span>
678
+ <div class="flow-box primary">Vision LLM</div>
679
+ </div>
680
+ <div class="flow-row">
681
+ <div class="flow-box">Image Detection</div>
682
+ <span class="flow-arrow">-></span>
683
+ <div class="flow-box">Markdown Output</div>
684
+ </div>
685
+ </div>
686
+ </div>
687
+ <div class="col">
688
+ <h3>RAG Pipeline</h3>
689
+ <div class="flow">
690
+ <div class="flow-row">
691
+ <div class="flow-box">User Question</div>
692
+ <span class="flow-arrow">-></span>
693
+ <div class="flow-box">Embed Query</div>
694
+ <span class="flow-arrow">-></span>
695
+ <div class="flow-box primary">Top-3 Retrieval</div>
696
+ </div>
697
+ <div class="flow-row">
698
+ <div class="flow-box">Context Building</div>
699
+ <span class="flow-arrow">-></span>
700
+ <div class="flow-box">LLM + Citations</div>
701
+ </div>
702
+ </div>
703
+ </div>
704
+ </div>
705
+ <div class="slide-number">4 / 12</div>
706
+ </div>
707
+
708
+ <!-- Slide 5: Technology Stack -->
709
+ <div class="slide">
710
+ <h2><span class="icon">+</span> Technology Stack</h2>
711
+ <div class="tech-stack">
712
+ <div class="tech-item">
713
+ <span class="icon">L</span>
714
+ <div>
715
+ <div class="name">Llama-4-Maverick 17B</div>
716
+ <div class="desc">Vision & Language Model</div>
717
+ </div>
718
+ </div>
719
+ <div class="tech-item">
720
+ <span class="icon">B</span>
721
+ <div>
722
+ <div class="name">BAAI/bge-large-en</div>
723
+ <div class="desc">1024-dim Embeddings</div>
724
+ </div>
725
+ </div>
726
+ <div class="tech-item">
727
+ <span class="icon">P</span>
728
+ <div>
729
+ <div class="name">Pinecone Cloud</div>
730
+ <div class="desc">Vector Database</div>
731
+ </div>
732
+ </div>
733
+ <div class="tech-item">
734
+ <span class="icon">F</span>
735
+ <div>
736
+ <div class="name">FastAPI</div>
737
+ <div class="desc">Async REST API</div>
738
+ </div>
739
+ </div>
740
+ <div class="tech-item">
741
+ <span class="icon">M</span>
742
+ <div>
743
+ <div class="name">PyMuPDF</div>
744
+ <div class="desc">PDF Processing</div>
745
+ </div>
746
+ </div>
747
+ <div class="tech-item">
748
+ <span class="icon">D</span>
749
+ <div>
750
+ <div class="name">Docker</div>
751
+ <div class="desc">Containerization</div>
752
+ </div>
753
+ </div>
754
+ </div>
755
+ <div style="margin-top: 25px;">
756
+ <h3>API Endpoints</h3>
757
+ <div class="endpoint">
758
+ <span class="method post">POST</span>
759
+ <span class="path">/ocr</span>
760
+ <div class="desc">Extract text from uploaded PDF with image detection</div>
761
+ </div>
762
+ <div class="endpoint">
763
+ <span class="method post">POST</span>
764
+ <span class="path">/llm</span>
765
+ <div class="desc">RAG-based Q&A with source citations</div>
766
+ </div>
767
+ <div class="endpoint">
768
+ <span class="method get">GET</span>
769
+ <span class="path">/health</span>
770
+ <div class="desc">Service health check and vector count</div>
771
+ </div>
772
+ </div>
773
+ <div class="slide-number">5 / 12</div>
774
+ </div>
775
+
776
+ <!-- Slide 6: Benchmark Results -->
777
+ <div class="slide">
778
+ <h2><span class="icon">%</span> Benchmark Results</h2>
779
+ <p style="margin-bottom: 15px;">We rigorously tested <span class="highlight-text">3 OCR models</span>, <span class="highlight-text">7 RAG configurations</span>, and <span class="highlight-text">3 LLMs</span> to optimize performance</p>
780
+
781
+ <h3>OCR Model Comparison</h3>
782
+ <table class="comparison-table">
783
+ <tr>
784
+ <th>Model</th>
785
+ <th>Character Success Rate</th>
786
+ <th>Word Success Rate</th>
787
+ <th>Speed (12 pages)</th>
788
+ <th>Type</th>
789
+ </tr>
790
+ <tr>
791
+ <td>GPT-4.1</td>
792
+ <td>88.12%</td>
793
+ <td>67.44%</td>
794
+ <td>199s</td>
795
+ <td><span class="badge closed">Closed</span></td>
796
+ </tr>
797
+ <tr>
798
+ <td class="winner">Llama-4-Maverick 17B [Selected]</td>
799
+ <td class="winner">87.75%</td>
800
+ <td class="winner">61.91%</td>
801
+ <td class="winner">75s</td>
802
+ <td><span class="badge open">Open</span></td>
803
+ </tr>
804
+ <tr>
805
+ <td>Phi-4-multimodal</td>
806
+ <td colspan="3" style="color: #ff6b6b;">Failed</td>
807
+ <td><span class="badge open">Open</span></td>
808
+ </tr>
809
+ </table>
810
+ <p style="margin-top: 15px; color: #00d4aa;">
811
+ Selected Llama-4: Only 0.37% accuracy loss vs GPT-4.1, but <strong>2.7x faster</strong> and <strong>open-source</strong>
812
+ </p>
813
+ <div class="slide-number">6 / 12</div>
814
+ </div>
815
+
816
+ <!-- Slide 7: RAG Optimization -->
817
+ <div class="slide">
818
+ <h2><span class="icon">@</span> RAG Optimization Results</h2>
819
+ <table class="comparison-table">
820
+ <tr>
821
+ <th>Configuration</th>
822
+ <th>Answer Quality</th>
823
+ <th>Citation Rate</th>
824
+ <th>Response Time</th>
825
+ </tr>
826
+ <tr>
827
+ <td class="winner">Citation-focused + Vanilla k3 [Selected]</td>
828
+ <td class="winner">55.67%</td>
829
+ <td class="winner">73.33%</td>
830
+ <td class="winner">3.61s</td>
831
+ </tr>
832
+ <tr>
833
+ <td>Few-shot + Vanilla k3</td>
834
+ <td>45.70%</td>
835
+ <td>40.00%</td>
836
+ <td>2.17s</td>
837
+ </tr>
838
+ <tr>
839
+ <td>Baseline + Vanilla k3</td>
840
+ <td>39.65%</td>
841
+ <td>20.00%</td>
842
+ <td>2.28s</td>
843
+ </tr>
844
+ <tr>
845
+ <td>MMR Retrieval</td>
846
+ <td>34.60%</td>
847
+ <td>6.67%</td>
848
+ <td>2.53s</td>
849
+ </tr>
850
+ </table>
851
+
852
+ <div class="decision-list">
853
+ <div class="decision-item">
854
+ <h4>Key Insight: Simple Beats Complex</h4>
855
+ <p>Vanilla retrieval outperforms MMR reranking by <span class="result">+21%</span>. Top-3 beats Top-5 by <span class="result">+20%</span></p>
856
+ </div>
857
+ <div class="decision-item">
858
+ <h4>Citation-Focused Prompting</h4>
859
+ <p>Custom Azerbaijani prompt improves quality by <span class="result">+16%</span> and citation rate by <span class="result">+53%</span></p>
860
+ </div>
861
+ </div>
862
+ <div class="slide-number">7 / 12</div>
863
+ </div>
864
+
865
+ <!-- Slide 8: Performance Metrics -->
866
+ <div class="slide">
867
+ <h2><span class="icon">^</span> Performance Metrics</h2>
868
+ <div class="stats-grid">
869
+ <div class="stat-card">
870
+ <div class="number">87.75%</div>
871
+ <div class="label">OCR Accuracy</div>
872
+ </div>
873
+ <div class="stat-card">
874
+ <div class="number">55.67%</div>
875
+ <div class="label">Answer Quality</div>
876
+ </div>
877
+ <div class="stat-card">
878
+ <div class="number">73.33%</div>
879
+ <div class="label">Citation Rate</div>
880
+ </div>
881
+ <div class="stat-card">
882
+ <div class="number">3.6s</div>
883
+ <div class="label">Response Time</div>
884
+ </div>
885
+ </div>
886
+
887
+ <h3 style="margin-top: 25px;">Estimated Hackathon Score</h3>
888
+ <div class="score-breakdown">
889
+ <div class="score-item">
890
+ <span class="score-label">OCR Quality (50%)</span>
891
+ <div class="score-bar-container">
892
+ <div class="score-bar" style="width: 87.75%;"></div>
893
+ </div>
894
+ <span class="score-value">43.9 / 50</span>
895
+ </div>
896
+ <div class="score-item">
897
+ <span class="score-label">LLM Quality (30%)</span>
898
+ <div class="score-bar-container">
899
+ <div class="score-bar" style="width: 55.67%;"></div>
900
+ </div>
901
+ <span class="score-value">16.7 / 30</span>
902
+ </div>
903
+ <div class="score-item">
904
+ <span class="score-label">Architecture (20%)</span>
905
+ <div class="score-bar-container">
906
+ <div class="score-bar" style="width: 100%;"></div>
907
+ </div>
908
+ <span class="score-value">20 / 20</span>
909
+ </div>
910
+ <div class="score-item" style="border-top: 2px solid #00d4aa; padding-top: 12px; margin-top: 8px;">
911
+ <span class="score-label" style="color: #00d4aa; font-weight: 700;">TOTAL SCORE</span>
912
+ <div class="score-bar-container">
913
+ <div class="score-bar" style="width: 88.1%;"></div>
914
+ </div>
915
+ <span class="score-value" style="font-size: 14pt;">440.6 / 500</span>
916
+ </div>
917
+ </div>
918
+ <div class="slide-number">8 / 12</div>
919
+ </div>
920
+
921
+ <!-- Slide 9: Key Technical Decisions -->
922
+ <div class="slide">
923
+ <h2><span class="icon">&</span> Key Technical Decisions</h2>
924
+ <div class="two-col">
925
+ <div class="col">
926
+ <h3 style="color: #00d4aa;">What We Did</h3>
927
+ <ul>
928
+ <li><strong>Open-source Llama</strong> over proprietary GPT-4</li>
929
+ <li><strong>Top-3 retrieval</strong> - more context confused the LLM</li>
930
+ <li><strong>Vanilla retrieval</strong> - simple beats complex reranking</li>
931
+ <li><strong>Citation-focused prompt</strong> in Azerbaijani</li>
932
+ <li><strong>BAAI embeddings</strong> - 25% better than multilingual</li>
933
+ <li><strong>600-char chunks</strong> with 100-char overlap</li>
934
+ </ul>
935
+ </div>
936
+ <div class="col">
937
+ <h3 style="color: #ff6b6b;">What We Avoided</h3>
938
+ <ul>
939
+ <li><strong>MMR/Reranking</strong> - 21% worse performance</li>
940
+ <li><strong>Top-5+ retrieval</strong> - information overload</li>
941
+ <li><strong>Few-shot prompting</strong> - inconsistent results</li>
942
+ <li><strong>Multilingual embeddings</strong> - underperformed</li>
943
+ <li><strong>Complex architectures</strong> - kept it simple</li>
944
+ <li><strong>Closed-source models</strong> - for transparency</li>
945
+ </ul>
946
+ </div>
947
+ </div>
948
+ <div style="margin-top: 20px; text-align: center; padding: 15px; background: rgba(0, 212, 170, 0.1); border-radius: 8px; border: 1px solid #00d4aa;">
949
+ <p style="font-size: 12pt; color: #00d4aa;">
950
+ "Every decision was validated through rigorous benchmarking across 3 Jupyter notebooks"
951
+ </p>
952
+ </div>
953
+ <div class="slide-number">9 / 12</div>
954
+ </div>
955
+
956
+ <!-- Slide 10: Demo Features -->
957
+ <div class="slide">
958
+ <h2><span class="icon">></span> Live Demo Features</h2>
959
+ <div class="demo-features">
960
+ <div class="demo-feature">
961
+ <span class="icon">[^]</span>
962
+ <div>
963
+ <h4>PDF Upload & OCR</h4>
964
+ <p>Drag & drop any PDF to extract text with image detection. Results in markdown format.</p>
965
+ </div>
966
+ </div>
967
+ <div class="demo-feature">
968
+ <span class="icon">[?]</span>
969
+ <div>
970
+ <h4>Interactive Q&A Chat</h4>
971
+ <p>Ask questions in Azerbaijani, Russian, or English. Get answers with source citations.</p>
972
+ </div>
973
+ </div>
974
+ <div class="demo-feature">
975
+ <span class="icon">[i]</span>
976
+ <div>
977
+ <h4>Source Citations</h4>
978
+ <p>Every answer includes document name, page number, and relevant excerpt for verification.</p>
979
+ </div>
980
+ </div>
981
+ <div class="demo-feature">
982
+ <span class="icon">[=]</span>
983
+ <div>
984
+ <h4>Swagger Documentation</h4>
985
+ <p>Full API documentation at /docs with interactive testing capabilities.</p>
986
+ </div>
987
+ </div>
988
+ </div>
989
+ <div style="margin-top: 30px; text-align: center;">
990
+ <p style="font-size: 14pt; color: #58a6ff;">
991
+ Web UI: <strong>localhost:8000</strong> | API Docs: <strong>/docs</strong>
992
+ </p>
993
+ </div>
994
+ <div class="slide-number">10 / 12</div>
995
+ </div>
996
+
997
+ <!-- Slide 11: What We Built -->
998
+ <div class="slide">
999
+ <h2><span class="icon">=</span> Deliverables</h2>
1000
+ <div class="stats-grid">
1001
+ <div class="stat-card">
1002
+ <div class="number">28</div>
1003
+ <div class="label">PDFs Processed</div>
1004
+ </div>
1005
+ <div class="stat-card">
1006
+ <div class="number">1,128</div>
1007
+ <div class="label">Vector Chunks</div>
1008
+ </div>
1009
+ <div class="stat-card">
1010
+ <div class="number">3</div>
1011
+ <div class="label">Benchmark Notebooks</div>
1012
+ </div>
1013
+ <div class="stat-card">
1014
+ <div class="number">100%</div>
1015
+ <div class="label">Open Source</div>
1016
+ </div>
1017
+ </div>
1018
+ <div class="two-col" style="margin-top: 20px;">
1019
+ <div class="col">
1020
+ <h3>Code & Infrastructure</h3>
1021
+ <ul>
1022
+ <li>FastAPI application (505 lines)</li>
1023
+ <li>Data ingestion pipeline</li>
1024
+ <li>Parallel processing (4x speedup)</li>
1025
+ <li>Docker + Docker Compose</li>
1026
+ <li>Health monitoring</li>
1027
+ <li>Interactive web UI</li>
1028
+ </ul>
1029
+ </div>
1030
+ <div class="col">
1031
+ <h3>Documentation & Analysis</h3>
1032
+ <ul>
1033
+ <li>8 comprehensive markdown docs</li>
1034
+ <li>VLM OCR benchmark notebook</li>
1035
+ <li>RAG optimization notebook</li>
1036
+ <li>LLM comparison notebook</li>
1037
+ <li>Sample questions & answers</li>
1038
+ <li>Deployment guide</li>
1039
+ </ul>
1040
+ </div>
1041
+ </div>
1042
+ <div class="slide-number">11 / 12</div>
1043
+ </div>
1044
+
1045
+ <!-- Slide 12: Thank You -->
1046
+ <div class="slide final-slide">
1047
+ <h2>Thank You!</h2>
1048
+ <p style="font-size: 16pt; color: #c9d1d9; margin-bottom: 8px;">
1049
+ SOCAR Historical Documents AI System
1050
+ </p>
1051
+ <p style="font-size: 11pt; color: #8b949e; margin-bottom: 15px;">
1052
+ Transforming archives into accessible, searchable knowledge
1053
+ </p>
1054
+ <div style="margin-bottom: 20px;">
1055
+ <p style="font-size: 14pt; color: #00d4aa; font-weight: 700; margin-bottom: 8px;">Team BeatByte</p>
1056
+ <p style="font-size: 11pt; color: #58a6ff;">Ulvi Bashirov | Samir Mehdiyev | Ismat Samadov</p>
1057
+ </div>
1058
+ <div class="stats-grid" style="max-width: 600px;">
1059
+ <div class="stat-card">
1060
+ <div class="number">87.75%</div>
1061
+ <div class="label">OCR Accuracy</div>
1062
+ </div>
1063
+ <div class="stat-card">
1064
+ <div class="number">440.6</div>
1065
+ <div class="label">Est. Score / 500</div>
1066
+ </div>
1067
+ <div class="stat-card">
1068
+ <div class="number">100%</div>
1069
+ <div class="label">Open Source</div>
1070
+ </div>
1071
+ <div class="stat-card">
1072
+ <div class="number">3.6s</div>
1073
+ <div class="label">Response Time</div>
1074
+ </div>
1075
+ </div>
1076
+ <div style="margin-top: 25px;">
1077
+ <p style="font-size: 18pt; color: #00d4aa; font-weight: 700;">
1078
+ Questions? Let's Demo!
1079
+ </p>
1080
+ </div>
1081
+ <div class="slide-number">12 / 12</div>
1082
+ </div>
1083
+ </body>
1084
+ </html>