Lyon28 commited on
Commit
29e65f5
Β·
verified Β·
1 Parent(s): ebd3ed5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +507 -1
README.md CHANGED
@@ -53,4 +53,510 @@ print(f"Loaded {len(data['qa_pairs'])} QA pairs!")
53
 
54
  ## Credits
55
 
56
- Created by Lyon28
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
 
54
  ## Credits
55
 
56
+ Created by Lyon28
57
+
58
+
59
+ <!--
60
+ HEADER SECTION
61
+ -->
62
+
63
+ <div align="center">
64
+ <picture>
65
+ <source
66
+ media="(prefers-color-scheme: dark)"
67
+ srcset="https://huggingface.co/Lyon28/caca-10m/resolve/main/logo-dark.png"
68
+ type="image/png"
69
+ />
70
+ <source
71
+ media="(prefers-color-scheme: light)"
72
+ srcset="https://huggingface.co/Lyon28/caca-10m/resolve/main/logo-light.png"
73
+ type="image/png"
74
+ />
75
+ <img
76
+ src="https://huggingface.co/Lyon28/caca-10m/resolve/main/logo.png"
77
+ alt="Caca Transformers Logo"
78
+ title="Caca - Modern Transformer Architecture"
79
+ width="60%"
80
+ height="auto"
81
+ loading="lazy"
82
+ />
83
+ </picture>
84
+ </div>
85
+
86
+ <!--
87
+ BADGES SECTION
88
+ -->
89
+
90
+ <div align="center">
91
+
92
+ <!-- Social Links -->
93
+ <p>
94
+ <a href="https://huggingface.co/Lyon28" target="_blank" rel="noopener noreferrer">
95
+ <img
96
+ src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Lyon28-ffc107?color=ffc107&logoColor=white"
97
+ alt="Hugging Face Profile"
98
+ title="Visit Hugging Face Profile"
99
+ />
100
+ </a>
101
+ </p>
102
+
103
+ <!-- License Badge -->
104
+ <p>
105
+ <a
106
+ href="https://github.com/Lyon-28/caca-transformers?tab=Apache-2.0-1-ov-file"
107
+ target="_blank"
108
+ rel="noopener noreferrer"
109
+ title="Apache 2.0 License"
110
+ >
111
+ <img
112
+ src="https://img.shields.io/badge/License-Apache%202.0-blue.svg"
113
+ alt="License: Apache 2.0"
114
+ height="20"
115
+ />
116
+ </a>
117
+ </p>
118
+
119
+ <!-- PyPI Badge -->
120
+ <p>
121
+ <a href="https://pypi.org/project/caca-transformers/" target="_blank" rel="noopener noreferrer">
122
+ <img
123
+ src="https://img.shields.io/pypi/v/caca-transformers?color=blue&label=PyPI&logo=pypi&logoColor=white"
124
+ alt="PyPI Version"
125
+ title="View on PyPI"
126
+ />
127
+ </a>
128
+ </p>
129
+
130
+ <!-- GitHub Stars -->
131
+ <p>
132
+ <a href="https://github.com/Lyon-28/caca-transformers" target="_blank" rel="noopener noreferrer">
133
+ <img
134
+ src="https://img.shields.io/github/stars/Lyon-28/caca-transformers?style=social&label=Star&maxAge=2592000"
135
+ alt="GitHub Stars"
136
+ title="Star on GitHub"
137
+ />
138
+ </a>
139
+ </p>
140
+
141
+ <!-- Description -->
142
+ <p>
143
+ <strong>Arsitektur Transformer Modern dengan GQA, RoPE, SwiGLU &amp; Flash Attention</strong>
144
+ </p>
145
+
146
+ </div>
147
+
148
+ <!-- Horizontal Rule -->
149
+ <hr/>
150
+
151
+ <!--
152
+ WARNING/ALERT SECTION
153
+ -->
154
+
155
+ <blockquote>
156
+ <p>
157
+ <strong>πŸ”¬ RESEARCH PROJECT</strong>
158
+ </p>
159
+ <p>
160
+ <strong>⚠️ PERHATIAN: MODEL UNTRAINED</strong>
161
+ </p>
162
+ <p>
163
+ Model ini memiliki bobot random dan memerlukan pretraining sebelum digunakan.
164
+ Tidak bisa langsung digunakan untuk inference!<br/>
165
+ Model ini adalah eksperimen arsitektur dan belum divalidasi untuk production use.
166
+ </p>
167
+ </blockquote>
168
+
169
+ <!--
170
+ MAIN TITLE
171
+ -->
172
+
173
+ <h1 align="center">
174
+ 🐣 CACA-10M - TINY
175
+ </h1>
176
+
177
+ <p align="center">
178
+ <strong>πŸ”’ 10,485,760 Parameters (0.01B)</strong>
179
+ </p>
180
+
181
+ <p align="center">
182
+ <strong>πŸ’Ύ ~0.02GB (FP16) / ~0.04GB (FP32)</strong>
183
+ </p>
184
+
185
+ <p align="center">
186
+ <strong>πŸ“ 8,192 Context Length</strong>
187
+ </p>
188
+
189
+ <p align="center">
190
+ <strong>🎯 Use Case:</strong> Eksperimen cepat, edge devices, pembelajaran
191
+ </p>
192
+
193
+ <p align="center">
194
+ <strong>πŸ–₯️ Recommended GPU:</strong> GTX 1060 6GB or better
195
+ </p>
196
+
197
+ <!--
198
+ FEATURES SECTION
199
+ -->
200
+
201
+ <h2>🎯 Fitur Utama</h2>
202
+
203
+ <p>
204
+ Arsitektur Caca menggabungkan teknik-teknik modern terbaik dari berbagai model state-of-the-art:
205
+ </p>
206
+
207
+ <ul>
208
+ <li>
209
+ <strong>πŸ”„ Grouped Query Attention (GQA)</strong> -
210
+ Keseimbangan optimal antara kecepatan inference dan kualitas output
211
+ </li>
212
+ <li>
213
+ <strong>πŸŒ€ RoPE (Rotary Positional Embeddings)</strong> -
214
+ Encoding posisi yang terbukti efektif untuk sequence panjang
215
+ </li>
216
+ <li>
217
+ <strong>⚑ SwiGLU Activation</strong> -
218
+ Performa superior dibanding ReLU/GELU dalam language modeling
219
+ </li>
220
+ <li>
221
+ <strong>πŸ“Š RMSNorm</strong> -
222
+ Normalisasi yang lebih efisien dan stabil dibanding LayerNorm
223
+ </li>
224
+ <li>
225
+ <strong>πŸͺŸ Sliding Window Attention</strong> -
226
+ Efisiensi memori untuk context window panjang (4,096 tokens)
227
+ </li>
228
+ <li>
229
+ <strong>πŸ’« Flash Attention Compatible</strong> -
230
+ Support untuk Flash Attention 2-4x lebih cepat (opsional)
231
+ </li>
232
+ <li>
233
+ <strong>πŸ”„ KV Cache Support</strong> -
234
+ Efficient autoregressive generation dengan caching
235
+ </li>
236
+ </ul>
237
+
238
+ <!--
239
+ TABLE SECTION - dengan semua atribut
240
+ -->
241
+
242
+ <h2 align="center">πŸ—οΈ Spesifikasi Teknis</h2>
243
+
244
+ <div align="center">
245
+
246
+ <table>
247
+ <caption>
248
+ <strong>Model Configuration Parameters</strong>
249
+ </caption>
250
+ <colgroup>
251
+ <col style="width: 50%"/>
252
+ <col style="width: 50%"/>
253
+ </colgroup>
254
+ <thead>
255
+ <tr>
256
+ <th align="left">Parameter</th>
257
+ <th align="right">Nilai</th>
258
+ </tr>
259
+ </thead>
260
+ <tbody>
261
+ <tr>
262
+ <td align="left"><strong>Total Parameters</strong></td>
263
+ <td align="right"><code>10,485,760</code> (~0.01B)</td>
264
+ </tr>
265
+ <tr>
266
+ <td align="left"><strong>Vocab Size</strong></td>
267
+ <td align="right"><code>50,000</code></td>
268
+ </tr>
269
+ <tr>
270
+ <td align="left"><strong>Hidden Size</strong></td>
271
+ <td align="right"><code>256</code></td>
272
+ </tr>
273
+ <tr>
274
+ <td align="left"><strong>Num Layers</strong></td>
275
+ <td align="right"><code>8</code></td>
276
+ </tr>
277
+ <tr>
278
+ <td align="left"><strong>Attention Heads</strong></td>
279
+ <td align="right"><code>8</code></td>
280
+ </tr>
281
+ <tr>
282
+ <td align="left"><strong>KV Heads (GQA)</strong></td>
283
+ <td align="right"><code>2</code></td>
284
+ </tr>
285
+ <tr>
286
+ <td align="left"><strong>GQA Ratio</strong></td>
287
+ <td align="right"><code>4:1</code></td>
288
+ </tr>
289
+ <tr>
290
+ <td align="left"><strong>Intermediate Size</strong></td>
291
+ <td align="right"><code>682</code></td>
292
+ </tr>
293
+ <tr>
294
+ <td align="left"><strong>Context Length</strong></td>
295
+ <td align="right"><code>8,192</code> tokens</td>
296
+ </tr>
297
+ <tr>
298
+ <td align="left"><strong>Sliding Window</strong></td>
299
+ <td align="right"><code>4,096</code> tokens</td>
300
+ </tr>
301
+ <tr>
302
+ <td align="left"><strong>RoPE Theta</strong></td>
303
+ <td align="right"><code>10,000</code></td>
304
+ </tr>
305
+ <tr>
306
+ <td align="left"><strong>Memory (FP16)</strong></td>
307
+ <td align="right">~<code>0.02</code> GB</td>
308
+ </tr>
309
+ <tr>
310
+ <td align="left"><strong>Memory (FP32)</strong></td>
311
+ <td align="right">~<code>0.04</code> GB</td>
312
+ </tr>
313
+ </tbody>
314
+ <tfoot>
315
+ <tr>
316
+ <td colspan="2" align="center">
317
+ <small><em>All values are approximate and may vary based on implementation</em></small>
318
+ </td>
319
+ </tr>
320
+ </tfoot>
321
+ </table>
322
+
323
+ </div>
324
+
325
+ <!--
326
+ DETAILS/SUMMARY - Collapsible sections
327
+ -->
328
+
329
+ <h2>πŸ“š Model Family</h2>
330
+
331
+ <p>Kami menyediakan berbagai ukuran model untuk berbagai use case:</p>
332
+
333
+ <details open>
334
+ <summary>
335
+ <strong>🐣 Tiny &amp; Small Models (10M - 500M)</strong>
336
+ </summary>
337
+
338
+ <p>Cocok untuk: Eksperimen cepat, edge devices, pembelajaran</p>
339
+
340
+ <table>
341
+ <thead>
342
+ <tr>
343
+ <th>Model</th>
344
+ <th>Params</th>
345
+ <th>Hidden</th>
346
+ <th>Layers</th>
347
+ <th>Heads</th>
348
+ <th>KV Heads</th>
349
+ <th>Context</th>
350
+ <th>Memory (FP16)</th>
351
+ </tr>
352
+ </thead>
353
+ <tbody>
354
+ <tr>
355
+ <td>
356
+ <a href="https://huggingface.co/Lyon28/caca-10m" target="_blank">caca-10M</a>
357
+ </td>
358
+ <td>10M</td>
359
+ <td>256</td>
360
+ <td>8</td>
361
+ <td>8</td>
362
+ <td>2</td>
363
+ <td>8K</td>
364
+ <td>~0.02 GB</td>
365
+ </tr>
366
+ <tr>
367
+ <td>
368
+ <a href="https://huggingface.co/Lyon28/caca-50m" target="_blank">caca-50M</a>
369
+ </td>
370
+ <td>50M</td>
371
+ <td>512</td>
372
+ <td>12</td>
373
+ <td>8</td>
374
+ <td>2</td>
375
+ <td>8K</td>
376
+ <td>~0.1 GB</td>
377
+ </tr>
378
+ <tr>
379
+ <td>
380
+ <a href="https://huggingface.co/Lyon28/caca-100m" target="_blank">caca-100M</a>
381
+ </td>
382
+ <td>100M</td>
383
+ <td>768</td>
384
+ <td>12</td>
385
+ <td>12</td>
386
+ <td>3</td>
387
+ <td>8K</td>
388
+ <td>~0.2 GB</td>
389
+ </tr>
390
+ </tbody>
391
+ </table>
392
+
393
+ </details>
394
+
395
+ <details>
396
+ <summary>
397
+ <strong>πŸ¦… Medium Models (1B - 10B)</strong>
398
+ </summary>
399
+
400
+ <p>Cocok untuk: Aplikasi production, fine-tuning, domain-specific tasks</p>
401
+
402
+ <p><em>Click to expand for model list...</em></p>
403
+
404
+ </details>
405
+
406
+ <!--
407
+ CODE BLOCKS dengan syntax highlighting
408
+ -->
409
+
410
+ <h2>πŸš€ Quick Start</h2>
411
+
412
+ <h3>πŸ’» Installation</h3>
413
+
414
+ <pre><code class="language-bash"># Install dengan xFormers untuk speedup 3x
415
+ pip install caca-transformers[xformers]
416
+
417
+ # Atau manual
418
+ pip install caca-transformers
419
+ pip install xformers
420
+
421
+ # Untuk Flash Attention (4x speedup) - opsional
422
+ pip install flash-attn --no-build-isolation
423
+ </code></pre>
424
+
425
+ <h3>Penggunaan Dasar</h3>
426
+
427
+ <pre><code class="language-python">from caca_transformers import CacaForCausalLM, CacaConfig
428
+ import torch
429
+
430
+ # Load model
431
+ model = CacaForCausalLM.from_pretrained("Lyon28/caca-10m")
432
+
433
+ # Atau buat dari scratch
434
+ config = CacaConfig()
435
+ model = CacaForCausalLM(config)
436
+
437
+ # Info model
438
+ print(f"Parameters: {model.num_parameters():,}")
439
+ </code></pre>
440
+
441
+ <!--
442
+ INLINE ELEMENTS
443
+ -->
444
+
445
+ <h2>πŸ’‘ Tips &amp; Best Practices</h2>
446
+
447
+ <p>
448
+ Gunakan <kbd>Ctrl</kbd> + <kbd>C</kbd> untuk copy code.
449
+ Parameter <code>learning_rate</code> sebaiknya <mark>3e-4</mark> untuk pretraining.
450
+ Formula RMSNorm: <code>x / RMS(x) * Ξ³</code> dimana
451
+ RMS(x) = <code>sqrt(mean(x<sup>2</sup>) + Ξ΅)</code>
452
+ </p>
453
+
454
+ <p>
455
+ <small>
456
+ <em>Note: Semua nilai adalah perkiraan dan dapat bervariasi</em>
457
+ </small>
458
+ </p>
459
+
460
+ <p>
461
+ Referensi: <cite>Attention is All You Need</cite> (Vaswani et al., 2017)
462
+ </p>
463
+
464
+ <!--
465
+ MIXED CONTENT TABLE
466
+ -->
467
+
468
+ <h2>πŸ“Š Perbandingan dengan Arsitektur Lain</h2>
469
+
470
+ <table>
471
+ <thead>
472
+ <tr>
473
+ <th rowspan="2">Feature</th>
474
+ <th colspan="2">Decoder-Only</th>
475
+ <th colspan="2">Others</th>
476
+ </tr>
477
+ <tr>
478
+ <th>Caca</th>
479
+ <th>LLaMA 2</th>
480
+ <th>GPT-3</th>
481
+ <th>BERT</th>
482
+ </tr>
483
+ </thead>
484
+ <tbody>
485
+ <tr>
486
+ <td>GQA</td>
487
+ <td align="center">βœ…</td>
488
+ <td align="center">βœ…</td>
489
+ <td align="center">❌</td>
490
+ <td align="center">❌</td>
491
+ </tr>
492
+ <tr>
493
+ <td>RoPE</td>
494
+ <td align="center">βœ…</td>
495
+ <td align="center">βœ…</td>
496
+ <td align="center">❌</td>
497
+ <td align="center">❌</td>
498
+ </tr>
499
+ <tr>
500
+ <td>Open Source</td>
501
+ <td align="center">βœ…</td>
502
+ <td align="center">βœ…</td>
503
+ <td align="center">❌</td>
504
+ <td align="center">βœ…</td>
505
+ </tr>
506
+ </tbody>
507
+ </table>
508
+
509
+ <!--
510
+ FOOTER SECTION
511
+ -->
512
+
513
+ <hr/>
514
+
515
+ <div align="center">
516
+
517
+ <h2>🌟 Star History</h2>
518
+
519
+ <a href="https://star-history.com/#Lyon-28/caca-transformers&Date" target="_blank" rel="noopener noreferrer">
520
+ <img
521
+ src="https://api.star-history.com/svg?repos=Lyon-28/caca-transformers&type=Date"
522
+ alt="Star History Chart"
523
+ title="View Star History"
524
+ width="100%"
525
+ loading="lazy"
526
+ />
527
+ </a>
528
+
529
+ </div>
530
+
531
+ <hr/>
532
+
533
+ <div align="center">
534
+
535
+ <p>
536
+ <strong>πŸš€ Built with ❀️ for the Indonesian AI Community</strong>
537
+ </p>
538
+
539
+ <p>
540
+ <a href="https://github.com/Lyon-28/caca-transformers" target="_blank" rel="noopener noreferrer">GitHub</a>
541
+ β€’
542
+ <a href="https://huggingface.co/Lyon28" target="_blank" rel="noopener noreferrer">Hugging Face</a>
543
+ </p>
544
+
545
+ <p>
546
+ <small>
547
+ <strong>Dibuat oleh
548
+ <a href="https://huggingface.co/Lyon28" target="_blank" rel="noopener noreferrer">Lyon</a>
549
+ </strong>
550
+ <br/>
551
+ Apache 2.0 License | 2025
552
+ </small>
553
+ </p>
554
+
555
+ </div>
556
+
557
+ <!--
558
+ TODO:
559
+ - Add more model variants
560
+ - Include benchmark results
561
+ - Add training scripts
562
+ -->