AI & ML interests
None defined yet.
Recent Activity
Organization Card
📋 Deskripsi
Caca adalah arsitektur Large Language Model (LLM) generasi terbaru yang menggabungkan berbagai teknik state-of-the-art dalam deep learning. Model ini dirancang dengan fokus pada efisiensi, skalabilitas, dan performa tinggi.
Caca itu eksperimen open-source Indonesian LLM yang dibuat dari nol secara individual dan bertahap. Bukan kompetitor siapa-siapa, cuma pengen eksplorasi apa yang bisa dilakukan dengan budget terbatas, passion unlimited, dan mindset collaborative. Kalau berguna buat orang lain, alhamdulillah. Kalau enggak, ya tetap fun kok.
Ini proyek eksplorasi, jadi kalau gagal ya bagian dari proses belajar. Kalau berhasil, itu bonus.
📊 Perbandingan dengan Arsitektur Lain
| Fitur | Caca | LLaMA 2 | Mistral | IndoGPT | GPT-2 |
|---|---|---|---|---|---|
| 🏗️ Arsitektur Dasar | |||||
| Status | ⚠️ Untrained | ✅ Trained | ✅ Trained | ✅ Trained | ✅ Trained |
| Ukuran Model | 60+ variant 1M - 1T (semoga) |
7B / 13B / 70B | 7B | 117M | 117M - 1.5B |
| Tipe Arsitektur | Decoder-only | Decoder-only | Decoder-only | Decoder-only | Decoder-only |
| Fungsi Aktivasi | SwiGLU | SwiGLU | SwiGLU | GELU | GELU |
| Normalisasi | RMSNorm | RMSNorm | RMSNorm | LayerNorm | LayerNorm |
| Tahun Release | 2025 | 2023 | 2023 | 2020 | 2019 |
| 👁️ Mekanisme Attention | |||||
| Tipe Attention | GQA (configurable) | GQA | GQA | MHA | MHA |
| Position Encoding | RoPE + variants | RoPE | RoPE | Learned | Learned |
| Max Context | 8K - 16K | 4K | 32K | 1K | 1K |
| Sliding Window | ✅ Optional | ❌ | ✅ 4K window | ❌ | ❌ |
| Flash Attention | ✅ Flash Attn 2 | ✅ Supported | ✅ Supported | ❌ | ❌ |
| KV Cache Efficiency | 75% reduction (GQA 4:1) |
~60% reduction | 75% reduction | No optimization | No optimization |
| 🚀 Fitur Lanjutan | |||||
| Mixture of Experts | ✅ Optional TopK + ExpertChoice |
❌ | ❌ (Mixtral variant) |
❌ | ❌ |
| Multimodal | ✅ Native Vision + Audio |
❌ (LLaVA separate) |
❌ | ❌ | ❌ |
| Config Flexibility | ✅ 50+ parameters Toggle semua fitur |
⚠️ Limited | ⚠️ Limited | ❌ Fixed | ❌ Fixed |
| Layer Scale | ✅ Optional | ❌ | ❌ | ❌ | ❌ |
| Stochastic Depth | ✅ Optional | ❌ | ❌ | ❌ | ❌ |
| ⚡ Performa & Optimisasi | |||||
| Inference Speed (7B model, A100) |
⚠️ TBD (belum trained) |
~75 tok/s | ~78 tok/s | ~150 tok/s (jauh lebih kecil) |
~120 tok/s (jauh lebih kecil) |
| Memory Footprint (7B, BF16) |
~14GB (dengan GQA) |
~14GB | ~14GB | ~500MB | ~500MB |
| Gradient Checkpointing | ✅ Full support | ✅ Supported | ✅ Supported | ⚠️ Manual | ⚠️ Manual |
| Quantization | ✅ 8-bit/4-bit built-in | ⚠️ Via external tools | ⚠️ Via external tools | ❌ Limited support | ❌ Limited support |
| Multi-Backend Support | ✅ 4 backends Flash/xFormers/SDPA/Standard |
⚠️ 2 backends | ⚠️ 2 backends | ❌ Standard only | ❌ Standard only |
| 🌏 Dukungan Bahasa | |||||
| Bahasa Indonesia | ⚠️ Belum trained Designed for ID |
❌ Poor English-heavy |
❌ Poor English-heavy |
✅ Native | ❌ Minimal |
| English | ⚠️ TBD Bilingual design |
✅ Excellent | ✅ Excellent | ⚠️ Limited | ✅ Good |
| Training Data | ⚠️ To be trained User's choice |
2T tokens English-heavy |
Unknown English-heavy |
23GB Indonesian |
40GB WebText |
| Vocab Size | 32K (configurable) |
32K | 32K | 50K | 50K |
| 👨💻 Developer Experience | |||||
| Error Messages | ✅ Helpful + solutions Detailed debugging |
⚠️ Standard PyTorch | ⚠️ Standard PyTorch | ❌ Basic errors | ❌ Basic errors |
| Config Validation | ✅ Comprehensive Auto-check conflicts |
⚠️ Basic | ⚠️ Basic | ❌ Minimal | ❌ Minimal |
| Documentation | ✅ Extensive ID + EN, with examples |
✅ Good Official docs |
⚠️ Medium Community-driven |
❌ Limited Minimal docs |
✅ Extensive OpenAI docs |
| Code Examples | ✅ 50+ examples Training to deployment |
✅ Many examples | ⚠️ Some examples | ❌ Few examples | ✅ Many examples |
| HuggingFace Integration | ✅ Full native Auto-registered |
✅ Official | ✅ Official | ✅ Available | ✅ Standard |
| 🌍 Ketersediaan & Lisensi | |||||
| License | ✅ Apache 2.0 Fully permissive |
⚠️ LLaMA 2 License Commercial OK |
✅ Apache 2.0 | ✅ MIT | ✅ MIT |
| Commercial Use | ✅ Allowed No restrictions |
✅ Allowed | ✅ Allowed | ✅ Allowed | ✅ Allowed |
| Weights Available | ❌ Not trained Architecture only |
✅ All sizes 7B/13B/70B |
✅ 7B | ✅ 117M | ✅ All sizes |
| Self-Hosting | ✅ Designed for it Full control |
✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
| Training Required | ❌ Yes From scratch |
✅ No Ready to use |
✅ No Ready to use |
✅ No Ready to use |
✅ No Ready to use |
| 🎯 Use Cases | |||||
| Production Ready | ❌ Not yet After training |
✅ Yes | ✅ Yes | ⚠️ Limited Too small |
⚠️ Limited Outdated |
| Research | ✅ Excellent Modular design |
✅ Good | ✅ Good | ⚠️ Limited | ✅ Classic baseline |
| Indonesian NLP | ⚠️ After training High potential |
❌ Poor Needs fine-tuning |
❌ Poor Needs fine-tuning |
✅ Native But limited |
❌ Poor |
| Education | ✅ Excellent Learn modern LLMs |
✅ Good | ⚠️ Medium | ✅ Good Simple architecture |
✅ Classic Well-documented |
📝 Catatan Penting:
- Caca adalah arsitektur modern yang belum dilatih - perlu training dari nol dengan dataset Indonesian
- LLaMA 2 & Mistral sangat bagus untuk English, tapi poor untuk Indonesian tanpa fine-tuning
- IndoGPT adalah satu-satunya dedicated Indonesian LLM, tapi arsitektur sudah outdated (GPT-2 era)
- GPT-2 dimasukkan sebagai baseline klasik - arsitektur yang sudah proven tapi tidak modern
✨ Keunggulan Unik Caca:
- 🎯 Modular Design: Toggle 50+ fitur tanpa rewrite code
- 🔧 Developer-Friendly: Error messages helpful + config validation
- 🚀 Modern Architecture: GQA + Flash Attention + SwiGLU + RMSNorm
- 🎨 Multimodal Native: Vision & Audio built-in (bukan add-on)
- 📚 Extensive Docs: Bahasa Indonesia + English dengan banyak contoh
- ⚡ Optimization Focus: 4 attention backends, auto-fallback, quantization ready
- 🔬 Research-Oriented: MoE, Mixture of Depths, Layer Scale, dll.
⚠️ Keterbatasan Realistis:
- ❌ Belum trained - output akan random sampai di-training
- ❌ Belum ada tokenizer - perlu training tokenizer sendiri untuk Indonesian
- ❌ Butuh resources besar - training 7B model perlu GPU kelas A100
- ❌ Belum teruji - perlu extensive evaluation setelah training
- ❌ Community masih kecil - tidak sebesar LLaMA/Mistral ecosystem
🔗 Links
models
0
None public yet
datasets
0
None public yet