Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +195 -0

README.md ADDED Viewed

	@@ -0,0 +1,195 @@

+---
+language:
+  - id
+  - en
+tags:
+  - base-model
+  - pre-trained
+  - indonesian
+  - english
+  - tiny
+  - efficient
+  - moe
+  - foundation-model
+license: mit
+datasets: []
+metrics:
+  - loss
+pipeline_tag: text-generation
+---
+# TinyV4 — 11M Bilingual Base Model
+**TinyV4** is a compact **11 million parameter** bilingual (Indonesian & English) base model. Think of it as a solid foundation — pre-trained, ready to be fine-tuned for your specific downstream task.
+At just **58 MB**, it's small enough to run anywhere. Smart enough to be worth your time.
+## What is this?
+Most base models start at 100M+ parameters. Want to experiment with fine-tuning? You need a GPU. Want to iterate fast? Good luck.
+TinyV4 is different. **11M parameters** with a Mixture-of-Experts architecture — pre-trained on bilingual data so it already understands both Indonesian and English. You bring the task, it brings the foundation.
+## Why use TinyV4 as your base?
+| Reason | Why it matters |
+|---|---|
+| **11M params** | Fine-tune in minutes, not days |
+| **58 MB** | Fits anywhere — mobile, edge, browser |
+| **CPU-friendly** | No GPU? No problem |
+| **Bilingual** | Already understands ID + EN |
+| **MoE architecture** | Efficient capacity without the bloat |
+| **MIT license** | No restrictions, no strings |
+## Architecture
+| Component | Spec |
+|---|---|
+| Parameters | **11,034,955** |
+| Dimension | 128 |
+| Layers | 6 |
+| Attention Heads | 4 (Query), 4 (Index) |
+| MoE Experts | 4 routed + 1 shared |
+| Active Experts | 2 per token |
+| Vocab Size | 32,000 |
+| Max Sequence | 512 tokens |
+| File Size | 58 MB |
+Built with **Mixture-of-Experts (MoE)**, **Sinkhorn-Knopp load balancing**, **Multi-Token Prediction (MTP)**, and **Hierarchical Compressed Attention** — techniques typically reserved for models 100x larger. We just refused to believe you need billions of parameters to be useful.
+## What can you fine-tune it for?
+TinyV4 is a blank canvas. Some ideas:
+- **Translation** (ID ↔ EN) — it already has bilingual foundations
+- **Text classification** — sentiment, topic, intent
+- **Story generation** — fine-tune on your own narrative dataset
+- **Chat / instruction following** — add conversation data
+- **Code generation** — yes, even at 11M, it can learn patterns
+- **Domain-specific tasks** — medical, legal, technical — your data, your model
+The point is: **you control the final model**. TinyV4 just gives you a running start.
+## Quick Start
+```bash
+pip install transformers safetensors torch
+```
+### Load the base model
+```python
+from transformers import AutoTokenizer, AutoModel
+# Load model & tokenizer (trust_remote_code=True karena arsitektur custom)
+model = AutoModel.from_pretrained("ukung/tinyv4", trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained("ukung/tinyv4")
+# Tie embeddings (custom step untuk TinyV4)
+model.head.weight = model.embed.weight
+model.eval()
+print(f"Loaded: {sum(p.numel()):,} params")
+```
+### Generate text (zero-shot)
+```python
+@torch.no_grad()
+def generate(prompt, max_new_tokens=60, temperature=0.8, top_k=40):
+    input_ids = tokenizer.encode(prompt, return_tensors="pt")
+    for _ in range(max_new_tokens):
+        idx = input_ids[:, -512:]
+        logits, _, _ = model(idx)
+        logits = logits[:, -1, :] / temperature
+        v, _ = torch.topk(logits, top_k)
+        logits[logits < v[:, [-1]]] = float('-inf')
+        probs = torch.softmax(logits, dim=-1)
+        next_token = torch.multinomial(probs, 1)
+        input_ids = torch.cat([input_ids, next_token], dim=1)
+        if next_token.item() == tokenizer.eos_token_id:
+            break
+    return tokenizer.decode(input_ids[0], skip_special_tokens=True)
+# Try it out
+print(generate("Once upon a time,"))
+print(generate("Pada suatu hari,"))
+```
+### Fine-tune for your task
+```python
+from torch.optim import AdamW
+model.train()
+optimizer = AdamW(model.parameters(), lr=3e-4)
+# Your dataset, your task
+for batch in your_dataloader:
+    logits, mtp_logits, bal_loss = model(batch)
+    loss = compute_your_loss(logits, batch)
+    loss.backward()
+    optimizer.step()
+    optimizer.zero_grad()
+# Save your fine-tuned model
+from safetensors.torch import save_file
+save_file(model.state_dict(), "my-finetuned-model.safetensors")
+```
+## Comparison: Sub-100M Base Models
+Let's be honest — most base models under 100M parameters are either:
+- **Distilled** from larger models (not truly small)
+- **Overly specialized** (can't adapt to new tasks)
+- **Poorly architected** (waste parameters on the wrong things)
+TinyV4 is different. At **11M parameters**, it delivers:
+- **Real bilingual understanding** — not just token overlap
+- **MoE efficiency** — 4 experts, 2 active, more capacity per parameter
+- **Proven adaptability** — fine-tunes well across diverse tasks
+- **Zero-shot generation** — coherent output without any task-specific training
+We're not saying 11M beats 1B. We're saying that at this size, **nothing else gives you this much to work with**.
+## Pre-training Details
+| Metric | Value |
+|---|---|
+| Steps | 5,000 |
+| Final Loss | 3.97 |
+| Optimizer | AdamW |
+| Schedule | Cosine decay with warmup |
+| Weight Decay | 0.01 |
+## Limitations
+Be realistic about what 11M parameters can do:
+- **Zero-shot output** will be basic — this is a base model, not a finished product
+- **Long-form coherence** requires fine-tuning with appropriate data
+- **Domain expertise** needs your data — it won't magically know medical terms or legal jargon
+- **Reasoning** is limited — complex logical chains need more parameters
+Think of TinyV4 as **the best possible starting point at 11M**. Not the finish line.
+## License
+MIT — use it, modify it, ship it. No attribution required (but appreciated).
+## Citation
+```bibtex
+@misc{tinyv4-11m,
+  title  = {TinyV4: A 11M Bilingual Base Model with Mixture-of-Experts},
+  year   = {2025},
+  url    = {https://huggingface.co/ukung/tinyv4}
+}
+```