ss-76
/

microgpt-deva

Text Generation

Model card Files Files and versions

ss-76 commited on Jul 30, 2025

Commit

528fcbc

·

verified ·

1 Parent(s): c84895e

Update README.md

Files changed (1) hide show

README.md +78 -4

README.md CHANGED Viewed

@@ -1,6 +1,80 @@
 ---
-language:
-- sa
 tags:
-- Sanskrit
----

 ---
+license: mit
 tags:
+  - generative
+  - language-model
+  - sanskrit
+  - devanagari
+  - flashattention
+  - micro-llm
+language:
+  - hi
+datasets:
+  - custom
+library_name: transformers
+pipeline_tag: text-generation
+---
+# 🧠 MicroGPT-Deva: Lightweight Hindi Generative LLM
+**MicroGPT-Deva** is a compact decoder-only language model trained on Sanskrit text in **Devanagari script**, optimized for text generation tasks. It uses a custom transformer architecture with **FlashAttention** for efficient GPU utilization and fast decoding.
+This model is ideal for:
+- Generating Sanskrit sentences or paragraphs
+- Educational chatbots or creative writing tools
+- Deployment on resource-constrained environments (single-GPU)
+---
+## 🛠️ Model Details
+| Property           | Value                        |
+|--------------------|------------------------------|
+| Architecture       | Decoder-only Transformer     |
+| Vocabulary Size    | 12,000 (SentencePiece BPE)   |
+| Hidden Size        | 512                          |
+| Layers             | 8                            |
+| Attention Heads    | 8                            |
+| Sequence Length    | 512 tokens                   |
+| Parameters         | ~33M                         |
+| FlashAttention     | ✅ Yes                        |
+---
+## 📖 Training
+- **Data**: Custom Sanskrit dataset of over 100,000+ Devanagari `.txt` files.
+- **Tokenizer**: [SentencePiece](https://github.com/google/sentencepiece) BPE model trained with `character_coverage=1.0`.
+- **Training Platform**: AWS SageMaker (`ml.p3.2xlarge`)
+- **Framework**: PyTorch with custom FlashAttention blocks
+- **Training Time**: ~3 epochs with dynamic batching on sharded data
+---
+## 💬 Usage
+### 🧪 In Python
+```python
+import torch
+import sentencepiece as spm
+from microgpt_deva import MicroGPT, Config
+# Load tokenizer
+sp = spm.SentencePieceProcessor()
+sp.load("tokenizer.model")
+# Load config and model
+with open("config.json") as f:
+    config = Config(json.load(f))
+model = MicroGPT(config)
+model.load_state_dict(torch.load("pytorch_model.bin"))
+model.eval()
+# Generate text
+prompt = "मुझे"
+input_ids = torch.tensor([sp.encode(prompt, out_type=int)], dtype=torch.long)
+with torch.no_grad():
+    output = model.generate(input_ids, max_new_tokens=30)
+print(sp.decode(output[0].tolist()))