--- license: mit tags: - generative - language-model - sanskrit - devanagari - flashattention - micro-llm language: - sa datasets: - custom library_name: transformers pipeline_tag: text-generation --- # 🧠 MicroGPT-Deva: Lightweight Sanskrit Generative LLM **MicroGPT-Deva** is a compact decoder-only language model trained on Sanskrit text in **Devanagari script**, optimized for text generation tasks. It uses a custom transformer architecture with **FlashAttention** for efficient GPU utilization and fast decoding. This model is ideal for: - Generating Sanskrit sentences or paragraphs - Educational chatbots or creative writing tools - Deployment on resource-constrained environments (single-GPU) --- ## 🛠️ Model Details | Property | Value | |--------------------|------------------------------| | Architecture | Decoder-only Transformer | | Vocabulary Size | 12,000 (SentencePiece BPE) | | Hidden Size | 512 | | Layers | 8 | | Attention Heads | 8 | | Sequence Length | 512 tokens | | Parameters | ~33M | | FlashAttention | ✅ Yes | --- ## 📖 Training - **Data**: Custom Sanskrit dataset of over 100,000+ Devanagari `.txt` files. - **Tokenizer**: [SentencePiece](https://github.com/google/sentencepiece) BPE model trained with `character_coverage=1.0`. - **Training Platform**: AWS SageMaker Tesla V100 GPU - **Framework**: PyTorch with custom FlashAttention blocks - **Training Time**: ~3 epochs with dynamic batching on sharded data --- ## 💬 Usage ### 🧪 In Python ```python import torch import sentencepiece as spm from microgpt_deva import MicroGPT, Config # Load tokenizer sp = spm.SentencePieceProcessor() sp.load("devanagari.model") # Load config and model with open("config.json") as f: config = Config(json.load(f)) model = MicroGPT(config) model.load_state_dict(torch.load("pytorch_model.bin")) model.eval() # Generate text prompt = "कस्मिंश्चिन् नगराभ्याशे " input_ids = torch.tensor([sp.encode(prompt, out_type=int)], dtype=torch.long) with torch.no_grad(): output = model.generate(input_ids, max_new_tokens=30) print(sp.decode(output[0].tolist()))