File size: 2,395 Bytes
c84895e
528fcbc
c84895e
528fcbc
 
 
 
 
 
 
05616ba
528fcbc
 
 
 
 
 
92d8e9b
528fcbc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b2c8e08
528fcbc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0bcc462
528fcbc
 
 
 
 
 
 
 
 
 
c6738d8
528fcbc
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
license: mit
tags:
  - generative
  - language-model
  - sanskrit
  - devanagari
  - flashattention
  - micro-llm
language:
  - sa
datasets:
  - custom
library_name: transformers
pipeline_tag: text-generation
---

# 🧠 MicroGPT-Deva: Lightweight Sanskrit Generative LLM

**MicroGPT-Deva** is a compact decoder-only language model trained on Sanskrit text in **Devanagari script**, optimized for text generation tasks. It uses a custom transformer architecture with **FlashAttention** for efficient GPU utilization and fast decoding.

This model is ideal for:
- Generating Sanskrit sentences or paragraphs
- Educational chatbots or creative writing tools
- Deployment on resource-constrained environments (single-GPU)

---

## 🛠️ Model Details

| Property           | Value                        |
|--------------------|------------------------------|
| Architecture       | Decoder-only Transformer     |
| Vocabulary Size    | 12,000 (SentencePiece BPE)   |
| Hidden Size        | 512                          |
| Layers             | 8                            |
| Attention Heads    | 8                            |
| Sequence Length    | 512 tokens                   |
| Parameters         | ~33M                         |
| FlashAttention     | ✅ Yes                        |

---

## 📖 Training

- **Data**: Custom Sanskrit dataset of over 100,000+ Devanagari `.txt` files.
- **Tokenizer**: [SentencePiece](https://github.com/google/sentencepiece) BPE model trained with `character_coverage=1.0`.
- **Training Platform**: AWS SageMaker Tesla V100 GPU
- **Framework**: PyTorch with custom FlashAttention blocks
- **Training Time**: ~3 epochs with dynamic batching on sharded data

---

## 💬 Usage

### 🧪 In Python

```python
import torch
import sentencepiece as spm
from microgpt_deva import MicroGPT, Config

# Load tokenizer
sp = spm.SentencePieceProcessor()
sp.load("devanagari.model")

# Load config and model
with open("config.json") as f:
    config = Config(json.load(f))

model = MicroGPT(config)
model.load_state_dict(torch.load("pytorch_model.bin"))
model.eval()

# Generate text
prompt = "कस्मिंश्चिन् नगराभ्याशे "
input_ids = torch.tensor([sp.encode(prompt, out_type=int)], dtype=torch.long)
with torch.no_grad():
    output = model.generate(input_ids, max_new_tokens=30)
print(sp.decode(output[0].tolist()))