ss-76 commited on
Commit
528fcbc
·
verified ·
1 Parent(s): c84895e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -4
README.md CHANGED
@@ -1,6 +1,80 @@
1
  ---
2
- language:
3
- - sa
4
  tags:
5
- - Sanskrit
6
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: mit
 
3
  tags:
4
+ - generative
5
+ - language-model
6
+ - sanskrit
7
+ - devanagari
8
+ - flashattention
9
+ - micro-llm
10
+ language:
11
+ - hi
12
+ datasets:
13
+ - custom
14
+ library_name: transformers
15
+ pipeline_tag: text-generation
16
+ ---
17
+
18
+ # 🧠 MicroGPT-Deva: Lightweight Hindi Generative LLM
19
+
20
+ **MicroGPT-Deva** is a compact decoder-only language model trained on Sanskrit text in **Devanagari script**, optimized for text generation tasks. It uses a custom transformer architecture with **FlashAttention** for efficient GPU utilization and fast decoding.
21
+
22
+ This model is ideal for:
23
+ - Generating Sanskrit sentences or paragraphs
24
+ - Educational chatbots or creative writing tools
25
+ - Deployment on resource-constrained environments (single-GPU)
26
+
27
+ ---
28
+
29
+ ## 🛠️ Model Details
30
+
31
+ | Property | Value |
32
+ |--------------------|------------------------------|
33
+ | Architecture | Decoder-only Transformer |
34
+ | Vocabulary Size | 12,000 (SentencePiece BPE) |
35
+ | Hidden Size | 512 |
36
+ | Layers | 8 |
37
+ | Attention Heads | 8 |
38
+ | Sequence Length | 512 tokens |
39
+ | Parameters | ~33M |
40
+ | FlashAttention | ✅ Yes |
41
+
42
+ ---
43
+
44
+ ## 📖 Training
45
+
46
+ - **Data**: Custom Sanskrit dataset of over 100,000+ Devanagari `.txt` files.
47
+ - **Tokenizer**: [SentencePiece](https://github.com/google/sentencepiece) BPE model trained with `character_coverage=1.0`.
48
+ - **Training Platform**: AWS SageMaker (`ml.p3.2xlarge`)
49
+ - **Framework**: PyTorch with custom FlashAttention blocks
50
+ - **Training Time**: ~3 epochs with dynamic batching on sharded data
51
+
52
+ ---
53
+
54
+ ## 💬 Usage
55
+
56
+ ### 🧪 In Python
57
+
58
+ ```python
59
+ import torch
60
+ import sentencepiece as spm
61
+ from microgpt_deva import MicroGPT, Config
62
+
63
+ # Load tokenizer
64
+ sp = spm.SentencePieceProcessor()
65
+ sp.load("tokenizer.model")
66
+
67
+ # Load config and model
68
+ with open("config.json") as f:
69
+ config = Config(json.load(f))
70
+
71
+ model = MicroGPT(config)
72
+ model.load_state_dict(torch.load("pytorch_model.bin"))
73
+ model.eval()
74
+
75
+ # Generate text
76
+ prompt = "मुझे"
77
+ input_ids = torch.tensor([sp.encode(prompt, out_type=int)], dtype=torch.long)
78
+ with torch.no_grad():
79
+ output = model.generate(input_ids, max_new_tokens=30)
80
+ print(sp.decode(output[0].tolist()))