Trimurti-LM / README.md
PredictiveManish's picture
Update README.md
45247fb verified
---
---
license: apache-2.0
tags:
- multilingual
- text-generation
- indic-languages
- hindi
- punjabi
- small-model
pipeline_tag: text-generation
widget:
- text: "[EN] The weather today is"
example_title: "English Generation"
- text: "[HI] आज का मौसम"
example_title: "Hindi Generation"
- text: "[PA] ਅੱਜ ਦਾ ਮੌਸਮ"
example_title: "Punjabi Generation"
language:
- en
- hi
- pa
datasets:
- ai4bharat/samanantar
- PredictiveManish/multilingual-corpus
library_name: transformers
---
# Trimurti-LM: A 4.2M Parameter Multilingual Language Model
## Model Description
**Trimurti-LM** is a small, efficient multilingual language model trained from scratch on English, Hindi, and Punjabi text. Named after the Hindu trinity (Brahma-Vishnu-Shiva), it represents the three-fold capability of creating text, preserving meaning, and transforming across scripts.
**Key Features:**
- 🏗️ **Built from scratch** - No pre-trained weights used
- 🌐 **Multilingual** - Handles 3 languages with 3 different scripts
- 💾 **Tiny footprint** - Only 4.2 million parameters
-**Fast training** - 2.38 hours on consumer GPU (GTX 1650 4GB)
- 🔤 **Smart tokenization** - Custom SentencePiece with byte fallback for Indic scripts
## Model Specifications
| Aspect | Details |
|--------|---------|
| **Architecture** | GPT-2 style decoder-only Transformer |
| **Parameters** | 4,672,000 (4.2M) |
| **Hidden Size** | 256 |
| **Layers** | 4 |
| **Attention Heads** | 8 |
| **Context Length** | 128 tokens |
| **Vocabulary** | 8000 tokens (SentencePiece) |
| **Training Steps** | 5000 |
| **Training Time** | 2.38 hours |
| **Hardware** | NVIDIA GTX 1650 (4GB VRAM) |
## Training Data
The model was trained on a balanced multilingual corpus:
- **English**: 150,000 sentences
- **Hindi**: 150,000 sentences
- **Punjabi**: 150,000 sentences
**Sources:**
- Primary: AI4Bharat Samanantar dataset (filtered and processed)
- Secondary: Custom curated multilingual corpus
**Data Processing:**
- Language tagging: `[EN]`, `[HI]`, `[PA]` prefixes
- Length filtering: 5-50 words per sentence
- Script validation for each language
- Deduplication and cleaning
## Performance
| Metric | Value | Notes |
|--------|-------|-------|
| **Final Loss** | 1.206 | Cross-entropy loss |
| **Perplexity** | 3.32 | e^1.206 = 3.32 |
| **Top-1 Accuracy** | ~25% | Next token prediction |
| **Top-5 Accuracy** | ~60% | Next token prediction |
| **Language ID Accuracy** | 95% | With explicit tags |
## Usage
### Quick Start
```python
from transformers import GPT2LMHeadModel
import sentencepiece as spm
import torch
# Load model and tokenizer
tokenizer = spm.SentencePieceProcessor()
tokenizer.load("multilingual_spm.model")
model = GPT2LMHeadModel.from_pretrained("PredictiveManish/Trimurti-LM")
# Generate text
prompt = "[EN] The weather is"
input_ids = tokenizer.encode(prompt)
input_tensor = torch.tensor([input_ids])
with torch.no_grad():
output = model.generate(
input_ids=input_tensor,
max_length=50,
temperature=0.7,
do_sample=True,
pad_token_id=0
)
generated = tokenizer.decode(output[0].tolist())
print(generated)
```
## citations(surely you're not going to use this but still, if in search of worst models):
If you use Trimurti-LM in your work, please cite:
```bibtex
@software{trimurti_lm_2026,
title = {Trimurti-LM: A 4.2M Parameter Multilingual Language Model},
author = {Manish Tiwari},
year = {2026},
url = {https://huggingface.co/PredictiveManish/Trimurti-LM},
note = {Trained from scratch on English, Hindi, and Punjabi with consumer hardware}
}
```
### Primary Dataset
```bibtex
@inproceedings{samanantar_2021,
title = {Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages},
author = {Gowtham Ramesh and Sumanth Doddapaneni and Aravinth Bheemaraj and Mayank Jobanputra and Raghavan AK and Ajitesh Sharma and Sujit Sahoo and Harshita Diddee and Mahalakshmi J and Divyanshu Kakwani and Navneet Kumar and Aswin Pradeep and Srihari Nagaraj and Kumar Deepak and Vivek Raghavan and Anoop Kunchukuttan and Pratyush Kumar and Mitesh Shantadevi Khapra},
booktitle = {Proceedings of the Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks},
year = {2021},
url = {https://arxiv.org/abs/2104.05596}
}
```
---