---
---
license: apache-2.0
tags:
- multilingual
- text-generation
- indic-languages
- hindi
- punjabi
- small-model
pipeline_tag: text-generation
widget:
- text: "[EN] The weather today is"
  example_title: "English Generation"
- text: "[HI] आज का मौसम"
  example_title: "Hindi Generation"
- text: "[PA] ਅੱਜ ਦਾ ਮੌਸਮ"
  example_title: "Punjabi Generation"
language:
- en
- hi
- pa
datasets:
- ai4bharat/samanantar
- PredictiveManish/multilingual-corpus
library_name: transformers
---

# Trimurti-LM: A 4.2M Parameter Multilingual Language Model

## Model Description

**Trimurti-LM** is a small, efficient multilingual language model trained from scratch on English, Hindi, and Punjabi text. Named after the Hindu trinity (Brahma-Vishnu-Shiva), it represents the three-fold capability of creating text, preserving meaning, and transforming across scripts.

**Key Features:**
- 🏗️ **Built from scratch** - No pre-trained weights used
- 🌐 **Multilingual** - Handles 3 languages with 3 different scripts
- 💾 **Tiny footprint** - Only 4.2 million parameters
- ⚡ **Fast training** - 2.38 hours on consumer GPU (GTX 1650 4GB)
- 🔤 **Smart tokenization** - Custom SentencePiece with byte fallback for Indic scripts

## Model Specifications

| Aspect | Details |
|--------|---------|
| **Architecture** | GPT-2 style decoder-only Transformer |
| **Parameters** | 4,672,000 (4.2M) |
| **Hidden Size** | 256 |
| **Layers** | 4 |
| **Attention Heads** | 8 |
| **Context Length** | 128 tokens |
| **Vocabulary** | 8000 tokens (SentencePiece) |
| **Training Steps** | 5000 |
| **Training Time** | 2.38 hours |
| **Hardware** | NVIDIA GTX 1650 (4GB VRAM) |

## Training Data

The model was trained on a balanced multilingual corpus:
- **English**: 150,000 sentences
- **Hindi**: 150,000 sentences  
- **Punjabi**: 150,000 sentences

**Sources:**
- Primary: AI4Bharat Samanantar dataset (filtered and processed)
- Secondary: Custom curated multilingual corpus

**Data Processing:**
- Language tagging: `[EN]`, `[HI]`, `[PA]` prefixes
- Length filtering: 5-50 words per sentence
- Script validation for each language
- Deduplication and cleaning

## Performance

| Metric | Value | Notes |
|--------|-------|-------|
| **Final Loss** | 1.206 | Cross-entropy loss |
| **Perplexity** | 3.32 | e^1.206 = 3.32 |
| **Top-1 Accuracy** | ~25% | Next token prediction |
| **Top-5 Accuracy** | ~60% | Next token prediction |
| **Language ID Accuracy** | 95% | With explicit tags |

## Usage

### Quick Start

```python
from transformers import GPT2LMHeadModel
import sentencepiece as spm
import torch

# Load model and tokenizer
tokenizer = spm.SentencePieceProcessor()
tokenizer.load("multilingual_spm.model")
model = GPT2LMHeadModel.from_pretrained("PredictiveManish/Trimurti-LM")

# Generate text
prompt = "[EN] The weather is"
input_ids = tokenizer.encode(prompt)
input_tensor = torch.tensor([input_ids])

with torch.no_grad():
    output = model.generate(
        input_ids=input_tensor,
        max_length=50,
        temperature=0.7,
        do_sample=True,
        pad_token_id=0
    )

generated = tokenizer.decode(output[0].tolist())
print(generated)


```

## citations(surely you're not going to use this but still, if in search of worst models): 
If you use Trimurti-LM in your work, please cite:

```bibtex
@software{trimurti_lm_2026,
  title = {Trimurti-LM: A 4.2M Parameter Multilingual Language Model},
  author = {Manish Tiwari},
  year = {2026},
  url = {https://huggingface.co/PredictiveManish/Trimurti-LM},
  note = {Trained from scratch on English, Hindi, and Punjabi with consumer hardware}
}


```


### Primary Dataset

```bibtex
@inproceedings{samanantar_2021,
  title = {Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages},
  author = {Gowtham Ramesh and Sumanth Doddapaneni and Aravinth Bheemaraj and Mayank Jobanputra and Raghavan AK and Ajitesh Sharma and Sujit Sahoo and Harshita Diddee and Mahalakshmi J and Divyanshu Kakwani and Navneet Kumar and Aswin Pradeep and Srihari Nagaraj and Kumar Deepak and Vivek Raghavan and Anoop Kunchukuttan and Pratyush Kumar and Mitesh Shantadevi Khapra},
  booktitle = {Proceedings of the Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks},
  year = {2021},
  url = {https://arxiv.org/abs/2104.05596}
}
```
---