Instructions to use balgeet/Gurmukh-370M-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use balgeet/Gurmukh-370M-base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="balgeet/Gurmukh-370M-base")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("balgeet/Gurmukh-370M-base")
model = AutoModelForCausalLM.from_pretrained("balgeet/Gurmukh-370M-base")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use balgeet/Gurmukh-370M-base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "balgeet/Gurmukh-370M-base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "balgeet/Gurmukh-370M-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/balgeet/Gurmukh-370M-base

SGLang

How to use balgeet/Gurmukh-370M-base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "balgeet/Gurmukh-370M-base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "balgeet/Gurmukh-370M-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "balgeet/Gurmukh-370M-base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "balgeet/Gurmukh-370M-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use balgeet/Gurmukh-370M-base with Docker Model Runner:
```
docker model run hf.co/balgeet/Gurmukh-370M-base
```

Gurmukh — 370M Punjabi Language Model

Gurmukh is a 370-million-parameter causal language model trained from scratch on Punjabi text. It is the first openly released GPT-2-scale base model dedicated to the Punjabi language, supporting both Gurmukhi script and Romanized Punjabi.

Model Details

Property	Value
Model name	Gurmukh
Architecture	GPT-2 (`GPT2LMHeadModel`)
Parameters	~370M
Layers	24
Hidden size	1024
Attention heads	16
Context length	2048 tokens
Vocabulary	64,000 (SentencePiece)
Language	Punjabi (Gurmukhi + Romanized)
License	Apache 2.0

Tokenizer

Gurmukh uses a custom SentencePiece BPE tokenizer (punjabi_spm_64k.model) with a 64,000-token vocabulary trained on the same Punjabi corpus. The tokenizer is highly efficient for Gurmukhi script:

Script	Mean Fertility (tokens/word)
Gurmukhi	1.105
Mixed (Gurmukhi + English)	1.030
Romanized Punjabi	1.333

Fertility near 1.0 means almost every Punjabi word maps to a single token — the vocabulary is well-suited to the language.

Training

Data

Gurmukh was trained on two splits from the Sangraha dataset:

Split	Size	Script
`sangraha_gurmukhi`	~12 GB	Gurmukhi
`sangraha_romanized`	~1.8 GB	Romanized Punjabi
Total	~13.8 GB

Data was deduplicated and cleaned before training. The combined corpus contains approximately 2.5 billion tokens.

Training Configuration

Setting	Value
Hardware	4× NVIDIA Tesla T4 (16 GB VRAM each)
Precision	FP16
Optimizer	AdamW (cosine decay, warmup 500 steps)
Batch size (effective)	8 sequences × 2048 tokens
Training steps	200,000
Epochs	~2.25
Peak learning rate	3×10⁻⁴
DeepSpeed	ZeRO Stage 1
Gradient checkpointing	Yes
Framework	PyTorch 2.5.1 + HuggingFace Transformers 4.46.0

Training ran for approximately 25 days. Final checkpoint evaluation loss: 2.8120.

Evaluation

Perplexity was measured on held-out Punjabi text across three domains:

Domain	Perplexity
News	12.65
Technical	22.82
Conversational	53.18

News perplexity of 12.65 is strong for a 370M Punjabi base model. The higher conversational perplexity is expected — the training corpus is predominantly formal/news text; the model has not seen conversational or instruction-style data.

Generation Examples

All examples below use temperature=0.8, top_p=0.9, repetition_penalty=1.1.

Prompt: ਪੰਜਾਬ ਸਰਕਾਰ ਨੇ ਅੱਜ ਐਲਾਨ ਕੀਤਾ ਕਿ (The Punjab government today announced that)

ਪੰਜਾਬ ਸਰਕਾਰ ਨੇ ਅੱਜ ਐਲਾਨ ਕੀਤਾ ਕਿ ਉਨ੍ਹਾਂ ਦੀ ਸਰਕਾਰ ਨੇ ਸੂਬੇ 'ਚ 100 ਮੁਹੱਲਾ ਕਲੀਨਿਕ ਸ਼ੁਰੂ ਕਰਨ ਦੀ ਮਨਜ਼ੂਰੀ ਦੇ ਦਿੱਤੀ ਹੈ। ਇਸ ਦੇ ਨਾਲ ਹੀ ਮੁੱਖ ਮੰਤਰੀ ਭਗਵੰਤ ਮਾਨ ਨੇ ਅੱਜ ਵਿਧਾਨ ਸਭਾ ਸੈਸ਼ਨ ਦੀ ਕਾਰਵਾਈ ਵੀ ਮੁਲਤਵੀ ਕਰ ਦਿੱਤੀ ਹੈ...

Prompt: machine learning ਦੀ ਵਰਤੋਂ ਕਰਕੇ ਅਸੀਂ (Using machine learning we can)

machine learning ਦੀ ਵਰਤੋਂ ਕਰਕੇ ਅਸੀਂ ਉਨ੍ਹਾਂ ਦੇ ਹੁਨਰ ਨੂੰ ਨਿਖਾਰ ਸਕਦੇ ਹਾਂ। ਹਰ ਸਾਲ ਭਾਰਤ ਦੇ ਨੌਜਵਾਨਾਂ ਨੂੰ ਸਕਿੱਲ ਸਕਿੱਲਜ਼ ਜ਼ਰੀਏ ਆਪਣੇ ਹੁਨਰ ਦਾ ਵਿਕਾਸ ਕਰਨ ਦਾ ਮੌਕਾ ਮਿਲਦਾ ਹੈ...

The model handles code-mixed Punjabi (Gurmukhi + English terms) naturally.

Intended Use

Gurmukh is a base language model — a foundation for further fine-tuning. Intended uses include:

Punjabi NLP research — text generation, language understanding, probing studies
Foundation for supervised fine-tuning (SFT) — instruction following, chat, question answering
Downstream tasks — sentiment analysis, summarisation, NER (with task-specific fine-tuning)
Voice pipeline — combined with an ASR front-end (e.g. Whisper fine-tuned on Punjabi) and a TTS back-end for spoken Punjabi interfaces

Limitations and Risks

Base model only. Gurmukh has not been instruction-tuned or safety-aligned. It will not follow instructions reliably and may produce harmful, biased, or factually incorrect text. Do not deploy as a chat assistant without SFT + RLHF/DPO alignment.
No conversational data. The training corpus is predominantly news and web text. The model has poor zero-shot performance on conversational or QA-style prompts.
Romanized Punjabi is weaker. The corpus is ~87% Gurmukhi by volume. Romanized generation quality is noticeably lower — the model may fall back to Gurmukhi mid-generation.
Knowledge cutoff. Training data is a static snapshot from the Sangraha dataset; the model has no awareness of events after that cutoff.
Hallucination. Like all autoregressive LMs, Gurmukh fabricates facts. Named entities, dates, and statistics in generated text must be verified independently.

How to Use

import sentencepiece as spm
from transformers import GPT2LMHeadModel, PreTrainedTokenizerFast
import torch

# Load SentencePiece tokenizer
sp = spm.SentencePieceProcessor()
sp.Load("punjabi_spm_64k.model")

# Wrap for HuggingFace (or use transformers AutoTokenizer if uploaded with tokenizer_config)
model = GPT2LMHeadModel.from_pretrained("path/to/gurmukh-370m")
model.eval()

# Encode prompt
prompt = "ਪੰਜਾਬ ਦੀ ਧਰਤੀ"
ids = sp.EncodeAsIds(prompt)
input_ids = torch.tensor([ids])

# Generate
with torch.no_grad():
    output = model.generate(
        input_ids,
        max_new_tokens=200,
        temperature=0.8,
        top_p=0.9,
        repetition_penalty=1.1,
        do_sample=True,
    )

print(sp.Decode(output[0].tolist()))

Citation

If you use Gurmukh in your research, please cite:

@misc{gurmukh2026,
  title        = {Gurmukh: A 370M Parameter Punjabi Language Model},
  author       = {Singh, Balgeet},
  year         = {2026},
  note         = {Trained on Sangraha Gurmukhi and Romanized Punjabi datasets.
                  Model available at https://huggingface.co/balgeet/Gurmukh-370M-base},
}

Acknowledgements

Training data: Sangraha by AI4Bharat
Compute: Azure NC64as_T4_v3 VM (4× Tesla T4), Cloudeesy infrastructure
Framework: HuggingFace Transformers, DeepSpeed, SentencePiece

Downloads last month: 3

Safetensors

Model size

0.4B params

Tensor type

BF16