Karaka Attention

Semantically Typed Attention via Sanskrit Grammatical Role Relations

Rahul Baxi | VyasaLabs

Overview

Karaka Attention replaces standard multi-head attention with 6 semantically typed heads grounded in Pāṇini's kāraka role system (Aṣṭādhyāyī, c. 400 BCE):

Head Sanskrit Role
Kartā कर्ता Agent
Karma कर्म Patient
Karaṇa करण Instrument
Sampradāna सम्प्रदान Recipient
Apādāna अपादान Source
Adhikaraṇa अधिकरण Locus

Each head is conditioned on the resonant component (v_r) from the Dhvani encoder, ensuring role assignments are grounded in compression-invariant semantic representations.

Key Results

Metric Karaka Attention Standard MHA
Attention entropy 2.72 (informative) 0.18 (collapsed)
Paraphrase JSD 0.090 (content-following) 0.002 (position-locked)
Role diversification 0.66 cosine sim (from 0.87 initial)
CDCT mean 0.179
Forward pass 57.3ms

Standard MHA heads collapse to near-zero entropy (positionally rigid). Karaka heads operate in the informative range, actively tracking semantic roles across surface variations.

Architecture

  • Base: Qwen3-1.7B + LoRA (pretrained with Dhvani compression-invariance objective)
  • Karaka layers: 2 × KarakaBlock (KarakaAttention + FFN)
  • Bias init: 27,412 sentences from UD Sanskrit-Vedic + UFAL treebanks
  • Training: Sanskrit Wikipedia + treebank text, 10K steps, TPU v6e-1

Files

  • karaka_attention.py — Core KarakaAttention module (6 typed heads + role consistency loss)
  • karaka_model.py — Full model (encoder + Dhvani projection + Karaka layers)
  • karaka_bias_init.py — Sanskrit treebank bias initialization
  • train_karaka.py — Training script (TPU/XLA)
  • results.json — Training results
  • eval_results.json — Paraphrase stability + head entropy + speed
  • baseline_jsd.json — Standard MHA baseline JSD
  • baseline_entropy.json — Standard MHA baseline entropy

Citation

@article{baxi2026karaka,
  title={Karaka Attention: Semantically Typed Attention via Sanskrit Grammatical Role Relations},
  author={Baxi, Rahul},
  year={2026},
  note={VyasaLabs Technical Report}
}

Related

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for rb512/karaka-attention