Separating Constraint Compliance from Semantic Accuracy: A Novel Benchmark for Evaluating Instruction-Following Under Compression
Paper • 2512.17920 • Published
Semantically Typed Attention via Sanskrit Grammatical Role Relations
Rahul Baxi | VyasaLabs
Karaka Attention replaces standard multi-head attention with 6 semantically typed heads grounded in Pāṇini's kāraka role system (Aṣṭādhyāyī, c. 400 BCE):
| Head | Sanskrit | Role |
|---|---|---|
| Kartā | कर्ता | Agent |
| Karma | कर्म | Patient |
| Karaṇa | करण | Instrument |
| Sampradāna | सम्प्रदान | Recipient |
| Apādāna | अपादान | Source |
| Adhikaraṇa | अधिकरण | Locus |
Each head is conditioned on the resonant component (v_r) from the Dhvani encoder, ensuring role assignments are grounded in compression-invariant semantic representations.
| Metric | Karaka Attention | Standard MHA |
|---|---|---|
| Attention entropy | 2.72 (informative) | 0.18 (collapsed) |
| Paraphrase JSD | 0.090 (content-following) | 0.002 (position-locked) |
| Role diversification | 0.66 cosine sim (from 0.87 initial) | — |
| CDCT mean | 0.179 | — |
| Forward pass | 57.3ms | — |
Standard MHA heads collapse to near-zero entropy (positionally rigid). Karaka heads operate in the informative range, actively tracking semantic roles across surface variations.
karaka_attention.py — Core KarakaAttention module (6 typed heads + role consistency loss)karaka_model.py — Full model (encoder + Dhvani projection + Karaka layers)karaka_bias_init.py — Sanskrit treebank bias initializationtrain_karaka.py — Training script (TPU/XLA)results.json — Training resultseval_results.json — Paraphrase stability + head entropy + speedbaseline_jsd.json — Standard MHA baseline JSDbaseline_entropy.json — Standard MHA baseline entropy@article{baxi2026karaka,
title={Karaka Attention: Semantically Typed Attention via Sanskrit Grammatical Role Relations},
author={Baxi, Rahul},
year={2026},
note={VyasaLabs Technical Report}
}