ArthaLabs
/

panini-tokenizer

@@ -22,6 +22,8 @@ tags:
 [![Demo](https://img.shields.io/badge/🚀_Try_Demo-HuggingFace_Spaces-blueviolet?style=for-the-badge)](https://huggingface.co/spaces/ArthaLabs/panini-tokenizer-demo)
 ## 🚨 The Problem
 Statistical tokenizers (BPE/WordPiece) systematically underperform on Sanskrit because they do not model **Sandhi**(phonetic fusion).
@@ -74,6 +76,14 @@ By strictly adhering to grammar, Panini Tokenizer drastically reduces sequence l
 * **Panini:** `▁nirapekza` | `jYAna` | `sAkzAtkAra` | `sAman` | `arthy` | `am` (6 meaningful roots)
 * **Sanskrit-BERT:** `nirape` | `##k` | `##z` | `##a` | `##jya` | `##nas`... (14 noise fragments)
 ## 🛠️ Technical Details
 * **Architecture:** Recursive Descent Splitter + Kosha (Dictionary) Lookup.

 [![Demo](https://img.shields.io/badge/🚀_Try_Demo-HuggingFace_Spaces-blueviolet?style=for-the-badge)](https://huggingface.co/spaces/ArthaLabs/panini-tokenizer-demo)
+> **Why it matters:** *Fewer tokens = more usable context per input = better learning & longer text coverage.*
 ## 🚨 The Problem
 Statistical tokenizers (BPE/WordPiece) systematically underperform on Sanskrit because they do not model **Sandhi**(phonetic fusion).
 * **Panini:** `▁nirapekza` | `jYAna` | `sAkzAtkAra` | `sAman` | `arthy` | `am` (6 meaningful roots)
 * **Sanskrit-BERT:** `nirape` | `##k` | `##z` | `##a` | `##jya` | `##nas`... (14 noise fragments)
+## 📋 Use Cases
+- 🔍 **Sanskrit semantic search**
+- 📖 **QA over philosophical texts** (Vedanta, Nyaya, etc.)
+- 📜 **Long-form verse processing** (epics, puranas)
+- 🤖 **Training Sanskrit LLMs** with cleaner token streams
+- 🔬 **Linguistics research** & morphological analysis
 ## 🛠️ Technical Details
 * **Architecture:** Recursive Descent Splitter + Kosha (Dictionary) Lookup.