ArthaLabs commited on
Commit
9166de1
·
verified ·
1 Parent(s): 5b16a53

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +10 -0
README.md CHANGED
@@ -22,6 +22,8 @@ tags:
22
 
23
  [![Demo](https://img.shields.io/badge/🚀_Try_Demo-HuggingFace_Spaces-blueviolet?style=for-the-badge)](https://huggingface.co/spaces/ArthaLabs/panini-tokenizer-demo)
24
 
 
 
25
  ## 🚨 The Problem
26
 
27
  Statistical tokenizers (BPE/WordPiece) systematically underperform on Sanskrit because they do not model **Sandhi**(phonetic fusion).
@@ -74,6 +76,14 @@ By strictly adhering to grammar, Panini Tokenizer drastically reduces sequence l
74
  * **Panini:** `▁nirapekza` | `jYAna` | `sAkzAtkAra` | `sAman` | `arthy` | `am` (6 meaningful roots)
75
  * **Sanskrit-BERT:** `nirape` | `##k` | `##z` | `##a` | `##jya` | `##nas`... (14 noise fragments)
76
 
 
 
 
 
 
 
 
 
77
  ## 🛠️ Technical Details
78
 
79
  * **Architecture:** Recursive Descent Splitter + Kosha (Dictionary) Lookup.
 
22
 
23
  [![Demo](https://img.shields.io/badge/🚀_Try_Demo-HuggingFace_Spaces-blueviolet?style=for-the-badge)](https://huggingface.co/spaces/ArthaLabs/panini-tokenizer-demo)
24
 
25
+ > **Why it matters:** *Fewer tokens = more usable context per input = better learning & longer text coverage.*
26
+
27
  ## 🚨 The Problem
28
 
29
  Statistical tokenizers (BPE/WordPiece) systematically underperform on Sanskrit because they do not model **Sandhi**(phonetic fusion).
 
76
  * **Panini:** `▁nirapekza` | `jYAna` | `sAkzAtkAra` | `sAman` | `arthy` | `am` (6 meaningful roots)
77
  * **Sanskrit-BERT:** `nirape` | `##k` | `##z` | `##a` | `##jya` | `##nas`... (14 noise fragments)
78
 
79
+ ## 📋 Use Cases
80
+
81
+ - 🔍 **Sanskrit semantic search**
82
+ - 📖 **QA over philosophical texts** (Vedanta, Nyaya, etc.)
83
+ - 📜 **Long-form verse processing** (epics, puranas)
84
+ - 🤖 **Training Sanskrit LLMs** with cleaner token streams
85
+ - 🔬 **Linguistics research** & morphological analysis
86
+
87
  ## 🛠️ Technical Details
88
 
89
  * **Architecture:** Recursive Descent Splitter + Kosha (Dictionary) Lookup.