Update README.md
Browse files
README.md
CHANGED
|
@@ -11,4 +11,41 @@ short_description: Context-Aware Transliteration
|
|
| 11 |
sdk_version: 1.53.1
|
| 12 |
---
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
sdk_version: 1.53.1
|
| 12 |
---
|
| 13 |
|
| 14 |
+
# SinCode: Neuro-Symbolic Transliteration Prototype 🇱🇰
|
| 15 |
+
|
| 16 |
+
> **Context-Aware Singlish-to-Sinhala Transliteration with Code-Switching Support.**
|
| 17 |
+
|
| 18 |
+
Welcome to the interim prototype of **SinCode**, a final-year research project designed to solve the ambiguity of transliterating "Singlish" (phonetic Sinhala) into native Sinhala script.
|
| 19 |
+
|
| 20 |
+
## 🚀 Key Features
|
| 21 |
+
|
| 22 |
+
* **🧠 Hybrid Neuro-Symbolic Engine:** Combines the speed of rule-based logic with the contextual understanding of Deep Learning (XLM-Roberta).
|
| 23 |
+
* **🔀 Adaptive Code-Switching:** Intelligently detects English words (e.g., *"Assignment"*, *"Presentation"*) mixed within Sinhala sentences and preserves them automatically.
|
| 24 |
+
* **📚 Massive Vocabulary:** Powered by an optimized dictionary of **5.9 Million** Sinhala words to ensure high-accuracy suggestions.
|
| 25 |
+
* **⚡ Contextual Disambiguation:** Resolves ambiguous terms (e.g., detecting if *"nisa"* means *because* or *near*) based on the full sentence context.
|
| 26 |
+
|
| 27 |
+
## 🛠️ How to Use
|
| 28 |
+
|
| 29 |
+
1. **Type** your Singlish sentence in the input box.
|
| 30 |
+
2. Click the **Transliterate** button.
|
| 31 |
+
3. View the **Result**.
|
| 32 |
+
4. (Optional) Expand the **"See How It Works"** section to view the real-time scoring logic used by the system.
|
| 33 |
+
|
| 34 |
+
## 🏗️ System Architecture
|
| 35 |
+
|
| 36 |
+
This prototype utilizes a **Tiered Decoding Strategy**:
|
| 37 |
+
1. **Tier 1 (English Filter):** Checks the Google-20k English Corpus to filter out technical terms.
|
| 38 |
+
2. **Tier 2 (Dictionary Lookup):** Scans the 5.9M word database for exact Sinhala matches.
|
| 39 |
+
3. **Tier 3 (Phonetic Rules):** Generates Sinhala text for unknown words using a rule-based engine.
|
| 40 |
+
4. **Tier 4 (Neural Ranking):** The **XLM-R** model scores all possible candidates to pick the most grammatically correct sequence.
|
| 41 |
+
|
| 42 |
+
## ⚠️ Disclaimer
|
| 43 |
+
|
| 44 |
+
This is an **Interim Prototype** for demonstration purposes.
|
| 45 |
+
* While accurate for common phrases, edge cases may still exist.
|
| 46 |
+
* The system is currently optimized for demonstration performance and will be fine-tuned further.
|
| 47 |
+
|
| 48 |
+
---
|
| 49 |
+
**Developer:** Kalana Chandrasekara
|
| 50 |
+
**Supervisor:** Hiruni Samarage
|
| 51 |
+
*Final Year Research Project (2026)*
|