SinCode / README.md
Kalana001's picture
Update README.md
8d638d4 verified
---
title: SinCode
emoji: πŸ’»
colorFrom: indigo
colorTo: green
sdk: streamlit
app_file: app.py
pinned: false
license: mit
short_description: Context-Aware Transliteration
sdk_version: 1.53.1
---
# SinCode: Neuro-Symbolic Transliteration Prototype
> **Context-Aware Singlish-to-Sinhala Transliteration with Code-Switching Support.**
Welcome to the interim prototype of **SinCode**, a final-year research project designed to solve the ambiguity of transliterating "Singlish" (phonetic Sinhala) into native Sinhala script.
## πŸš€ Key Features
* **🧠 Hybrid Neuro-Symbolic Engine:** Combines the speed of rule-based logic with the contextual understanding of Deep Learning (XLM-Roberta).
* **πŸ”€ Adaptive Code-Switching:** Intelligently detects English words (e.g., *"Assignment"*, *"Presentation"*) mixed within Sinhala sentences and preserves them automatically.
* **πŸ“š Massive Vocabulary:** Powered by an optimized dictionary of **5.9 Million** Sinhala words to ensure high-accuracy suggestions.
* **⚑ Contextual Disambiguation:** Resolves ambiguous terms (e.g., detecting if *"nisa"* means *because* or *near*) based on the full sentence context.
## πŸ› οΈ How to Use
1. **Type** your Singlish sentence in the input box.
2. Click the **Transliterate** button.
3. View the **Result**.
4. (Optional) Expand the **"See How It Works"** section to view the real-time scoring logic used by the system.
## πŸ—οΈ System Architecture
This prototype utilizes a **Tiered Decoding Strategy**:
1. **Tier 1 (English Filter):** Checks the Google-20k English Corpus to filter out technical terms.
2. **Tier 2 (Dictionary Lookup):** Scans the 5.9M word database for exact Sinhala matches.
3. **Tier 3 (Phonetic Rules):** Generates Sinhala text for unknown words using a rule-based engine.
4. **Tier 4 (Neural Ranking):** The **XLM-R** model scores all possible candidates to pick the most grammatically correct sequence.
## ⚠️ Disclaimer
This is an **Interim Prototype** for demonstration purposes.
* While accurate for common phrases, edge cases may still exist.
* The system is currently optimized for demonstration performance and will be fine-tuned further.
---
**Developer:** Kalana Chandrasekara
**Supervisor:** Hiruni Samarage
*Final Year Research Project (2026)*