File size: 2,293 Bytes
8bb4667
 
 
 
 
cfc3719
8bb4667
 
 
 
f5173d8
8bb4667
 
71b88ab
32dda51
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8d638d4
32dda51
33ddf95
8d638d4
32dda51
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
---
title: SinCode
emoji: πŸ’»
colorFrom: indigo
colorTo: green
sdk: streamlit
app_file: app.py
pinned: false
license: mit
short_description: Context-Aware Transliteration
sdk_version: 1.53.1
---

# SinCode: Neuro-Symbolic Transliteration Prototype

> **Context-Aware Singlish-to-Sinhala Transliteration with Code-Switching Support.**

Welcome to the interim prototype of **SinCode**, a final-year research project designed to solve the ambiguity of transliterating "Singlish" (phonetic Sinhala) into native Sinhala script.

## πŸš€ Key Features

* **🧠 Hybrid Neuro-Symbolic Engine:** Combines the speed of rule-based logic with the contextual understanding of Deep Learning (XLM-Roberta).
* **πŸ”€ Adaptive Code-Switching:** Intelligently detects English words (e.g., *"Assignment"*, *"Presentation"*) mixed within Sinhala sentences and preserves them automatically.
* **πŸ“š Massive Vocabulary:** Powered by an optimized dictionary of **5.9 Million** Sinhala words to ensure high-accuracy suggestions.
* **⚑ Contextual Disambiguation:** Resolves ambiguous terms (e.g., detecting if *"nisa"* means *because* or *near*) based on the full sentence context.

## πŸ› οΈ How to Use

1.  **Type** your Singlish sentence in the input box.
2.  Click the **Transliterate** button.
3.  View the **Result**.
4.  (Optional) Expand the **"See How It Works"** section to view the real-time scoring logic used by the system.

## πŸ—οΈ System Architecture

This prototype utilizes a **Tiered Decoding Strategy**:
1.  **Tier 1 (English Filter):** Checks the Google-20k English Corpus to filter out technical terms.
2.  **Tier 2 (Dictionary Lookup):** Scans the 5.9M word database for exact Sinhala matches.
3.  **Tier 3 (Phonetic Rules):** Generates Sinhala text for unknown words using a rule-based engine.
4.  **Tier 4 (Neural Ranking):** The **XLM-R** model scores all possible candidates to pick the most grammatically correct sequence.

## ⚠️ Disclaimer

This is an **Interim Prototype** for demonstration purposes.
* While accurate for common phrases, edge cases may still exist.
* The system is currently optimized for demonstration performance and will be fine-tuned further.

---
**Developer:** Kalana Chandrasekara

**Supervisor:** Hiruni Samarage


*Final Year Research Project (2026)*