Kalana001 commited on
Commit
32dda51
·
verified ·
1 Parent(s): 3055757

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -1
README.md CHANGED
@@ -11,4 +11,41 @@ short_description: Context-Aware Transliteration
11
  sdk_version: 1.53.1
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  sdk_version: 1.53.1
12
  ---
13
 
14
+ # SinCode: Neuro-Symbolic Transliteration Prototype 🇱🇰
15
+
16
+ > **Context-Aware Singlish-to-Sinhala Transliteration with Code-Switching Support.**
17
+
18
+ Welcome to the interim prototype of **SinCode**, a final-year research project designed to solve the ambiguity of transliterating "Singlish" (phonetic Sinhala) into native Sinhala script.
19
+
20
+ ## 🚀 Key Features
21
+
22
+ * **🧠 Hybrid Neuro-Symbolic Engine:** Combines the speed of rule-based logic with the contextual understanding of Deep Learning (XLM-Roberta).
23
+ * **🔀 Adaptive Code-Switching:** Intelligently detects English words (e.g., *"Assignment"*, *"Presentation"*) mixed within Sinhala sentences and preserves them automatically.
24
+ * **📚 Massive Vocabulary:** Powered by an optimized dictionary of **5.9 Million** Sinhala words to ensure high-accuracy suggestions.
25
+ * **⚡ Contextual Disambiguation:** Resolves ambiguous terms (e.g., detecting if *"nisa"* means *because* or *near*) based on the full sentence context.
26
+
27
+ ## 🛠️ How to Use
28
+
29
+ 1. **Type** your Singlish sentence in the input box.
30
+ 2. Click the **Transliterate** button.
31
+ 3. View the **Result**.
32
+ 4. (Optional) Expand the **"See How It Works"** section to view the real-time scoring logic used by the system.
33
+
34
+ ## 🏗️ System Architecture
35
+
36
+ This prototype utilizes a **Tiered Decoding Strategy**:
37
+ 1. **Tier 1 (English Filter):** Checks the Google-20k English Corpus to filter out technical terms.
38
+ 2. **Tier 2 (Dictionary Lookup):** Scans the 5.9M word database for exact Sinhala matches.
39
+ 3. **Tier 3 (Phonetic Rules):** Generates Sinhala text for unknown words using a rule-based engine.
40
+ 4. **Tier 4 (Neural Ranking):** The **XLM-R** model scores all possible candidates to pick the most grammatically correct sequence.
41
+
42
+ ## ⚠️ Disclaimer
43
+
44
+ This is an **Interim Prototype** for demonstration purposes.
45
+ * While accurate for common phrases, edge cases may still exist.
46
+ * The system is currently optimized for demonstration performance and will be fine-tuned further.
47
+
48
+ ---
49
+ **Developer:** Kalana Chandrasekara
50
+ **Supervisor:** Hiruni Samarage
51
+ *Final Year Research Project (2026)*