SinCode / README.md
Kalana001's picture
Update README.md
8d638d4 verified

A newer version of the Streamlit SDK is available: 1.54.0

Upgrade
metadata
title: SinCode
emoji: πŸ’»
colorFrom: indigo
colorTo: green
sdk: streamlit
app_file: app.py
pinned: false
license: mit
short_description: Context-Aware Transliteration
sdk_version: 1.53.1

SinCode: Neuro-Symbolic Transliteration Prototype

Context-Aware Singlish-to-Sinhala Transliteration with Code-Switching Support.

Welcome to the interim prototype of SinCode, a final-year research project designed to solve the ambiguity of transliterating "Singlish" (phonetic Sinhala) into native Sinhala script.

πŸš€ Key Features

  • 🧠 Hybrid Neuro-Symbolic Engine: Combines the speed of rule-based logic with the contextual understanding of Deep Learning (XLM-Roberta).
  • πŸ”€ Adaptive Code-Switching: Intelligently detects English words (e.g., "Assignment", "Presentation") mixed within Sinhala sentences and preserves them automatically.
  • πŸ“š Massive Vocabulary: Powered by an optimized dictionary of 5.9 Million Sinhala words to ensure high-accuracy suggestions.
  • ⚑ Contextual Disambiguation: Resolves ambiguous terms (e.g., detecting if "nisa" means because or near) based on the full sentence context.

πŸ› οΈ How to Use

  1. Type your Singlish sentence in the input box.
  2. Click the Transliterate button.
  3. View the Result.
  4. (Optional) Expand the "See How It Works" section to view the real-time scoring logic used by the system.

πŸ—οΈ System Architecture

This prototype utilizes a Tiered Decoding Strategy:

  1. Tier 1 (English Filter): Checks the Google-20k English Corpus to filter out technical terms.
  2. Tier 2 (Dictionary Lookup): Scans the 5.9M word database for exact Sinhala matches.
  3. Tier 3 (Phonetic Rules): Generates Sinhala text for unknown words using a rule-based engine.
  4. Tier 4 (Neural Ranking): The XLM-R model scores all possible candidates to pick the most grammatically correct sequence.

⚠️ Disclaimer

This is an Interim Prototype for demonstration purposes.

  • While accurate for common phrases, edge cases may still exist.
  • The system is currently optimized for demonstration performance and will be fine-tuned further.

Developer: Kalana Chandrasekara

Supervisor: Hiruni Samarage

Final Year Research Project (2026)