KalanaPabasara commited on
Commit Β·
4a1077b
1
Parent(s): 9fe0b67
Make README ASCII-safe to avoid mojibake on web renderers
Browse files
README.md
CHANGED
|
@@ -10,7 +10,7 @@ pinned: false
|
|
| 10 |
license: mit
|
| 11 |
---
|
| 12 |
|
| 13 |
-
#
|
| 14 |
|
| 15 |
A model-driven, context-aware back-transliteration system that converts Romanised Sinhala (Singlish) to native Sinhala script.
|
| 16 |
|
|
@@ -18,25 +18,25 @@ A model-driven, context-aware back-transliteration system that converts Romanise
|
|
| 18 |
|
| 19 |
```
|
| 20 |
Input sentence
|
| 21 |
-
|
| 22 |
-
|
| 23 |
Word Tokenizer
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
ByT5-small seq2seq
|
| 33 |
(top-5 candidates)
|
| 34 |
-
|
| 35 |
-
|
| 36 |
XLM-RoBERTa MLM reranker
|
| 37 |
(contextual scoring)
|
| 38 |
-
|
| 39 |
-
|
| 40 |
Best candidate
|
| 41 |
```
|
| 42 |
|
|
@@ -44,18 +44,18 @@ Word Tokenizer
|
|
| 44 |
|
| 45 |
| Model | Role | Hub ID |
|
| 46 |
|-------|------|--------|
|
| 47 |
-
| ByT5-small | Singlish
|
| 48 |
| XLM-RoBERTa | Contextual MLM reranking | `Kalana001/xlm-roberta-base-finetuned-sinhala` |
|
| 49 |
| mBart50 | Full-sentence Sinhala output mode | `Kalana001/mbart50-large-singlish-sinhala` |
|
| 50 |
|
| 51 |
## Modes
|
| 52 |
|
| 53 |
-
- **Code-Mixed Output**
|
| 54 |
-
- **Full Sinhala Output**
|
| 55 |
|
| 56 |
## Environment Variables (optional)
|
| 57 |
|
| 58 |
-
Set these in HF Spaces
|
| 59 |
|
| 60 |
| Variable | Description |
|
| 61 |
|----------|-------------|
|
|
|
|
| 10 |
license: mit
|
| 11 |
---
|
| 12 |
|
| 13 |
+
# SinCode - Singlish to Sinhala Transliterator
|
| 14 |
|
| 15 |
A model-driven, context-aware back-transliteration system that converts Romanised Sinhala (Singlish) to native Sinhala script.
|
| 16 |
|
|
|
|
| 18 |
|
| 19 |
```
|
| 20 |
Input sentence
|
| 21 |
+
|
|
| 22 |
+
v
|
| 23 |
Word Tokenizer
|
| 24 |
+
|
|
| 25 |
+
+-- Sinhala script? -------------------------> Pass through unchanged
|
| 26 |
+
|
|
| 27 |
+
+-- English vocab (len >= 3)? --------------> Pass through unchanged
|
| 28 |
+
|
|
| 29 |
+
`-- Singlish word?
|
| 30 |
+
|
|
| 31 |
+
v
|
| 32 |
ByT5-small seq2seq
|
| 33 |
(top-5 candidates)
|
| 34 |
+
|
|
| 35 |
+
v
|
| 36 |
XLM-RoBERTa MLM reranker
|
| 37 |
(contextual scoring)
|
| 38 |
+
|
|
| 39 |
+
v
|
| 40 |
Best candidate
|
| 41 |
```
|
| 42 |
|
|
|
|
| 44 |
|
| 45 |
| Model | Role | Hub ID |
|
| 46 |
|-------|------|--------|
|
| 47 |
+
| ByT5-small | Singlish -> Sinhala candidate generation | `Kalana001/byt5-small-singlish-sinhala` |
|
| 48 |
| XLM-RoBERTa | Contextual MLM reranking | `Kalana001/xlm-roberta-base-finetuned-sinhala` |
|
| 49 |
| mBart50 | Full-sentence Sinhala output mode | `Kalana001/mbart50-large-singlish-sinhala` |
|
| 50 |
|
| 51 |
## Modes
|
| 52 |
|
| 53 |
+
- **Code-Mixed Output** - Retains English words where contextually appropriate; Singlish words are transliterated using ByT5 + XLM-RoBERTa reranking.
|
| 54 |
+
- **Full Sinhala Output** - Transliterates the entire sentence to Sinhala script using mBart50.
|
| 55 |
|
| 56 |
## Environment Variables (optional)
|
| 57 |
|
| 58 |
+
Set these in HF Spaces -> Settings -> Repository secrets to enable Supabase feedback storage:
|
| 59 |
|
| 60 |
| Variable | Description |
|
| 61 |
|----------|-------------|
|