KalanaPabasara commited on
Commit
9fe0b67
·
1 Parent(s): f28d091

Fix README encoding (garbled UTF-8 Sinhala/Unicode characters)

Browse files
Files changed (1) hide show
  1. README.md +20 -21
README.md CHANGED
@@ -10,7 +10,7 @@ pinned: false
10
  license: mit
11
  ---
12
 
13
- # සිංCode — Singlish to Sinhala Transliterator
14
 
15
  A model-driven, context-aware back-transliteration system that converts Romanised Sinhala (Singlish) to native Sinhala script.
16
 
@@ -18,25 +18,25 @@ A model-driven, context-aware back-transliteration system that converts Romanise
18
 
19
  ```
20
  Input sentence
21
- │
22
- â–¼
23
  Word Tokenizer
24
- │
25
- ├─ Sinhala script? ──────────────────────► Pass through unchanged
26
- │
27
- ├─ English vocab (len ≥ 3)? ─────────────► Pass through unchanged
28
- │
29
- └─ Singlish word?
30
- │
31
- â–¼
32
  ByT5-small seq2seq
33
  (top-5 candidates)
34
- │
35
- â–¼
36
  XLM-RoBERTa MLM reranker
37
  (contextual scoring)
38
- │
39
- â–¼
40
  Best candidate
41
  ```
42
 
@@ -44,18 +44,18 @@ Word Tokenizer
44
 
45
  | Model | Role | Hub ID |
46
  |-------|------|--------|
47
- | ByT5-small | Singlish → Sinhala candidate generation | `Kalana001/byt5-small-singlish-sinhala` |
48
  | XLM-RoBERTa | Contextual MLM reranking | `Kalana001/xlm-roberta-base-finetuned-sinhala` |
49
  | mBart50 | Full-sentence Sinhala output mode | `Kalana001/mbart50-large-singlish-sinhala` |
50
 
51
  ## Modes
52
 
53
- - **Code-Mixed Output** — Retains English words where contextually appropriate; Singlish words are transliterated using ByT5 + XLM-RoBERTa reranking.
54
- - **Full Sinhala Output** — Transliterates the entire sentence to Sinhala script using mBart50.
55
 
56
  ## Environment Variables (optional)
57
 
58
- Set these in HF Spaces → Settings → Repository secrets to enable Supabase feedback storage:
59
 
60
  | Variable | Description |
61
  |----------|-------------|
@@ -64,5 +64,4 @@ Set these in HF Spaces → Settings → Repository secrets to enable S
64
  | `SUPABASE_SERVICE_ROLE_KEY` | Supabase service role key |
65
  | `SUPABASE_FEEDBACK_TABLE` | Table name (default: `feedback_submissions`) |
66
 
67
- If not set, feedback is saved locally to `misc/feedback_submissions.jsonl`.
68
-
 
10
  license: mit
11
  ---
12
 
13
+ # සිංCode Singlish to Sinhala Transliterator
14
 
15
  A model-driven, context-aware back-transliteration system that converts Romanised Sinhala (Singlish) to native Sinhala script.
16
 
 
18
 
19
  ```
20
  Input sentence
21
+
22
+
23
  Word Tokenizer
24
+
25
+ ├─ Sinhala script? ──────────────────────────► Pass through unchanged
26
+
27
+ ├─ English vocab (len 3)? ─────────────────► Pass through unchanged
28
+
29
+ └─ Singlish word?
30
+
31
+
32
  ByT5-small seq2seq
33
  (top-5 candidates)
34
+
35
+
36
  XLM-RoBERTa MLM reranker
37
  (contextual scoring)
38
+
39
+
40
  Best candidate
41
  ```
42
 
 
44
 
45
  | Model | Role | Hub ID |
46
  |-------|------|--------|
47
+ | ByT5-small | Singlish Sinhala candidate generation | `Kalana001/byt5-small-singlish-sinhala` |
48
  | XLM-RoBERTa | Contextual MLM reranking | `Kalana001/xlm-roberta-base-finetuned-sinhala` |
49
  | mBart50 | Full-sentence Sinhala output mode | `Kalana001/mbart50-large-singlish-sinhala` |
50
 
51
  ## Modes
52
 
53
+ - **Code-Mixed Output** Retains English words where contextually appropriate; Singlish words are transliterated using ByT5 + XLM-RoBERTa reranking.
54
+ - **Full Sinhala Output** Transliterates the entire sentence to Sinhala script using mBart50.
55
 
56
  ## Environment Variables (optional)
57
 
58
+ Set these in HF Spaces Settings Repository secrets to enable Supabase feedback storage:
59
 
60
  | Variable | Description |
61
  |----------|-------------|
 
64
  | `SUPABASE_SERVICE_ROLE_KEY` | Supabase service role key |
65
  | `SUPABASE_FEEDBACK_TABLE` | Table name (default: `feedback_submissions`) |
66
 
67
+ If not set, feedback is saved locally to `misc/feedback_submissions.jsonl`.