thekusaldarshana commited on
Commit
0e3eeeb
·
verified ·
1 Parent(s): 62b7250

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -21,6 +21,7 @@ Our mission is to create AI systems that empower people, advance research, and a
21
  **Current:**
22
 
23
  - **UgannA_Siyabasa**: A FastText Model built for Sinhala Language with 90+ simillar words retrieval rate.
 
24
  - We are working on a Language Model called GCE2.
25
 
26
  ---
 
21
  **Current:**
22
 
23
  - **UgannA_Siyabasa**: A FastText Model built for Sinhala Language with 90+ simillar words retrieval rate.
24
+ - **CleanSinhalaTextCorpus**: 20GB of Sinhala-only text (20 × 1GB .gz files when uncompressed), carefully cleaned and tokenized. It is designed for language modeling, embeddings, and NLP research
25
  - We are working on a Language Model called GCE2.
26
 
27
  ---