UgannA Siyabasa V2 — FastText Sinhala Embedding Model 🇱🇰
UgannA_SiyabasaV2 (උගන්නැ සියබස) is the first public FastText embedding model released by Remeinium Corp. The name comes from Kumaratunga Munidasa’s timeless quote: “උගන්නැ සියබස – මත් වන්නැ එහි රසයෙන්” Learn Sinhala – be intoxicated with its beauty.
Just as Munidasa envisioned nurturing the Sinhala language, this model represents teaching it to machines.
📌 Key Features
- Type: FastText
- Vector size: 300 dimensions
- File size: ~3.94GB
- Training data: 17GB processed Sinhala text
🔧 Usage
import fasttext
# Load model
model = fasttext.load_model("Remeinium/UgannA_Siyabasa/UgannA_Siyabasa.bin")
# Get vector for a word
vector = model.get_word_vector("අම්මා")
# Get nearest neighbors
neighbors = model.get_nearest_neighbors("අම්මා", k=10)
print(neighbors)
Use API
- Test Live: Embedding Playground
- API Docs: Go to API Console
📂 Training Data
- Processed and cleaned training corpus: ~17GB
- Preprocessing: tokenization, normalization, deduplication
🗜️ License
This model is released under the Remeinium Open Model License (ROML).
It permits research and commercial use with attribution.
See the LICENSE file for full terms.
⚠️ Limitations
- May reflect cultural/linguistic biases from sources.
- Optimized for Sinhala; not multilingual.
🤝 Collaboration
You are welcome to:
- Use this model for research & your projects
- Share improvements, benchmarks, or downstream applications
- Contact : 📧 support@remeinium.com
- Downloads last month
- 2