You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

UgannA Siyabasa V2 — FastText Sinhala Embedding Model 🇱🇰

UgannA_SiyabasaV2 (උගන්නැ සියබස) is the first public FastText embedding model released by Remeinium Corp. The name comes from Kumaratunga Munidasa’s timeless quote: “උගන්නැ සියබස – මත් වන්නැ එහි රසයෙන්” Learn Sinhala – be intoxicated with its beauty.

Just as Munidasa envisioned nurturing the Sinhala language, this model represents teaching it to machines.

📌 Key Features

Type: FastText
Vector size: 300 dimensions
File size: ~3.94GB
Training data: 17GB processed Sinhala text

🔧 Usage

import fasttext
# Load model
model = fasttext.load_model("Remeinium/UgannA_Siyabasa/UgannA_Siyabasa.bin")

# Get vector for a word
vector = model.get_word_vector("අම්මා")

# Get nearest neighbors
neighbors = model.get_nearest_neighbors("අම්මා", k=10)
print(neighbors)

Use API

Test Live: Embedding Playground
API Docs: Go to API Console

📂 Training Data

Processed and cleaned training corpus: ~17GB
Preprocessing: tokenization, normalization, deduplication

🗜️ License

This model is released under the Remeinium Open Model License (ROML).
It permits research and commercial use with attribution.
See the LICENSE file for full terms.

⚠️ Limitations

May reflect cultural/linguistic biases from sources.
Optimized for Sinhala; not multilingual.

🤝 Collaboration

You are welcome to:

Use this model for research & your projects
Share improvements, benchmarks, or downstream applications
Contact : 📧 support@remeinium.com

Downloads last month: 1

Remeinium
/

UgannA_SiyabasaV2