You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

UgannA Siyabasa V2 — FastText Sinhala Embedding Model 🇱🇰

UgannA_SiyabasaV2 (උගන්නැ සියබස) is the first public FastText embedding model released by Remeinium Corp. The name comes from Kumaratunga Munidasa’s timeless quote: “උගන්නැ සියබස – මත් වන්නැ එහි රසයෙන්” Learn Sinhala – be intoxicated with its beauty.

Just as Munidasa envisioned nurturing the Sinhala language, this model represents teaching it to machines.

📌 Key Features

  • Type: FastText
  • Vector size: 300 dimensions
  • File size: ~3.94GB
  • Training data: 17GB processed Sinhala text

🔧 Usage

import fasttext
# Load model
model = fasttext.load_model("Remeinium/UgannA_Siyabasa/UgannA_Siyabasa.bin")

# Get vector for a word
vector = model.get_word_vector("අම්මා")

# Get nearest neighbors
neighbors = model.get_nearest_neighbors("අම්මා", k=10)
print(neighbors)

Use API

📂 Training Data

  • Processed and cleaned training corpus: ~17GB
  • Preprocessing: tokenization, normalization, deduplication

🗜️ License

This model is released under the Remeinium Open Model License (ROML).
It permits research and commercial use with attribution.
See the LICENSE file for full terms.

⚠️ Limitations

  • May reflect cultural/linguistic biases from sources.
  • Optimized for Sinhala; not multilingual.

🤝 Collaboration

You are welcome to:

  • Use this model for research & your projects
  • Share improvements, benchmarks, or downstream applications
  • Contact : 📧 support@remeinium.com
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Remeinium/UgannA_SiyabasaV2

Space using Remeinium/UgannA_SiyabasaV2 1