General information
This is a ScAndinavian GenerAl embedding model (SAGA), as of writing (2026-05-04) it is ranked 9th on MTEB for scandinavian tasks and is currently the highest ranked model under 1.5 billion parameters.
SAGA-embed was initialized from a ModernBert architecture and trained on approximately 250 million semantically related pairs and then fine-tuned.
The model has not been optimized for any particular task, the main goal was to create a small, easy to use model for the scandinavian languages.
Usage
The model can be used without prompts, but has also been trained using custom prompts for different tasks. Using prompts is recommended to achieve optimal performance. For standard inference, format your prompts as follows:
- Retrieval (Queries):
task: retrieval | query: {text} - Retrieval (Passages):
title: none | text: {text} - Clustering:
task: clustering | query: {text} - Classification:
task: classification | query: {text} - Semantic Similarity:
task: semantic similarity | query: {text}
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("nicher92/saga_embed_v1")
# Example: Encoding a search query
query = "task: retrieval | query: Hur mycket skatt betalar jag i Sverige?"
embedding = model.encode(query)
Added soon
Link to technical report
- Downloads last month
- 217