General information

This is a ScAndinavian GenerAl embedding model (SAGA), as of writing (2026-05-04) it is ranked 9th on MTEB for scandinavian tasks and is currently the highest ranked model under 1.5 billion parameters.
SAGA-embed was initialized from a ModernBert architecture and trained on approximately 250 million semantically related pairs and then fine-tuned.
The model has not been optimized for any particular task, the main goal was to create a small, easy to use model for the scandinavian languages.

Usage

The model can be used without prompts, but has also been trained using custom prompts for different tasks. Using prompts is recommended to achieve optimal performance. For standard inference, format your prompts as follows:

  • Retrieval (Queries): task: retrieval | query: {text}
  • Retrieval (Passages): title: none | text: {text}
  • Clustering: task: clustering | query: {text}
  • Classification: task: classification | query: {text}
  • Semantic Similarity: task: semantic similarity | query: {text}
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("nicher92/saga_embed_v1")

# Example: Encoding a search query
query = "task: retrieval | query: Hur mycket skatt betalar jag i Sverige?"
embedding = model.encode(query)

Added soon

Link to technical report

Downloads last month
217
Safetensors
Model size
0.4B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for nicher92/saga-embed_v1

Finetuned
(1)
this model
Finetunes
1 model

Spaces using nicher92/saga-embed_v1 6