General information

This is a ScAndinavian GenerAl embedding model (SAGA), as of writing (2026-05-04) it is ranked 9th on MTEB for scandinavian tasks and is currently the highest ranked model under 1.5 billion parameters.
SAGA-embed was initialized from a ModernBert architecture and trained on approximately 250 million semantically related pairs and then fine-tuned.
The model has not been optimized for any particular task, the main goal was to create a small, easy to use model for the scandinavian languages.

Usage

The model can be used without prompts, but has also been trained using custom prompts for different tasks. Using prompts is recommended to achieve optimal performance. For standard inference, format your prompts as follows:

Retrieval (Queries): task: retrieval | query: {text}
Retrieval (Passages): title: none | text: {text}
Clustering: task: clustering | query: {text}
Classification: task: classification | query: {text}
Semantic Similarity: task: semantic similarity | query: {text}

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("nicher92/saga_embed_v1")

# Example: Encoding a search query
query = "task: retrieval | query: Hur mycket skatt betalar jag i Sverige?"
embedding = model.encode(query)

Added soon

Link to technical report

Downloads last month: 217

Safetensors

Model size

0.4B params

Tensor type

F32

Model tree for nicher92/saga-embed_v1

Base model

answerdotai/ModernBERT-base

Finetuned

AI-Sweden-Models/ModernBERT-base

Finetuned

(1)

this model

Finetunes

1 model

nicher92
/

saga-embed_v1

General information

Usage

Added soon

Model tree for nicher92/saga-embed_v1

Spaces using nicher92/saga-embed_v1 6