MOSAIC: Masked Objective with Selective Adaptation for In-domain Contrastive Learning
Paper • 2510.16797 • Published
A biomedical sentence embedding model trained using the MOSAIC framework (Masked Objective with Selective Adaptation for In-domain Contrastive Learning).
This model is optimized for biomedical and clinical text, including PubMed abstracts, clinical notes, and scientific literature.
📄 Paper: MOSAIC: Masked Objective with Selective Adaptation for In-domain Contrastive Learning (EACL 2026, Findings)
💻 Training Code: github.com/rttl-ai/mosaic
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("rttl-ai/MOSAIC-embed-biomed", trust_remote_code=True)
sentences = [
"search_document: Metformin is a first-line treatment for type 2 diabetes.",
"search_query: What medications treat diabetes?"
]
embeddings = model.encode(sentences)
print(embeddings.shape)
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0]
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
tokenizer = AutoTokenizer.from_pretrained("rttl-ai/MOSAIC-embed-biomed")
model = AutoModel.from_pretrained("rttl-ai/MOSAIC-embed-biomed", trust_remote_code=True)
sentences = ["search_query: What causes Alzheimer's disease?"]
inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
embeddings = mean_pooling(outputs, inputs["attention_mask"])
embeddings = F.normalize(embeddings, p=2, dim=1)
print(embeddings.shape)
This model uses task-specific prefixes for optimal performance:
| Task | Prefix | Example |
|---|---|---|
| Document embedding | search_document: |
search_document: Aspirin inhibits platelet aggregation. |
| Query embedding | search_query: |
search_query: How does aspirin work? |
| Clustering | clustering: |
clustering: cardiac arrest treatment protocols |
| Classification | classification: |
classification: The patient presents with fever and cough. |
@inproceedings{mosaic2026,
title={MOSAIC: Masked Objective with Selective Adaptation for In-domain Contrastive Learning},
author={Pavlova, Vera and ...},
booktitle={Findings of the European Chapter of the Association for Computational Linguistics (EACL)},
year={2026}
}
Apache 2.0