Instructions to use SINAI/ALIA-MrBERT-es-biomedical-reranker with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use SINAI/ALIA-MrBERT-es-biomedical-reranker with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("SINAI/ALIA-MrBERT-es-biomedical-reranker") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Notebooks
- Google Colab
- Kaggle
ALIA MrBERT Spanish Biomedical and Healthcare Reranker Model
This repository contains ALIA MrBERT Spanish Biomedical and Healthcare Reranker, a Spanish biomedical domain cross-encoder (reranker) model for information retrieval and document ranking tasks. It is built upon MrBERT-es, a bilingual (Spanish–English) foundational language model based on the ModernBERT architecture, and fine-tuned on domain-specific biomedical data using a Curriculum Learning strategy.
DISCLAIMER: This model is a domain-specific proof-of-concept designed to demonstrate retrieval capabilities in the Spanish biomedical domain. While optimized for this domain, results should be verified against official clinical sources and expert judgment. The model may fail in out-of-domain or adversarial inputs.
Model Details
Model Lineage
ModernBERT (architecture)
↓
MrBERT-es (BSC-LT)
Bilingual ES/EN encoder
150M parameters
↓
ALIA-MrBERT-es-biomedical-reranker (SINAI)
Biomedical domain fine-tuning
Curriculum Learning + Hard Negatives
Key Features
- 🔍 Domain: Spanish biomedical texts
- 📐 Architecture: ModernBERT Cross-Encoder (reranker)
- 📏 Context length: up to 8,192 tokens
- 🎓 Training strategy: Curriculum Learning (easy → medium → hard)
- ⚙️ Negative mining: Positive-Aware Hard Negative Mining
Architecture
This model uses the same base architecture as MrBERT-es, formatted as a Cross-Encoder for sentence pair classification:
| Base Architecture | ModernBERT |
| Total Parameters | ~150M |
| Hidden size | 768 |
| Intermediate size | 1,152 |
| Attention heads | 12 |
| Hidden layers | 22 |
| Context length | 8,192 tokens |
| Vocabulary size | 51,200 |
| Precision | bfloat16 |
| Model Type | Cross-Encoder |
Training
Training Strategy: Curriculum Learning
The model was fine-tuned using a Curriculum Learning strategy, progressively increasing the difficulty of training examples. For this Cross-Encoder, the training focused on the most challenging examples: pairs that a Bi-Encoder might struggle to distinguish.
The dataset consists of text pairs incorporating hard negatives mined from the corpus SINAI/ALIA-biomedical-hard-negatives/train. For the Cross-Encoder, the data is flattened into independent {query, document, label} pairs, where label is either 1.0 (relevant) or 0.0 (irrelevant).
Refinement with Hard Negatives: Training uses mined hard negatives to force the model to distinguish fine-grained nuances. Examples are processed in order of increasing difficulty (easy → medium → hard) to improve convergence and generalization.
Hyperparameter Optimization
Before training, hyperparameter search was conducted using Optuna (20 trials) to maximize NDCG@10 (with fallback to MRR@10) on a control subset:
- Sampler: TPESampler (Tree-structured Parzen Estimator)
- Pruner: MedianPruner with an
OptunaPruningCallbackreporting NDCG@10 - Evaluator:
CrossEncoderRerankingEvaluator
Final Training Hyperparameters
| Hyperparameter | Value | Description |
|---|---|---|
| Learning Rate | 4.8498×10⁻⁵ | Nominal learning rate |
| Batch Size | 32 | Global batch size per device |
| Gradient Accumulation | 4 | Simulates larger effective batches |
| Warmup Ratio | 0.1345 | Linear LR warmup during the first 11% of steps |
| Weight Decay | 0.0416 | L2 regularization |
| Optimizer | AdamW | Standard HuggingFace Trainer optimizer |
| Precision | bf16 | Bfloat16 for supported architectures |
| Max Sequence Length | 8,192 | Maximum tokens processed for the concatenated (query, doc) pair |
| Loss Function | BinaryCrossEntropyLoss | Treats pairs as an independent binary classification problem |
| Gradient Checkpointing | Enabled | Memory optimization for long contexts (use_reentrant=False) |
Training Framework
| Component | Details |
|---|---|
| Library | PyTorch, sentence-transformers, HuggingFace datasets |
| Distributed | DDP (Distributed Data Parallel) via torchrun |
| Memory optimization | Gradient Checkpointing (expandable_segments:True) |
| Logging | WandB (via report_to=wandb) |
Intended Use
Direct Use
This model is designed for document reranking and semantic matching tasks in the Spanish biomedical domain. Primary use cases include:
- RAG pipelines: Reranking retrieved context chunks for language models
- Search pipelines: Improving initial retrieval (e.g. BM25 or Bi-encoder) by doing precise cross-encoding over the top-k results
- Biomedical text matching: Determining high-resolution entailment or relevance between queries and biomedical passages
Out-of-Scope Use
- General-domain retrieval (the model is specialized for biomedical Spanish)
- Fast, large-scale search across millions of documents (use a Bi-encoder first, then rerank the top-k results with this Cross-encoder)
- Cross-lingual retrieval beyond Spanish
How to Use
With sentence-transformers
from sentence_transformers import CrossEncoder
model = CrossEncoder("SINAI/ALIA-MrBERT-es-biomedical-reranker")
query = "¿Cuáles son los síntomas principales de la insuficiencia cardíaca?"
documents = [
"La insuficiencia cardíaca puede causar disnea, fatiga y edema periférico...",
"El tratamiento inicial incluye control de la presión arterial y ajuste farmacológico...",
"El subsidio de incapacidad temporal requiere un certificado médico en vigor.",
]
# We want to score the query with each document
pairs = [[query, doc] for doc in documents]
scores = model.predict(pairs)
print(scores)
Evaluation
The model was evaluated using the MTEB (Massive Text Embedding Benchmark) framework, adapted for the biomedical domain. The main reported metric is NDCG@10 (Normalized Discounted Cumulative Gain at k=10), which is the standard metric used in retrieval leaderboards and aligns with the metric reported in the MrBERT family.
Evaluation Datasets
| Dataset | Category | Description |
|---|---|---|
| miracl | Reranking | Spanish subset of the MIRACL (mteb/MIRACLReranking) |
| esci | Reranking | ESCI dataset for Spanish language (mteb/ESCIReranking) |
| CoWeSe | Retrieval | Generated open-ended questions from the CoWeSe corpus (chrisnb1/cowese-qa-dataset) |
| AbSanitas | Retrieval | Spanish biomedical information retrieval dataset built from biomedical texts collected from official academic repositories and open-access sources (BSC-LT/AbSanitas) |
| pairs800 | Retrieval | Subset of 800 biomedical evaluation pairs (query + passage) derived from SINAI/ALIA-biomedical-hard-negatives/test. |
| pairs1.6k | Retrieval | Subset of 1.6k biomedical evaluation pairs (query + passage) derived from SINAI/ALIA-biomedical-hard-negatives/test. |
Results
The following table reports the performance of the model compared to leading commercial/open-weight generalist models:
| Model | miracl | esci | CoWeSe | AbSanitas | pairs800 | pairs1.6k |
|---|---|---|---|---|---|---|
| BAAI/bge-reranker-v2-m3 | 0.6777 | 0.8321 | 0.9598 | 0.9966 | 0.9989 | 0.9987 |
| nvidia/llama-nemotron-rerank-1b-v2 | 0.6258 | 0.8004 | 0.8957 | 0.9938 | 0.9878 | 0.9875 |
| tomaarsen/Qwen3-Reranker-0.6B-seq-cls | 0.6724 | 0.8133 | 0.9223 | 0.9983 | 0.9969 | 0.9942 |
| ALIA-MrBERT-es-biomedical-reranker (ours) | 0.5509 | 0.8164 | 0.9625 | 0.9987 | 1 | 0.9992 |
Note: The BGE reranker is a significantly larger scale model (billions of parameters), yet our domain-specific 150M parameter cross-encoder performs comparably or better on specialized biomedical Spanish datasets (e.g., CoWeSe, AbSanitas, pairs800, pairs1.6k).
Limitations and Biases
Known Limitations
- Domain specificity: The model is highly optimized for Spanish biomedical texts. Its zero-shot capabilities on general domains are weaker compared to massive generalist rerankers.
- Latency: Being a cross-encoder, it can be computationally heavy to score many document-query pairs. Use it to rerank a maximum of typical 20-100 top documents fetched by a fast bi-encoder.
- Clinical accuracy: Semantic similarity does not guarantee clinical or medical correctness.
Biases
- The model reflects biases present in Spanish biomedical literature, clinical records, and health-related corpora.
Additional Information
License
Citation
If you use this model in your research, please cite:
@misc{ALIA-MrBERT-es-biomedical-reranker,
title = {ALIA MrBERT Spanish Biomedical and Healthcare Reranker Model},
author = {SINAI Research Group, Universidad de Jaén},
year = {2026},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/SINAI/ALIA-MrBERT-es-biomedical-reranker}}
}
Please also cite the base model:
@misc{tamayo2026mrbertmodernmultilingualencoders,
title={MrBERT: Modern Multilingual Encoders via Vocabulary, Domain, and Dimensional Adaptation},
author={Daniel Tamayo and Iñaki Lacunza and Paula Rivera-Hidalgo and Severino Da Dalt and Javier Aula-Blasco and Aitor Gonzalez-Agirre and Marta Villegas},
year={2026},
eprint={2602.21379},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2602.21379},
}
Funding
This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project ALIA.
Acknowledgments
This dataset has been generated thanks to CEATIC (Centro de Estudios Avanzados en Tecnologías de la Información y de la Comunicación) – UJA (Universidad de Jaén) which provided the needed computational resources on its clusters.
Contact: ALIA Project - SINAI Research Group - Universidad de Jaén
More Information: SINAI Research Group | ALIA-UJA Project
- Downloads last month
- 22