ALIA MrBERT Spanish Biomedical and Healthcare Reranker Model

This repository contains ALIA MrBERT Spanish Biomedical and Healthcare Reranker, a Spanish biomedical domain cross-encoder (reranker) model for information retrieval and document ranking tasks. It is built upon MrBERT-es, a bilingual (Spanish–English) foundational language model based on the ModernBERT architecture, and fine-tuned on domain-specific biomedical data using a Curriculum Learning strategy.

DISCLAIMER: This model is a domain-specific proof-of-concept designed to demonstrate retrieval capabilities in the Spanish biomedical domain. While optimized for this domain, results should be verified against official clinical sources and expert judgment. The model may fail in out-of-domain or adversarial inputs.


Model Details

Model Lineage

ModernBERT (architecture)
       ↓
  MrBERT-es (BSC-LT)
  Bilingual ES/EN encoder
  150M parameters
       ↓
  ALIA-MrBERT-es-biomedical-reranker (SINAI)
  Biomedical domain fine-tuning
  Curriculum Learning + Hard Negatives

Key Features

  • 🔍 Domain: Spanish biomedical texts
  • 📐 Architecture: ModernBERT Cross-Encoder (reranker)
  • 📏 Context length: up to 8,192 tokens
  • 🎓 Training strategy: Curriculum Learning (easy → medium → hard)
  • ⚙️ Negative mining: Positive-Aware Hard Negative Mining

Architecture

This model uses the same base architecture as MrBERT-es, formatted as a Cross-Encoder for sentence pair classification:

Base Architecture ModernBERT
Total Parameters ~150M
Hidden size 768
Intermediate size 1,152
Attention heads 12
Hidden layers 22
Context length 8,192 tokens
Vocabulary size 51,200
Precision bfloat16
Model Type Cross-Encoder

Training

Training Strategy: Curriculum Learning

The model was fine-tuned using a Curriculum Learning strategy, progressively increasing the difficulty of training examples. For this Cross-Encoder, the training focused on the most challenging examples: pairs that a Bi-Encoder might struggle to distinguish.

The dataset consists of text pairs incorporating hard negatives mined from the corpus SINAI/ALIA-biomedical-hard-negatives/train. For the Cross-Encoder, the data is flattened into independent {query, document, label} pairs, where label is either 1.0 (relevant) or 0.0 (irrelevant).

Refinement with Hard Negatives: Training uses mined hard negatives to force the model to distinguish fine-grained nuances. Examples are processed in order of increasing difficulty (easy → medium → hard) to improve convergence and generalization.

Hyperparameter Optimization

Before training, hyperparameter search was conducted using Optuna (20 trials) to maximize NDCG@10 (with fallback to MRR@10) on a control subset:

  • Sampler: TPESampler (Tree-structured Parzen Estimator)
  • Pruner: MedianPruner with an OptunaPruningCallback reporting NDCG@10
  • Evaluator: CrossEncoderRerankingEvaluator

Final Training Hyperparameters

Hyperparameter Value Description
Learning Rate 4.8498×10⁻⁵ Nominal learning rate
Batch Size 32 Global batch size per device
Gradient Accumulation 4 Simulates larger effective batches
Warmup Ratio 0.1345 Linear LR warmup during the first 11% of steps
Weight Decay 0.0416 L2 regularization
Optimizer AdamW Standard HuggingFace Trainer optimizer
Precision bf16 Bfloat16 for supported architectures
Max Sequence Length 8,192 Maximum tokens processed for the concatenated (query, doc) pair
Loss Function BinaryCrossEntropyLoss Treats pairs as an independent binary classification problem
Gradient Checkpointing Enabled Memory optimization for long contexts (use_reentrant=False)

Training Framework

Component Details
Library PyTorch, sentence-transformers, HuggingFace datasets
Distributed DDP (Distributed Data Parallel) via torchrun
Memory optimization Gradient Checkpointing (expandable_segments:True)
Logging WandB (via report_to=wandb)

Intended Use

Direct Use

This model is designed for document reranking and semantic matching tasks in the Spanish biomedical domain. Primary use cases include:

  • RAG pipelines: Reranking retrieved context chunks for language models
  • Search pipelines: Improving initial retrieval (e.g. BM25 or Bi-encoder) by doing precise cross-encoding over the top-k results
  • Biomedical text matching: Determining high-resolution entailment or relevance between queries and biomedical passages

Out-of-Scope Use

  • General-domain retrieval (the model is specialized for biomedical Spanish)
  • Fast, large-scale search across millions of documents (use a Bi-encoder first, then rerank the top-k results with this Cross-encoder)
  • Cross-lingual retrieval beyond Spanish

How to Use

With sentence-transformers

from sentence_transformers import CrossEncoder

model = CrossEncoder("SINAI/ALIA-MrBERT-es-biomedical-reranker")

query = "¿Cuáles son los síntomas principales de la insuficiencia cardíaca?"
documents = [
    "La insuficiencia cardíaca puede causar disnea, fatiga y edema periférico...",
    "El tratamiento inicial incluye control de la presión arterial y ajuste farmacológico...",
    "El subsidio de incapacidad temporal requiere un certificado médico en vigor.",
]

# We want to score the query with each document
pairs = [[query, doc] for doc in documents]

scores = model.predict(pairs)
print(scores)

Evaluation

The model was evaluated using the MTEB (Massive Text Embedding Benchmark) framework, adapted for the biomedical domain. The main reported metric is NDCG@10 (Normalized Discounted Cumulative Gain at k=10), which is the standard metric used in retrieval leaderboards and aligns with the metric reported in the MrBERT family.

Evaluation Datasets

Dataset Category Description
miracl Reranking Spanish subset of the MIRACL (mteb/MIRACLReranking)
esci Reranking ESCI dataset for Spanish language (mteb/ESCIReranking)
CoWeSe Retrieval Generated open-ended questions from the CoWeSe corpus (chrisnb1/cowese-qa-dataset)
AbSanitas Retrieval Spanish biomedical information retrieval dataset built from biomedical texts collected from official academic repositories and open-access sources (BSC-LT/AbSanitas)
pairs800 Retrieval Subset of 800 biomedical evaluation pairs (query + passage) derived from SINAI/ALIA-biomedical-hard-negatives/test.
pairs1.6k Retrieval Subset of 1.6k biomedical evaluation pairs (query + passage) derived from SINAI/ALIA-biomedical-hard-negatives/test.

Results

The following table reports the performance of the model compared to leading commercial/open-weight generalist models:

Model miracl esci CoWeSe AbSanitas pairs800 pairs1.6k
BAAI/bge-reranker-v2-m3 0.6777 0.8321 0.9598 0.9966 0.9989 0.9987
nvidia/llama-nemotron-rerank-1b-v2 0.6258 0.8004 0.8957 0.9938 0.9878 0.9875
tomaarsen/Qwen3-Reranker-0.6B-seq-cls 0.6724 0.8133 0.9223 0.9983 0.9969 0.9942
ALIA-MrBERT-es-biomedical-reranker (ours) 0.5509 0.8164 0.9625 0.9987 1 0.9992

Note: The BGE reranker is a significantly larger scale model (billions of parameters), yet our domain-specific 150M parameter cross-encoder performs comparably or better on specialized biomedical Spanish datasets (e.g., CoWeSe, AbSanitas, pairs800, pairs1.6k).


Limitations and Biases

Known Limitations

  • Domain specificity: The model is highly optimized for Spanish biomedical texts. Its zero-shot capabilities on general domains are weaker compared to massive generalist rerankers.
  • Latency: Being a cross-encoder, it can be computationally heavy to score many document-query pairs. Use it to rerank a maximum of typical 20-100 top documents fetched by a fast bi-encoder.
  • Clinical accuracy: Semantic similarity does not guarantee clinical or medical correctness.

Biases

  • The model reflects biases present in Spanish biomedical literature, clinical records, and health-related corpora.

Additional Information

License

Apache License, Version 2.0

Citation

If you use this model in your research, please cite:

@misc{ALIA-MrBERT-es-biomedical-reranker,
  title        = {ALIA MrBERT Spanish Biomedical and Healthcare Reranker Model},
  author       = {SINAI Research Group, Universidad de Jaén},
  year         = {2026},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/SINAI/ALIA-MrBERT-es-biomedical-reranker}}
}

Please also cite the base model:

@misc{tamayo2026mrbertmodernmultilingualencoders,
      title={MrBERT: Modern Multilingual Encoders via Vocabulary, Domain, and Dimensional Adaptation}, 
      author={Daniel Tamayo and Iñaki Lacunza and Paula Rivera-Hidalgo and Severino Da Dalt and Javier Aula-Blasco and Aitor Gonzalez-Agirre and Marta Villegas},
      year={2026},
      eprint={2602.21379},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2602.21379}, 
}

Funding

This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project ALIA.

Acknowledgments

This dataset has been generated thanks to CEATIC (Centro de Estudios Avanzados en Tecnologías de la Información y de la Comunicación) – UJA (Universidad de Jaén) which provided the needed computational resources on its clusters.


Contact: ALIA Project - SINAI Research Group - Universidad de Jaén

More Information: SINAI Research Group | ALIA-UJA Project

Downloads last month
22
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SINAI/ALIA-MrBERT-es-biomedical-reranker

Base model

BSC-LT/MrBERT
Finetuned
BSC-LT/MrBERT-es
Finetuned
(10)
this model

Papers for SINAI/ALIA-MrBERT-es-biomedical-reranker