File size: 3,674 Bytes
dce400e f6d4af3 dce400e f6d4af3 dce400e f6d4af3 dce400e f6d4af3 3260b26 2ee30e8 dce400e f6d4af3 dce400e f6d4af3 dce400e f6d4af3 dce400e ae7ed46 dce400e f6d4af3 dce400e f6d4af3 dce400e f6d4af3 dce400e f6d4af3 dce400e f6d4af3 dce400e f6d4af3 c678b49 20dc0b0 dce400e f6d4af3 dce400e f6d4af3 dce400e 0b778f7 dce400e f6d4af3 0b778f7 dce400e 0b778f7 f6d4af3 dce400e f6d4af3 20dc0b0 dce400e 0b778f7 f6d4af3 dce400e f6d4af3 0b778f7 f6d4af3 dce400e f6d4af3 dce400e f6d4af3 dce400e f6d4af3 dce400e f6d4af3 dce400e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | # BioHiCL-base: Hierarchical Multi-Label Contrastive Biomedical Retriever
## Model Card
## 🔍 Overview
BioHiCL-base is a biomedical dense retriever trained with hierarchical MeSH supervision to capture fine-grained semantic relationships between biomedical texts.
Unlike traditional dense retrievers trained with binary relevance signals, BioHiCL models semantic similarity using structured multi-label supervision derived from the MeSH ontology, enabling it to capture partial semantic overlap between documents.
# ⚠️ Important: Please ensure that the `transformers` version matches exactly (4.57.3), as other versions may lead to compatibility issues or unexpected behavior.
---
## 💡 Key Features
- **Hierarchical supervision**: Leverages MeSH ontology to encode structured biomedical semantics
- **Multi-label similarity learning**: Captures graded semantic overlap beyond binary relevance
- **Contrastive + regression training**: Aligns embedding similarity with label similarity
- **Efficient**: ~0.1B parameters, suitable for deployment on a single GPU
- **Domain-adapted retriever**: Fine-tuned from a strong general-purpose bi-encoder
---
## 🧠 Model Details
- **Model type**: Bi-encoder (dense retriever)
- **Backbone**: BAAI/bge-base-en-v1.5
- **Parameters**: ~0.1B
- **Fine-tuning**: LoRA (merged into base model)
- **Max input length**: 512 tokens
- **Training data**: Biomedical abstracts annotated with MeSH labels (e.g., BioASQ-derived corpora)
---
## ⚙️ Intended Use
This model is intended for biomedical information retrieval tasks such as:
- Scientific literature search (e.g., PubMed-style retrieval)
- Biomedical document ranking
- Query–abstract semantic matching
- Benchmark evaluation on BEIR biomedical subsets
---
## ⚙️ How It Works
BioHiCL aligns:
- Embedding similarity (SimE): cosine similarity between document embeddings
- Label similarity (SimL): cosine similarity over weighted MeSH multi-label vectors
---
## ⚙️ Requirements
- python >= 3.8
- transformers == 4.57.3
> ⚠️ Important: Please ensure that the `transformers` version matches exactly (4.57.3), as other versions may lead to compatibility issues or unexpected behavior.
---
## 🚀 Usage (BEIR Evaluation)
```python
from beir import util
from beir.datasets.data_loader import GenericDataLoader
from beir.retrieval.models import SentenceBERT
from beir.retrieval.search.dense import DenseRetrievalExactSearch
from beir.retrieval.evaluation import EvaluateRetrieval
# 1. Download load the SciFact dataset
dataset = "scifact"
url = "https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/" + dataset + ".zip"
data_path = util.download_and_unzip(url, "datasets")
corpus, queries, qrels = GenericDataLoader(data_path).load(split="test")
# > ⚠️ Important: Please ensure that the `transformers` version matches exactly (4.57.3), as other versions may lead to compatibility issues or unexpected behavior.
model_name = "LunaLan07/BioHiCL-base"
model = SentenceBERT(model_name)
retriever = DenseRetrievalExactSearch(model, batch_size=16)
top_k = 10 # top 10 documents per query
results = retriever.search(corpus, queries, top_k=top_k, score_function="cos_sim")
k_values = [1, 3, 5, 10]
ndcg, _map, recall, precision = EvaluateRetrieval.evaluate(qrels, results, k_values=k_values)
```
## 📖 Citation
If you use this model, please cite:
```bibtex
@article{lan2026biohicl,
title={BioHiCL: Hierarchical Multi-Label Contrastive Learning for Biomedical Retrieval with MeSH Labels},
author={Lan, Mengfei and Zheng, Lecheng and Kilicoglu, Halil},
booktitle={ACL 2026},
year={2026}
}
``` |