File size: 3,676 Bytes
008678b 43e95ed a62cf97 e3b752b 817b4e5 43e95ed 9ca21f2 43e95ed 90d168c e3b752b 43e95ed e3b752b 9ca21f2 817b4e5 a62cf97 43e95ed e3b752b 43e95ed e3b752b 9ca21f2 817b4e5 9ca21f2 817b4e5 a62cf97 43e95ed e3b752b 43e95ed e3b752b a62cf97 e3b752b 43e95ed 5fe9485 f75bd32 a62cf97 0c5deb2 a62cf97 43e95ed e3b752b 2b8c0e4 dd568c6 2b8c0e4 43e95ed dd568c6 2b8c0e4 dd568c6 43e95ed 2b8c0e4 fba173e f75bd32 817b4e5 dd568c6 2b8c0e4 fba173e dd568c6 32834bf | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | # BioHiCL-Large: Hierarchical Multi-Label Contrastive Biomedical Retriever
## Model Card
## 🔍 Overview
BioHiCL-large is a biomedical dense retriever trained with hierarchical MeSH supervision to capture fine-grained semantic relationships between biomedical texts.
Unlike traditional dense retrievers trained with binary relevance signals, BioHiCL models semantic similarity using structured multi-label supervision derived from the MeSH ontology, enabling it to capture partial semantic overlap between documents.
# ⚠️ Important: Please ensure that the `transformers` version matches exactly (4.57.3), as other versions may lead to compatibility issues or unexpected behavior.
---
## 💡 Key Features
- **Hierarchical supervision**: Leverages MeSH ontology to encode structured biomedical semantics
- **Multi-label similarity learning**: Captures graded semantic overlap beyond binary relevance
- **Contrastive + regression training**: Aligns embedding similarity with label similarity
- **Efficient**: ~0.3B parameters, suitable for deployment on a single GPU
- **Domain-adapted retriever**: Fine-tuned from a strong general-purpose bi-encoder
---
## 🧠 Model Details
- **Model type**: Bi-encoder (dense retriever)
- **Backbone**: BAAI/bge-large-en-v1.5
- **Parameters**: ~0.3B
- **Fine-tuning**: LoRA (merged into base model)
- **Max input length**: 512 tokens
- **Training data**: Biomedical abstracts annotated with MeSH labels (e.g., BioASQ-derived corpora)
---
## ⚙️ Intended Use
This model is intended for biomedical information retrieval tasks such as:
- Scientific literature search (e.g., PubMed-style retrieval)
- Biomedical document ranking
- Query–abstract semantic matching
- Benchmark evaluation on BEIR biomedical subsets
---
## ⚙️ How It Works
BioHiCL aligns:
- Embedding similarity (SimE): cosine similarity between document embeddings
- Label similarity (SimL): cosine similarity over weighted MeSH multi-label vectors
---
## ⚙️ Requirements
- python >= 3.8
- transformers == 4.57.3
> ⚠️ Important: Please ensure that the `transformers` version matches exactly (4.57.3), as other versions may lead to compatibility issues or unexpected behavior.
---
## 🚀 Usage (BEIR Evaluation)
```python
from beir import util
from beir.datasets.data_loader import GenericDataLoader
from beir.retrieval.models import SentenceBERT
from beir.retrieval.search.dense import DenseRetrievalExactSearch
from beir.retrieval.evaluation import EvaluateRetrieval
# 1. Download load the SciFact dataset
dataset = "scifact"
url = "https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/" + dataset + ".zip"
data_path = util.download_and_unzip(url, "datasets")
corpus, queries, qrels = GenericDataLoader(data_path).load(split="test")
# ⚠️ Important: Please ensure that the `transformers` version matches exactly (4.57.3), as other versions may lead to compatibility issues or unexpected behavior.
model_name = "LunaLan07/BioHiCL-large"
model = SentenceBERT(model_name)
retriever = DenseRetrievalExactSearch(model, batch_size=16)
top_k = 10 # top 10 documents per query
results = retriever.search(corpus, queries, top_k=top_k, score_function="cos_sim")
k_values = [1, 3, 5, 10]
ndcg, _map, recall, precision = EvaluateRetrieval.evaluate(qrels, results, k_values=k_values)
```
## 📖 Citation
If you use this model, please cite:
```bibtex
@article{lan2026biohicl,
title={BioHiCL: Hierarchical Multi-Label Contrastive Learning for Biomedical Retrieval with MeSH Labels},
author={Lan, Mengfei and Zheng, Lecheng and Kilicoglu, Halil},
booktitle={ACL 2026},
year={2026}
}
``` |