LunaLan07 commited on
Commit
817b4e5
·
verified ·
1 Parent(s): 32834bf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -1,9 +1,9 @@
1
- # BioHiCL-base: Hierarchical Multi-Label Contrastive Biomedical Retriever
2
 
3
  ## Model Card
4
 
5
  ## 🔍 Overview
6
- BioHiCL-base is a biomedical dense retriever trained with hierarchical MeSH supervision to capture fine-grained semantic relationships between biomedical texts.
7
 
8
  Unlike traditional dense retrievers trained with binary relevance signals, BioHiCL models semantic similarity using structured multi-label supervision derived from the MeSH ontology, enabling it to capture partial semantic overlap between documents.
9
 
@@ -13,17 +13,17 @@ Unlike traditional dense retrievers trained with binary relevance signals, BioHi
13
  - **Hierarchical supervision**: Leverages MeSH ontology to encode structured biomedical semantics
14
  - **Multi-label similarity learning**: Captures graded semantic overlap beyond binary relevance
15
  - **Contrastive + regression training**: Aligns embedding similarity with label similarity
16
- - **Efficient**: ~0.1B parameters, suitable for deployment on a single GPU
17
  - **Domain-adapted retriever**: Fine-tuned from a strong general-purpose bi-encoder
18
 
19
  ---
20
 
21
  ## 🧠 Model Details
22
  - **Model type**: Bi-encoder (dense retriever)
23
- - **Backbone**: BAAI/bge-base-en-v1.5
24
- - **Parameters**: ~0.1B
25
  - **Fine-tuning**: LoRA (merged into base model)
26
- - **Max input length**: 8192 tokens
27
  - **Training data**: Biomedical abstracts annotated with MeSH labels (e.g., BioASQ-derived corpora)
28
 
29
  ---
@@ -63,7 +63,7 @@ data_path = util.download_and_unzip(url, "datasets")
63
  corpus, queries, qrels = GenericDataLoader(data_path).load(split="test")
64
 
65
  # Model
66
- model_name = "LunaLan07/BioHiCL-base"
67
  model = SentenceTransformer(model_name)
68
 
69
  # Retrieval
 
1
+ # BioHiCL-large: Hierarchical Multi-Label Contrastive Biomedical Retriever
2
 
3
  ## Model Card
4
 
5
  ## 🔍 Overview
6
+ BioHiCL-large is a biomedical dense retriever trained with hierarchical MeSH supervision to capture fine-grained semantic relationships between biomedical texts.
7
 
8
  Unlike traditional dense retrievers trained with binary relevance signals, BioHiCL models semantic similarity using structured multi-label supervision derived from the MeSH ontology, enabling it to capture partial semantic overlap between documents.
9
 
 
13
  - **Hierarchical supervision**: Leverages MeSH ontology to encode structured biomedical semantics
14
  - **Multi-label similarity learning**: Captures graded semantic overlap beyond binary relevance
15
  - **Contrastive + regression training**: Aligns embedding similarity with label similarity
16
+ - **Efficient**: ~0.3B parameters, suitable for deployment on a single GPU
17
  - **Domain-adapted retriever**: Fine-tuned from a strong general-purpose bi-encoder
18
 
19
  ---
20
 
21
  ## 🧠 Model Details
22
  - **Model type**: Bi-encoder (dense retriever)
23
+ - **Backbone**: BAAI/bge-large-en-v1.5
24
+ - **Parameters**: ~0.3B
25
  - **Fine-tuning**: LoRA (merged into base model)
26
+ - **Max input length**: 512 tokens
27
  - **Training data**: Biomedical abstracts annotated with MeSH labels (e.g., BioASQ-derived corpora)
28
 
29
  ---
 
63
  corpus, queries, qrels = GenericDataLoader(data_path).load(split="test")
64
 
65
  # Model
66
+ model_name = "LunaLan07/BioHiCL-large"
67
  model = SentenceTransformer(model_name)
68
 
69
  # Retrieval