Sentence Similarity
sentence-transformers
PyTorch
Safetensors
Transformers
English
bert
feature-extraction
text-embeddings-inference
Instructions to use NeuML/pubmedbert-base-embeddings with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use NeuML/pubmedbert-base-embeddings with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("NeuML/pubmedbert-base-embeddings") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use NeuML/pubmedbert-base-embeddings with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("NeuML/pubmedbert-base-embeddings") model = AutoModel.from_pretrained("NeuML/pubmedbert-base-embeddings") - Inference
- Notebooks
- Google Colab
- Kaggle
Commit ·
64beaa0
1
Parent(s): e6e0356
Update README
Browse files
README.md
CHANGED
|
@@ -85,22 +85,22 @@ Performance of this model compared to the top base models on the [MTEB leaderboa
|
|
| 85 |
|
| 86 |
The following datasets were used to evaluate model performance.
|
| 87 |
|
| 88 |
-
- [PubMed QA](https://huggingface.co/datasets/
|
| 89 |
- Subset: pqa_labeled, Split: train, Pair: (question, long_answer)
|
| 90 |
-
- [PubMed Subset](https://huggingface.co/datasets/
|
| 91 |
- Split: test, Pair: (title, text)
|
| 92 |
-
- [PubMed Summary](https://huggingface.co/datasets/scientific_papers)
|
| 93 |
- Subset: pubmed, Split: validation, Pair: (article, abstract)
|
| 94 |
|
| 95 |
Evaluation results are shown below. The [Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient) is used as the evaluation metric.
|
| 96 |
|
| 97 |
| Model | PubMed QA | PubMed Subset | PubMed Summary | Average |
|
| 98 |
| ----------------------------------------------------------------------------- | --------- | ------------- | -------------- | --------- |
|
| 99 |
-
| [all-MiniLM-L6-v2](https://hf.co/sentence-transformers/all-MiniLM-L6-v2)
|
| 100 |
-
| [bge-base-en-v1.5](https://hf.co/BAAI/bge-
|
| 101 |
-
| [gte-base](https://hf.co/thenlper/gte-base)
|
| 102 |
-
| [**pubmedbert-base-embeddings**](https://hf.co/neuml/pubmedbert-base-embeddings) | **93.27** | **97.
|
| 103 |
-
| [S-PubMedBert-MS-MARCO](https://hf.co/pritamdeka/S-PubMedBert-MS-MARCO)
|
| 104 |
|
| 105 |
## Training
|
| 106 |
|
|
|
|
| 85 |
|
| 86 |
The following datasets were used to evaluate model performance.
|
| 87 |
|
| 88 |
+
- [PubMed QA](https://huggingface.co/datasets/qiaojin/PubMedQA)
|
| 89 |
- Subset: pqa_labeled, Split: train, Pair: (question, long_answer)
|
| 90 |
+
- [PubMed Subset](https://huggingface.co/datasets/awinml/pubmed_abstract_3_1k)
|
| 91 |
- Split: test, Pair: (title, text)
|
| 92 |
+
- [PubMed Summary](https://huggingface.co/datasets/armanc/scientific_papers)
|
| 93 |
- Subset: pubmed, Split: validation, Pair: (article, abstract)
|
| 94 |
|
| 95 |
Evaluation results are shown below. The [Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient) is used as the evaluation metric.
|
| 96 |
|
| 97 |
| Model | PubMed QA | PubMed Subset | PubMed Summary | Average |
|
| 98 |
| ----------------------------------------------------------------------------- | --------- | ------------- | -------------- | --------- |
|
| 99 |
+
| [all-MiniLM-L6-v2](https://hf.co/sentence-transformers/all-MiniLM-L6-v2) | 90.40 | 95.92 | 94.07 | 93.46 |
|
| 100 |
+
| [bge-base-en-v1.5](https://hf.co/BAAI/bge-base-en-v1.5) | 91.02 | 95.82 | 94.49 | 93.78 |
|
| 101 |
+
| [gte-base](https://hf.co/thenlper/gte-base) | 92.97 | 96.90 | 96.24 | 95.37 |
|
| 102 |
+
| [**pubmedbert-base-embeddings**](https://hf.co/neuml/pubmedbert-base-embeddings) | **93.27** | **97.00** | **96.58** | **95.62** |
|
| 103 |
+
| [S-PubMedBert-MS-MARCO](https://hf.co/pritamdeka/S-PubMedBert-MS-MARCO) | 90.86 | 93.68 | 93.54 | 92.69 |
|
| 104 |
|
| 105 |
## Training
|
| 106 |
|