Update README.md
Browse files
README.md
CHANGED
|
@@ -9,7 +9,7 @@ library_name: sentence-transformers
|
|
| 9 |
|
| 10 |
# SentenceTransformer
|
| 11 |
|
| 12 |
-
This is a [sentence-transformers](https://www.SBERT.net) model
|
| 13 |
|
| 14 |
## Model Details
|
| 15 |
|
|
@@ -25,15 +25,13 @@ This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps
|
|
| 25 |
|
| 26 |
### Model Sources
|
| 27 |
|
| 28 |
-
- **
|
| 29 |
-
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
|
| 30 |
-
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
|
| 31 |
|
| 32 |
### Full Model Architecture
|
| 33 |
|
| 34 |
```
|
| 35 |
SentenceTransformer(
|
| 36 |
-
(0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: RobertaModel
|
| 37 |
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
| 38 |
(2): Normalize()
|
| 39 |
)
|
|
@@ -54,12 +52,12 @@ Then you can load this model and run inference.
|
|
| 54 |
from sentence_transformers import SentenceTransformer
|
| 55 |
|
| 56 |
# Download from the 🤗 Hub
|
| 57 |
-
model = SentenceTransformer("
|
| 58 |
# Run inference
|
| 59 |
sentences = [
|
| 60 |
-
'
|
| 61 |
-
"
|
| 62 |
-
'
|
| 63 |
]
|
| 64 |
embeddings = model.encode(sentences)
|
| 65 |
print(embeddings.shape)
|
|
@@ -119,23 +117,19 @@ You can finetune this model on your own dataset.
|
|
| 119 |
- Tokenizers: 0.21.0
|
| 120 |
|
| 121 |
## Citation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 122 |
|
| 123 |
-
### BibTeX
|
| 124 |
|
| 125 |
-
<!--
|
| 126 |
-
## Glossary
|
| 127 |
-
|
| 128 |
-
*Clearly define terms in order to be accessible across audiences.*
|
| 129 |
-
-->
|
| 130 |
-
|
| 131 |
-
<!--
|
| 132 |
## Model Card Authors
|
| 133 |
|
| 134 |
-
|
| 135 |
-
-->
|
| 136 |
|
| 137 |
-
<!--
|
| 138 |
## Model Card Contact
|
| 139 |
|
| 140 |
-
|
| 141 |
-
-->
|
|
|
|
| 9 |
|
| 10 |
# SentenceTransformer
|
| 11 |
|
| 12 |
+
This is a trained [Chem-MRL](https://github.com/emapco/chem-mrl) [sentence-transformers](https://www.SBERT.net) model. It maps SMILES to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, database indexing, molecular classification, clustering, and more.
|
| 13 |
|
| 14 |
## Model Details
|
| 15 |
|
|
|
|
| 25 |
|
| 26 |
### Model Sources
|
| 27 |
|
| 28 |
+
- **Repository:** [Chem-MRL on GitHub](https://github.com/emapco/chem-mrl)
|
|
|
|
|
|
|
| 29 |
|
| 30 |
### Full Model Architecture
|
| 31 |
|
| 32 |
```
|
| 33 |
SentenceTransformer(
|
| 34 |
+
(0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: RobertaModel (ChemBERTa)
|
| 35 |
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
| 36 |
(2): Normalize()
|
| 37 |
)
|
|
|
|
| 52 |
from sentence_transformers import SentenceTransformer
|
| 53 |
|
| 54 |
# Download from the 🤗 Hub
|
| 55 |
+
model = SentenceTransformer("Derify/ChemMRL-alpha")
|
| 56 |
# Run inference
|
| 57 |
sentences = [
|
| 58 |
+
'CCO',
|
| 59 |
+
"CC(C)O",
|
| 60 |
+
'CC(=O)O',
|
| 61 |
]
|
| 62 |
embeddings = model.encode(sentences)
|
| 63 |
print(embeddings.shape)
|
|
|
|
| 117 |
- Tokenizers: 0.21.0
|
| 118 |
|
| 119 |
## Citation
|
| 120 |
+
- Chithrananda, Seyone, et al. "ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction." _arXiv [Cs.LG]_, 2020. [Link](http://arxiv.org/abs/2010.09885).
|
| 121 |
+
- Ahmad, Walid, et al. "ChemBERTa-2: Towards Chemical Foundation Models." _arXiv [Cs.LG]_, 2022. [Link](http://arxiv.org/abs/2209.01712).
|
| 122 |
+
- Kusupati, Aditya, et al. "Matryoshka Representation Learning." _arXiv [Cs.LG]_, 2022. [Link](https://arxiv.org/abs/2205.13147).
|
| 123 |
+
- Li, Xianming, et al. "2D Matryoshka Sentence Embeddings." _arXiv [Cs.CL]_, 2024. [Link](http://arxiv.org/abs/2402.14776).
|
| 124 |
+
- Bajusz, Dávid, et al. "Why is the Tanimoto Index an Appropriate Choice for Fingerprint-Based Similarity Calculations?" _J Cheminform_, 7, 20 (2015). [Link](https://doi.org/10.1186/s13321-015-0069-3).
|
| 125 |
+
- Li, Xiaoya, et al. "Dice Loss for Data-imbalanced NLP Tasks." _arXiv [Cs.CL]_, 2020. [Link](https://arxiv.org/abs/1911.02855)
|
| 126 |
+
- Reimers, Nils, and Gurevych, Iryna. "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks." _Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing_, 2019. [Link](https://arxiv.org/abs/1908.10084).
|
| 127 |
|
|
|
|
| 128 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 129 |
## Model Card Authors
|
| 130 |
|
| 131 |
+
[@eacortes](https://huggingface.co/eacortes)
|
|
|
|
| 132 |
|
|
|
|
| 133 |
## Model Card Contact
|
| 134 |
|
| 135 |
+
Manny Cortes (manny@derifyai.com)
|
|
|