Derify
/

ChemMRL-beta

@@ -1,4 +1,5 @@
 ---
 tags:
 - sentence-transformers
 - molecular-similarity
@@ -43,7 +44,7 @@ library_name: sentence-transformers
 metrics:
 - spearman
 model-index:
-- name: SentenceTransformer based on Derify/ChemBERTa-druglike
   results:
   - task:
       type: semantic-similarity
@@ -55,10 +56,10 @@ model-index:
     - type: spearman
       value: 0.9932120589500998
       name: Spearman
-license: apache-2.0
 ---
-# SentenceTransformer based on Derify/ChemBERTa-druglike
 This is a [Chem-MRL](https://github.com/emapco/chem-mrl) ([sentence-transformers](https://www.SBERT.net)) model finetuned from [Derify/ChemBERTa-druglike](https://huggingface.co/Derify/ChemBERTa-druglike) on the [pubchem_10m_genmol_similarity](https://huggingface.co/datasets/Derify/pubchem_10m_genmol_similarity) dataset. It maps SMILES to a 1024-dimensional dense vector space and can be used for molecular similarity, semantic search, database indexing, molecular classification, clustering, and more.
@@ -72,7 +73,7 @@ This is a [Chem-MRL](https://github.com/emapco/chem-mrl) ([sentence-transformers
 - **Similarity Function:** Tanimoto
 - **Training Dataset:**
     - [pubchem_10m_genmol_similarity](https://huggingface.co/datasets/Derify/pubchem_10m_genmol_similarity)
-- **License:** [Apache-2.0](https://huggingface.co/Derify/ChemBERTa-druglike/blob/main/LICENSE)
 ### Model Sources
@@ -104,32 +105,32 @@ Then you can load this model and run inference.
 from chem_mrl import ChemMRL
 # Download from the 🤗 Hub
-chem_mrl = ChemMRL("Derify/ChemMRL-beta")
 # Run inference
 sentences = [
     "Clc1nccc(C#CCCc2nc3ccccc3o2)n1",
     "O=Cc1nc2ccccc2o1",
     "O[C@H]1CN(C(Cc2ccccc2)c2ccccc2)C[C@@H]1Cc1cnc[nH]1",
 ]
-embeddings = chem_mrl.backbone.encode(sentences)
 print(embeddings.shape)
 # [3, 1024]
 # Get the similarity scores for the embeddings
-similarities = chem_mrl.backbone.similarity(embeddings, embeddings)
 print(similarities)
 # tensor([[1.0000, 0.3200, 0.1209],
 #         [0.3200, 1.0000, 0.0950],
 #         [0.1209, 0.0950, 1.0000]])
 # Load the model with half precision
-chem_mrl = ChemMRL("Derify/ChemMRL-beta", use_half_precision=True)
 sentences = [
     "Clc1nccc(C#CCCc2nc3ccccc3o2)n1",
     "O=Cc1nc2ccccc2o1",
     "O[C@H]1CN(C(Cc2ccccc2)c2ccccc2)C[C@@H]1Cc1cnc[nH]1",
 ]
-embeddings = chem_mrl.embed(sentences)  # Use the embed method for half precision
 print(embeddings.shape)
 # [3, 1024]
 ```
@@ -148,10 +149,10 @@ print(embeddings.shape)
   }
   ```
-| Split | Metric       | Value      |
-|  :--------- | :----------- | :--------- |
 | **validation** | **spearman** | **0.993212** |
-| **test** | **spearman** | **0.993243** |
 ## Training Details
@@ -166,7 +167,7 @@ print(embeddings.shape)
   |         | smiles_a                                                                            | smiles_b                                                                            | label                                                           |
   | :------ | :---------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------- | :-------------------------------------------------------------- |
   | type    | string                                                                              | string                                                                              | float                                                           |
-  | details | <ul><li>min: 17 tokens</li><li>mean: 39.66 tokens</li><li>max: 119 tokens</li></ul> | <ul><li>min: 11 tokens</li><li>mean: 38.29 tokens</li><li>max: 115 tokens</li></ul> | <ul><li>min: 0.02</li><li>mean: 0.57</li><li>max: 1.0</li></ul> |                                         | <code>0.7123287916183472</code> |
 * Loss: [<code>Matryoshka2dLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshka2dloss) with these parameters:
   <details><summary>Click to expand</summary>
@@ -492,12 +493,12 @@ print(embeddings.shape)
 #### TanimotoSentLoss
 ```bibtex
-@online{emapco-chem-mrl-tanimotosentloss,
     title={TanimotoSentLoss: Tanimoto Loss for SMILES Embeddings},
     author={Emmanuel Cortes},
     year={2025},
     month={Jan},
-    url={https://github.com/emapco/chem-mrl/blob/main/chem_mrl/losses/TanimotoLoss.py},
 }
 ```
@@ -507,4 +508,4 @@ print(embeddings.shape)
 ## Model Card Contact
-Manny Cortes (manny@derifyai.com)

 ---
+license: apache-2.0
 tags:
 - sentence-transformers
 - molecular-similarity
 metrics:
 - spearman
 model-index:
+- name: 'ChemMRL: SMILES Matryoshka Representation Learning Embedding Transformer'
   results:
   - task:
       type: semantic-similarity
     - type: spearman
       value: 0.9932120589500998
       name: Spearman
+new_version: Derify/ChemMRL
 ---
+# ChemMRL: SMILES Matryoshka Representation Learning Embedding Transformer
 This is a [Chem-MRL](https://github.com/emapco/chem-mrl) ([sentence-transformers](https://www.SBERT.net)) model finetuned from [Derify/ChemBERTa-druglike](https://huggingface.co/Derify/ChemBERTa-druglike) on the [pubchem_10m_genmol_similarity](https://huggingface.co/datasets/Derify/pubchem_10m_genmol_similarity) dataset. It maps SMILES to a 1024-dimensional dense vector space and can be used for molecular similarity, semantic search, database indexing, molecular classification, clustering, and more.
 - **Similarity Function:** Tanimoto
 - **Training Dataset:**
     - [pubchem_10m_genmol_similarity](https://huggingface.co/datasets/Derify/pubchem_10m_genmol_similarity)
+- **License:** apache-2.0
 ### Model Sources
 from chem_mrl import ChemMRL
 # Download from the 🤗 Hub
+model = ChemMRL("Derify/ChemMRL-beta")
 # Run inference
 sentences = [
     "Clc1nccc(C#CCCc2nc3ccccc3o2)n1",
     "O=Cc1nc2ccccc2o1",
     "O[C@H]1CN(C(Cc2ccccc2)c2ccccc2)C[C@@H]1Cc1cnc[nH]1",
 ]
+embeddings = model.backbone.encode(sentences)
 print(embeddings.shape)
 # [3, 1024]
 # Get the similarity scores for the embeddings
+similarities = model.backbone.similarity(embeddings, embeddings)
 print(similarities)
 # tensor([[1.0000, 0.3200, 0.1209],
 #         [0.3200, 1.0000, 0.0950],
 #         [0.1209, 0.0950, 1.0000]])
 # Load the model with half precision
+model = ChemMRL("Derify/ChemMRL-beta", use_half_precision=True)
 sentences = [
     "Clc1nccc(C#CCCc2nc3ccccc3o2)n1",
     "O=Cc1nc2ccccc2o1",
     "O[C@H]1CN(C(Cc2ccccc2)c2ccccc2)C[C@@H]1Cc1cnc[nH]1",
 ]
+embeddings = model.embed(sentences)  # Use the embed method for half precision
 print(embeddings.shape)
 # [3, 1024]
 ```
   }
   ```
+| Split          | Metric       | Value        |
+| :------------- | :----------- | :----------- |
 | **validation** | **spearman** | **0.993212** |
+| **test**       | **spearman** | **0.993243** |
 ## Training Details
   |         | smiles_a                                                                            | smiles_b                                                                            | label                                                           |
   | :------ | :---------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------- | :-------------------------------------------------------------- |
   | type    | string                                                                              | string                                                                              | float                                                           |
+  | details | <ul><li>min: 17 tokens</li><li>mean: 39.66 tokens</li><li>max: 119 tokens</li></ul> | <ul><li>min: 11 tokens</li><li>mean: 38.29 tokens</li><li>max: 115 tokens</li></ul> | <ul><li>min: 0.02</li><li>mean: 0.57</li><li>max: 1.0</li></ul> |  | <code>0.7123287916183472</code> |
 * Loss: [<code>Matryoshka2dLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshka2dloss) with these parameters:
   <details><summary>Click to expand</summary>
 #### TanimotoSentLoss
 ```bibtex
+@online{cortes-2025-tanimotosentloss,
     title={TanimotoSentLoss: Tanimoto Loss for SMILES Embeddings},
     author={Emmanuel Cortes},
     year={2025},
     month={Jan},
+    url={https://github.com/emapco/chem-mrl},
 }
 ```
 ## Model Card Contact
+Manny Cortes (manny@derifyai.com)