ChatterjeeLab
/

PeptiVerse

Joblib

Model card Files Files and versions

xet

Community

yinuozhang commited on Jan 4

Commit

413366c

verified ·

1 Parent(s): c627ff9

Update README.md

Browse files

Files changed (1) hide show

README.md +16 -16

README.md CHANGED Viewed

@@ -235,32 +235,32 @@ print("Downloaded to:", local_dir)
     - Pooled (fixed-length vector per sequence)
         - Generated by mean-pooling token embeddings excluding special tokens (CLS/EOS) and padding.
         - Each item:
-            sequence: `str`
-            label: `int` (classification) or `float` (regression)
-            embedding: `float32[H]` (H=1280 for ESM-2 650M)
     - Unpooled (variable-length token matrix)
         - Generated by keeping all valid token embeddings (excluding special tokens + padding) as a per-sequence matrix.
         - Each item:
-            sequence: `str`
-            label: `int` (classification) or `float` (regression)
-            embedding: `float16[L, H]` (nested lists)
-            attention_mask: `int8[L]`
-            length: `int` (=L)
 - B) SMILES-based ([PeptideCLM](https://github.com/AaronFeller/PeptideCLM) embeddings)
     - Pooled (fixed-length vector per sequence)
         - Generated by mean-pooling token embeddings excluding special tokens (CLS/EOS) and padding.
         - Each item:
-            sequence: `str` (SMILES)
-            label: `int` (classification) or `float` (regression)
-            embedding: `float32[H]`
     - Unpooled (variable-length token matrix)
         - Generated by keeping all valid token embeddings (excluding special tokens + padding) as a per-sequence matrix.
         - Each item:
-            sequence: `str` (SMILES)
-            label: `int` (classification) or `float` (regression)
-            embedding: `float16[L, H]` (nested lists)
-            attention_mask: `int8[L]`
-            length: `int` (=L)
 ### Quick inference by property per model

     - Pooled (fixed-length vector per sequence)
         - Generated by mean-pooling token embeddings excluding special tokens (CLS/EOS) and padding.
         - Each item:
+            sequence: `str`;
+            label: `int` (classification) or `float` (regression);
+            embedding: `float32[H]` (H=1280 for ESM-2 650M);
     - Unpooled (variable-length token matrix)
         - Generated by keeping all valid token embeddings (excluding special tokens + padding) as a per-sequence matrix.
         - Each item:
+            sequence: `str`;
+            label: `int` (classification) or `float` (regression);
+            embedding: `float16[L, H]` (nested lists);
+            attention_mask: `int8[L]`;
+            length: `int` (=L);
 - B) SMILES-based ([PeptideCLM](https://github.com/AaronFeller/PeptideCLM) embeddings)
     - Pooled (fixed-length vector per sequence)
         - Generated by mean-pooling token embeddings excluding special tokens (CLS/EOS) and padding.
         - Each item:
+            sequence: `str` (SMILES);
+            label: `int` (classification) or `float` (regression);
+            embedding: `float32[H]`;
     - Unpooled (variable-length token matrix)
         - Generated by keeping all valid token embeddings (excluding special tokens + padding) as a per-sequence matrix.
         - Each item:
+            sequence: `str` (SMILES);
+            label: `int` (classification) or `float` (regression);
+            embedding: `float16[L, H]` (nested lists);
+            attention_mask: `int8[L]`;
+            length: `int` (=L);
 ### Quick inference by property per model