Update README.md
Browse files
README.md
CHANGED
|
@@ -235,32 +235,32 @@ print("Downloaded to:", local_dir)
|
|
| 235 |
- Pooled (fixed-length vector per sequence)
|
| 236 |
- Generated by mean-pooling token embeddings excluding special tokens (CLS/EOS) and padding.
|
| 237 |
- Each item:
|
| 238 |
-
sequence: `str
|
| 239 |
-
label: `int` (classification) or `float` (regression)
|
| 240 |
-
embedding: `float32[H]` (H=1280 for ESM-2 650M)
|
| 241 |
- Unpooled (variable-length token matrix)
|
| 242 |
- Generated by keeping all valid token embeddings (excluding special tokens + padding) as a per-sequence matrix.
|
| 243 |
- Each item:
|
| 244 |
-
sequence: `str
|
| 245 |
-
label: `int` (classification) or `float` (regression)
|
| 246 |
-
embedding: `float16[L, H]` (nested lists)
|
| 247 |
-
attention_mask: `int8[L]
|
| 248 |
-
length: `int` (=L)
|
| 249 |
- B) SMILES-based ([PeptideCLM](https://github.com/AaronFeller/PeptideCLM) embeddings)
|
| 250 |
- Pooled (fixed-length vector per sequence)
|
| 251 |
- Generated by mean-pooling token embeddings excluding special tokens (CLS/EOS) and padding.
|
| 252 |
- Each item:
|
| 253 |
-
sequence: `str` (SMILES)
|
| 254 |
-
label: `int` (classification) or `float` (regression)
|
| 255 |
-
embedding: `float32[H]
|
| 256 |
- Unpooled (variable-length token matrix)
|
| 257 |
- Generated by keeping all valid token embeddings (excluding special tokens + padding) as a per-sequence matrix.
|
| 258 |
- Each item:
|
| 259 |
-
sequence: `str` (SMILES)
|
| 260 |
-
label: `int` (classification) or `float` (regression)
|
| 261 |
-
embedding: `float16[L, H]` (nested lists)
|
| 262 |
-
attention_mask: `int8[L]
|
| 263 |
-
length: `int` (=L)
|
| 264 |
|
| 265 |
|
| 266 |
### Quick inference by property per model
|
|
|
|
| 235 |
- Pooled (fixed-length vector per sequence)
|
| 236 |
- Generated by mean-pooling token embeddings excluding special tokens (CLS/EOS) and padding.
|
| 237 |
- Each item:
|
| 238 |
+
sequence: `str`;
|
| 239 |
+
label: `int` (classification) or `float` (regression);
|
| 240 |
+
embedding: `float32[H]` (H=1280 for ESM-2 650M);
|
| 241 |
- Unpooled (variable-length token matrix)
|
| 242 |
- Generated by keeping all valid token embeddings (excluding special tokens + padding) as a per-sequence matrix.
|
| 243 |
- Each item:
|
| 244 |
+
sequence: `str`;
|
| 245 |
+
label: `int` (classification) or `float` (regression);
|
| 246 |
+
embedding: `float16[L, H]` (nested lists);
|
| 247 |
+
attention_mask: `int8[L]`;
|
| 248 |
+
length: `int` (=L);
|
| 249 |
- B) SMILES-based ([PeptideCLM](https://github.com/AaronFeller/PeptideCLM) embeddings)
|
| 250 |
- Pooled (fixed-length vector per sequence)
|
| 251 |
- Generated by mean-pooling token embeddings excluding special tokens (CLS/EOS) and padding.
|
| 252 |
- Each item:
|
| 253 |
+
sequence: `str` (SMILES);
|
| 254 |
+
label: `int` (classification) or `float` (regression);
|
| 255 |
+
embedding: `float32[H]`;
|
| 256 |
- Unpooled (variable-length token matrix)
|
| 257 |
- Generated by keeping all valid token embeddings (excluding special tokens + padding) as a per-sequence matrix.
|
| 258 |
- Each item:
|
| 259 |
+
sequence: `str` (SMILES);
|
| 260 |
+
label: `int` (classification) or `float` (regression);
|
| 261 |
+
embedding: `float16[L, H]` (nested lists);
|
| 262 |
+
attention_mask: `int8[L]`;
|
| 263 |
+
length: `int` (=L);
|
| 264 |
|
| 265 |
|
| 266 |
### Quick inference by property per model
|