Joblib
yinuozhang commited on
Commit
413366c
·
verified ·
1 Parent(s): c627ff9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -16
README.md CHANGED
@@ -235,32 +235,32 @@ print("Downloaded to:", local_dir)
235
  - Pooled (fixed-length vector per sequence)
236
  - Generated by mean-pooling token embeddings excluding special tokens (CLS/EOS) and padding.
237
  - Each item:
238
- sequence: `str`
239
- label: `int` (classification) or `float` (regression)
240
- embedding: `float32[H]` (H=1280 for ESM-2 650M)
241
  - Unpooled (variable-length token matrix)
242
  - Generated by keeping all valid token embeddings (excluding special tokens + padding) as a per-sequence matrix.
243
  - Each item:
244
- sequence: `str`
245
- label: `int` (classification) or `float` (regression)
246
- embedding: `float16[L, H]` (nested lists)
247
- attention_mask: `int8[L]`
248
- length: `int` (=L)
249
  - B) SMILES-based ([PeptideCLM](https://github.com/AaronFeller/PeptideCLM) embeddings)
250
  - Pooled (fixed-length vector per sequence)
251
  - Generated by mean-pooling token embeddings excluding special tokens (CLS/EOS) and padding.
252
  - Each item:
253
- sequence: `str` (SMILES)
254
- label: `int` (classification) or `float` (regression)
255
- embedding: `float32[H]`
256
  - Unpooled (variable-length token matrix)
257
  - Generated by keeping all valid token embeddings (excluding special tokens + padding) as a per-sequence matrix.
258
  - Each item:
259
- sequence: `str` (SMILES)
260
- label: `int` (classification) or `float` (regression)
261
- embedding: `float16[L, H]` (nested lists)
262
- attention_mask: `int8[L]`
263
- length: `int` (=L)
264
 
265
 
266
  ### Quick inference by property per model
 
235
  - Pooled (fixed-length vector per sequence)
236
  - Generated by mean-pooling token embeddings excluding special tokens (CLS/EOS) and padding.
237
  - Each item:
238
+ sequence: `str`;
239
+ label: `int` (classification) or `float` (regression);
240
+ embedding: `float32[H]` (H=1280 for ESM-2 650M);
241
  - Unpooled (variable-length token matrix)
242
  - Generated by keeping all valid token embeddings (excluding special tokens + padding) as a per-sequence matrix.
243
  - Each item:
244
+ sequence: `str`;
245
+ label: `int` (classification) or `float` (regression);
246
+ embedding: `float16[L, H]` (nested lists);
247
+ attention_mask: `int8[L]`;
248
+ length: `int` (=L);
249
  - B) SMILES-based ([PeptideCLM](https://github.com/AaronFeller/PeptideCLM) embeddings)
250
  - Pooled (fixed-length vector per sequence)
251
  - Generated by mean-pooling token embeddings excluding special tokens (CLS/EOS) and padding.
252
  - Each item:
253
+ sequence: `str` (SMILES);
254
+ label: `int` (classification) or `float` (regression);
255
+ embedding: `float32[H]`;
256
  - Unpooled (variable-length token matrix)
257
  - Generated by keeping all valid token embeddings (excluding special tokens + padding) as a per-sequence matrix.
258
  - Each item:
259
+ sequence: `str` (SMILES);
260
+ label: `int` (classification) or `float` (regression);
261
+ embedding: `float16[L, H]` (nested lists);
262
+ attention_mask: `int8[L]`;
263
+ length: `int` (=L);
264
 
265
 
266
  ### Quick inference by property per model