Instructions to use wukevin/tcr-bert with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use wukevin/tcr-bert with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="wukevin/tcr-bert")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("wukevin/tcr-bert") model = AutoModelForSequenceClassification.from_pretrained("wukevin/tcr-bert") - Inference
- Notebooks
- Google Colab
- Kaggle
Add missing files
Browse files- classifier_training.log +0 -57
- params.json +0 -14
- special_tokens_map.json +1 -0
- tokenizer_config.json +1 -0
classifier_training.log
DELETED
|
@@ -1,57 +0,0 @@
|
|
| 1 |
-
Git commit: 01fe7efc56319afc04ab13136d485a680a1b8a82
|
| 2 |
-
PyTorch CUDA version: 10.2
|
| 3 |
-
Parameter pretrained: ../bert_reduced_intermediate_pe_50_epochs_VDJdb_PIRD
|
| 4 |
-
Parameter data: ['PIRD']
|
| 5 |
-
Parameter segment: TRB
|
| 6 |
-
Parameter outdir: /home/wukevin/projects/tcr/tcr_models/temp3
|
| 7 |
-
Parameter downsample: 1.0
|
| 8 |
-
Parameter monitor: auprc
|
| 9 |
-
Parameter bs: 128
|
| 10 |
-
Parameter lr: 5e-05
|
| 11 |
-
Parameter epochs: 25
|
| 12 |
-
Parameter warmup: 0.1
|
| 13 |
-
PIRD data 0.1655 data labelled with antigen sequence
|
| 14 |
-
PIRD: Removing 95 entires with non amino acid residues
|
| 15 |
-
Entries with antigen sequence: 8429/51044
|
| 16 |
-
Unique antigen sequences: 73
|
| 17 |
-
PIRD data TRA/TRB instances: Counter({'TRB': 46428, 'TRA': 4011, 'TRA-TRB': 605})
|
| 18 |
-
No alphabet given, assuming alphabet of: ['TAAQAAVVRFQEAAN', 'ADTLQSIGATTVASN', 'LLWNGPMAV', 'RPRGEVRFL', 'ATDALMTGY', 'CINGVCWTV', 'HSKKKCDEL', 'KAFSPEVIPMF', 'KRWIILGLNK', 'KRWIIMGLNK', 'TPQDLNTML', 'EIYKRWII', 'GLNKIVRMY', 'ISPRTL-W', 'QASQEVKNW', 'FLKEKGGL', 'HPKVSSEVHI', 'IIKDYGKQM', 'LPPIVAKEI', 'RFPLTFGWCF', 'RYPLTFGWCF', 'TPGPGVRYPL', 'TQGYFPDWQNY', 'FPRPWLHGL', 'RYPLTFGWCF ', 'RYPLTFGW ', 'RYPLTFGW', 'CVETMCNEY', 'DEEDAIAAY', 'ELKRKMIYM', 'ELRRKMMYM', 'LSEFCRVLCCYVLEE', 'QIKVRVDMV', 'QIKVRVKMV', 'VLEETSVML', 'YILEETSVM', 'ARNLVPMVATVQGQN', 'CPSQEPMSIYVY', 'EDVPSGKLFMHVTLG', 'FPTKDVAL', 'IPSINVHHY', 'NLVPMVATV', 'RPHERNGFTVL', 'TPRVTGGGAM', 'YSEHPTFTSQY', 'VTEHDTLLY', 'YLAMPFATPMEAELARRSLA', 'ALTPVVVTL', 'GLCTLVAML', 'YVLDHLIVV', 'EPLPQGQLTAY', 'GPEPLPQGQLTAY', 'LPEGLPQGQLTAY', 'LPEPLGQGQLTAY', 'LPEPLPQAQLTAY', 'LPEPLPQGGLTAY', 'LPEPLPQGQGTAY', 'LPEPLPQGQLGAY', 'LPEPLPQGQLTAY', 'LPEPLPQGQLTGY', 'RAKFKQLL', 'HPVGEADYFEY', 'FLRGRAYGL', 'EENLLDFVRF', 'AVFDRKSDAK', 'IVTDFSVIK', 'NFIRMVISNPAAT', 'KRGIVEQCCTSISSL', 'KRGIVEQSSTSISSL', 'ENPVVHFFKNIVTPR', 'GILGFVFTL', 'PQPELPYPQPE', 'FWIDLFETIG']
|
| 19 |
-
Generated labels of shape (6235,): [ 0 0 0 ... 72 72 72]
|
| 20 |
-
Retaining 44/73 examples with sufficient examples
|
| 21 |
-
Split train with 4365 examples
|
| 22 |
-
Split valid with 935 examples
|
| 23 |
-
Split test with 935 examples
|
| 24 |
-
Loading BERT classifier with 45 labels
|
| 25 |
-
Best AUROC (index=27): 0.9405
|
| 26 |
-
Best AUROC (index=9): 0.9882
|
| 27 |
-
Best AUROC (index=9): 0.9989
|
| 28 |
-
Best AUROC (index=9): 1.0000
|
| 29 |
-
Best AUROC (index=9): 1.0000
|
| 30 |
-
Best AUROC (index=9): 1.0000
|
| 31 |
-
Best AUROC (index=9): 1.0000
|
| 32 |
-
Best AUROC (index=9): 1.0000
|
| 33 |
-
Best AUROC (index=9): 1.0000
|
| 34 |
-
Best AUROC (index=9): 1.0000
|
| 35 |
-
Best AUROC (index=9): 1.0000
|
| 36 |
-
Best AUROC (index=15): 1.0000
|
| 37 |
-
Best AUROC (index=9): 1.0000
|
| 38 |
-
Best AUROC (index=9): 1.0000
|
| 39 |
-
Best AUROC (index=9): 1.0000
|
| 40 |
-
Best AUROC (index=9): 1.0000
|
| 41 |
-
Best AUROC (index=9): 1.0000
|
| 42 |
-
Best AUROC (index=9): 1.0000
|
| 43 |
-
Best AUROC (index=9): 1.0000
|
| 44 |
-
Best AUROC (index=9): 1.0000
|
| 45 |
-
Best AUROC (index=9): 1.0000
|
| 46 |
-
Best AUROC (index=9): 1.0000
|
| 47 |
-
Best AUROC (index=9): 1.0000
|
| 48 |
-
Best AUROC (index=9): 1.0000
|
| 49 |
-
Best AUROC (index=21): 1.0000
|
| 50 |
-
Best AUROC (index=9): 1.0000
|
| 51 |
-
epoch 25.0000
|
| 52 |
-
eval_accuracy 0.7401
|
| 53 |
-
eval_auprc 0.3922
|
| 54 |
-
eval_auroc 0.7881
|
| 55 |
-
eval_loss 1.5623
|
| 56 |
-
eval_runtime 1.3134
|
| 57 |
-
eval_samples_per_second 711.9190
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
params.json
DELETED
|
@@ -1,14 +0,0 @@
|
|
| 1 |
-
{
|
| 2 |
-
"pretrained": "../bert_reduced_intermediate_pe_50_epochs_VDJdb_PIRD",
|
| 3 |
-
"data": [
|
| 4 |
-
"PIRD"
|
| 5 |
-
],
|
| 6 |
-
"segment": "TRB",
|
| 7 |
-
"outdir": "/home/wukevin/projects/tcr/tcr_models/temp3",
|
| 8 |
-
"downsample": 1.0,
|
| 9 |
-
"monitor": "auprc",
|
| 10 |
-
"bs": 128,
|
| 11 |
-
"lr": 5e-05,
|
| 12 |
-
"epochs": 25,
|
| 13 |
-
"warmup": 0.1
|
| 14 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
special_tokens_map.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"unk_token": "?", "sep_token": "|", "pad_token": "$", "cls_token": "*", "mask_token": "."}
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"do_lower_case": false, "do_basic_tokenize": true, "never_split": null, "unk_token": "?", "sep_token": "|", "pad_token": "$", "cls_token": "*", "mask_token": ".", "tokenize_chinese_chars": false, "strip_accents": null, "model_max_len": 45, "padding_side": "right"}
|