ideasbyjin commited on
Commit ·
67ae5f4
1
Parent(s): e933533
Add README
Browse files
README.md
CHANGED
|
@@ -1,6 +1,41 @@
|
|
| 1 |
---
|
| 2 |
license: other
|
| 3 |
widget:
|
| 4 |
-
- text: "Ḣ"
|
| 5 |
---
|
| 6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: other
|
| 3 |
widget:
|
| 4 |
+
- text: "Ḣ Q V Q [MASK] E"
|
| 5 |
---
|
| 6 |
|
| 7 |
+
## AntiBERTa2 🧬
|
| 8 |
+
|
| 9 |
+
AntiBERTa2 is an antibody-specific language model based on the [RoFormer model](https://arxiv.org/abs/2104.09864) - it is pre-trained using masked language modelling.
|
| 10 |
+
We also provide a multimodal version of AntiBERTa2, AntiBERTa2-CSSP, that has been trained using a contrastive objective, similar to the [CLIP method](https://arxiv.org/abs/2103.00020).
|
| 11 |
+
Further details on both AntiBERTa2 and AntiBERTa2-CSSP are described in our [paper]() accepted at the NeurIPS MLSB Workshop 2023.
|
| 12 |
+
|
| 13 |
+
Both AntiBERTa2 models are only available for non-commercial use. Output antibody sequences (e.g. from infilling via masked language models) can only be used for
|
| 14 |
+
non-commercial use. For any users seeking commercial use of our model and generated antibodies, please reach out to us at [info@alchemab.com](mailto:info@alchemab.com).
|
| 15 |
+
|
| 16 |
+
| Model variant | Parameters | Config |
|
| 17 |
+
| ------------- | ---------- | ------ |
|
| 18 |
+
| [AntiBERTa2](https://huggingface.co/alchemab/antiberta2) | 202M | 24L, 12H, 1024d |
|
| 19 |
+
| [AntiBERTa2-CSSP](https://huggingface.co/alchemab/antiberta2-cssp) | 202M | 24L, 12H, 1024d |
|
| 20 |
+
|
| 21 |
+
## Example usage
|
| 22 |
+
|
| 23 |
+
```
|
| 24 |
+
>>> from transformers import (
|
| 25 |
+
RoFormerForMaskedLM,
|
| 26 |
+
RoFormerTokenizer,
|
| 27 |
+
pipeline,
|
| 28 |
+
RoFormerForSequenceClassification
|
| 29 |
+
)
|
| 30 |
+
>>> tokenizer = RoFormerTokenizer.from_pretrained("alchemab/antiberta2")
|
| 31 |
+
>>> model = RoFormerForMaskedLM.from_pretrained("alchemab/antiberta2")
|
| 32 |
+
|
| 33 |
+
>>> filler = pipeline(model=model, tokenizer=tokenizer)
|
| 34 |
+
>>> filler("Ḣ Q V Q ... C A [MASK] D ... T V S S") # fill in the mask
|
| 35 |
+
|
| 36 |
+
>>> new_model = RoFormerForSequenceClassification.from_pretrained(
|
| 37 |
+
"alchemab/antiberta2") # this will of course raise warnings
|
| 38 |
+
# that a new linear layer will be added
|
| 39 |
+
# and randomly initialized
|
| 40 |
+
|
| 41 |
+
```
|