File size: 2,325 Bytes
4a473f4 38ba108 68273b6 3601d75 38ba108 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
---
base_model:
- leaBroe/HeavyBERTa
- leaBroe/LightGPT
pipeline_tag: translation
---
# Heavy2Light
Heavy2Light is an seq2seq model designed to generate light chain antibody sequences from corresponding heavy chain inputs. It leverages [HeavyBERTa](https://huggingface.co/leaBroe/HeavyBERTa) as the encoder and [LightGPT](https://huggingface.co/leaBroe/LightGPT) as the decoder. The model is fine-tuned on paired antibody chain data from the [OAS](https://opig.stats.ox.ac.uk/webapps/oas/) and [PLAbDab](https://opig.stats.ox.ac.uk/webapps/plabdab/) databases. The model utilizes [Adapters](https://github.com/adapter-hub/adapters) for efficient fine-tuning. You can either download the full model weights and adapter from this repository, or directly use the Heavy2Light adapter available in its dedicated [directory](https://huggingface.co/leaBroe/Heavy2Light_adapter) on Hugging Face.
For more information, please visit our GitHub [repository](https://github.com/ibmm-unibe-ch/Heavy2Light.git).
## How to use the model
```python
from transformers import EncoderDecoderModel, AutoTokenizer, GenerationConfig
from adapters import init
model_path = "leaBroe/Heavy2Light"
subfolder_path = "heavy2light_final_checkpoint"
model = EncoderDecoderModel.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path, subfolder=subfolder_path)
init(model)
adapter_name = model.load_adapter("leaBroe/Heavy2Light_adapter", set_active=True)
model.set_active_adapters(adapter_name)
generation_config = GenerationConfig.from_pretrained(model_path)
# example input heavy sequence
heavy_seq = "QLQVQESGPGLVKPSETLSLTCTVSGASSSIKKYYWGWIRQSPGKGLEWIGSIYSSGSTQYNPALGSRVTLSVDTSQTQFSLRLTSVTAADTATYFCARQGADCTDGSCYLNDAFDVWGRGTVVTVSS"
inputs = tokenizer(
heavy_seq,
padding="max_length",
truncation=True,
max_length=250,
return_tensors="pt"
)
generated_seq = model.generate(
input_ids=inputs.input_ids,
attention_mask=inputs.attention_mask,
num_return_sequences=1,
output_scores=True,
return_dict_in_generate=True,
generation_config=generation_config,
bad_words_ids=[[4]],
do_sample=True,
temperature=1.0,
)
generated_text = tokenizer.decode(
generated_seq.sequences[0],
skip_special_tokens=True,
)
print("Generated light sequence:", generated_text)
``` |