leaBroe
/

Heavy2Light

encoder-decoder

Model card Files Files and versions

leaBroe commited on Jul 31, 2025

Commit

38ba108

·

verified ·

1 Parent(s): 8a27a43

Update README.md

Files changed (1) hide show

README.md +53 -1

README.md CHANGED Viewed

@@ -3,4 +3,56 @@ base_model:
 - leaBroe/HeavyBERTa
 - leaBroe/LightGPT
 pipeline_tag: translation
----

 - leaBroe/HeavyBERTa
 - leaBroe/LightGPT
 pipeline_tag: translation
+---
+# Heavy2Light
+Heavy2Light is an seq2seq model designed to generate light chain antibody sequences from corresponding heavy chain inputs. It leverages [HeavyBERTa](https://huggingface.co/leaBroe/HeavyBERTa) as the encoder and [LightGPT](https://huggingface.co/leaBroe/LightGPT) as the decoder. The model is fine-tuned on paired antibody chain data from the [OAS](https://opig.stats.ox.ac.uk/webapps/oas/) and [PlabDab](https://opig.stats.ox.ac.uk/webapps/plabdab/) databases.
+For more information, please visit our GitHub [repository](https://github.com/ibmm-unibe-ch/Heavy2Light.git).
+## How to use the model
+```python
+from transformers import EncoderDecoderModel, AutoTokenizer, GenerationConfig
+from adapters import init
+model_path = "leaBroe/Heavy2Light"
+subfolder_path = "heavy2light_final_checkpoint"
+model = EncoderDecoderModel.from_pretrained(model_path)
+tokenizer = AutoTokenizer.from_pretrained(model_path, subfolder=subfolder_path)
+init(model)
+adapter_name = model.load_adapter("leaBroe/Heavy2Light_adapter", set_active=True)
+model.set_active_adapters(adapter_name)
+generation_config = GenerationConfig.from_pretrained(model_path)
+# example input heavy sequence
+heavy_seq = "QLQVQESGPGLVKPSETLSLTCTVSGASSSIKKYYWGWIRQSPGKGLEWIGSIYSSGSTQYNPALGSRVTLSVDTSQTQFSLRLTSVTAADTATYFCARQGADCTDGSCYLNDAFDVWGRGTVVTVSS"
+inputs = tokenizer(
+    heavy_seq,
+    padding="max_length",
+    truncation=True,
+    max_length=250,
+    return_tensors="pt"
+)
+generated_seq = model.generate(
+    input_ids=inputs.input_ids,
+    attention_mask=inputs.attention_mask,
+    num_return_sequences=1,
+    output_scores=True,
+    return_dict_in_generate=True,
+    generation_config=generation_config,
+    bad_words_ids=[[4]],
+    do_sample=True,
+    temperature=1.0,
+)
+generated_text = tokenizer.decode(
+    generated_seq.sequences[0],
+    skip_special_tokens=True,
+)
+print("Generated light sequence:", generated_text)
+```