--- base_model: - leaBroe/HeavyBERTa - leaBroe/LightGPT pipeline_tag: translation --- # Heavy2Light Heavy2Light is an seq2seq model designed to generate light chain antibody sequences from corresponding heavy chain inputs. It leverages [HeavyBERTa](https://huggingface.co/leaBroe/HeavyBERTa) as the encoder and [LightGPT](https://huggingface.co/leaBroe/LightGPT) as the decoder. The model is fine-tuned on paired antibody chain data from the [OAS](https://opig.stats.ox.ac.uk/webapps/oas/) and [PLAbDab](https://opig.stats.ox.ac.uk/webapps/plabdab/) databases. The model utilizes [Adapters](https://github.com/adapter-hub/adapters) for efficient fine-tuning. You can either download the full model weights and adapter from this repository, or directly use the Heavy2Light adapter available in its dedicated [directory](https://huggingface.co/leaBroe/Heavy2Light_adapter) on Hugging Face. For more information, please visit our GitHub [repository](https://github.com/ibmm-unibe-ch/Heavy2Light.git). ## How to use the model ```python from transformers import EncoderDecoderModel, AutoTokenizer, GenerationConfig from adapters import init model_path = "leaBroe/Heavy2Light" subfolder_path = "heavy2light_final_checkpoint" model = EncoderDecoderModel.from_pretrained(model_path) tokenizer = AutoTokenizer.from_pretrained(model_path, subfolder=subfolder_path) init(model) adapter_name = model.load_adapter("leaBroe/Heavy2Light_adapter", set_active=True) model.set_active_adapters(adapter_name) generation_config = GenerationConfig.from_pretrained(model_path) # example input heavy sequence heavy_seq = "QLQVQESGPGLVKPSETLSLTCTVSGASSSIKKYYWGWIRQSPGKGLEWIGSIYSSGSTQYNPALGSRVTLSVDTSQTQFSLRLTSVTAADTATYFCARQGADCTDGSCYLNDAFDVWGRGTVVTVSS" inputs = tokenizer( heavy_seq, padding="max_length", truncation=True, max_length=250, return_tensors="pt" ) generated_seq = model.generate( input_ids=inputs.input_ids, attention_mask=inputs.attention_mask, num_return_sequences=1, output_scores=True, return_dict_in_generate=True, generation_config=generation_config, bad_words_ids=[[4]], do_sample=True, temperature=1.0, ) generated_text = tokenizer.decode( generated_seq.sequences[0], skip_special_tokens=True, ) print("Generated light sequence:", generated_text) ```