leaBroe commited on
Commit
38ba108
·
verified ·
1 Parent(s): 8a27a43

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -1
README.md CHANGED
@@ -3,4 +3,56 @@ base_model:
3
  - leaBroe/HeavyBERTa
4
  - leaBroe/LightGPT
5
  pipeline_tag: translation
6
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  - leaBroe/HeavyBERTa
4
  - leaBroe/LightGPT
5
  pipeline_tag: translation
6
+ ---
7
+ # Heavy2Light
8
+ Heavy2Light is an seq2seq model designed to generate light chain antibody sequences from corresponding heavy chain inputs. It leverages [HeavyBERTa](https://huggingface.co/leaBroe/HeavyBERTa) as the encoder and [LightGPT](https://huggingface.co/leaBroe/LightGPT) as the decoder. The model is fine-tuned on paired antibody chain data from the [OAS](https://opig.stats.ox.ac.uk/webapps/oas/) and [PlabDab](https://opig.stats.ox.ac.uk/webapps/plabdab/) databases.
9
+ For more information, please visit our GitHub [repository](https://github.com/ibmm-unibe-ch/Heavy2Light.git).
10
+
11
+ ## How to use the model
12
+ ```python
13
+ from transformers import EncoderDecoderModel, AutoTokenizer, GenerationConfig
14
+ from adapters import init
15
+
16
+ model_path = "leaBroe/Heavy2Light"
17
+ subfolder_path = "heavy2light_final_checkpoint"
18
+
19
+ model = EncoderDecoderModel.from_pretrained(model_path)
20
+
21
+ tokenizer = AutoTokenizer.from_pretrained(model_path, subfolder=subfolder_path)
22
+
23
+ init(model)
24
+ adapter_name = model.load_adapter("leaBroe/Heavy2Light_adapter", set_active=True)
25
+ model.set_active_adapters(adapter_name)
26
+
27
+ generation_config = GenerationConfig.from_pretrained(model_path)
28
+
29
+ # example input heavy sequence
30
+ heavy_seq = "QLQVQESGPGLVKPSETLSLTCTVSGASSSIKKYYWGWIRQSPGKGLEWIGSIYSSGSTQYNPALGSRVTLSVDTSQTQFSLRLTSVTAADTATYFCARQGADCTDGSCYLNDAFDVWGRGTVVTVSS"
31
+
32
+ inputs = tokenizer(
33
+ heavy_seq,
34
+ padding="max_length",
35
+ truncation=True,
36
+ max_length=250,
37
+ return_tensors="pt"
38
+ )
39
+
40
+ generated_seq = model.generate(
41
+ input_ids=inputs.input_ids,
42
+ attention_mask=inputs.attention_mask,
43
+ num_return_sequences=1,
44
+ output_scores=True,
45
+ return_dict_in_generate=True,
46
+ generation_config=generation_config,
47
+ bad_words_ids=[[4]],
48
+ do_sample=True,
49
+ temperature=1.0,
50
+ )
51
+
52
+ generated_text = tokenizer.decode(
53
+ generated_seq.sequences[0],
54
+ skip_special_tokens=True,
55
+ )
56
+
57
+ print("Generated light sequence:", generated_text)
58
+ ```