File size: 2,005 Bytes
dfeec26
 
 
 
 
 
21ce68c
dfeec26
21ce68c
dfeec26
 
 
 
 
 
 
 
 
 
 
 
 
 
ddc8dfc
 
dfeec26
ddc8dfc
 
 
 
 
 
 
 
dfeec26
ddc8dfc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dfeec26
ddc8dfc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
tags:
- encoder-decoder
- adapter-transformers
---

# Adapter `leaBroe/Heavy2Light_adapter` for the Heavy2Light EncoderDecoder Model

An [adapter](https://adapterhub.ml) for the `Heavy2Light EncoderDecoder Model (Encoder: HeavyBERTa, Decoder: LightGPT)` model that was trained with data from [OAS](https://opig.stats.ox.ac.uk/webapps/oas/) and [PLAbDab](https://opig.stats.ox.ac.uk/webapps/plabdab/).

This adapter was created for usage with the **[Adapters](https://github.com/Adapter-Hub/adapters)** library.

## Usage

First, install `adapters`:

```
pip install -U adapters
```

Now, the adapter can be loaded and activated like this:

```python
from transformers import EncoderDecoderModel, AutoTokenizer, GenerationConfig
from adapters import init

model_path = "leaBroe/Heavy2Light"
subfolder_path = "heavy2light_final_checkpoint"

model = EncoderDecoderModel.from_pretrained(model_path)

tokenizer = AutoTokenizer.from_pretrained(model_path, subfolder=subfolder_path)

init(model)
adapter_name = model.load_adapter("leaBroe/Heavy2Light_adapter", set_active=True)
model.set_active_adapters(adapter_name)
```

then, the model can be used for inference:  

``` python
generation_config = GenerationConfig.from_pretrained(model_path)

# example input heavy sequence
heavy_seq = "QLQVQESGPGLVKPSETLSLTCTVSGASSSIKKYYWGWIRQSPGKGLEWIGSIYSSGSTQYNPALGSRVTLSVDTSQTQFSLRLTSVTAADTATYFCARQGADCTDGSCYLNDAFDVWGRGTVVTVSS"

inputs = tokenizer(
    heavy_seq,
    padding="max_length",
    truncation=True,
    max_length=250,
    return_tensors="pt"
)

generated_seq = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    num_return_sequences=1,
    output_scores=True,
    return_dict_in_generate=True,
    generation_config=generation_config,
    bad_words_ids=[[4]],
    do_sample=True,
    temperature=1.0,
)

generated_text = tokenizer.decode(
    generated_seq.sequences[0],
    skip_special_tokens=True,
)

print("Generated light sequence:", generated_text)
```