File size: 620 Bytes
923cd63
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
`google/electra-large-discriminator` finetuned for regression on imdb dataset ratings for 3 epoches.

Large examples tokenized with head and tail parts of a review, as described in [How to Fine-Tune BERT for Text Classification?](https://arxiv.org/abs/1905.05583)

```python
def preprocess_function(example):
    tokens = tokenizer(example["text"], truncation=False)
    if len(tokens['input_ids']) > 512:
        tokens['input_ids'] = tokens['input_ids'][:129] + \
            [102] + tokens['input_ids'][-382:]
        tokens['token_type_ids'] = [0]*512
        tokens['attention_mask'] = [1]*512
    return tokens
```