File size: 597 Bytes
e8c7553 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
`google/electra-large-discriminator` finetuned on imdb dataset for 2 epoches.
Large examples tokenized with head and tail parts of a review, as described in [How to Fine-Tune BERT for Text Classification?](https://arxiv.org/abs/1905.05583)
```python
def preprocess_function(example):
tokens = tokenizer(example["text"], truncation=False)
if len(tokens['input_ids']) > 512:
tokens['input_ids'] = tokens['input_ids'][:129] + \
[102] + tokens['input_ids'][-382:]
tokens['token_type_ids'] = [0]*512
tokens['attention_mask'] = [1]*512
return tokens
``` |