Introduction

BERTimbau-CEFR is a fine-tuned version of BERTimbau for the task of classifiying text difficulty in Portuguese in the CEFR scale.

It's based on bert-base-portuguese-cased, which has 12 layers and 110M parameters, and was fine-tuned on the COPLE2 corpus (Mendes et al. [2016]). BERTimbau-CEFR was developed in the context of the Master's thesis "Learning What to Learn: Generating Language Lessons using BERT", whose repository with code and text is available on Github.

Usage

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

labels = ["A1", "A2", "B1", "B2", "C1"]
checkpoint = 'neuralmind/bert-base-portuguese-cased'
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=len(labels))
tokenizer = AutoTokenizer.from_pretrained(checkpoint, do_lower_case=False)

checkpoint = torch.load("model_4_acc86%")
model.load_state_dict(checkpoint)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support