Instructions to use law-ai/InLegalBERT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use law-ai/InLegalBERT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="law-ai/InLegalBERT")# Load model directly from transformers import AutoTokenizer, AutoModelForPreTraining tokenizer = AutoTokenizer.from_pretrained("law-ai/InLegalBERT") model = AutoModelForPreTraining.from_pretrained("law-ai/InLegalBERT") - Inference
- Notebooks
- Google Colab
- Kaggle
getting errors while training the model with few fine tuning modification.
Can someone explain how and what exactly is to be doe to get this up and running. Because everytime i try training the model, i get this value error. -> "ValueError: The model did not return a loss from the inputs, only the following keys: last_hidden_state,pooler_output. For reference, the inputs it received are input_ids,token_type_ids,attention_mask."
Hi, what kind of fine-tuning task are you trying? And how are you initializing the model?
It would be good if you could provide a snippet of your code
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("law-ai/InLegalBERT")
X_train_set = list(train_set['text'])
y_train_set = list(train_set['label'])
X_test_set = list(test_set['text'])
y_test_set = list(test_set['label'])
X_validation_set = list(validation_set['text'])
y_validation_set = list(validation_set['label'])
train_encoded_input = tokenizer(X_train_set, return_tensors="pt", truncation=True, padding=True)
test_encoded_input = tokenizer(X_test_set, return_tensors="pt", truncation=True, padding=True)
class Dataset(torch.utils.data.Dataset):
def init(self, encodings, labels=None):
self.encodings = encodings
self.labels = labels
def __getitem__(self, idx):
item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
if self.labels:
item["labels"] = torch.tensor(self.labels[idx]-1)
return item
def __len__(self):
return len(self.encodings["input_ids"])
train_dataset = Dataset(train_encoded_input, y_train_set)
test_dataset = Dataset(test_encoded_input, y_test_set)
model = AutoModel.from_pretrained("law-ai/InLegalBERT")
from transformers import TrainingArguments
training_args = TrainingArguments(output_dir="test_trainer")
from transformers import TrainingArguments, Trainer
training_args = TrainingArguments(output_dir="test_trainer", evaluation_strategy="epoch", num_train_epochs=5)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=test_dataset,
compute_metrics=compute_metrics,
)
trainer.train()
AutoModel.from_pretrained() gives you the bare BERT model (without any classification heads). Consequently, the model returns the last hidden state of BERT and a pooler output, which is constructed from the embedding of the [CLS] token.
The trainer however requires a model that will return a loss. For this, you need to add a head on top of the bare BERT model. It might be possible to do this by using
AutoModelForSequenceClassification.from_pretrained(), which returns a randomly initialized sequence classification head on top of the bare BERT model, and then you might possibly use it directly. However, I need to check this to confirm. You can check it too, meanwhile.