Model Card for Model ID

Legal-BERT Base Entity Classifier

Overview

A fine-tuned Longformer-based model for classifying legal entities (such as locations and dates) within the context of legal decision texts. The model is based on allenai/longformer and is trained to predict the type of a marked entity span, given its context, using special entity markers [E] ... [/E].

Model Details

  • Model Name: longformer-classifier-refinement-abb
  • Architecture: Longformer (allenai/longformer)
  • Task: Entity Classification (NER-style, entity-in-context classification)
  • Framework: PyTorch, Hugging Face Transformers
  • Author: S. Vercoutere

Intended Use

  • Purpose: Automatic classification of legal entities (e.g., location, date) in municipal or governmental decision documents.
  • Not Intended For: General-purpose NER, non-legal domains, or tasks outside entity classification.

Training Data

  • Source: Annotated legal decision texts from Ghent/Freiburg/Bamberg.
  • Entity Types:
    • Locations: impact_location, context_location
    • Dates: publication_date, session_date, entry_date, expiry_date, legal_date, context_date, validity_period, context_period
  • Preprocessing:
    • XML-like tags in text, with entities wrapped in <entity_type>...</entity_type>.
    • For training, one entity per sample is marked with [E] ... [/E] in context.
    • Dataset balanced to max 5000 samples per label.

Training Procedure

  • Model: nlpaueb/legal-bert-base-uncased
  • Tokenization: Hugging Face AutoTokenizer, with [E] and [/E] as additional special tokens.
  • Max Sequence Length: 2048 (trained)
  • Batch Size: 4
  • Optimizer: AdamW
  • Learning Rate: 2e-5
  • Epochs: 10
  • Mixed Precision: Yes (AMP)
  • Validation Split: 20%
  • Evaluation Metrics: Accuracy, F1, confusion matrix

Evaluation

Validation Accuracy: 0.8454 (on held-out validation set)

Detailed Entity-Level Evaluation:

Entity Label Precision Recall F1-score Support
context_date 0.9272 0.9405 0.9338 975
context_location 0.9671 0.9751 0.9711 843
context_period 0.9744 0.8321 0.8976 137
entry_date 0.9528 0.9587 0.9557 484
expiry_date 0.8980 0.9496 0.9231 139
impact_location 0.9501 0.9559 0.9530 997
legal_date 1.0000 0.9926 0.9963 943
publication_date 0.9501 0.9870 0.9682 386
session_date 0.9597 0.9597 0.9597 347
validity_period 0.9932 0.9379 0.9648 467
accuracy 0.9601 5718
macro avg 0.9572 0.9489 0.9523 5718
weighted avg 0.9606 0.9601 0.9601 5718

Usage Example

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("svercoutere/longformer-classifier-refinement-abb")
model = AutoModelForSequenceClassification.from_pretrained("svercoutere/longformer-classifier-refinement-abb")

def classify_entity(entity_text, context_text):
    marked_text = context_text.replace(entity_text, f"[E] {entity_text} [/E]", 1)
    inputs = tokenizer(marked_text, return_tensors="pt", truncation=True, max_length=2048, padding="max_length")
    with torch.no_grad():
        outputs = model(**inputs)
    pred = torch.argmax(outputs.logits, dim=-1).item()
    return pred  # Map to label using label_encoder.classes_

Limitations & Bias

  • The model is trained on legal texts from specific municipalities and may not generalize to other domains or languages.
  • Only entity types present in the training data are supported.
  • The model expects entities to be marked with [E] ... [/E] in the input.

Citation

If you use this model, please cite:

@misc{longformer-classifier-refinement-abb,
  author = {S. Vercoutere},
  title = {Longformer Entity Refinement},
  year = {2026},
  howpublished = {\url{https://huggingface.co/svercoutere/longformer-classifier-refinement-abb}}
}
Downloads last month
21
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support