--- library_name: transformers tags: - transformers - pytorch - bert - legal-domain - entity-classification - sequence-classification - NER - longformer - token-classification - label-studio - english - fine-tuned --- # Model Card for Model ID # Legal-BERT Base Entity Classifier ## Overview A fine-tuned Longformer-based model for classifying legal entities (such as locations and dates) within the context of legal decision texts. The model is based on `allenai/longformer` and is trained to predict the type of a marked entity span, given its context, using special entity markers `[E] ... [/E]`. ## Model Details - **Model Name:** longformer-classifier-refinement-abb - **Architecture:** Longformer (allenai/longformer) - **Task:** Entity Classification (NER-style, entity-in-context classification) - **Framework:** PyTorch, Hugging Face Transformers - **Author:** S. Vercoutere ## Intended Use - **Purpose:** Automatic classification of legal entities (e.g., location, date) in municipal or governmental decision documents. - **Not Intended For:** General-purpose NER, non-legal domains, or tasks outside entity classification. ## Training Data - **Source:** Annotated legal decision texts from Ghent/Freiburg/Bamberg. - **Entity Types:** - Locations: `impact_location`, `context_location` - Dates: `publication_date`, `session_date`, `entry_date`, `expiry_date`, `legal_date`, `context_date`, `validity_period`, `context_period` - **Preprocessing:** - XML-like tags in text, with entities wrapped in `...`. - For training, one entity per sample is marked with `[E] ... [/E]` in context. - Dataset balanced to max 5000 samples per label. ## Training Procedure - **Model:** `nlpaueb/legal-bert-base-uncased` - **Tokenization:** Hugging Face AutoTokenizer, with `[E]` and `[/E]` as additional special tokens. - **Max Sequence Length:** 2048 (trained) - **Batch Size:** 4 - **Optimizer:** AdamW - **Learning Rate:** 2e-5 - **Epochs:** 10 - **Mixed Precision:** Yes (AMP) - **Validation Split:** 20% - **Evaluation Metrics:** Accuracy, F1, confusion matrix ## Evaluation **Validation Accuracy:** 0.8454 (on held-out validation set) **Detailed Entity-Level Evaluation:** | Entity Label | Precision | Recall | F1-score | Support | | ---------------- | --------- | ------ | ---------- | ------- | | context_date | 0.9272 | 0.9405 | 0.9338 | 975 | | context_location | 0.9671 | 0.9751 | 0.9711 | 843 | | context_period | 0.9744 | 0.8321 | 0.8976 | 137 | | entry_date | 0.9528 | 0.9587 | 0.9557 | 484 | | expiry_date | 0.8980 | 0.9496 | 0.9231 | 139 | | impact_location | 0.9501 | 0.9559 | 0.9530 | 997 | | legal_date | 1.0000 | 0.9926 | 0.9963 | 943 | | publication_date | 0.9501 | 0.9870 | 0.9682 | 386 | | session_date | 0.9597 | 0.9597 | 0.9597 | 347 | | validity_period | 0.9932 | 0.9379 | 0.9648 | 467 | | **accuracy** | | | **0.9601** | 5718 | | **macro avg** | 0.9572 | 0.9489 | 0.9523 | 5718 | | **weighted avg** | 0.9606 | 0.9601 | 0.9601 | 5718 | ## Usage Example ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch tokenizer = AutoTokenizer.from_pretrained("svercoutere/longformer-classifier-refinement-abb") model = AutoModelForSequenceClassification.from_pretrained("svercoutere/longformer-classifier-refinement-abb") def classify_entity(entity_text, context_text): marked_text = context_text.replace(entity_text, f"[E] {entity_text} [/E]", 1) inputs = tokenizer(marked_text, return_tensors="pt", truncation=True, max_length=2048, padding="max_length") with torch.no_grad(): outputs = model(**inputs) pred = torch.argmax(outputs.logits, dim=-1).item() return pred # Map to label using label_encoder.classes_ ``` ## Limitations & Bias - The model is trained on legal texts from specific municipalities and may not generalize to other domains or languages. - Only entity types present in the training data are supported. - The model expects entities to be marked with `[E] ... [/E]` in the input. ## Citation If you use this model, please cite: ``` @misc{longformer-classifier-refinement-abb, author = {S. Vercoutere}, title = {Longformer Entity Refinement}, year = {2026}, howpublished = {\url{https://huggingface.co/svercoutere/longformer-classifier-refinement-abb}} } ```