LiLT RE β€” Document Relation Extraction

Fine-tuned LiLT for Relation Extraction on documents.

Given key and value entities detected by LiLT SER, predicts which key is linked to which value.

Results

Metric Score
F1 97.1%
Precision 97.7%
Recall 96.5%

Architecture

  • Encoder: LiLT-InfoXLM (initialized from SER checkpoint)
  • Head: Bilinear classifier on entity pair embeddings
  • Context: 1024 tokens
  • Input: words + bounding boxes + entity pair positions
  • Output: binary score per (key, value) pair

Training

Trained on 17,881 documents from bluecopa/samyx-document-ser. Uses gold SER labels during training. At inference, uses SER model predictions.

Usage

Used in combination with the SER model:

# Step 1: SER model labels each word as key/value/other
# Step 2: RE model links keys to values

from transformers import LiltModel
import torch

# Load encoder
encoder = LiltModel.from_pretrained("bluecopa/lilt-re-document-extraction", subfolder="encoder")

# Load RE head weights
checkpoint = torch.load("model.pt", map_location="cpu")
# checkpoint contains: model_state_dict, hidden_size, epoch, f1

Pipeline

Document β†’ OCR β†’ words + bboxes
  β†’ LiLT SER β†’ key/value labels
  → LiLT RE → key→value links (this model)
  β†’ MiniLM β†’ match to schema field names
  β†’ Structured JSON output
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train bluecopa/lilt-re-document-extraction