LiLT RE β Document Relation Extraction
Fine-tuned LiLT for Relation Extraction on documents.
Given key and value entities detected by LiLT SER, predicts which key is linked to which value.
Results
| Metric | Score |
|---|---|
| F1 | 97.1% |
| Precision | 97.7% |
| Recall | 96.5% |
Architecture
- Encoder: LiLT-InfoXLM (initialized from SER checkpoint)
- Head: Bilinear classifier on entity pair embeddings
- Context: 1024 tokens
- Input: words + bounding boxes + entity pair positions
- Output: binary score per (key, value) pair
Training
Trained on 17,881 documents from bluecopa/samyx-document-ser. Uses gold SER labels during training. At inference, uses SER model predictions.
Usage
Used in combination with the SER model:
# Step 1: SER model labels each word as key/value/other
# Step 2: RE model links keys to values
from transformers import LiltModel
import torch
# Load encoder
encoder = LiltModel.from_pretrained("bluecopa/lilt-re-document-extraction", subfolder="encoder")
# Load RE head weights
checkpoint = torch.load("model.pt", map_location="cpu")
# checkpoint contains: model_state_dict, hidden_size, epoch, f1
Pipeline
Document β OCR β words + bboxes
β LiLT SER β key/value labels
β LiLT RE β keyβvalue links (this model)
β MiniLM β match to schema field names
β Structured JSON output
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support