--- language: en license: cc-by-4.0 tags: - document-understanding - layout - relation-extraction - lilt datasets: - bluecopa/samyx-document-ser metrics: - f1 --- # LiLT RE — Document Relation Extraction Fine-tuned [LiLT](https://huggingface.co/SCUT-DLVCLab/lilt-infoxlm-base) for **Relation Extraction** on documents. Given key and value entities detected by [LiLT SER](https://huggingface.co/bluecopa/lilt-ser-document-extraction), predicts which key is linked to which value. ## Results | Metric | Score | |--------|-------| | F1 | **97.1%** | | Precision | 97.7% | | Recall | 96.5% | ## Architecture - **Encoder:** LiLT-InfoXLM (initialized from SER checkpoint) - **Head:** Bilinear classifier on entity pair embeddings - **Context:** 1024 tokens - **Input:** words + bounding boxes + entity pair positions - **Output:** binary score per (key, value) pair ## Training Trained on 17,881 documents from [bluecopa/samyx-document-ser](https://huggingface.co/datasets/bluecopa/samyx-document-ser). Uses gold SER labels during training. At inference, uses SER model predictions. ## Usage Used in combination with the SER model: ```python # Step 1: SER model labels each word as key/value/other # Step 2: RE model links keys to values from transformers import LiltModel import torch # Load encoder encoder = LiltModel.from_pretrained("bluecopa/lilt-re-document-extraction", subfolder="encoder") # Load RE head weights checkpoint = torch.load("model.pt", map_location="cpu") # checkpoint contains: model_state_dict, hidden_size, epoch, f1 ``` ## Pipeline ``` Document → OCR → words + bboxes → LiLT SER → key/value labels → LiLT RE → key→value links (this model) → MiniLM → match to schema field names → Structured JSON output ```