bluecopa
/

lilt-re-document-extraction

document-understanding

relation-extraction

Model card Files Files and versions

lilt-re-document-extraction / README.md

satya007's picture

Add model card

f82935f verified 3 days ago

|

history blame contribute delete

1.77 kB

	---
	language: en
	license: cc-by-4.0
	tags:
	- document-understanding
	- layout
	- relation-extraction
	- lilt
	datasets:
	- bluecopa/samyx-document-ser
	metrics:
	- f1
	---

	# LiLT RE — Document Relation Extraction

	Fine-tuned [LiLT](https://huggingface.co/SCUT-DLVCLab/lilt-infoxlm-base) for Relation Extraction on documents.

	Given key and value entities detected by [LiLT SER](https://huggingface.co/bluecopa/lilt-ser-document-extraction),
	predicts which key is linked to which value.

	## Results

	\| Metric \| Score \|
	\|--------\|-------\|
	\| F1 \| 97.1% \|
	\| Precision \| 97.7% \|
	\| Recall \| 96.5% \|

	## Architecture

	- Encoder: LiLT-InfoXLM (initialized from SER checkpoint)
	- Head: Bilinear classifier on entity pair embeddings
	- Context: 1024 tokens
	- Input: words + bounding boxes + entity pair positions
	- Output: binary score per (key, value) pair

	## Training

	Trained on 17,881 documents from [bluecopa/samyx-document-ser](https://huggingface.co/datasets/bluecopa/samyx-document-ser).
	Uses gold SER labels during training. At inference, uses SER model predictions.

	## Usage

	Used in combination with the SER model:

	```python
	# Step 1: SER model labels each word as key/value/other
	# Step 2: RE model links keys to values

	from transformers import LiltModel
	import torch

	# Load encoder
	encoder = LiltModel.from_pretrained("bluecopa/lilt-re-document-extraction", subfolder="encoder")

	# Load RE head weights
	checkpoint = torch.load("model.pt", map_location="cpu")
	# checkpoint contains: model_state_dict, hidden_size, epoch, f1
	```

	## Pipeline

	```
	Document → OCR → words + bboxes
	→ LiLT SER → key/value labels
	→ LiLT RE → key→value links (this model)
	→ MiniLM → match to schema field names
	→ Structured JSON output
	```