LiLT FUNSD โ GGUF
GGUF conversion of philschmid/lilt-en-funsd for use with CrispEmbed.
LiLT (Language-independent Layout Transformer) is a dual-stream encoder that combines RoBERTa (768d text) with a parallel layout transformer (192d) via BiACM (bidirectional attention complementation). It takes OCR text + bounding boxes and performs token classification for document understanding.
This variant is fine-tuned on FUNSD (Form Understanding in Noisy Scanned Documents) with 7 IOB labels: O, B-HEADER, I-HEADER, B-QUESTION, I-QUESTION, B-ANSWER, I-ANSWER.
Model Details
| Property | Value |
|---|---|
| Architecture | LiLT (RoBERTa + Layout Transformer + BiACM) |
| Parameters | 130.7M |
| Hidden size | 768 (text) / 192 (layout) |
| Layers | 12 |
| Heads | 12 |
| Vocab | 50,265 (RoBERTa BPE) |
| Labels | 7 (FUNSD IOB) |
| License | MIT |
| Base model | SCUT-DLVCLab/lilt-roberta-en-base |
Available Formats
| File | Format | Size |
|---|---|---|
| Float32 | 498 MB | |
| Q8_0 | 134 MB | |
| Q4_K | 90 MB |
Usage
Python
CLI
Parity
Verified against HuggingFace transformers using the crispembed-diff harness:
- 25/25 encoder stages: cos_min = 1.000000
- 16/16 token labels match (100%)
- max_abs < 1.6e-03 across all layers
Citation
- Downloads last month
- 207
Hardware compatibility
Log In to add your hardware
8-bit
32-bit
Model tree for cstr/lilt-funsd-GGUF
Base model
philschmid/lilt-en-funsd