Add lilt-only-base
Browse files- README.md +48 -0
- config.json +15 -0
- model.safetensors +3 -0
- pytorch_model.bin +3 -0
README.md
ADDED
|
@@ -0,0 +1,48 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# lilt-only-base
|
| 2 |
+
|
| 3 |
+
Layout-only pretrained checkpoint from the official [LiLT repository](https://github.com/jpwang/lilt).
|
| 4 |
+
|
| 5 |
+
This is **not a complete model** — it contains only the 2D spatial (layout) encoder, with no text encoder. It is intended as a building block for combining with any RoBERTa-like text encoder.
|
| 6 |
+
|
| 7 |
+
## What is this?
|
| 8 |
+
|
| 9 |
+
LiLT (Language-Independent Layout Transformer) decouples text and layout understanding into two separate encoders. `lilt-only-base` contains exclusively the **layout encoder** weights, pretrained on document layout understanding (IIT-CDIP dataset).
|
| 10 |
+
|
| 11 |
+
This allows combining it with any RoBERTa-compatible text encoder to produce a language-specific document understanding model.
|
| 12 |
+
|
| 13 |
+
## Usage
|
| 14 |
+
|
| 15 |
+
Use [`gen_weight_roberta_like.py`](https://github.com/jpwang/lilt) from the official repository to combine with your text encoder of choice:
|
| 16 |
+
|
| 17 |
+
```bash
|
| 18 |
+
python gen_weight_roberta_like.py \
|
| 19 |
+
--lilt lilt-only-base/pytorch_model.bin \
|
| 20 |
+
--text your-roberta-model/pytorch_model.bin \
|
| 21 |
+
--config your-roberta-model/config.json \
|
| 22 |
+
--out lilt-your-language-base
|
| 23 |
+
```
|
| 24 |
+
|
| 25 |
+
Compatible text encoders: any RoBERTa-like model (`roberta-base`, `camembert-base`, `microsoft/infoxlm-base`, etc.)
|
| 26 |
+
|
| 27 |
+
## Files
|
| 28 |
+
|
| 29 |
+
| File | Description |
|
| 30 |
+
|------|-------------|
|
| 31 |
+
| `model.safetensors` | Layout encoder weights (safetensors format) |
|
| 32 |
+
| `pytorch_model.bin` | Layout encoder weights (PyTorch format) |
|
| 33 |
+
| `config.json` | Model configuration (`model_type: liltrobertalike`) |
|
| 34 |
+
|
| 35 |
+
## Note on model type
|
| 36 |
+
|
| 37 |
+
This checkpoint uses `model_type = liltrobertalike`, a custom type defined in the original LiLT repository. It cannot be loaded directly with `AutoModel` from HuggingFace transformers without first combining it with a text encoder via the procedure above.
|
| 38 |
+
|
| 39 |
+
## License
|
| 40 |
+
|
| 41 |
+
MIT — following the original [jpwang/lilt](https://github.com/jpwang/lilt) repository.
|
| 42 |
+
|
| 43 |
+
## Acknowledgements
|
| 44 |
+
|
| 45 |
+
- [LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding](https://arxiv.org/abs/2202.13669) — Wang et al., 2022
|
| 46 |
+
- Original weights: [jpwang/lilt](https://github.com/jpwang/lilt)
|
| 47 |
+
|
| 48 |
+
> **Note**: This is not an official HuggingFace release from the original authors.
|
config.json
ADDED
|
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"model_type": "liltrobertalike",
|
| 3 |
+
"channel_shrink_ratio": 4,
|
| 4 |
+
"max_2d_position_embeddings": 1024,
|
| 5 |
+
"hidden_size": 768,
|
| 6 |
+
"num_hidden_layers": 12,
|
| 7 |
+
"num_attention_heads": 12,
|
| 8 |
+
"intermediate_size": 3072,
|
| 9 |
+
"hidden_act": "gelu",
|
| 10 |
+
"hidden_dropout_prob": 0.1,
|
| 11 |
+
"attention_probs_dropout_prob": 0.1,
|
| 12 |
+
"initializer_range": 0.02,
|
| 13 |
+
"layer_norm_eps": 1e-5,
|
| 14 |
+
"type_vocab_size": 1
|
| 15 |
+
}
|
model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a49660daa64a30125150cb00711380af425c99b217448b5fa8228645aae06779
|
| 3 |
+
size 24461920
|
pytorch_model.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f109e656c529979904c9a3b35f4eb0ab5ad642e9840a9b444481ff6035ca9fb8
|
| 3 |
+
size 24510336
|