Add lilt-only-base

Browse files

Files changed (4) hide show

README.md +48 -0
config.json +15 -0
model.safetensors +3 -0
pytorch_model.bin +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,48 @@

+# lilt-only-base
+Layout-only pretrained checkpoint from the official [LiLT repository](https://github.com/jpwang/lilt).
+This is **not a complete model** — it contains only the 2D spatial (layout) encoder, with no text encoder. It is intended as a building block for combining with any RoBERTa-like text encoder.
+## What is this?
+LiLT (Language-Independent Layout Transformer) decouples text and layout understanding into two separate encoders. `lilt-only-base` contains exclusively the **layout encoder** weights, pretrained on document layout understanding (IIT-CDIP dataset).
+This allows combining it with any RoBERTa-compatible text encoder to produce a language-specific document understanding model.
+## Usage
+Use [`gen_weight_roberta_like.py`](https://github.com/jpwang/lilt) from the official repository to combine with your text encoder of choice:
+```bash
+python gen_weight_roberta_like.py \
+     --lilt lilt-only-base/pytorch_model.bin \
+     --text your-roberta-model/pytorch_model.bin \
+     --config your-roberta-model/config.json \
+     --out lilt-your-language-base
+```
+Compatible text encoders: any RoBERTa-like model (`roberta-base`, `camembert-base`, `microsoft/infoxlm-base`, etc.)
+## Files
+| File | Description |
+|------|-------------|
+| `model.safetensors` | Layout encoder weights (safetensors format) |
+| `pytorch_model.bin` | Layout encoder weights (PyTorch format) |
+| `config.json` | Model configuration (`model_type: liltrobertalike`) |
+## Note on model type
+This checkpoint uses `model_type = liltrobertalike`, a custom type defined in the original LiLT repository. It cannot be loaded directly with `AutoModel` from HuggingFace transformers without first combining it with a text encoder via the procedure above.
+## License
+MIT — following the original [jpwang/lilt](https://github.com/jpwang/lilt) repository.
+## Acknowledgements
+- [LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding](https://arxiv.org/abs/2202.13669) — Wang et al., 2022
+- Original weights: [jpwang/lilt](https://github.com/jpwang/lilt)
+> **Note**: This is not an official HuggingFace release from the original authors.

config.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+  "model_type": "liltrobertalike",
+  "channel_shrink_ratio": 4,
+  "max_2d_position_embeddings": 1024,
+  "hidden_size": 768,
+  "num_hidden_layers": 12,
+  "num_attention_heads": 12,
+  "intermediate_size": 3072,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "attention_probs_dropout_prob": 0.1,
+  "initializer_range": 0.02,
+  "layer_norm_eps": 1e-5,
+  "type_vocab_size": 1
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a49660daa64a30125150cb00711380af425c99b217448b5fa8228645aae06779
+size 24461920

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f109e656c529979904c9a3b35f4eb0ab5ad642e9840a9b444481ff6035ca9fb8
+size 24510336