RomDev2 commited on
Commit
46ecd1e
·
verified ·
1 Parent(s): 2da199a

Add lilt-only-base

Browse files
Files changed (4) hide show
  1. README.md +48 -0
  2. config.json +15 -0
  3. model.safetensors +3 -0
  4. pytorch_model.bin +3 -0
README.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # lilt-only-base
2
+
3
+ Layout-only pretrained checkpoint from the official [LiLT repository](https://github.com/jpwang/lilt).
4
+
5
+ This is **not a complete model** — it contains only the 2D spatial (layout) encoder, with no text encoder. It is intended as a building block for combining with any RoBERTa-like text encoder.
6
+
7
+ ## What is this?
8
+
9
+ LiLT (Language-Independent Layout Transformer) decouples text and layout understanding into two separate encoders. `lilt-only-base` contains exclusively the **layout encoder** weights, pretrained on document layout understanding (IIT-CDIP dataset).
10
+
11
+ This allows combining it with any RoBERTa-compatible text encoder to produce a language-specific document understanding model.
12
+
13
+ ## Usage
14
+
15
+ Use [`gen_weight_roberta_like.py`](https://github.com/jpwang/lilt) from the official repository to combine with your text encoder of choice:
16
+
17
+ ```bash
18
+ python gen_weight_roberta_like.py \
19
+ --lilt lilt-only-base/pytorch_model.bin \
20
+ --text your-roberta-model/pytorch_model.bin \
21
+ --config your-roberta-model/config.json \
22
+ --out lilt-your-language-base
23
+ ```
24
+
25
+ Compatible text encoders: any RoBERTa-like model (`roberta-base`, `camembert-base`, `microsoft/infoxlm-base`, etc.)
26
+
27
+ ## Files
28
+
29
+ | File | Description |
30
+ |------|-------------|
31
+ | `model.safetensors` | Layout encoder weights (safetensors format) |
32
+ | `pytorch_model.bin` | Layout encoder weights (PyTorch format) |
33
+ | `config.json` | Model configuration (`model_type: liltrobertalike`) |
34
+
35
+ ## Note on model type
36
+
37
+ This checkpoint uses `model_type = liltrobertalike`, a custom type defined in the original LiLT repository. It cannot be loaded directly with `AutoModel` from HuggingFace transformers without first combining it with a text encoder via the procedure above.
38
+
39
+ ## License
40
+
41
+ MIT — following the original [jpwang/lilt](https://github.com/jpwang/lilt) repository.
42
+
43
+ ## Acknowledgements
44
+
45
+ - [LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding](https://arxiv.org/abs/2202.13669) — Wang et al., 2022
46
+ - Original weights: [jpwang/lilt](https://github.com/jpwang/lilt)
47
+
48
+ > **Note**: This is not an official HuggingFace release from the original authors.
config.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "liltrobertalike",
3
+ "channel_shrink_ratio": 4,
4
+ "max_2d_position_embeddings": 1024,
5
+ "hidden_size": 768,
6
+ "num_hidden_layers": 12,
7
+ "num_attention_heads": 12,
8
+ "intermediate_size": 3072,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "attention_probs_dropout_prob": 0.1,
12
+ "initializer_range": 0.02,
13
+ "layer_norm_eps": 1e-5,
14
+ "type_vocab_size": 1
15
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a49660daa64a30125150cb00711380af425c99b217448b5fa8228645aae06779
3
+ size 24461920
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f109e656c529979904c9a3b35f4eb0ab5ad642e9840a9b444481ff6035ca9fb8
3
+ size 24510336