File size: 2,369 Bytes
47cd4f2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# MODEL_NAME



This repository contains **layoutlm-camembertv2-qa** weights exported to `safetensors` format.



## Source



These weights are derived from pretrained models:



- **Layout encoder (LayoutLM)**: [`microsoft/layoutlm-base-uncased`](https://huggingface.co/microsoft/layoutlm-base-uncased) — pretrained on IIT-CDIP + masked visual-language modeling (LayoutLM paper)

- **Text encoder**: [`almanach/camembertv2-base`](https://huggingface.co/almanach/camembertv2-base) — French language model (RoBERTa-like architecture)



## Methodology



This checkpoint was produced by **weight merging**, not end-to-end training.



1. Load the pretrained layout encoder weights (LiLT or LayoutLM) — kept intact

2. Replace the text encoder weights (embeddings, attention layers, FFN) with those from the French model

3. Update the tokenizer and vocabulary configuration accordingly



No training or fine-tuning was performed at this stage.  

This checkpoint is intended as a **starting point** for downstream fine-tuning on French document understanding tasks (NER, token classification, extractive QA…).





## Files



| File | Description |

|------|-------------|

| `model.safetensors` | Model weights |

| `pytorch_model.bin` | Model weights (PyTorch format) |
| `config.json` | Model configuration |
| `tokenizer_config.json` | Tokenizer configuration |
| `README.md` | This model card |

## Usage

```python

from transformers import AutoTokenizer, AutoModel



tokenizer = AutoTokenizer.from_pretrained("USERNAME/MODEL_NAME")

model     = AutoModel.from_pretrained("USERNAME/MODEL_NAME")

```

## Limitations

- This model has **not been fine-tuned** on any French document dataset
- Performance on downstream tasks is **not guaranteed** without task-specific fine-tuning
- Intended for research and experimentation purposes

## License

Weights are derived from models released under the MIT and Apache-2.0 licenses.  
Please refer to the original repositories for full license terms.

## Acknowledgements

- [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) — Xu et al., 2020
- [`microsoft/layoutlm-base-uncased`](https://huggingface.co/microsoft/layoutlm-base-uncased)


> **Note**: This is not an official release from any of the above organizations.