| # MODEL_NAME | |
| This repository contains **layoutlm-camembertv2-qa** weights exported to `safetensors` format. | |
| ## Source | |
| These weights are derived from pretrained models: | |
| - **Layout encoder (LayoutLM)**: [`microsoft/layoutlm-base-uncased`](https://huggingface.co/microsoft/layoutlm-base-uncased) — pretrained on IIT-CDIP + masked visual-language modeling (LayoutLM paper) | |
| - **Text encoder**: [`almanach/camembertv2-base`](https://huggingface.co/almanach/camembertv2-base) — French language model (RoBERTa-like architecture) | |
| ## Methodology | |
| This checkpoint was produced by **weight merging**, not end-to-end training. | |
| 1. Load the pretrained layout encoder weights (LiLT or LayoutLM) — kept intact | |
| 2. Replace the text encoder weights (embeddings, attention layers, FFN) with those from the French model | |
| 3. Update the tokenizer and vocabulary configuration accordingly | |
| No training or fine-tuning was performed at this stage. | |
| This checkpoint is intended as a **starting point** for downstream fine-tuning on French document understanding tasks (NER, token classification, extractive QA…). | |
| ## Files | |
| | File | Description | | |
| |------|-------------| | |
| | `model.safetensors` | Model weights | | |
| | `pytorch_model.bin` | Model weights (PyTorch format) | | |
| | `config.json` | Model configuration | | |
| | `tokenizer_config.json` | Tokenizer configuration | | |
| | `README.md` | This model card | | |
| ## Usage | |
| ```python | |
| from transformers import AutoTokenizer, AutoModel | |
| tokenizer = AutoTokenizer.from_pretrained("USERNAME/MODEL_NAME") | |
| model = AutoModel.from_pretrained("USERNAME/MODEL_NAME") | |
| ``` | |
| ## Limitations | |
| - This model has **not been fine-tuned** on any French document dataset | |
| - Performance on downstream tasks is **not guaranteed** without task-specific fine-tuning | |
| - Intended for research and experimentation purposes | |
| ## License | |
| Weights are derived from models released under the MIT and Apache-2.0 licenses. | |
| Please refer to the original repositories for full license terms. | |
| ## Acknowledgements | |
| - [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) — Xu et al., 2020 | |
| - [`microsoft/layoutlm-base-uncased`](https://huggingface.co/microsoft/layoutlm-base-uncased) | |
| > **Note**: This is not an official release from any of the above organizations. |