# MODEL_NAME

This repository contains **layoutlm-camembertv2-qa** weights exported to `safetensors` format.

## Source

These weights are derived from pretrained models:

- **Layout encoder (LayoutLM)**: [`microsoft/layoutlm-base-uncased`](https://huggingface.co/microsoft/layoutlm-base-uncased) — pretrained on IIT-CDIP + masked visual-language modeling (LayoutLM paper)
- **Text encoder**: [`almanach/camembertv2-base`](https://huggingface.co/almanach/camembertv2-base) — French language model (RoBERTa-like architecture)

## Methodology

This checkpoint was produced by **weight merging**, not end-to-end training.

1. Load the pretrained layout encoder weights (LiLT or LayoutLM) — kept intact
2. Replace the text encoder weights (embeddings, attention layers, FFN) with those from the French model
3. Update the tokenizer and vocabulary configuration accordingly

No training or fine-tuning was performed at this stage.  
This checkpoint is intended as a **starting point** for downstream fine-tuning on French document understanding tasks (NER, token classification, extractive QA…).


## Files

| File | Description |
|------|-------------|
| `model.safetensors` | Model weights |
| `pytorch_model.bin` | Model weights (PyTorch format) |
| `config.json` | Model configuration |
| `tokenizer_config.json` | Tokenizer configuration |
| `README.md` | This model card |

## Usage

```python
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("USERNAME/MODEL_NAME")
model     = AutoModel.from_pretrained("USERNAME/MODEL_NAME")
```

## Limitations

- This model has **not been fine-tuned** on any French document dataset
- Performance on downstream tasks is **not guaranteed** without task-specific fine-tuning
- Intended for research and experimentation purposes

## License

Weights are derived from models released under the MIT and Apache-2.0 licenses.  
Please refer to the original repositories for full license terms.

## Acknowledgements

- [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) — Xu et al., 2020
- [`microsoft/layoutlm-base-uncased`](https://huggingface.co/microsoft/layoutlm-base-uncased)


> **Note**: This is not an official release from any of the above organizations.