RomDev2's picture
Rename README to README.md
a1b2b5d verified
# MODEL_NAME
This repository contains **layoutlm-camembertv2-qa** weights exported to `safetensors` format.
## Source
These weights are derived from pretrained models:
- **Layout encoder (LayoutLM)**: [`microsoft/layoutlm-base-uncased`](https://huggingface.co/microsoft/layoutlm-base-uncased) — pretrained on IIT-CDIP + masked visual-language modeling (LayoutLM paper)
- **Text encoder**: [`almanach/camembertv2-base`](https://huggingface.co/almanach/camembertv2-base) — French language model (RoBERTa-like architecture)
## Methodology
This checkpoint was produced by **weight merging**, not end-to-end training.
1. Load the pretrained layout encoder weights (LiLT or LayoutLM) — kept intact
2. Replace the text encoder weights (embeddings, attention layers, FFN) with those from the French model
3. Update the tokenizer and vocabulary configuration accordingly
No training or fine-tuning was performed at this stage.
This checkpoint is intended as a **starting point** for downstream fine-tuning on French document understanding tasks (NER, token classification, extractive QA…).
## Files
| File | Description |
|------|-------------|
| `model.safetensors` | Model weights |
| `pytorch_model.bin` | Model weights (PyTorch format) |
| `config.json` | Model configuration |
| `tokenizer_config.json` | Tokenizer configuration |
| `README.md` | This model card |
## Usage
```python
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("USERNAME/MODEL_NAME")
model = AutoModel.from_pretrained("USERNAME/MODEL_NAME")
```
## Limitations
- This model has **not been fine-tuned** on any French document dataset
- Performance on downstream tasks is **not guaranteed** without task-specific fine-tuning
- Intended for research and experimentation purposes
## License
Weights are derived from models released under the MIT and Apache-2.0 licenses.
Please refer to the original repositories for full license terms.
## Acknowledgements
- [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) — Xu et al., 2020
- [`microsoft/layoutlm-base-uncased`](https://huggingface.co/microsoft/layoutlm-base-uncased)
> **Note**: This is not an official release from any of the above organizations.