RomDev2
/

layoutlm-camembertv2-qa

Model card Files Files and versions

layoutlm-camembertv2-qa / README.md

RomDev2's picture

Rename README to README.md

a1b2b5d verified 7 days ago

|

history blame contribute delete

2.37 kB

	# MODEL_NAME

	This repository contains layoutlm-camembertv2-qa weights exported to `safetensors` format.

	## Source

	These weights are derived from pretrained models:

	- Layout encoder (LayoutLM): [`microsoft/layoutlm-base-uncased`](https://huggingface.co/microsoft/layoutlm-base-uncased) — pretrained on IIT-CDIP + masked visual-language modeling (LayoutLM paper)
	- Text encoder: [`almanach/camembertv2-base`](https://huggingface.co/almanach/camembertv2-base) — French language model (RoBERTa-like architecture)

	## Methodology

	This checkpoint was produced by weight merging, not end-to-end training.

	1. Load the pretrained layout encoder weights (LiLT or LayoutLM) — kept intact
	2. Replace the text encoder weights (embeddings, attention layers, FFN) with those from the French model
	3. Update the tokenizer and vocabulary configuration accordingly

	No training or fine-tuning was performed at this stage.
	This checkpoint is intended as a starting point for downstream fine-tuning on French document understanding tasks (NER, token classification, extractive QA…).


	## Files

	\| File \| Description \|
	\|------\|-------------\|
	\| `model.safetensors` \| Model weights \|
	\| `pytorch_model.bin` \| Model weights (PyTorch format) \|
	\| `config.json` \| Model configuration \|
	\| `tokenizer_config.json` \| Tokenizer configuration \|
	\| `README.md` \| This model card \|

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModel

	tokenizer = AutoTokenizer.from_pretrained("USERNAME/MODEL_NAME")
	model = AutoModel.from_pretrained("USERNAME/MODEL_NAME")
	```

	## Limitations

	- This model has not been fine-tuned on any French document dataset
	- Performance on downstream tasks is not guaranteed without task-specific fine-tuning
	- Intended for research and experimentation purposes

	## License

	Weights are derived from models released under the MIT and Apache-2.0 licenses.
	Please refer to the original repositories for full license terms.

	## Acknowledgements

	- [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) — Xu et al., 2020
	- [`microsoft/layoutlm-base-uncased`](https://huggingface.co/microsoft/layoutlm-base-uncased)


	> Note: This is not an official release from any of the above organizations.