| # My Dummy Model | |
| --- | |
| language: fr | |
| license: apache-2.0 | |
| tags: | |
| - masked-lm | |
| - camembert | |
| - transformers | |
| - tf | |
| - french | |
| - fill-mask | |
| --- | |
| # CamemBERT MLM - Fine-tuned Model | |
| This is a TensorFlow-based masked language model (MLM) based on the [camembert-base](https://huggingface.co/camembert-base) checkpoint, a RoBERTa-like model trained on French text. | |
| ## Model description | |
| This model uses the CamemBERT architecture, which is a RoBERTa-based transformer trained on large-scale French corpora (e.g., OSCAR, CCNet). It's designed to perform Masked Language Modeling (MLM) tasks. | |
| It was loaded and saved using the `transformers` library in TensorFlow (`TFAutoModelForMaskedLM`). It can be used for fill-in-the-blank tasks in French. | |
| ## Intended uses & limitations | |
| ### Intended uses | |
| - Fill-mask predictions in French | |
| - Feature extraction for NLP tasks | |
| - Fine-tuning on downstream tasks like text classification, NER, etc. | |
| ### Limitations | |
| - Works best with French text | |
| - May not generalize well to other languages | |
| - Cannot be used for generative tasks (e.g., translation, text generation) | |
| ## How to use | |
| ```python | |
| from transformers import TFAutoModelForMaskedLM, AutoTokenizer | |
| import tensorflow as tf | |
| model = TFAutoModelForMaskedLM.from_pretrained("Mhammad2023/my-dummy-model") | |
| tokenizer = AutoTokenizer.from_pretrained("Mhammad2023/my-dummy-model") | |
| inputs = tokenizer("J'aime le [MASK] rouge.", return_tensors="tf") | |
| outputs = model(**inputs) | |
| logits = outputs.logits | |
| masked_index = tf.argmax(inputs.input_ids == tokenizer.mask_token_id, axis=1)[0] | |
| predicted_token_id = tf.argmax(logits[0, masked_index]) | |
| predicted_token = tokenizer.decode([predicted_token_id]) | |
| print(f"Predicted word: {predicted_token}") | |
| ``` | |
| ## Limitations and bias | |
| This model inherits the limitations and biases from the camembert-base checkpoint, including: | |
| Potential biases from the training data (e.g., internet corpora) | |
| ## Inappropriate predictions for sensitive topics | |
| Use with caution in production or sensitive applications. | |
| ## Training data | |
| The model was not further fine-tuned; it is based directly on camembert-base, which was trained on: | |
| OSCAR (Open Super-large Crawled ALMAnaCH coRpus) | |
| CCNet (Common Crawl News) | |
| ## Training procedure | |
| No additional training was applied for this version. You can load and fine-tune it on your task using Trainer or Keras API. | |
| ## Evaluation results | |
| This version has not been evaluated on downstream tasks. For evaluation metrics and benchmarks, refer to the original camembert-base model card. |