| # **Khasi Fill-Mask Model** | |
| This project demonstrates how to use the Hugging Face Transformers library to perform a fill-mask task using the **`jefson08/kha-bert`** model. The fill-mask task predicts the most likely token(s) to replace the `[MASK]` token in a given sentence. | |
| --- | |
| ## **Usage** | |
| ### **1. Import Dependencies** | |
| ```python | |
| from transformers import pipeline | |
| ``` | |
| ### **2. Initialize the Model and Tokenizer** | |
| Load the tokenizer and model pipeline: | |
| ```python | |
| # Initialisation | |
| fill_mask = pipeline( | |
| "fill-mask", | |
| model="jefson08/kha-bert", | |
| tokenizer='jefson08/kha-bert' | |
| ) | |
| ``` | |
| ### **3. Predict the [MASK] Token** | |
| Provide a sentence with a `[MASK]` token for prediction: | |
| ```python | |
| # Predict [MASK] token | |
| sentence = "Nga dei u briew u ba [MASK] bha." | |
| predictions = fill_mask(sentence) | |
| # Display predictions | |
| for prediction in predictions: | |
| print(f"{prediction['sequence']} (score: {prediction['score']:.4f})") | |
| ``` | |
| --- | |
| ## **Example Output** | |
| Given the input sentence: | |
| ```plaintext | |
| "Nga dei u briew u ba [MASK] bha." | |
| ``` | |
| The model might output: | |
| ```plaintext | |
| [{'score': 0.05552137270569801, | |
| 'token': 668, | |
| 'token_str': 'kham', | |
| 'sequence': 'Nga dei u briew u ba kham bha.'}, | |
| {'score': 0.03611050173640251, | |
| 'token': 2318, | |
| 'token_str': 'kmen', | |
| 'sequence': 'Nga dei u briew u ba kmen bha.'}, | |
| {'score': 0.029321255162358284, | |
| 'token': 3612, | |
| 'token_str': 'tbit', | |
| 'sequence': 'Nga dei u briew u ba tbit bha.'}, | |
| {'score': 0.028406640514731407, | |
| 'token': 1860, | |
| 'token_str': 'ieit', | |
| 'sequence': 'Nga dei u briew u ba ieit bha.'}, | |
| {'score': 0.027690021321177483, | |
| 'token': 4187, | |
| 'token_str': 'sarong', | |
| 'sequence': 'Nga dei u briew u ba sarong bha.'}] | |
| ``` | |
| --- | |
| ## **Model Information** | |
| The `jefson08/kha-bert` model is fine-tuned for Khasi text tasks. It uses the fill-mask pipeline to predict and replace `[MASK]` tokens in sentences, providing insights into contextual language understanding. | |
| --- | |
| ## **Dependencies** | |
| - [Transformers](https://huggingface.co/docs/transformers): Provides the pipeline and model-loading utilities. | |
| - [PyTorch](https://pytorch.org/): Backend framework for running the model. | |
| Install the dependencies with: | |
| ```bash | |
| pip install transformers torch | |
| ``` | |
| --- | |
| ## **Acknowledgements** | |
| - Hugging Face [Transformers](https://huggingface.co/docs/transformers) library. | |
| - Model by [N Donald Jefferson Thabah](https://huggingface.co/jefson08/kha-roberta). | |
| --- | |
| ## **License** | |
| This project is licensed under the MIT License. See the [LICENSE](./LICENSE) file for more details. | |
| --- |