File size: 2,566 Bytes
02a5ecf 927c80b 02a5ecf |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
# **Khasi Fill-Mask Model**
This project demonstrates how to use the Hugging Face Transformers library to perform a fill-mask task using the **`jefson08/kha-bert`** model. The fill-mask task predicts the most likely token(s) to replace the `[MASK]` token in a given sentence.
---
## **Usage**
### **1. Import Dependencies**
```python
from transformers import pipeline
```
### **2. Initialize the Model and Tokenizer**
Load the tokenizer and model pipeline:
```python
# Initialisation
fill_mask = pipeline(
"fill-mask",
model="jefson08/kha-bert",
tokenizer='jefson08/kha-bert'
)
```
### **3. Predict the [MASK] Token**
Provide a sentence with a `[MASK]` token for prediction:
```python
# Predict [MASK] token
sentence = "Nga dei u briew u ba [MASK] bha."
predictions = fill_mask(sentence)
# Display predictions
for prediction in predictions:
print(f"{prediction['sequence']} (score: {prediction['score']:.4f})")
```
---
## **Example Output**
Given the input sentence:
```plaintext
"Nga dei u briew u ba [MASK] bha."
```
The model might output:
```plaintext
[{'score': 0.05552137270569801,
'token': 668,
'token_str': 'kham',
'sequence': 'Nga dei u briew u ba kham bha.'},
{'score': 0.03611050173640251,
'token': 2318,
'token_str': 'kmen',
'sequence': 'Nga dei u briew u ba kmen bha.'},
{'score': 0.029321255162358284,
'token': 3612,
'token_str': 'tbit',
'sequence': 'Nga dei u briew u ba tbit bha.'},
{'score': 0.028406640514731407,
'token': 1860,
'token_str': 'ieit',
'sequence': 'Nga dei u briew u ba ieit bha.'},
{'score': 0.027690021321177483,
'token': 4187,
'token_str': 'sarong',
'sequence': 'Nga dei u briew u ba sarong bha.'}]
```
---
## **Model Information**
The `jefson08/kha-bert` model is fine-tuned for Khasi text tasks. It uses the fill-mask pipeline to predict and replace `[MASK]` tokens in sentences, providing insights into contextual language understanding.
---
## **Dependencies**
- [Transformers](https://huggingface.co/docs/transformers): Provides the pipeline and model-loading utilities.
- [PyTorch](https://pytorch.org/): Backend framework for running the model.
Install the dependencies with:
```bash
pip install transformers torch
```
---
## **Acknowledgements**
- Hugging Face [Transformers](https://huggingface.co/docs/transformers) library.
- Model by [N Donald Jefferson Thabah](https://huggingface.co/jefson08/kha-roberta).
---
## **License**
This project is licensed under the MIT License. See the [LICENSE](./LICENSE) file for more details.
--- |