File size: 2,712 Bytes
7910dc4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8733fe2
7910dc4
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113

# **Khasi Fill-Mask Model**

This project demonstrates how to use the Hugging Face Transformers library to perform a fill-mask task using the **`jefson08/kha-roberta`** model. The fill-mask task predicts the most likely token(s) to replace the `[MASK]` token in a given sentence.

---

## **Usage**

### **1. Import Dependencies**

```python
from transformers import pipeline, AutoTokenizer
```

### **2. Initialize the Model and Tokenizer**

Load the tokenizer and model pipeline:

```python
# Initialisation
tokenizer = AutoTokenizer.from_pretrained('jefson08/kha-roberta')
fill_mask = pipeline(
    "fill-mask",
    model="jefson08/kha-roberta",
    tokenizer=tokenizer,
    device="cuda",  # Use "cuda" for GPU or omit for CPU
)
```

### **3. Predict the [MASK] Token**

Provide a sentence with a `[MASK]` token for prediction:

```python
# Predict [MASK] token
sentence = "Nga dei u briew u ba [MASK] bha."
predictions = fill_mask(sentence)

# Display predictions
for prediction in predictions:
    print(f"{prediction['sequence']} (score: {prediction['score']:.4f})")
```

---

## **Example Output**

Given the input sentence:

```plaintext
"Nga dei u briew u ba [MASK] bha."
```

The model might output:

```plaintext
[{'score': 0.09230164438486099,
  'token': 6086,
  'token_str': 'mutlop',
  'sequence': 'Nga dei u briew u ba  mutlop bha.'},
 {'score': 0.051360130310058594,
  'token': 2059,
  'token_str': 'stad',
  'sequence': 'Nga dei u briew u ba  stad bha.'},
 {'score': 0.045497000217437744,
  'token': 1864,
  'token_str': 'khuid',
  'sequence': 'Nga dei u briew u ba  khuid bha.'},
 {'score': 0.04180142655968666,
  'token': 668,
  'token_str': 'kham',
  'sequence': 'Nga dei u briew u ba  kham bha.'},
 {'score': 0.027332570403814316,
  'token': 2817,
  'token_str': 'khlaiñ',
  'sequence': 'Nga dei u briew u ba  khlaiñ bha.'}]
```

---

## **Model Information**

The `jefson08/kha-roberta` model is fine-tuned for Khasi text tasks. It uses the fill-mask pipeline to predict and replace `[MASK]` tokens in sentences, providing insights into contextual language understanding.


---

## **Dependencies**

- [Transformers](https://huggingface.co/docs/transformers): Provides the pipeline and model-loading utilities.
- [PyTorch](https://pytorch.org/): Backend framework for running the model.

Install the dependencies with:

```bash
pip install transformers torch
```

---

## **Acknowledgements**

- Hugging Face [Transformers](https://huggingface.co/docs/transformers) library.
- Model by [N Donald Jefferson Thabah](https://huggingface.co/jefson08/kha-roberta).

---

## **License**

This project is licensed under the MIT License. See the [LICENSE](./LICENSE) file for more details.

---