jefson08 commited on
Commit
02a5ecf
·
verified ·
1 Parent(s): 3023127

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +111 -3
README.md CHANGED
@@ -1,3 +1,111 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # **Khasi Fill-Mask Model**
3
+
4
+ This project demonstrates how to use the Hugging Face Transformers library to perform a fill-mask task using the **`jefson08/kha-bert`** model. The fill-mask task predicts the most likely token(s) to replace the `[MASK]` token in a given sentence.
5
+
6
+ ---
7
+
8
+ ## **Usage**
9
+
10
+ ### **1. Import Dependencies**
11
+
12
+ ```python
13
+ from transformers import pipeline
14
+ ```
15
+
16
+ ### **2. Initialize the Model and Tokenizer**
17
+
18
+ Load the tokenizer and model pipeline:
19
+
20
+ ```python
21
+ # Initialisation
22
+ fill_mask = pipeline(
23
+ "fill-mask",
24
+ model="kha-bert",
25
+ tokenizer='kha-bert'
26
+ )
27
+ ```
28
+
29
+ ### **3. Predict the [MASK] Token**
30
+
31
+ Provide a sentence with a `[MASK]` token for prediction:
32
+
33
+ ```python
34
+ # Predict [MASK] token
35
+ sentence = "Nga dei u briew u ba [MASK] bha."
36
+ predictions = fill_mask(sentence)
37
+
38
+ # Display predictions
39
+ for prediction in predictions:
40
+ print(f"{prediction['sequence']} (score: {prediction['score']:.4f})")
41
+ ```
42
+
43
+ ---
44
+
45
+ ## **Example Output**
46
+
47
+ Given the input sentence:
48
+
49
+ ```plaintext
50
+ "Nga dei u briew u ba [MASK] bha."
51
+ ```
52
+
53
+ The model might output:
54
+
55
+ ```plaintext
56
+ [{'score': 0.05552137270569801,
57
+ 'token': 668,
58
+ 'token_str': 'kham',
59
+ 'sequence': 'Nga dei u briew u ba kham bha.'},
60
+ {'score': 0.03611050173640251,
61
+ 'token': 2318,
62
+ 'token_str': 'kmen',
63
+ 'sequence': 'Nga dei u briew u ba kmen bha.'},
64
+ {'score': 0.029321255162358284,
65
+ 'token': 3612,
66
+ 'token_str': 'tbit',
67
+ 'sequence': 'Nga dei u briew u ba tbit bha.'},
68
+ {'score': 0.028406640514731407,
69
+ 'token': 1860,
70
+ 'token_str': 'ieit',
71
+ 'sequence': 'Nga dei u briew u ba ieit bha.'},
72
+ {'score': 0.027690021321177483,
73
+ 'token': 4187,
74
+ 'token_str': 'sarong',
75
+ 'sequence': 'Nga dei u briew u ba sarong bha.'}]
76
+ ```
77
+
78
+ ---
79
+
80
+ ## **Model Information**
81
+
82
+ The `jefson08/kha-bert` model is fine-tuned for Khasi text tasks. It uses the fill-mask pipeline to predict and replace `[MASK]` tokens in sentences, providing insights into contextual language understanding.
83
+
84
+ ---
85
+
86
+
87
+ ## **Dependencies**
88
+
89
+ - [Transformers](https://huggingface.co/docs/transformers): Provides the pipeline and model-loading utilities.
90
+ - [PyTorch](https://pytorch.org/): Backend framework for running the model.
91
+
92
+ Install the dependencies with:
93
+
94
+ ```bash
95
+ pip install transformers torch
96
+ ```
97
+
98
+ ---
99
+
100
+ ## **Acknowledgements**
101
+
102
+ - Hugging Face [Transformers](https://huggingface.co/docs/transformers) library.
103
+ - Model by [N Donald Jefferson Thabah](https://huggingface.co/jefson08/kha-roberta).
104
+
105
+ ---
106
+
107
+ ## **License**
108
+
109
+ This project is licensed under the MIT License. See the [LICENSE](./LICENSE) file for more details.
110
+
111
+ ---