raygx commited on
Commit
3cca050
·
1 Parent(s): deb3194

gpt2NepaliCasualLM;Epoch 3

Browse files
Files changed (4) hide show
  1. README.md +13 -50
  2. config.json +3 -2
  3. generation_config.json +3 -0
  4. tf_model.h5 +1 -1
README.md CHANGED
@@ -1,21 +1,9 @@
1
  ---
 
 
2
  model-index:
3
  - name: GPT2-Nepali-Casual-LM
4
  results: []
5
- datasets:
6
- - raygx/NepaliCorpus
7
- language:
8
- - ne
9
- pipeline_tag: text-generation
10
- license: gpl
11
- metrics:
12
- - accuracy
13
- library_name: transformers
14
- tags:
15
- - text
16
- - nepali
17
- - nepali text generation
18
- - gpt2 for nepali
19
  ---
20
 
21
  <!-- This model card has been generated automatically according to the information Keras had access to. You should
@@ -23,62 +11,37 @@ probably proofread and complete it, then remove this comment. -->
23
 
24
  # GPT2-Nepali-Casual-LM
25
 
26
- This model was trained from scratch on raygx/NepaliCorpus dataset.
27
- This is a Casual LM model for Nepali Language.
28
 
29
 
30
- <!-- ## Model description
31
 
32
- More information needed -->
33
 
34
  ## Intended uses & limitations
35
 
36
- This can be used as a pretrained model for fine-tuning in other purpose within the domain of Nepali Language.
37
 
38
  ## Training and evaluation data
39
 
40
- Training dataset can be found in hugging face as raygx/NepaliCorpus. This is a combined data from multiple sources just for training this model.
41
- [In this kaggle notebook](https://www.kaggle.com/code/reganmaharjan/tokenizer-nepcov19tweets) you'll find the actual sources of the data.
42
 
43
  ## Training procedure
44
 
45
- I used kaggle to train the model. Since, there is limitation to session time and GPU usage as well.
46
- I have to train the model in multiple batches of data and training sessions.
47
- Other than that, I followed the hugging face course.
48
-
49
  ### Training hyperparameters
50
 
51
- Same hyperparameters as suggested in hugging face courses.
 
 
52
 
53
  ### Training results
54
 
55
- Calling Pipeline with following inputs: <br>
56
- model_pipeline(["बिहीबार सिंगापुरदेखि न्यूयोर्कसम्म",<br>
57
- "अधिकांस दोस्रो रोजाइका खेलाडी",<br>
58
- "पहिलो हाफमा गरेको ",<br>
59
- "कालीमाटी फलफूल तथा तरकारी ",<br>
60
- "हिल्सा नाका हुँदै यस वर्ष ९ हजार",<br>
61
- "ओलीको सरकार बनेपछि इन्धनको",<br>
62
- "मेरो नाम श्याम हो"])
63
- <br><br>
64
- Gave following outputs:<br>
65
- [[{'generated_text': 'बिहीबार सिंगापुरदेखि न्यूयोर्कसम्म पनि छन् । तर, यो खबर आजको कान्तिपुर दैनिकमा छ । यो खबर आजको'}],<br>
66
- [{'generated_text': 'अधिकांस दोस्रो रोजाइका खेलाडी हुन् । उनी भन्छन्, ‘ यो कुरा हो । ’ उनले भने, ‘'}],<br>
67
- [{'generated_text': 'पहिलो हाफमा गरेको थियो । तर, यो खबर आजको कान्तिपुर दैनिकमा छ । यो खबर आजको कान्तिपुर दैनिकमा छ'}],<br>
68
- [{'generated_text': 'कालीमाटी फलफूल तथा तरकारी खेती गर्न सकिने व्यवस्था गरिएको छ । काठमाडौं । नेपाल राष्ट्र बैंकले गत आर्थिक वर्षमा १'}],<br>
69
- [{'generated_text': 'हिल्सा नाका हुँदै यस वर्ष ९ हजार ७ सय ७ ० रुपैयाँ बराबरको शेयर कारोबार भएको छ । यो अवधिमा'}],<br>
70
- [{'generated_text': 'ओलीको सरकार बनेपछि इन्धनको मूल्य निर्धारण गर्न नसकेको हो । तर, अहिले पनि यो मूल्य निर्धारण गर्न नसकेको हो'}],<br>
71
- [{'generated_text': ' मेरो नाम श्याम हो । यो नाम श्याम हो । यो नाम श्याम हो । यो नाम श्याम हो ।}]]<br>
72
-
73
- This is quite good result. In my opinion.
74
- Since the dataset used was mostly crawled from news portals, I think the model quite caught the gist of news.
75
- As it can be noted for all the inputs except the last one.<br>
76
- For the last input "मेरो नाम श्याम हो" model was able to know that it was end of a sentence and it added "।",
77
- and it also seem to know what "श्याम" is, but it wasn't able to generate any new sentence and kept repeating it.
78
 
79
  ### Framework versions
80
 
81
  - Transformers 4.28.1
82
  - TensorFlow 2.11.0
83
  - Datasets 2.1.0
84
- - Tokenizers 0.13.3
 
1
  ---
2
+ tags:
3
+ - generated_from_keras_callback
4
  model-index:
5
  - name: GPT2-Nepali-Casual-LM
6
  results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  ---
8
 
9
  <!-- This model card has been generated automatically according to the information Keras had access to. You should
 
11
 
12
  # GPT2-Nepali-Casual-LM
13
 
14
+ This model was trained from scratch on an unknown dataset.
15
+ It achieves the following results on the evaluation set:
16
 
17
 
18
+ ## Model description
19
 
20
+ More information needed
21
 
22
  ## Intended uses & limitations
23
 
24
+ More information needed
25
 
26
  ## Training and evaluation data
27
 
28
+ More information needed
 
29
 
30
  ## Training procedure
31
 
 
 
 
 
32
  ### Training hyperparameters
33
 
34
+ The following hyperparameters were used during training:
35
+ - optimizer: None
36
+ - training_precision: float32
37
 
38
  ### Training results
39
 
40
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
  ### Framework versions
43
 
44
  - Transformers 4.28.1
45
  - TensorFlow 2.11.0
46
  - Datasets 2.1.0
47
+ - Tokenizers 0.13.3
config.json CHANGED
@@ -6,9 +6,9 @@
6
  "GPT2LMHeadModel"
7
  ],
8
  "attn_pdrop": 0.1,
9
- "bos_token_id": null,
10
  "embd_pdrop": 0.1,
11
- "eos_token_id": null,
12
  "id2label": {
13
  "0": "NEUTRAL",
14
  "1": "POSITIVE",
@@ -28,6 +28,7 @@
28
  "n_inner": null,
29
  "n_layer": 6,
30
  "n_positions": 1024,
 
31
  "reorder_and_upcast_attn": false,
32
  "resid_pdrop": 0.1,
33
  "scale_attn_by_inverse_layer_idx": false,
 
6
  "GPT2LMHeadModel"
7
  ],
8
  "attn_pdrop": 0.1,
9
+ "bos_token_id": 1,
10
  "embd_pdrop": 0.1,
11
+ "eos_token_id": 2,
12
  "id2label": {
13
  "0": "NEUTRAL",
14
  "1": "POSITIVE",
 
28
  "n_inner": null,
29
  "n_layer": 6,
30
  "n_positions": 1024,
31
+ "pad_token_id": 3,
32
  "reorder_and_upcast_attn": false,
33
  "resid_pdrop": 0.1,
34
  "scale_attn_by_inverse_layer_idx": false,
generation_config.json CHANGED
@@ -1,4 +1,7 @@
1
  {
2
  "_from_model_config": true,
 
 
 
3
  "transformers_version": "4.28.1"
4
  }
 
1
  {
2
  "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "pad_token_id": 3,
6
  "transformers_version": "4.28.1"
7
  }
tf_model.h5 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:dbb9678235885dc3bcbdb14c2779d74c6611b067eacec4b36f8f89b0d19bca8d
3
  size 326955968
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:09dd449b2776c96a5de70924d183f332ca28f8ace94cc3d7aa1d7c362d146a4a
3
  size 326955968