gpt2NepaliCasualLM;Epoch 3

Browse files

Files changed (4) hide show

README.md +13 -50
config.json +3 -2
generation_config.json +3 -0
tf_model.h5 +1 -1

README.md CHANGED Viewed

@@ -1,21 +1,9 @@
 ---
 model-index:
 - name: GPT2-Nepali-Casual-LM
   results: []
-datasets:
-- raygx/NepaliCorpus
-language:
-- ne
-pipeline_tag: text-generation
-license: gpl
-metrics:
-- accuracy
-library_name: transformers
-tags:
-- text
-- nepali
-- nepali text generation
-- gpt2 for nepali
 ---
 <!-- This model card has been generated automatically according to the information Keras had access to. You should
@@ -23,62 +11,37 @@ probably proofread and complete it, then remove this comment. -->
 # GPT2-Nepali-Casual-LM
-This model was trained from scratch on raygx/NepaliCorpus dataset.
-This is a Casual LM model for Nepali Language.
-<!-- ## Model description
-More information needed -->
 ## Intended uses & limitations
-This can be used as a pretrained model for fine-tuning in other purpose within the domain of Nepali Language.
 ## Training and evaluation data
-Training dataset can be found in hugging face as raygx/NepaliCorpus. This is a combined data from multiple sources just for training this model.
-[In this kaggle notebook](https://www.kaggle.com/code/reganmaharjan/tokenizer-nepcov19tweets) you'll find the actual sources of the data.
 ## Training procedure
-I used kaggle to train the model. Since, there is limitation to session time and GPU usage as well.
-I have to train the model in multiple batches of data and training sessions.
-Other than that, I followed the hugging face course.
 ### Training hyperparameters
-Same hyperparameters as suggested in hugging face courses.
 ### Training results
-Calling Pipeline with following inputs: <br>
-model_pipeline(["बिहीबार सिंगापुरदेखि न्यूयोर्कसम्म",<br>
-                "अधिकांस दोस्रो रोजाइका खेलाडी",<br>
-                "पहिलो हाफमा गरेको ",<br>
-               "कालीमाटी फलफूल तथा तरकारी ",<br>
-               "हिल्सा नाका हुँदै यस वर्ष ९ हजार",<br>
-               "ओलीको सरकार बनेपछि इन्धनको",<br>
-               "मेरो नाम श्याम हो"])
-<br><br>
-Gave following outputs:<br>
-[[{'generated_text': 'बिहीबार सिंगापुरदेखि न्यूयोर्कसम्म पनि छन् । तर, यो खबर आजको कान्तिपुर दैनिकमा छ । यो खबर आजको'}],<br>
- [{'generated_text': 'अधिकांस दोस्रो रोजाइका खेलाडी हुन् । उनी भन्छन्, ‘ यो कुरा हो । ’ उनले भने, ‘'}],<br>
- [{'generated_text': 'पहिलो हाफमा गरेको  थियो । तर, यो खबर आजको कान्तिपुर दैनिकमा छ । यो खबर आजको कान्तिपुर दैनिकमा छ'}],<br>
- [{'generated_text': 'कालीमाटी फलफूल तथा तरकारी  खेती गर्न सकिने व्यवस्था गरिएको छ । काठमाडौं । नेपाल राष्ट्र बैंकले गत आर्थिक वर्षमा १'}],<br>
- [{'generated_text': 'हिल्सा नाका हुँदै यस वर्ष ९ हजार ७ सय ७ ० रुपैयाँ बराबरको शेयर कारोबार भएको छ । यो अवधिमा'}],<br>
- [{'generated_text': 'ओलीको सरकार बनेपछि इन्धनको मूल्य निर्धारण गर्न नसकेको हो । तर, अहिले पनि यो मूल्य निर्धारण गर्न नसकेको हो'}],<br>
- [{'generated_text': ' मेरो नाम श्याम हो । यो नाम श्याम हो । यो नाम श्याम हो । यो नाम श्याम हो ।}]]<br>
- This is quite good result. In my opinion.
- Since the dataset used was mostly crawled from news portals, I think the model quite caught the gist of news.
- As it can be noted for all the inputs except the last one.<br>
- For the last input "मेरो नाम श्याम हो" model was able to know that it was end of a sentence and it added "।",
- and it also seem to know what "श्याम" is, but it wasn't able to generate any new sentence and kept repeating it.
 ### Framework versions
 - Transformers 4.28.1
 - TensorFlow 2.11.0
 - Datasets 2.1.0
-- Tokenizers 0.13.3

 ---
+tags:
+- generated_from_keras_callback
 model-index:
 - name: GPT2-Nepali-Casual-LM
   results: []
 ---
 <!-- This model card has been generated automatically according to the information Keras had access to. You should
 # GPT2-Nepali-Casual-LM
+This model was trained from scratch on an unknown dataset.
+It achieves the following results on the evaluation set:
+## Model description
+More information needed
 ## Intended uses & limitations
+More information needed
 ## Training and evaluation data
+More information needed
 ## Training procedure
 ### Training hyperparameters
+The following hyperparameters were used during training:
+- optimizer: None
+- training_precision: float32
 ### Training results
 ### Framework versions
 - Transformers 4.28.1
 - TensorFlow 2.11.0
 - Datasets 2.1.0
+- Tokenizers 0.13.3

config.json CHANGED Viewed

@@ -6,9 +6,9 @@
     "GPT2LMHeadModel"
   ],
   "attn_pdrop": 0.1,
-  "bos_token_id": null,
   "embd_pdrop": 0.1,
-  "eos_token_id": null,
   "id2label": {
     "0": "NEUTRAL",
     "1": "POSITIVE",
@@ -28,6 +28,7 @@
   "n_inner": null,
   "n_layer": 6,
   "n_positions": 1024,
   "reorder_and_upcast_attn": false,
   "resid_pdrop": 0.1,
   "scale_attn_by_inverse_layer_idx": false,

     "GPT2LMHeadModel"
   ],
   "attn_pdrop": 0.1,
+  "bos_token_id": 1,
   "embd_pdrop": 0.1,
+  "eos_token_id": 2,
   "id2label": {
     "0": "NEUTRAL",
     "1": "POSITIVE",
   "n_inner": null,
   "n_layer": 6,
   "n_positions": 1024,
+  "pad_token_id": 3,
   "reorder_and_upcast_attn": false,
   "resid_pdrop": 0.1,
   "scale_attn_by_inverse_layer_idx": false,

generation_config.json CHANGED Viewed

@@ -1,4 +1,7 @@
 {
   "_from_model_config": true,
   "transformers_version": "4.28.1"
 }

 {
   "_from_model_config": true,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "pad_token_id": 3,
   "transformers_version": "4.28.1"
 }

tf_model.h5 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:dbb9678235885dc3bcbdb14c2779d74c6611b067eacec4b36f8f89b0d19bca8d
 size 326955968

 version https://git-lfs.github.com/spec/v1
+oid sha256:09dd449b2776c96a5de70924d183f332ca28f8ace94cc3d7aa1d7c362d146a4a
 size 326955968