Surbhipatil commited on
Commit
4c9fb93
·
1 Parent(s): 85bfb11

Training in progress epoch 0

Browse files
Files changed (3) hide show
  1. README.md +15 -17
  2. config.json +0 -1
  3. tf_model.h5 +3 -0
README.md CHANGED
@@ -3,18 +3,22 @@ library_name: transformers
3
  license: mit
4
  base_model: gpt2
5
  tags:
6
- - generated_from_trainer
7
  model-index:
8
- - name: codeparrot-ds
9
  results: []
10
  ---
11
 
12
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
- should probably proofread and complete it, then remove this comment. -->
14
 
15
- # codeparrot-ds
16
 
17
  This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 
 
 
 
18
 
19
  ## Model description
20
 
@@ -33,25 +37,19 @@ More information needed
33
  ### Training hyperparameters
34
 
35
  The following hyperparameters were used during training:
36
- - learning_rate: 0.0005
37
- - train_batch_size: 32
38
- - eval_batch_size: 32
39
- - seed: 42
40
- - gradient_accumulation_steps: 8
41
- - total_train_batch_size: 256
42
- - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
43
- - lr_scheduler_type: cosine
44
- - lr_scheduler_warmup_steps: 1000
45
- - num_epochs: 1
46
- - mixed_precision_training: Native AMP
47
 
48
  ### Training results
49
 
 
 
 
50
 
51
 
52
  ### Framework versions
53
 
54
  - Transformers 4.48.3
55
- - Pytorch 2.5.1+cu124
56
  - Datasets 3.3.2
57
  - Tokenizers 0.21.0
 
3
  license: mit
4
  base_model: gpt2
5
  tags:
6
+ - generated_from_keras_callback
7
  model-index:
8
+ - name: Surbhipatil/codeparrot-ds
9
  results: []
10
  ---
11
 
12
+ <!-- This model card has been generated automatically according to the information Keras had access to. You should
13
+ probably proofread and complete it, then remove this comment. -->
14
 
15
+ # Surbhipatil/codeparrot-ds
16
 
17
  This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
18
+ It achieves the following results on the evaluation set:
19
+ - Train Loss: 8.5497
20
+ - Validation Loss: 6.7307
21
+ - Epoch: 0
22
 
23
  ## Model description
24
 
 
37
  ### Training hyperparameters
38
 
39
  The following hyperparameters were used during training:
40
+ - optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'module': 'transformers.optimization_tf', 'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-05, 'decay_schedule_fn': {'module': 'keras.optimizers.schedules', 'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-05, 'decay_steps': -562, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'registered_name': None}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}, 'registered_name': 'WarmUp'}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}
41
+ - training_precision: mixed_float16
 
 
 
 
 
 
 
 
 
42
 
43
  ### Training results
44
 
45
+ | Train Loss | Validation Loss | Epoch |
46
+ |:----------:|:---------------:|:-----:|
47
+ | 8.5497 | 6.7307 | 0 |
48
 
49
 
50
  ### Framework versions
51
 
52
  - Transformers 4.48.3
53
+ - TensorFlow 2.18.0
54
  - Datasets 3.3.2
55
  - Tokenizers 0.21.0
config.json CHANGED
@@ -32,7 +32,6 @@
32
  "max_length": 50
33
  }
34
  },
35
- "torch_dtype": "float32",
36
  "transformers_version": "4.48.3",
37
  "use_cache": true,
38
  "vocab_size": 50000
 
32
  "max_length": 50
33
  }
34
  },
 
35
  "transformers_version": "4.48.3",
36
  "use_cache": true,
37
  "vocab_size": 50000
tf_model.h5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8ed2e7fc1f75a66149be4271ba1f2d9f0f5c7cc12c3d537af22a8ca3415fa483
3
+ size 497145936