IParraMartin
/

impossible-llms-english-random-fourgram

+---
+library_name: transformers
+tags:
+- generated_from_trainer
+model-index:
+- name: impossible-llms-english-random-fourgram
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# impossible-llms-english-random-fourgram
+This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 4.4904
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0001
+- train_batch_size: 12
+- eval_batch_size: 8
+- seed: 0
+- distributed_type: multi-GPU
+- num_devices: 4
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 384
+- total_eval_batch_size: 32
+- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.1
+- training_steps: 3000
+- mixed_precision_training: Native AMP
+- label_smoothing_factor: 0.1
+### Training results
+| Training Loss | Epoch   | Step | Validation Loss |
+|:-------------:|:-------:|:----:|:---------------:|
+| 21.2318       | 1.0     | 96   | 7.0318          |
+| 17.3066       | 2.0     | 192  | 5.8019          |
+| 16.9078       | 3.0     | 288  | 5.6130          |
+| 16.3046       | 4.0     | 384  | 5.3811          |
+| 15.7736       | 5.0     | 480  | 5.1867          |
+| 15.2597       | 6.0     | 576  | 5.0483          |
+| 14.9974       | 7.0     | 672  | 4.9447          |
+| 14.6845       | 8.0     | 768  | 4.8589          |
+| 14.5313       | 9.0     | 864  | 4.7952          |
+| 14.425        | 10.0    | 960  | 4.7419          |
+| 14.09         | 11.0    | 1056 | 4.6954          |
+| 13.959        | 12.0    | 1152 | 4.6586          |
+| 13.9513       | 13.0    | 1248 | 4.6308          |
+| 13.7675       | 14.0    | 1344 | 4.6051          |
+| 13.6601       | 15.0    | 1440 | 4.5844          |
+| 13.5687       | 16.0    | 1536 | 4.5667          |
+| 13.5257       | 17.0    | 1632 | 4.5534          |
+| 13.4789       | 18.0    | 1728 | 4.5398          |
+| 13.4417       | 19.0    | 1824 | 4.5290          |
+| 13.3908       | 20.0    | 1920 | 4.5210          |
+| 13.307        | 21.0    | 2016 | 4.5132          |
+| 13.3016       | 22.0    | 2112 | 4.5081          |
+| 13.2893       | 23.0    | 2208 | 4.5023          |
+| 13.2032       | 24.0    | 2304 | 4.4990          |
+| 13.1012       | 25.0    | 2400 | 4.4962          |
+| 13.1562       | 26.0    | 2496 | 4.4939          |
+| 12.9843       | 27.0    | 2592 | 4.4923          |
+| 13.0885       | 28.0    | 2688 | 4.4913          |
+| 13.0813       | 29.0    | 2784 | 4.4908          |
+| 13.1086       | 30.0    | 2880 | 4.4904          |
+| 13.1765       | 31.0    | 2976 | 4.4904          |
+| 34.9023       | 31.2516 | 3000 | 4.4904          |
+### Framework versions
+- Transformers 4.49.0
+- Pytorch 2.4.0+cu121
+- Datasets 3.4.0
+- Tokenizers 0.21.0

generation_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 0,
+  "eos_token_id": 0,
+  "transformers_version": "4.49.0"
+}