IParraMartin
/

impossible-llms-english-random

+---
+library_name: transformers
+tags:
+- generated_from_trainer
+model-index:
+- name: impossible-llms-english-random
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# impossible-llms-english-random
+This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 5.0462
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0001
+- train_batch_size: 12
+- eval_batch_size: 8
+- seed: 0
+- distributed_type: multi-GPU
+- num_devices: 4
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 384
+- total_eval_batch_size: 32
+- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.1
+- training_steps: 3000
+- mixed_precision_training: Native AMP
+- label_smoothing_factor: 0.1
+### Training results
+| Training Loss | Epoch   | Step | Validation Loss |
+|:-------------:|:-------:|:----:|:---------------:|
+| 35.913        | 1.0     | 95   | 7.1506          |
+| 30.1513       | 2.0     | 190  | 5.9945          |
+| 29.2184       | 3.0     | 285  | 5.8395          |
+| 28.5279       | 4.0     | 380  | 5.6620          |
+| 27.8359       | 5.0     | 475  | 5.5429          |
+| 27.5482       | 6.0     | 570  | 5.4532          |
+| 27.0829       | 7.0     | 665  | 5.3803          |
+| 26.7397       | 8.0     | 760  | 5.3227          |
+| 26.4572       | 9.0     | 855  | 5.2749          |
+| 26.2057       | 10.0    | 950  | 5.2360          |
+| 25.9724       | 11.0    | 1045 | 5.2010          |
+| 25.7457       | 12.0    | 1140 | 5.1755          |
+| 25.7047       | 13.0    | 1235 | 5.1526          |
+| 25.5117       | 14.0    | 1330 | 5.1328          |
+| 25.3094       | 15.0    | 1425 | 5.1168          |
+| 25.0625       | 16.0    | 1520 | 5.1017          |
+| 24.9048       | 17.0    | 1615 | 5.0899          |
+| 25.1186       | 18.0    | 1710 | 5.0804          |
+| 25.0563       | 19.0    | 1805 | 5.0721          |
+| 24.8198       | 20.0    | 1900 | 5.0669          |
+| 24.7689       | 21.0    | 1995 | 5.0611          |
+| 24.8698       | 22.0    | 2090 | 5.0565          |
+| 24.5199       | 23.0    | 2185 | 5.0543          |
+| 24.8015       | 24.0    | 2280 | 5.0501          |
+| 24.4517       | 25.0    | 2375 | 5.0494          |
+| 24.5355       | 26.0    | 2470 | 5.0486          |
+| 24.5157       | 27.0    | 2565 | 5.0473          |
+| 24.6138       | 28.0    | 2660 | 5.0470          |
+| 24.4382       | 29.0    | 2755 | 5.0465          |
+| 24.4547       | 30.0    | 2850 | 5.0463          |
+| 24.4558       | 31.0    | 2945 | 5.0462          |
+| 39.0136       | 31.5812 | 3000 | 5.0462          |
+### Framework versions
+- Transformers 4.49.0
+- Pytorch 2.4.0+cu121
+- Datasets 3.4.0
+- Tokenizers 0.21.0

generation_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 0,
+  "eos_token_id": 0,
+  "transformers_version": "4.49.0"
+}