---
library_name: transformers
tags:
- generated_from_trainer
model-index:
- name: TBD-LLaMA-2B-Final-Direction-2B
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# TBD-LLaMA-2B-Final-Direction-2B

This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 3.8900

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 16
- total_train_batch_size: 64
- total_eval_batch_size: 4
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 139
- training_steps: 13966

### Training results

| Training Loss | Epoch  | Step  | Validation Loss |
|:-------------:|:------:|:-----:|:---------------:|
| 8.9472        | 0.0143 | 200   | 8.9381          |
| 6.7664        | 0.0286 | 400   | 6.7485          |
| 6.6429        | 0.0430 | 600   | 6.6299          |
| 6.5725        | 0.0573 | 800   | 6.5598          |
| 6.4746        | 0.0716 | 1000  | 6.4666          |
| 6.345         | 0.0859 | 1200  | 6.3290          |
| 6.1452        | 0.1002 | 1400  | 6.1231          |
| 5.9711        | 0.1146 | 1600  | 5.9283          |
| 5.8076        | 0.1289 | 1800  | 5.7896          |
| 5.718         | 0.1432 | 2000  | 5.6944          |
| 5.6422        | 0.1575 | 2200  | 5.6219          |
| 5.5956        | 0.1718 | 2400  | 5.5653          |
| 5.5424        | 0.1862 | 2600  | 5.5163          |
| 5.4527        | 0.2005 | 2800  | 5.4252          |
| 4.7472        | 0.2148 | 3000  | 4.6523          |
| 4.5528        | 0.2291 | 3200  | 4.4846          |
| 4.503         | 0.2434 | 3400  | 4.3817          |
| 4.427         | 0.2578 | 3600  | 4.3165          |
| 4.4322        | 0.2721 | 3800  | 4.2725          |
| 4.3265        | 0.2864 | 4000  | 4.2409          |
| 4.3255        | 0.3007 | 4200  | 4.2157          |
| 4.322         | 0.3150 | 4400  | 4.1930          |
| 4.1982        | 0.3294 | 4600  | 4.1759          |
| 4.2197        | 0.3437 | 4800  | 4.1609          |
| 4.2109        | 0.3580 | 5000  | 4.1478          |
| 4.1553        | 0.3723 | 5200  | 4.1329          |
| 4.169         | 0.3866 | 5400  | 4.1215          |
| 4.2068        | 0.4010 | 5600  | 4.1093          |
| 4.182         | 0.4153 | 5800  | 4.0969          |
| 4.2148        | 0.4296 | 6000  | 4.0841          |
| 4.0511        | 0.4439 | 6200  | 4.0716          |
| 4.0997        | 0.4582 | 6400  | 4.0592          |
| 4.0322        | 0.4726 | 6600  | 4.0488          |
| 3.9972        | 0.4869 | 6800  | 4.0372          |
| 4.0335        | 0.5012 | 7000  | 4.0258          |
| 4.0742        | 0.5155 | 7200  | 4.0168          |
| 4.003         | 0.5298 | 7400  | 4.0082          |
| 4.0007        | 0.5442 | 7600  | 3.9992          |
| 4.1114        | 0.5585 | 7800  | 3.9898          |
| 3.8742        | 0.5728 | 8000  | 3.9831          |
| 4.0346        | 0.5871 | 8200  | 3.9765          |
| 3.8871        | 0.6014 | 8400  | 3.9686          |
| 3.9689        | 0.6158 | 8600  | 3.9626          |
| 4.0003        | 0.6301 | 8800  | 3.9580          |
| 4.0529        | 0.6444 | 9000  | 3.9496          |
| 3.9973        | 0.6587 | 9200  | 3.9456          |
| 4.0418        | 0.6730 | 9400  | 3.9409          |
| 4.0237        | 0.6874 | 9600  | 3.9355          |
| 3.9256        | 0.7017 | 9800  | 3.9299          |
| 3.8549        | 0.7160 | 10000 | 3.9249          |
| 3.9872        | 0.7303 | 10200 | 3.9215          |
| 3.9918        | 0.7446 | 10400 | 3.9180          |
| 4.0075        | 0.7590 | 10600 | 3.9137          |
| 3.9235        | 0.7733 | 10800 | 3.9107          |
| 3.9416        | 0.7876 | 11000 | 3.9069          |
| 3.9939        | 0.8019 | 11200 | 3.9053          |
| 4.0625        | 0.8162 | 11400 | 3.9030          |
| 3.9773        | 0.8306 | 11600 | 3.9010          |
| 3.8279        | 0.8449 | 11800 | 3.8990          |
| 3.8631        | 0.8592 | 12000 | 3.8970          |
| 3.8593        | 0.8735 | 12200 | 3.8953          |
| 3.9531        | 0.8878 | 12400 | 3.8938          |
| 3.8922        | 0.9022 | 12600 | 3.8927          |
| 3.9151        | 0.9165 | 12800 | 3.8917          |
| 3.9119        | 0.9308 | 13000 | 3.8910          |
| 3.9261        | 0.9451 | 13200 | 3.8905          |
| 3.9169        | 0.9594 | 13400 | 3.8903          |
| 3.8439        | 0.9738 | 13600 | 3.8900          |
| 3.8795        | 0.9881 | 13800 | 3.8900          |


### Framework versions

- Transformers 4.56.1
- Pytorch 2.8.0a0+5228986c39.nv25.05
- Datasets 4.0.0
- Tokenizers 0.22.0