Hitisha
/

orpo-phi3

Generated from Trainer

4-bit precision

Model card Files Files and versions

Metrics Training metrics Community

Hitisha commited on Jul 16, 2024

Commit

3f58db4

·

verified ·

1 Parent(s): 84c851a

Model save

Files changed (1) hide show

README.md +25 -3

README.md CHANGED Viewed

@@ -14,9 +14,24 @@ model-index:
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
 # orpo-phi3
 This model is a fine-tuned version of [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) on the None dataset.
 ## Model description
@@ -36,16 +51,23 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 8e-06
-- train_batch_size: 2
-- eval_batch_size: 2
 - seed: 42
 - gradient_accumulation_steps: 4
-- total_train_batch_size: 8
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 10
 - num_epochs: 1
 ### Framework versions
 - PEFT 0.11.1

 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/algo-llm/huggingface/runs/12ettnkn)
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/algo-llm/huggingface/runs/12ettnkn)
 # orpo-phi3
 This model is a fine-tuned version of [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) on the None dataset.
+It achieves the following results on the evaluation set:
+- Loss: 4.1677
+- Rewards/chosen: -0.4133
+- Rewards/rejected: -0.4133
+- Rewards/accuracies: 0.0
+- Rewards/margins: 0.0
+- Logps/rejected: -4.1330
+- Logps/chosen: -4.1330
+- Logits/rejected: 24.1632
+- Logits/chosen: 24.1632
+- Nll Loss: 4.0984
+- Log Odds Ratio: -0.6931
+- Log Odds Chosen: 0.0
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 8e-06
+- train_batch_size: 4
+- eval_batch_size: 4
 - seed: 42
 - gradient_accumulation_steps: 4
+- total_train_batch_size: 16
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 10
 - num_epochs: 1
+### Training results
+| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss | Log Odds Ratio | Log Odds Chosen |
+|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:|:--------------:|:---------------:|
+| 3.1975        | 1.0   | 1    | 4.1677          | -0.4133        | -0.4133          | 0.0                | 0.0             | -4.1330        | -4.1330      | 24.1632         | 24.1632       | 4.0984   | -0.6931        | 0.0             |
 ### Framework versions
 - PEFT 0.11.1