Hitisha
/

orpo-phi3

@@ -14,22 +14,21 @@ model-index:
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/algo-llm/huggingface/runs/12ettnkn)
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/algo-llm/huggingface/runs/12ettnkn)
 # orpo-phi3
 This model is a fine-tuned version of [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 4.1677
-- Rewards/chosen: -0.4133
-- Rewards/rejected: -0.4133
 - Rewards/accuracies: 0.0
 - Rewards/margins: 0.0
-- Logps/rejected: -4.1330
-- Logps/chosen: -4.1330
-- Logits/rejected: 24.1632
-- Logits/chosen: 24.1632
-- Nll Loss: 4.0984
 - Log Odds Ratio: -0.6931
 - Log Odds Chosen: 0.0
@@ -65,7 +64,7 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss | Log Odds Ratio | Log Odds Chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:|:--------------:|:---------------:|
-| 3.1975        | 1.0   | 1    | 4.1677          | -0.4133        | -0.4133          | 0.0                | 0.0             | -4.1330        | -4.1330      | 24.1632         | 24.1632       | 4.0984   | -0.6931        | 0.0             |
 ### Framework versions

 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/algo-llm/huggingface/runs/8cgkk8bv)
 # orpo-phi3
 This model is a fine-tuned version of [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 3.9381
+- Rewards/chosen: -0.4248
+- Rewards/rejected: -0.4248
 - Rewards/accuracies: 0.0
 - Rewards/margins: 0.0
+- Logps/rejected: -4.2478
+- Logps/chosen: -4.2478
+- Logits/rejected: 18.4407
+- Logits/chosen: 18.4407
+- Nll Loss: 3.8688
 - Log Odds Ratio: -0.6931
 - Log Odds Chosen: 0.0
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss | Log Odds Ratio | Log Odds Chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:|:--------------:|:---------------:|
+| 3.0778        | 1.0   | 1    | 3.9381          | -0.4248        | -0.4248          | 0.0                | 0.0             | -4.2478        | -4.2478      | 18.4407         | 18.4407       | 3.8688   | -0.6931        | 0.0             |
 ### Framework versions