Hitisha
/

orpo-phi3

@@ -14,21 +14,21 @@ model-index:
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/algo-llm/huggingface/runs/8cgkk8bv)
 # orpo-phi3
 This model is a fine-tuned version of [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 3.9381
-- Rewards/chosen: -0.4248
-- Rewards/rejected: -0.4248
 - Rewards/accuracies: 0.0
 - Rewards/margins: 0.0
-- Logps/rejected: -4.2478
-- Logps/chosen: -4.2478
-- Logits/rejected: 18.4407
-- Logits/chosen: 18.4407
-- Nll Loss: 3.8688
 - Log Odds Ratio: -0.6931
 - Log Odds Chosen: 0.0
@@ -64,7 +64,7 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss | Log Odds Ratio | Log Odds Chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:|:--------------:|:---------------:|
-| 3.0778        | 1.0   | 1    | 3.9381          | -0.4248        | -0.4248          | 0.0                | 0.0             | -4.2478        | -4.2478      | 18.4407         | 18.4407       | 3.8688   | -0.6931        | 0.0             |
 ### Framework versions

 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/algo-llm/huggingface/runs/g44k7xr4)
 # orpo-phi3
 This model is a fine-tuned version of [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 5.4687
+- Rewards/chosen: -0.5200
+- Rewards/rejected: -0.5200
 - Rewards/accuracies: 0.0
 - Rewards/margins: 0.0
+- Logps/rejected: -5.2002
+- Logps/chosen: -5.2002
+- Logits/rejected: 23.0851
+- Logits/chosen: 23.0851
+- Nll Loss: 5.3994
 - Log Odds Ratio: -0.6931
 - Log Odds Chosen: 0.0
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss | Log Odds Ratio | Log Odds Chosen |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:|:--------------:|:---------------:|
+| 2.9122        | 1.0   | 1    | 5.4687          | -0.5200        | -0.5200          | 0.0                | 0.0             | -5.2002        | -5.2002      | 23.0851         | 23.0851       | 5.3994   | -0.6931        | 0.0             |
 ### Framework versions