• What is this? Nothing interesting, just an experiment.
  • License: CC-BY-NC
|                         Task                         |Version|    Metric    |Value |   |Stderr|
|------------------------------------------------------|------:|--------------|-----:|---|-----:|
|all                                                   |       |acc           |0.6502|Β±  |0.0327|
|                                                      |       |acc_norm      |0.6414|Β±  |0.0095|
|                                                      |       |truthfulqa_mc1|0.3696|Β±  |0.0169|
|                                                      |       |truthfulqa_mc2|0.5305|Β±  |0.0159|
|                                                      |       |qem           |0.4670|Β±  |0.0137|
|leaderboard:arc:challenge:25                          |      0|acc           |0.5555|Β±  |0.0145|
|                                                      |       |acc_norm      |0.5623|Β±  |0.0145|
|leaderboard:gsm8k:5                                   |      0|qem           |0.4670|Β±  |0.0137|
|leaderboard:hellaswag:10                              |      0|acc           |0.5598|Β±  |0.0050|
|                                                      |       |acc_norm      |0.7205|Β±  |0.0045|
|leaderboard:mmlu:_average:5                           |       |acc           |0.6527|Β±  |0.0338|
|leaderboard:mmlu:abstract_algebra:5                   |      0|acc           |0.3300|Β±  |0.0473|
|leaderboard:mmlu:anatomy:5                            |      0|acc           |0.6593|Β±  |0.0409|
|leaderboard:mmlu:astronomy:5                          |      0|acc           |0.7303|Β±  |0.0361|
|leaderboard:mmlu:business_ethics:5                    |      0|acc           |0.6700|Β±  |0.0473|
|leaderboard:mmlu:clinical_knowledge:5                 |      0|acc           |0.7321|Β±  |0.0273|
|leaderboard:mmlu:college_biology:5                    |      0|acc           |0.7708|Β±  |0.0351|
|leaderboard:mmlu:college_chemistry:5                  |      0|acc           |0.4900|Β±  |0.0502|
|leaderboard:mmlu:college_computer_science:5           |      0|acc           |0.4600|Β±  |0.0501|
|leaderboard:mmlu:college_mathematics:5                |      0|acc           |0.3900|Β±  |0.0490|
|leaderboard:mmlu:college_medicine:5                   |      0|acc           |0.6069|Β±  |0.0372|
|leaderboard:mmlu:college_physics:5                    |      0|acc           |0.4706|Β±  |0.0497|
|leaderboard:mmlu:computer_security:5                  |      0|acc           |0.7800|Β±  |0.0416|
|leaderboard:mmlu:conceptual_physics:5                 |      0|acc           |0.5830|Β±  |0.0322|
|leaderboard:mmlu:econometrics:5                       |      0|acc           |0.5000|Β±  |0.0470|
|leaderboard:mmlu:electrical_engineering:5             |      0|acc           |0.5862|Β±  |0.0410|
|leaderboard:mmlu:elementary_mathematics:5             |      0|acc           |0.4630|Β±  |0.0257|
|leaderboard:mmlu:formal_logic:5                       |      0|acc           |0.5238|Β±  |0.0447|
|leaderboard:mmlu:global_facts:5                       |      0|acc           |0.4300|Β±  |0.0498|
|leaderboard:mmlu:high_school_biology:5                |      0|acc           |0.7581|Β±  |0.0244|
|leaderboard:mmlu:high_school_chemistry:5              |      0|acc           |0.5271|Β±  |0.0351|
|leaderboard:mmlu:high_school_computer_science:5       |      0|acc           |0.6600|Β±  |0.0476|
|leaderboard:mmlu:high_school_european_history:5       |      0|acc           |0.7212|Β±  |0.0350|
|leaderboard:mmlu:high_school_geography:5              |      0|acc           |0.7929|Β±  |0.0289|
|leaderboard:mmlu:high_school_government_and_politics:5|      0|acc           |0.8756|Β±  |0.0238|
|leaderboard:mmlu:high_school_macroeconomics:5         |      0|acc           |0.6590|Β±  |0.0240|
|leaderboard:mmlu:high_school_mathematics:5            |      0|acc           |0.3407|Β±  |0.0289|
|leaderboard:mmlu:high_school_microeconomics:5         |      0|acc           |0.7563|Β±  |0.0279|
|leaderboard:mmlu:high_school_physics:5                |      0|acc           |0.4503|Β±  |0.0406|
|leaderboard:mmlu:high_school_psychology:5             |      0|acc           |0.8294|Β±  |0.0161|
|leaderboard:mmlu:high_school_statistics:5             |      0|acc           |0.4954|Β±  |0.0341|
|leaderboard:mmlu:high_school_us_history:5             |      0|acc           |0.8039|Β±  |0.0279|
|leaderboard:mmlu:high_school_world_history:5          |      0|acc           |0.8186|Β±  |0.0251|
|leaderboard:mmlu:human_aging:5                        |      0|acc           |0.6951|Β±  |0.0309|
|leaderboard:mmlu:human_sexuality:5                    |      0|acc           |0.7863|Β±  |0.0360|
|leaderboard:mmlu:international_law:5                  |      0|acc           |0.8017|Β±  |0.0364|
|leaderboard:mmlu:jurisprudence:5                      |      0|acc           |0.8056|Β±  |0.0383|
|leaderboard:mmlu:logical_fallacies:5                  |      0|acc           |0.7362|Β±  |0.0346|
|leaderboard:mmlu:machine_learning:5                   |      0|acc           |0.4911|Β±  |0.0475|
|leaderboard:mmlu:management:5                         |      0|acc           |0.8252|Β±  |0.0376|
|leaderboard:mmlu:marketing:5                          |      0|acc           |0.8718|Β±  |0.0219|
|leaderboard:mmlu:medical_genetics:5                   |      0|acc           |0.6900|Β±  |0.0465|
|leaderboard:mmlu:miscellaneous:5                      |      0|acc           |0.8225|Β±  |0.0137|
|leaderboard:mmlu:moral_disputes:5                     |      0|acc           |0.7052|Β±  |0.0245|
|leaderboard:mmlu:moral_scenarios:5                    |      0|acc           |0.4190|Β±  |0.0165|
|leaderboard:mmlu:nutrition:5                          |      0|acc           |0.7353|Β±  |0.0253|
|leaderboard:mmlu:philosophy:5                         |      0|acc           |0.7203|Β±  |0.0255|
|leaderboard:mmlu:prehistory:5                         |      0|acc           |0.6975|Β±  |0.0256|
|leaderboard:mmlu:professional_accounting:5            |      0|acc           |0.5035|Β±  |0.0298|
|leaderboard:mmlu:professional_law:5                   |      0|acc           |0.4576|Β±  |0.0127|
|leaderboard:mmlu:professional_medicine:5              |      0|acc           |0.7132|Β±  |0.0275|
|leaderboard:mmlu:professional_psychology:5            |      0|acc           |0.6879|Β±  |0.0187|
|leaderboard:mmlu:public_relations:5                   |      0|acc           |0.6545|Β±  |0.0455|
|leaderboard:mmlu:security_studies:5                   |      0|acc           |0.7388|Β±  |0.0281|
|leaderboard:mmlu:sociology:5                          |      0|acc           |0.8159|Β±  |0.0274|
|leaderboard:mmlu:us_foreign_policy:5                  |      0|acc           |0.8500|Β±  |0.0359|
|leaderboard:mmlu:virology:5                           |      0|acc           |0.5000|Β±  |0.0389|
|leaderboard:mmlu:world_religions:5                    |      0|acc           |0.8129|Β±  |0.0299|
|leaderboard:truthfulqa:mc:0                           |      0|truthfulqa_mc1|0.3696|Β±  |0.0169|
|                                                      |       |truthfulqa_mc2|0.5305|Β±  |0.0159|
|leaderboard:winogrande:5                              |      0|acc           |0.6938|Β±  |0.0130|

Baseline:

|                         Task                         |Version|    Metric    |Value |   |Stderr|
|------------------------------------------------------|------:|--------------|-----:|---|-----:|
|all                                                   |       |acc           |0.6635|Β±  |0.0322|
|                                                      |       |acc_norm      |0.6569|Β±  |0.0094|
|                                                      |       |truthfulqa_mc1|0.3745|Β±  |0.0169|
|                                                      |       |truthfulqa_mc2|0.5338|Β±  |0.0160|
|                                                      |       |qem           |0.6808|Β±  |0.0128|
|leaderboard:arc:challenge:25                          |      0|acc           |0.5742|Β±  |0.0144|
|                                                      |       |acc_norm      |0.5828|Β±  |0.0144|
|leaderboard:gsm8k:5                                   |      0|qem           |0.6808|Β±  |0.0128|
|leaderboard:hellaswag:10                              |      0|acc           |0.5707|Β±  |0.0049|
|                                                      |       |acc_norm      |0.7310|Β±  |0.0044|
|leaderboard:mmlu:_average:5                           |       |acc           |0.6662|Β±  |0.0333|
|leaderboard:mmlu:abstract_algebra:5                   |      0|acc           |0.3300|Β±  |0.0473|
|leaderboard:mmlu:anatomy:5                            |      0|acc           |0.6815|Β±  |0.0402|
|leaderboard:mmlu:astronomy:5                          |      0|acc           |0.7500|Β±  |0.0352|
|leaderboard:mmlu:business_ethics:5                    |      0|acc           |0.7000|Β±  |0.0461|
|leaderboard:mmlu:clinical_knowledge:5                 |      0|acc           |0.7472|Β±  |0.0267|
|leaderboard:mmlu:college_biology:5                    |      0|acc           |0.7917|Β±  |0.0340|
|leaderboard:mmlu:college_chemistry:5                  |      0|acc           |0.4500|Β±  |0.0500|
|leaderboard:mmlu:college_computer_science:5           |      0|acc           |0.5200|Β±  |0.0502|
|leaderboard:mmlu:college_mathematics:5                |      0|acc           |0.3900|Β±  |0.0490|
|leaderboard:mmlu:college_medicine:5                   |      0|acc           |0.6590|Β±  |0.0361|
|leaderboard:mmlu:college_physics:5                    |      0|acc           |0.4314|Β±  |0.0493|
|leaderboard:mmlu:computer_security:5                  |      0|acc           |0.7900|Β±  |0.0409|
|leaderboard:mmlu:conceptual_physics:5                 |      0|acc           |0.5872|Β±  |0.0322|
|leaderboard:mmlu:econometrics:5                       |      0|acc           |0.5439|Β±  |0.0469|
|leaderboard:mmlu:electrical_engineering:5             |      0|acc           |0.6138|Β±  |0.0406|
|leaderboard:mmlu:elementary_mathematics:5             |      0|acc           |0.4683|Β±  |0.0257|
|leaderboard:mmlu:formal_logic:5                       |      0|acc           |0.5317|Β±  |0.0446|
|leaderboard:mmlu:global_facts:5                       |      0|acc           |0.4600|Β±  |0.0501|
|leaderboard:mmlu:high_school_biology:5                |      0|acc           |0.8065|Β±  |0.0225|
|leaderboard:mmlu:high_school_chemistry:5              |      0|acc           |0.5419|Β±  |0.0351|
|leaderboard:mmlu:high_school_computer_science:5       |      0|acc           |0.6800|Β±  |0.0469|
|leaderboard:mmlu:high_school_european_history:5       |      0|acc           |0.7394|Β±  |0.0343|
|leaderboard:mmlu:high_school_geography:5              |      0|acc           |0.8131|Β±  |0.0278|
|leaderboard:mmlu:high_school_government_and_politics:5|      0|acc           |0.8964|Β±  |0.0220|
|leaderboard:mmlu:high_school_macroeconomics:5         |      0|acc           |0.6769|Β±  |0.0237|
|leaderboard:mmlu:high_school_mathematics:5            |      0|acc           |0.3259|Β±  |0.0286|
|leaderboard:mmlu:high_school_microeconomics:5         |      0|acc           |0.7563|Β±  |0.0279|
|leaderboard:mmlu:high_school_physics:5                |      0|acc           |0.4106|Β±  |0.0402|
|leaderboard:mmlu:high_school_psychology:5             |      0|acc           |0.8477|Β±  |0.0154|
|leaderboard:mmlu:high_school_statistics:5             |      0|acc           |0.4769|Β±  |0.0341|
|leaderboard:mmlu:high_school_us_history:5             |      0|acc           |0.7892|Β±  |0.0286|
|leaderboard:mmlu:high_school_world_history:5          |      0|acc           |0.8397|Β±  |0.0239|
|leaderboard:mmlu:human_aging:5                        |      0|acc           |0.7265|Β±  |0.0299|
|leaderboard:mmlu:human_sexuality:5                    |      0|acc           |0.7939|Β±  |0.0355|
|leaderboard:mmlu:international_law:5                  |      0|acc           |0.7686|Β±  |0.0385|
|leaderboard:mmlu:jurisprudence:5                      |      0|acc           |0.7593|Β±  |0.0413|
|leaderboard:mmlu:logical_fallacies:5                  |      0|acc           |0.7607|Β±  |0.0335|
|leaderboard:mmlu:machine_learning:5                   |      0|acc           |0.5268|Β±  |0.0474|
|leaderboard:mmlu:management:5                         |      0|acc           |0.8155|Β±  |0.0384|
|leaderboard:mmlu:marketing:5                          |      0|acc           |0.9060|Β±  |0.0191|
|leaderboard:mmlu:medical_genetics:5                   |      0|acc           |0.7900|Β±  |0.0409|
|leaderboard:mmlu:miscellaneous:5                      |      0|acc           |0.8238|Β±  |0.0136|
|leaderboard:mmlu:moral_disputes:5                     |      0|acc           |0.7399|Β±  |0.0236|
|leaderboard:mmlu:moral_scenarios:5                    |      0|acc           |0.4358|Β±  |0.0166|
|leaderboard:mmlu:nutrition:5                          |      0|acc           |0.7549|Β±  |0.0246|
|leaderboard:mmlu:philosophy:5                         |      0|acc           |0.7331|Β±  |0.0251|
|leaderboard:mmlu:prehistory:5                         |      0|acc           |0.7469|Β±  |0.0242|
|leaderboard:mmlu:professional_accounting:5            |      0|acc           |0.5177|Β±  |0.0298|
|leaderboard:mmlu:professional_law:5                   |      0|acc           |0.4648|Β±  |0.0127|
|leaderboard:mmlu:professional_medicine:5              |      0|acc           |0.7279|Β±  |0.0270|
|leaderboard:mmlu:professional_psychology:5            |      0|acc           |0.6928|Β±  |0.0187|
|leaderboard:mmlu:public_relations:5                   |      0|acc           |0.6636|Β±  |0.0453|
|leaderboard:mmlu:security_studies:5                   |      0|acc           |0.7306|Β±  |0.0284|
|leaderboard:mmlu:sociology:5                          |      0|acc           |0.8557|Β±  |0.0248|
|leaderboard:mmlu:us_foreign_policy:5                  |      0|acc           |0.8600|Β±  |0.0349|
|leaderboard:mmlu:virology:5                           |      0|acc           |0.5361|Β±  |0.0388|
|leaderboard:mmlu:world_religions:5                    |      0|acc           |0.7953|Β±  |0.0309|
|leaderboard:truthfulqa:mc:0                           |      0|truthfulqa_mc1|0.3745|Β±  |0.0169|
|                                                      |       |truthfulqa_mc2|0.5338|Β±  |0.0160|
|leaderboard:winogrande:5                              |      0|acc           |0.6930|Β±  |0.0130|
Downloads last month
2
Safetensors
Model size
8B params
Tensor type
BF16
Β·
Inference Providers NEW
Input a message to start chatting with dreamgen/llama3-8b-instruct-align-test2-kto.

Model tree for dreamgen/llama3-8b-instruct-align-test2-kto

Quantizations
1 model

Spaces using dreamgen/llama3-8b-instruct-align-test2-kto 9