xwm
/

SciWorld-MPO

@@ -1,11 +1,18 @@
 ---
 license: apache-2.0
 tags:
 - nlp
 - agent
-language:
-- en
-pipeline_tag: text-generation
 ---
 # SciWorld-MPO
@@ -22,9 +29,13 @@ It achieves the following results on the evaluation set:
 - Logits/chosen: 0.5212
 - Logits/rejected: 0.5151
 ## Model description
-More information needed
 ## Intended uses & limitations
@@ -32,7 +43,7 @@ More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure

 ---
+language:
+- en
 license: apache-2.0
+library_name: transformers
+pipeline_tag: reinforcement-learning
+datasets:
+- xwm/Meta_Plan_Optimization
+base_model:
+- meta-llama/Llama-3.1-8B-Instruct
+metrics:
+- accuracy
 tags:
 - nlp
 - agent
 ---
 # SciWorld-MPO
 - Logits/chosen: 0.5212
 - Logits/rejected: 0.5151
+See the original paper for more details: [MPO: Boosting LLM Agents with Meta Plan Optimization](https://hf.co/papers/2503.02682).
+Code: https://github.com/WeiminXiong/MPO
 ## Model description
+This model uses Meta Plan Optimization (MPO) to improve the planning capabilities of LLM agents. It leverages high-level general guidance through meta plans and enables continuous optimization based on feedback from the agent's task execution.  It achieves state-of-the-art performance on ALFWorld and SciWorld, with an average accuracy of 83.1.
 ## Intended uses & limitations
 ## Training and evaluation data
+The model was trained on the `sciworld-metaplan-preference-pairs` dataset, part of the [Meta_Plan_Optimization](https://huggingface.co/datasets/xwm/Meta_Plan_Optimization) dataset.
 ## Training procedure