Spaces:

LAMDA-NeSy
/

ChinaTravel

Build error

博闻 commited on Feb 4

Commit

13f06e2

1 Parent(s): 6a12be7

add emoji

Files changed (2) hide show

app.py CHANGED Viewed

@@ -14,8 +14,9 @@ with gr.Blocks(title="ChinaTravel Benchmark Evaluation") as demo:
     gr.Markdown(content.INTRO_MARKDOWN)
     gr.Markdown(content.SUBMISSION_GUIDE)
-    gr.Markdown("### Leaderboard")
     gr.Markdown("Methods marked with \* leverage Oracle DSL or an Oracle Verifier.")
     if SPLITS_LIST:
         with gr.Tabs():
             for split in SPLITS_LIST:

     gr.Markdown(content.INTRO_MARKDOWN)
     gr.Markdown(content.SUBMISSION_GUIDE)
+    gr.Markdown("### 🏆 Leaderboard")
     gr.Markdown("Methods marked with \* leverage Oracle DSL or an Oracle Verifier.")
+    gr.Markdown("✨ Methods marked with * leverage Oracle DSL or an Oracle Verifier.")
     if SPLITS_LIST:
         with gr.Tabs():
             for split in SPLITS_LIST:

chinatravel/ui/content.py CHANGED Viewed

@@ -1,24 +1,24 @@
 TITLE_HTML = """
-<h1 style=\"text-align:center; margin-bottom: 0.25rem;\">ChinaTravel Benchmark Evaluation</h1>
 """
 INTRO_MARKDOWN = """
-ChinaTravel is an open-ended travel planning benchmark with compositional constraint validation for language agents. (See our [paper](https://arxiv.org/abs/2412.13682) for more details.)
 """
 SUBMISSION_GUIDE = """
-**How to submit**
 - Pick a split. The split determines which query UIDs are expected.
 - Upload a `.zip` that contains JSON files named by query UIDs.
 - Each JSON must follow the target schema: see [chinatravel/evaluation/output_schema.json](chinatravel/evaluation/output_schema.json).
 - You can dry-run locally via `python eval_exp.py --splits <split> --method <your_method>` to mirror the hosted evaluation.
-**Output**
 - We compute DR (schema pass rate), EPR_micro/EPR_macro (commonsense), LPR_micro/LPR_macro/C-LPR (logic), and FPR (all-pass rate).
 - A detailed JSON report is produced for download after evaluation.
-**Contact**
 - If you are interested in showing your results on our leaderboard or have any questions, please contact [Jie-Jing Shao](shaojj@lamda.nju.edu.cn), [Bo-Wen Zhang](221900200@smail.nju.edu.cn), [Xiao-Wen Yang](yangxw@lamda.nju.edu.cn)
 """
-CONTACT = "Contact: zbw@smail.nju.edu.cn, shaojj@lamda.nju.edu.cn"

 TITLE_HTML = """
+<h1 style=\"text-align:center; margin-bottom: 0.25rem;\">🧭 ChinaTravel Benchmark Evaluation</h1>
 """
 INTRO_MARKDOWN = """
+✈️ ChinaTravel is an open-ended travel planning benchmark with compositional constraint validation for language agents. (See our [paper](https://arxiv.org/abs/2412.13682) for more details.)
 """
 SUBMISSION_GUIDE = """
+📥 **How to submit**
 - Pick a split. The split determines which query UIDs are expected.
 - Upload a `.zip` that contains JSON files named by query UIDs.
 - Each JSON must follow the target schema: see [chinatravel/evaluation/output_schema.json](chinatravel/evaluation/output_schema.json).
 - You can dry-run locally via `python eval_exp.py --splits <split> --method <your_method>` to mirror the hosted evaluation.
+📊 **Output**
 - We compute DR (schema pass rate), EPR_micro/EPR_macro (commonsense), LPR_micro/LPR_macro/C-LPR (logic), and FPR (all-pass rate).
 - A detailed JSON report is produced for download after evaluation.
+📨 **Contact**
 - If you are interested in showing your results on our leaderboard or have any questions, please contact [Jie-Jing Shao](shaojj@lamda.nju.edu.cn), [Bo-Wen Zhang](221900200@smail.nju.edu.cn), [Xiao-Wen Yang](yangxw@lamda.nju.edu.cn)
 """
+CONTACT = "Contact: ✉️ zbw@smail.nju.edu.cn, ✉️ shaojj@lamda.nju.edu.cn"