TRAIL

Sleeping

App Files Files Community

jitinpatronus commited on May 15, 2025

Commit

e5da7a6

verified ·

1 Parent(s): a44fe3a

Update README.md

Browse files

Files changed (1) hide show

README.md +13 -63

README.md CHANGED Viewed

@@ -1,15 +1,16 @@
 ---
-title: TRAIL
-emoji: 🥇
 colorFrom: green
 colorTo: indigo
 sdk: gradio
 app_file: app.py
 pinned: true
 license: mit
-short_description: 'TRAIL: Trace Reasoning and Agentic Issue Localization'
 sdk_version: 5.19.0
 ---
 # Model Performance Leaderboard
@@ -17,69 +18,18 @@ This is a Hugging Face Space that hosts a leaderboard for comparing model perfor
 ## Features
-- **Submit Model Results**: Share your model's performance metrics
-- **Interactive Leaderboard**: View and sort all submissions
-- **Integrated Backend**: Stores all submissions with timestamp and attribution
-- **Customizable Metrics**: Configure which metrics to display and track
-## Installation
-### Setting Up Your Space
-1. Upload all files to your Hugging Face Space
-2. Make sure to make `start.sh` executable:
-   ```bash
-   chmod +x start.sh
-   ```
-3. Configure your Space to use the `start.sh` script as the entry point
-### Troubleshooting Installation Issues
-If you encounter JSON parsing errors:
-1. Check if `models.json` exists and is a valid JSON file
-2. Run `python setup.py` to regenerate configuration files
-3. If problems persist, delete the `models.json` file and let the setup script create a new one
-## How to Use
-### Viewing the Leaderboard
-Navigate to the "Leaderboard" tab to see all submitted models. You can:
-- Sort by any metric (click on the dropdown)
-- Change sort order (ascending/descending)
-- Refresh the leaderboard for the latest submissions
-### Submitting a Model
-1. Go to the "Submit Model" tab
-2. Fill in your model name, your name, and optional description
-3. Enter values for the requested metrics
-4. Click "Submit Model"
-## Configuration
-You can customize this leaderboard by modifying the `models.json` file:
-```json
-{
-  "title": "TRAIL Performance Leaderboard",
-  "description": "This leaderboard tracks and compares model performance across multiple metrics. Submit your model results to see how they stack up!",
-  "metrics": ["accuracy", "f1_score", "precision", "recall"],
-  "main_metric": "accuracy"
-}
-```
-- `title`: The title of your leaderboard
-- `description`: A description that appears at the top
-- `metrics`: List of metrics to track
-- `main_metric`: Default metric for sorting
-## Technical Details
-This leaderboard is built using:
-- Gradio for the UI components
-- A file-based database to store submissions
-- Pandas for data manipulation and display
 ## License

 ---
+title: TRAIL Leaderboard
+emoji: 🏆
 colorFrom: green
 colorTo: indigo
 sdk: gradio
 app_file: app.py
 pinned: true
 license: mit
+short_description: Trace Reasoning and Agentic Issue Localization Leaderboard
 sdk_version: 5.19.0
+tags:
+- leaderboard
 ---
 # Model Performance Leaderboard
 ## Features
+- **Submit Your Answers**: Run your model on TRAIL dataset. Submit your results.
+- **Leaderboard**: View how your submissions are ranked.
+## Instructions
+1. Please refer to our GitHub repository at https://github.com/patronus-ai/trail-benchmark for step‑by‑step instructions on how to run your model with the TRAIL dataset.
+2. Compress the resulting JSON outputs into a ZIP archive whose filename begins with SWE_/GAIA_, and submit it.
+3. Once the evaluation is complete, we’ll upload the scores (this process will soon be automated).
+## Benchmarking on TRAIL
+TRAIL(Trace Reasoning and Agentic Issue Localization) is a benchmark dataset of 148 annotated AI agent execution traces containing 841 errors across reasoning, execution, and planning categories. Created from real-world software engineering and information retrieval tasks, it challenges even state-of-the-art LLMs, with the best model achieving only 11% accuracy, highlighting the difficulty of trace debugging for complex agent workflows.
 ## License