Spaces:
Build error
Build error
| # FutureBench Dataset Processing | |
| This directory contains tools for processing FutureBench datasets, both downloading from HuggingFace and transforming your own database into the standard format. | |
| ## Option 1: Download from HuggingFace (Original) | |
| Use this to download the existing FutureBench dataset: | |
| ```bash | |
| python download_data.py | |
| ``` | |
| ## Option 2: Transform Your Own Database | |
| Use this to transform your production database into HuggingFace format: | |
| ### Setup | |
| 1. **Install dependencies:** | |
| ```bash | |
| pip install pandas sqlalchemy huggingface_hub | |
| ``` | |
| 2. **Set up HuggingFace token:** | |
| ```bash | |
| export HF_TOKEN="your_huggingface_token_here" | |
| ``` | |
| 3. **Configure your settings:** | |
| Edit `config_db.py` to match your needs: | |
| - Update `HF_CONFIG` with your HuggingFace repository names | |
| - Adjust `PROCESSING_CONFIG` for data filtering preferences | |
| - Note: Database connection uses the same setup as the main FutureBench app | |
| ### Usage | |
| ```bash | |
| # Transform your database and upload to HuggingFace | |
| python db_to_hf.py | |
| # Or run locally without uploading | |
| HF_TOKEN="" python db_to_hf.py | |
| ``` | |
| ### Database Schema | |
| The script uses the same database schema as the main FutureBench application: | |
| - `EventBase` model for events | |
| - `Prediction` model for predictions | |
| - Uses SQLAlchemy ORM (same as `convert_to_csv.py`) | |
| No additional database configuration needed - it uses the existing FutureBench database connection. | |
| ### Output Format | |
| The script produces data in the same format as the original FutureBench dataset: | |
| - `event_id`, `question`, `event_type`, `algorithm_name`, `actual_prediction`, `result`, `open_to_bet_until`, `prediction_created_at` | |
| ### Automation | |
| You can run this as a scheduled job: | |
| ```bash | |
| # Add to crontab to run daily at 2 AM | |
| 0 2 * * * cd /path/to/your/project && python leaderboard/process_data/db_to_hf.py | |
| ``` | |
| ## Files | |
| - `download_data.py` - Downloads data from HuggingFace repositories | |
| - `db_to_hf.py` - Transforms your database to HuggingFace format | |
| - `config_db.py` - Configuration for database connection and HF settings | |
| - `config.py` - HuggingFace repository configuration | |
| - `requirements.txt` - Python dependencies | |
| ## Data Structure | |
| The main dataset contains: | |
| - `event_id`: Unique identifier for each event | |
| - `question`: The prediction question | |
| - `event_type`: Type of event (polymarket, soccer, etc.) | |
| - `answer_options`: Possible answers in JSON format | |
| - `result`: Actual outcome (if resolved) | |
| - `algorithm_name`: AI model that made the prediction | |
| - `actual_prediction`: The prediction made | |
| - `open_to_bet_until`: Prediction window deadline | |
| - `prediction_created_at`: When prediction was made | |
| ## Output | |
| The script generates: | |
| - Downloaded datasets in local cache folders | |
| - `evaluation_queue.csv` with unique events for processing | |
| - Console output with data statistics and summary | |