| # TREA 2.0 Pipeline | |
| Audio question-answering dataset generator using ESC-50. Creates four task types: COUNT, DURATION, ORDER, and VOLUME. | |
| ## Quick Start | |
| ```bash | |
| # 1. Install dependencies | |
| pip install -r requirements.txt | |
| # 2. Preprocess ESC-50 (required for DURATION task only) | |
| python preprocess_esc50.py --config config.yaml | |
| # 3. Generate datasets | |
| python main.py --config config.yaml | |
| ``` | |
| ## Configuration | |
| Edit `config.yaml` to set: | |
| - **Task duration**: `task_duration_size` (hours) per task | |
| - **Clip duration range**: `min_clip_duration` to `max_clip_duration` (seconds) | |
| - **ESC-50 paths**: Point to your ESC-50 dataset location | |
| - **Enable/disable tasks**: Set `enabled: true/false` for each task | |
| ## Key Files | |
| - **`config.yaml`** - All configuration parameters | |
| - **`main.py`** - Pipeline entry point (runs all tasks) | |
| - **`preprocess_esc50.py`** - Preprocess ESC-50 for duration task | |
| - **`tasks/task_*.py`** - Individual task generators | |
| ## Tasks | |
| | Task | Question | Example | | |
| |------|----------|---------| | |
| | **COUNT** | "How many unique sounds?" | Audio with distinct sound types | | |
| | **DURATION** | "Which sound is longest/shortest?" | Compare sound durations | | |
| | **ORDER** | "Which sound is first/last/after X?" | Temporal sequence questions | | |
| | **VOLUME** | "Which sound is loudest/softest?" | Loudness comparison | | |
| ## Output Structure | |
| ``` | |
| output/{task}/ | |
| βββ audios/*.wav # Generated audio files | |
| βββ {task}_mcq.csv # Multiple choice questions | |
| βββ {task}_open_text.csv # Open-ended questions | |
| βββ {task}_metadata.csv # Detailed metadata | |
| ``` | |
| ## Shell scripts (quick) | |
| Use the provided shell helpers for simple runs. | |
| Run full pipeline (uses `python main.py` under the hood): | |
| ```bash | |
| # Make executable and run (from pipeline/) | |
| ./run_pipeline.sh | |
| # With custom config, tasks, and output | |
| ./run_pipeline.sh --config my_config.yaml --tasks count,order --output ./my_dataset | |
| ``` | |
| Run the LLM answer generation across splits (uses `llm_answer_generator.py`): | |
| ```bash | |
| # Processes open_text CSVs across splits/tasks defined in the script | |
| ./run_llm_answers_all.sh | |
| # Or run per-file with the helper script directly | |
| python llm_answer_generator.py --input /path/to/count_open_text.csv --mode open_text --task count | |
| ``` | |
| ## Advanced Usage | |
| ```bash | |
| # Run specific tasks only | |
| python main.py --tasks count order | |
| # Use custom config | |
| python main.py --config my_config.yaml | |
| # Custom output directory | |
| python main.py --output /path/to/output | |
| # Preprocess with custom parameters | |
| python preprocess_esc50.py --config config.yaml \ | |
| --threshold-strategy noise_floor \ | |
| --noise-floor-percentile 2.0 \ | |
| --noise-floor-delta-db 5.0 | |
| ``` | |
| ## Documentation | |
| See **`DOCS.md`** for complete technical documentation including: | |
| - Mathematical formulations | |
| - Detailed algorithm explanations | |
| - Configuration parameter reference | |
| - Preprocessing pipeline details | |
| - Balancing mechanisms | |
| ## Requirements | |
| - Python 3.8+ | |
| - pydub | |
| - numpy | |
| - pandas | |
| - tqdm | |
| - pyyaml | |