File size: 2,988 Bytes
fec9168 d3419fd fec9168 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
# TREA 2.0 Pipeline
Audio question-answering dataset generator using ESC-50. Creates four task types: COUNT, DURATION, ORDER, and VOLUME.
## Quick Start
```bash
# 1. Install dependencies
pip install -r requirements.txt
# 2. Preprocess ESC-50 (required for DURATION task only)
python preprocess_esc50.py --config config.yaml
# 3. Generate datasets
python main.py --config config.yaml
```
## Configuration
Edit `config.yaml` to set:
- **Task duration**: `task_duration_size` (hours) per task
- **Clip duration range**: `min_clip_duration` to `max_clip_duration` (seconds)
- **ESC-50 paths**: Point to your ESC-50 dataset location
- **Enable/disable tasks**: Set `enabled: true/false` for each task
## Key Files
- **`config.yaml`** - All configuration parameters
- **`main.py`** - Pipeline entry point (runs all tasks)
- **`preprocess_esc50.py`** - Preprocess ESC-50 for duration task
- **`tasks/task_*.py`** - Individual task generators
## Tasks
| Task | Question | Example |
|------|----------|---------|
| **COUNT** | "How many unique sounds?" | Audio with distinct sound types |
| **DURATION** | "Which sound is longest/shortest?" | Compare sound durations |
| **ORDER** | "Which sound is first/last/after X?" | Temporal sequence questions |
| **VOLUME** | "Which sound is loudest/softest?" | Loudness comparison |
## Output Structure
```
output/{task}/
βββ audios/*.wav # Generated audio files
βββ {task}_mcq.csv # Multiple choice questions
βββ {task}_open_text.csv # Open-ended questions
βββ {task}_metadata.csv # Detailed metadata
```
## Shell scripts (quick)
Use the provided shell helpers for simple runs.
Run full pipeline (uses `python main.py` under the hood):
```bash
# Make executable and run (from pipeline/)
./run_pipeline.sh
# With custom config, tasks, and output
./run_pipeline.sh --config my_config.yaml --tasks count,order --output ./my_dataset
```
Run the LLM answer generation across splits (uses `llm_answer_generator.py`):
```bash
# Processes open_text CSVs across splits/tasks defined in the script
./run_llm_answers_all.sh
# Or run per-file with the helper script directly
python llm_answer_generator.py --input /path/to/count_open_text.csv --mode open_text --task count
```
## Advanced Usage
```bash
# Run specific tasks only
python main.py --tasks count order
# Use custom config
python main.py --config my_config.yaml
# Custom output directory
python main.py --output /path/to/output
# Preprocess with custom parameters
python preprocess_esc50.py --config config.yaml \
--threshold-strategy noise_floor \
--noise-floor-percentile 2.0 \
--noise-floor-delta-db 5.0
```
## Documentation
See **`DOCS.md`** for complete technical documentation including:
- Mathematical formulations
- Detailed algorithm explanations
- Configuration parameter reference
- Preprocessing pipeline details
- Balancing mechanisms
## Requirements
- Python 3.8+
- pydub
- numpy
- pandas
- tqdm
- pyyaml
|