File size: 2,988 Bytes
fec9168
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d3419fd
fec9168
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
# TREA 2.0 Pipeline

Audio question-answering dataset generator using ESC-50. Creates four task types: COUNT, DURATION, ORDER, and VOLUME.

## Quick Start

```bash
# 1. Install dependencies
pip install -r requirements.txt

# 2. Preprocess ESC-50 (required for DURATION task only)
python preprocess_esc50.py --config config.yaml

# 3. Generate datasets
python main.py --config config.yaml
```

## Configuration

Edit `config.yaml` to set:
- **Task duration**: `task_duration_size` (hours) per task
- **Clip duration range**: `min_clip_duration` to `max_clip_duration` (seconds)
- **ESC-50 paths**: Point to your ESC-50 dataset location
- **Enable/disable tasks**: Set `enabled: true/false` for each task

## Key Files

- **`config.yaml`** - All configuration parameters
- **`main.py`** - Pipeline entry point (runs all tasks)
- **`preprocess_esc50.py`** - Preprocess ESC-50 for duration task
- **`tasks/task_*.py`** - Individual task generators

## Tasks

| Task | Question | Example |
|------|----------|---------|
| **COUNT** | "How many unique sounds?" | Audio with distinct sound types |
| **DURATION** | "Which sound is longest/shortest?" | Compare sound durations |
| **ORDER** | "Which sound is first/last/after X?" | Temporal sequence questions |
| **VOLUME** | "Which sound is loudest/softest?" | Loudness comparison |

## Output Structure

```
output/{task}/
β”œβ”€β”€ audios/*.wav          # Generated audio files
β”œβ”€β”€ {task}_mcq.csv        # Multiple choice questions
β”œβ”€β”€ {task}_open_text.csv  # Open-ended questions
└── {task}_metadata.csv   # Detailed metadata
```

## Shell scripts (quick)

Use the provided shell helpers for simple runs.

Run full pipeline (uses `python main.py` under the hood):

```bash
# Make executable and run (from pipeline/)
./run_pipeline.sh

# With custom config, tasks, and output
./run_pipeline.sh --config my_config.yaml --tasks count,order --output ./my_dataset
```

Run the LLM answer generation across splits (uses `llm_answer_generator.py`):

```bash
# Processes open_text CSVs across splits/tasks defined in the script
./run_llm_answers_all.sh

# Or run per-file with the helper script directly
python llm_answer_generator.py --input /path/to/count_open_text.csv --mode open_text --task count
```


## Advanced Usage

```bash
# Run specific tasks only
python main.py --tasks count order

# Use custom config
python main.py --config my_config.yaml

# Custom output directory
python main.py --output /path/to/output

# Preprocess with custom parameters
python preprocess_esc50.py --config config.yaml \
    --threshold-strategy noise_floor \
    --noise-floor-percentile 2.0 \
    --noise-floor-delta-db 5.0
```

## Documentation

See **`DOCS.md`** for complete technical documentation including:
- Mathematical formulations
- Detailed algorithm explanations
- Configuration parameter reference
- Preprocessing pipeline details
- Balancing mechanisms

## Requirements

- Python 3.8+
- pydub
- numpy
- pandas
- tqdm
- pyyaml