TREA-ORCA
/

TREA_2.0_codebase

Model card Files Files and versions

TREA_2.0_codebase / README.md

malay-36's picture

Update README.md

d3419fd verified 13 days ago

|

history blame contribute delete

2.99 kB

	# TREA 2.0 Pipeline

	Audio question-answering dataset generator using ESC-50. Creates four task types: COUNT, DURATION, ORDER, and VOLUME.

	## Quick Start

	```bash
	# 1. Install dependencies
	pip install -r requirements.txt

	# 2. Preprocess ESC-50 (required for DURATION task only)
	python preprocess_esc50.py --config config.yaml

	# 3. Generate datasets
	python main.py --config config.yaml
	```

	## Configuration

	Edit `config.yaml` to set:
	- Task duration: `task_duration_size` (hours) per task
	- Clip duration range: `min_clip_duration` to `max_clip_duration` (seconds)
	- ESC-50 paths: Point to your ESC-50 dataset location
	- Enable/disable tasks: Set `enabled: true/false` for each task

	## Key Files

	- `config.yaml` - All configuration parameters
	- `main.py` - Pipeline entry point (runs all tasks)
	- `preprocess_esc50.py` - Preprocess ESC-50 for duration task
	- *`tasks/task_.py`** - Individual task generators

	## Tasks

	\| Task \| Question \| Example \|
	\|------\|----------\|---------\|
	\| COUNT \| "How many unique sounds?" \| Audio with distinct sound types \|
	\| DURATION \| "Which sound is longest/shortest?" \| Compare sound durations \|
	\| ORDER \| "Which sound is first/last/after X?" \| Temporal sequence questions \|
	\| VOLUME \| "Which sound is loudest/softest?" \| Loudness comparison \|

	## Output Structure

	```
	output/{task}/
	├── audios/*.wav # Generated audio files
	├── {task}_mcq.csv # Multiple choice questions
	├── {task}_open_text.csv # Open-ended questions
	└── {task}_metadata.csv # Detailed metadata
	```

	## Shell scripts (quick)

	Use the provided shell helpers for simple runs.

	Run full pipeline (uses `python main.py` under the hood):

	```bash
	# Make executable and run (from pipeline/)
	./run_pipeline.sh

	# With custom config, tasks, and output
	./run_pipeline.sh --config my_config.yaml --tasks count,order --output ./my_dataset
	```

	Run the LLM answer generation across splits (uses `llm_answer_generator.py`):

	```bash
	# Processes open_text CSVs across splits/tasks defined in the script
	./run_llm_answers_all.sh

	# Or run per-file with the helper script directly
	python llm_answer_generator.py --input /path/to/count_open_text.csv --mode open_text --task count
	```


	## Advanced Usage

	```bash
	# Run specific tasks only
	python main.py --tasks count order

	# Use custom config
	python main.py --config my_config.yaml

	# Custom output directory
	python main.py --output /path/to/output

	# Preprocess with custom parameters
	python preprocess_esc50.py --config config.yaml \
	--threshold-strategy noise_floor \
	--noise-floor-percentile 2.0 \
	--noise-floor-delta-db 5.0
	```

	## Documentation

	See `DOCS.md` for complete technical documentation including:
	- Mathematical formulations
	- Detailed algorithm explanations
	- Configuration parameter reference
	- Preprocessing pipeline details
	- Balancing mechanisms

	## Requirements

	- Python 3.8+
	- pydub
	- numpy
	- pandas
	- tqdm
	- pyyaml