# AGENTS Guide - audio-embeddings This file is for coding agents working in this repository. Follow these repo-specific rules over generic defaults. ## 1) Environment Snapshot - Python: `>=3.12` (from `pyproject.toml`). - Dependency manager: `uv`. - Main stack: PyTorch, PyTorch Lightning, Hydra, OmegaConf. - Project root marker: `.project-root`. - Main entrypoint: `src/train.py`. ## 2) Cursor / Copilot Rule Files - Checked `.cursor/rules/`: not present. - Checked `.cursorrules`: not present. - Checked `.github/copilot-instructions.md`: not present. - Therefore, no additional Cursor/Copilot rule files are currently enforced. ## 3) Install / Setup Commands ```bash uv sync uv run uv add ``` ## 4) Build / Train / Eval Commands There is no separate "build" step (this is a training codebase). Use quick-run training as the integration sanity check. ```bash uv run src/train.py uv run src/train.py trainer.fast_dev_run=True uv run src/train.py trainer=cpu trainer.fast_dev_run=True uv run src/train.py experiment=local/audio_jepa uv run src/train.py trainer.max_epochs=10 data.batch_size=32 model.optimizer.lr=1e-4 ``` Cluster-style execution (existing project pattern): ```bash srun .venv/bin/python -u -O src/train.py experiment=cluster_jepa_audioset_rope +trainer.max_time="00:19:50:00" ``` ## 5) Lint / Formatting / Static Checks Use the commands below as pragmatic checks: ```bash uv run pre-commit run --all-files uv run pre-commit run ruff --all-files uv run pre-commit run ruff-format --all-files uv run python -m compileall src ``` Ruff is configured via `.pre-commit-config.yaml` and runs both lint fixes and formatting. ## 6) Test Commands (Including Single Test) Primary validation in this repo is script-based verification under `tests/`. Run test files directly as native Python files: ```bash uv run tests/verify_rope.py uv run tests/verify_custom_rope.py uv run tests/verify_data.py ``` Useful single-file checks (native execution): ```bash uv run src/train.py trainer.fast_dev_run=True uv run src/train.py trainer=cpu trainer.fast_dev_run=True uv run scripts/verify_shapes.py uv run scripts/verify_scheduler.py ``` Notes: - `tests/test_*.py` are pytest-style and are not part of the default native-file workflow. - Prefer `tests/verify_*.py` and `scripts/verify_*.py` for lightweight checks. ## 7) Repository Architecture Expectations - `configs/`: Hydra composition (trainer/data/model/logger/callbacks/experiment). - `src/train.py`: orchestration only (instantiate and run). - `src/models/`: LightningModules (high-level training logic). - `src/models/components/`: reusable `nn.Module` building blocks. - `src/data/`: DataModules/Datasets and collate logic. - `src/utils/`: logging, instantiation, wrappers, scheduler helpers. When possible, prefer config changes over hardcoded Python changes. ## 8) Code Style Guidelines ### Imports - Group imports as: standard library -> third-party -> local `src.*`. - Keep one import per line unless importing multiple names from same module. - Avoid wildcard imports. - Prefer absolute imports from `src...`. ### Formatting - Use 4-space indentation and readable line lengths. - Keep functions small; extract helpers for complex logic. - Do not introduce unrelated reformatting in touched files. - Keep comments for non-obvious intent, not obvious mechanics. ### Typing - Type hints are expected for function arguments and return values. - Use concrete tensor/container types when practical. - Use `Optional[T]` / `T | None` consistently within a file. - For dict-like configs, type as `DictConfig` when passing Hydra config objects. ### Naming - `snake_case`: functions, variables, module filenames. - `PascalCase`: classes (`AudioJEPAModule`, `AudioSetDataModule`). - `UPPER_SNAKE_CASE`: constants. - Prefer descriptive names (`mask_indices`) over short names (`m2`) except local math temporaries. ### PyTorch / Lightning / Hydra Conventions - Keep heavy compute out of `__init__` where possible. - `forward()` for inference logic; training behavior in `training_step()`. - Use `self.log(...)` with explicit flags (`on_step`, `on_epoch`, `prog_bar`, `batch_size`). - Instantiate components through Hydra (`hydra.utils.instantiate`). - Expose tunable parameters in config files, not hardcoded literals. ### Error Handling and Validation - Raise informative `ValueError` / `RuntimeError` for invalid config/state. - Validate critical tensor assumptions with assertions or explicit checks. - Prefer logger/warnings over bare `print()` in new code. - For file I/O, prefer `pathlib.Path` and existence checks. ### Data and Paths - Do not hardcode absolute machine paths. - Use `rootutils.setup_root(..., indicator=".project-root", pythonpath=True)` in entrypoints/scripts when needed. - Respect `cfg.paths.*` outputs for logs/checkpoints/artifacts. ## 9) Agent Workflow Rules - Reuse existing components before adding new abstractions. - Keep `src/train.py` generic; place model/data logic in dedicated modules. - Prefer minimal, focused diffs. - Update configs and docs when behavior changes. - Validate with the smallest meaningful command first (`fast_dev_run`, single test), then broader checks. ## 10) Git / Change Hygiene - Do not revert unrelated local changes. - Keep commits scoped to one concern. - Write clear commit messages describing intent. - Prefer Conventional Commit-like format: `type(scope): intent`. - Common types in this repo: `feat`, `fix`, `conf`, `build`, `docs`, `style`, `chore`. - Never commit secrets, credentials, or environment-specific absolute paths. ## 11) Practical Agent Defaults - Prefer reusing existing modules over creating new abstractions. - Keep edits local to the requested change; avoid drive-by refactors. - Run the smallest useful verification command after changes. - If you touch training logic, run at least one fast training sanity check. - If you touch model components, run relevant verify script(s) in `tests/`. - If you touch Hydra config wiring, run a config-backed entry command via `uv run src/train.py ...`. ## 12) Common Pitfalls - Avoid hardcoding data paths; use config (`cfg.paths`, data config fields). - Avoid printing in new code paths; use ranked loggers/warnings. - Avoid putting heavy tensor compute in constructors. - Avoid bypassing Hydra by manually instantiating configurable components. - Avoid changing unrelated formatting in files you touch.