File size: 6,420 Bytes
eca55dc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 | # AGENTS Guide - audio-embeddings
This file is for coding agents working in this repository.
Follow these repo-specific rules over generic defaults.
## 1) Environment Snapshot
- Python: `>=3.12` (from `pyproject.toml`).
- Dependency manager: `uv`.
- Main stack: PyTorch, PyTorch Lightning, Hydra, OmegaConf.
- Project root marker: `.project-root`.
- Main entrypoint: `src/train.py`.
## 2) Cursor / Copilot Rule Files
- Checked `.cursor/rules/`: not present.
- Checked `.cursorrules`: not present.
- Checked `.github/copilot-instructions.md`: not present.
- Therefore, no additional Cursor/Copilot rule files are currently enforced.
## 3) Install / Setup Commands
```bash
uv sync
uv run <command>
uv add <package>
```
## 4) Build / Train / Eval Commands
There is no separate "build" step (this is a training codebase).
Use quick-run training as the integration sanity check.
```bash
uv run src/train.py
uv run src/train.py trainer.fast_dev_run=True
uv run src/train.py trainer=cpu trainer.fast_dev_run=True
uv run src/train.py experiment=local/audio_jepa
uv run src/train.py trainer.max_epochs=10 data.batch_size=32 model.optimizer.lr=1e-4
```
Cluster-style execution (existing project pattern):
```bash
srun .venv/bin/python -u -O src/train.py experiment=cluster_jepa_audioset_rope +trainer.max_time="00:19:50:00"
```
## 5) Lint / Formatting / Static Checks
Use the commands below as pragmatic checks:
```bash
uv run pre-commit run --all-files
uv run pre-commit run ruff --all-files
uv run pre-commit run ruff-format --all-files
uv run python -m compileall src
```
Ruff is configured via `.pre-commit-config.yaml` and runs both lint fixes and formatting.
## 6) Test Commands (Including Single Test)
Primary validation in this repo is script-based verification under `tests/`.
Run test files directly as native Python files:
```bash
uv run tests/verify_rope.py
uv run tests/verify_custom_rope.py
uv run tests/verify_data.py
```
Useful single-file checks (native execution):
```bash
uv run src/train.py trainer.fast_dev_run=True
uv run src/train.py trainer=cpu trainer.fast_dev_run=True
uv run scripts/verify_shapes.py
uv run scripts/verify_scheduler.py
```
Notes:
- `tests/test_*.py` are pytest-style and are not part of the default native-file workflow.
- Prefer `tests/verify_*.py` and `scripts/verify_*.py` for lightweight checks.
## 7) Repository Architecture Expectations
- `configs/`: Hydra composition (trainer/data/model/logger/callbacks/experiment).
- `src/train.py`: orchestration only (instantiate and run).
- `src/models/`: LightningModules (high-level training logic).
- `src/models/components/`: reusable `nn.Module` building blocks.
- `src/data/`: DataModules/Datasets and collate logic.
- `src/utils/`: logging, instantiation, wrappers, scheduler helpers.
When possible, prefer config changes over hardcoded Python changes.
## 8) Code Style Guidelines
### Imports
- Group imports as: standard library -> third-party -> local `src.*`.
- Keep one import per line unless importing multiple names from same module.
- Avoid wildcard imports.
- Prefer absolute imports from `src...`.
### Formatting
- Use 4-space indentation and readable line lengths.
- Keep functions small; extract helpers for complex logic.
- Do not introduce unrelated reformatting in touched files.
- Keep comments for non-obvious intent, not obvious mechanics.
### Typing
- Type hints are expected for function arguments and return values.
- Use concrete tensor/container types when practical.
- Use `Optional[T]` / `T | None` consistently within a file.
- For dict-like configs, type as `DictConfig` when passing Hydra config objects.
### Naming
- `snake_case`: functions, variables, module filenames.
- `PascalCase`: classes (`AudioJEPAModule`, `AudioSetDataModule`).
- `UPPER_SNAKE_CASE`: constants.
- Prefer descriptive names (`mask_indices`) over short names (`m2`) except local math temporaries.
### PyTorch / Lightning / Hydra Conventions
- Keep heavy compute out of `__init__` where possible.
- `forward()` for inference logic; training behavior in `training_step()`.
- Use `self.log(...)` with explicit flags (`on_step`, `on_epoch`, `prog_bar`, `batch_size`).
- Instantiate components through Hydra (`hydra.utils.instantiate`).
- Expose tunable parameters in config files, not hardcoded literals.
### Error Handling and Validation
- Raise informative `ValueError` / `RuntimeError` for invalid config/state.
- Validate critical tensor assumptions with assertions or explicit checks.
- Prefer logger/warnings over bare `print()` in new code.
- For file I/O, prefer `pathlib.Path` and existence checks.
### Data and Paths
- Do not hardcode absolute machine paths.
- Use `rootutils.setup_root(..., indicator=".project-root", pythonpath=True)` in entrypoints/scripts when needed.
- Respect `cfg.paths.*` outputs for logs/checkpoints/artifacts.
## 9) Agent Workflow Rules
- Reuse existing components before adding new abstractions.
- Keep `src/train.py` generic; place model/data logic in dedicated modules.
- Prefer minimal, focused diffs.
- Update configs and docs when behavior changes.
- Validate with the smallest meaningful command first (`fast_dev_run`, single test), then broader checks.
## 10) Git / Change Hygiene
- Do not revert unrelated local changes.
- Keep commits scoped to one concern.
- Write clear commit messages describing intent.
- Prefer Conventional Commit-like format: `type(scope): intent`.
- Common types in this repo: `feat`, `fix`, `conf`, `build`, `docs`, `style`, `chore`.
- Never commit secrets, credentials, or environment-specific absolute paths.
## 11) Practical Agent Defaults
- Prefer reusing existing modules over creating new abstractions.
- Keep edits local to the requested change; avoid drive-by refactors.
- Run the smallest useful verification command after changes.
- If you touch training logic, run at least one fast training sanity check.
- If you touch model components, run relevant verify script(s) in `tests/`.
- If you touch Hydra config wiring, run a config-backed entry command via `uv run src/train.py ...`.
## 12) Common Pitfalls
- Avoid hardcoding data paths; use config (`cfg.paths`, data config fields).
- Avoid printing in new code paths; use ranked loggers/warnings.
- Avoid putting heavy tensor compute in constructors.
- Avoid bypassing Hydra by manually instantiating configurable components.
- Avoid changing unrelated formatting in files you touch.
|