File size: 6,420 Bytes
eca55dc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
# AGENTS Guide - audio-embeddings

This file is for coding agents working in this repository.
Follow these repo-specific rules over generic defaults.

## 1) Environment Snapshot
- Python: `>=3.12` (from `pyproject.toml`).
- Dependency manager: `uv`.
- Main stack: PyTorch, PyTorch Lightning, Hydra, OmegaConf.
- Project root marker: `.project-root`.
- Main entrypoint: `src/train.py`.

## 2) Cursor / Copilot Rule Files
- Checked `.cursor/rules/`: not present.
- Checked `.cursorrules`: not present.
- Checked `.github/copilot-instructions.md`: not present.
- Therefore, no additional Cursor/Copilot rule files are currently enforced.

## 3) Install / Setup Commands
```bash
uv sync
uv run <command>
uv add <package>
```

## 4) Build / Train / Eval Commands
There is no separate "build" step (this is a training codebase).
Use quick-run training as the integration sanity check.
```bash
uv run src/train.py
uv run src/train.py trainer.fast_dev_run=True
uv run src/train.py trainer=cpu trainer.fast_dev_run=True
uv run src/train.py experiment=local/audio_jepa
uv run src/train.py trainer.max_epochs=10 data.batch_size=32 model.optimizer.lr=1e-4
```
Cluster-style execution (existing project pattern):
```bash
srun .venv/bin/python -u -O src/train.py experiment=cluster_jepa_audioset_rope +trainer.max_time="00:19:50:00"
```

## 5) Lint / Formatting / Static Checks
Use the commands below as pragmatic checks:
```bash
uv run pre-commit run --all-files
uv run pre-commit run ruff --all-files
uv run pre-commit run ruff-format --all-files
uv run python -m compileall src
```
Ruff is configured via `.pre-commit-config.yaml` and runs both lint fixes and formatting.

## 6) Test Commands (Including Single Test)
Primary validation in this repo is script-based verification under `tests/`.
Run test files directly as native Python files:
```bash
uv run tests/verify_rope.py
uv run tests/verify_custom_rope.py
uv run tests/verify_data.py
```
Useful single-file checks (native execution):
```bash
uv run src/train.py trainer.fast_dev_run=True
uv run src/train.py trainer=cpu trainer.fast_dev_run=True
uv run scripts/verify_shapes.py
uv run scripts/verify_scheduler.py
```
Notes:
- `tests/test_*.py` are pytest-style and are not part of the default native-file workflow.
- Prefer `tests/verify_*.py` and `scripts/verify_*.py` for lightweight checks.

## 7) Repository Architecture Expectations
- `configs/`: Hydra composition (trainer/data/model/logger/callbacks/experiment).
- `src/train.py`: orchestration only (instantiate and run).
- `src/models/`: LightningModules (high-level training logic).
- `src/models/components/`: reusable `nn.Module` building blocks.
- `src/data/`: DataModules/Datasets and collate logic.
- `src/utils/`: logging, instantiation, wrappers, scheduler helpers.
When possible, prefer config changes over hardcoded Python changes.

## 8) Code Style Guidelines
### Imports
- Group imports as: standard library -> third-party -> local `src.*`.
- Keep one import per line unless importing multiple names from same module.
- Avoid wildcard imports.
- Prefer absolute imports from `src...`.

### Formatting
- Use 4-space indentation and readable line lengths.
- Keep functions small; extract helpers for complex logic.
- Do not introduce unrelated reformatting in touched files.
- Keep comments for non-obvious intent, not obvious mechanics.

### Typing
- Type hints are expected for function arguments and return values.
- Use concrete tensor/container types when practical.
- Use `Optional[T]` / `T | None` consistently within a file.
- For dict-like configs, type as `DictConfig` when passing Hydra config objects.

### Naming
- `snake_case`: functions, variables, module filenames.
- `PascalCase`: classes (`AudioJEPAModule`, `AudioSetDataModule`).
- `UPPER_SNAKE_CASE`: constants.
- Prefer descriptive names (`mask_indices`) over short names (`m2`) except local math temporaries.

### PyTorch / Lightning / Hydra Conventions
- Keep heavy compute out of `__init__` where possible.
- `forward()` for inference logic; training behavior in `training_step()`.
- Use `self.log(...)` with explicit flags (`on_step`, `on_epoch`, `prog_bar`, `batch_size`).
- Instantiate components through Hydra (`hydra.utils.instantiate`).
- Expose tunable parameters in config files, not hardcoded literals.

### Error Handling and Validation
- Raise informative `ValueError` / `RuntimeError` for invalid config/state.
- Validate critical tensor assumptions with assertions or explicit checks.
- Prefer logger/warnings over bare `print()` in new code.
- For file I/O, prefer `pathlib.Path` and existence checks.

### Data and Paths
- Do not hardcode absolute machine paths.
- Use `rootutils.setup_root(..., indicator=".project-root", pythonpath=True)` in entrypoints/scripts when needed.
- Respect `cfg.paths.*` outputs for logs/checkpoints/artifacts.

## 9) Agent Workflow Rules
- Reuse existing components before adding new abstractions.
- Keep `src/train.py` generic; place model/data logic in dedicated modules.
- Prefer minimal, focused diffs.
- Update configs and docs when behavior changes.
- Validate with the smallest meaningful command first (`fast_dev_run`, single test), then broader checks.

## 10) Git / Change Hygiene
- Do not revert unrelated local changes.
- Keep commits scoped to one concern.
- Write clear commit messages describing intent.
- Prefer Conventional Commit-like format: `type(scope): intent`.
- Common types in this repo: `feat`, `fix`, `conf`, `build`, `docs`, `style`, `chore`.
- Never commit secrets, credentials, or environment-specific absolute paths.

## 11) Practical Agent Defaults
- Prefer reusing existing modules over creating new abstractions.
- Keep edits local to the requested change; avoid drive-by refactors.
- Run the smallest useful verification command after changes.
- If you touch training logic, run at least one fast training sanity check.
- If you touch model components, run relevant verify script(s) in `tests/`.
- If you touch Hydra config wiring, run a config-backed entry command via `uv run src/train.py ...`.

## 12) Common Pitfalls
- Avoid hardcoding data paths; use config (`cfg.paths`, data config fields).
- Avoid printing in new code paths; use ranked loggers/warnings.
- Avoid putting heavy tensor compute in constructors.
- Avoid bypassing Hydra by manually instantiating configurable components.
- Avoid changing unrelated formatting in files you touch.