yyliu01
/

AuralSAM2

Model card Files Files and versions

AuralSAM2 / docs /installation.md

yyliu01's picture

Upload folder using huggingface_hub

c6dfc69 verified 5 days ago

|

history blame contribute delete

2.89 kB

	# Installation

	The project is based on Python and PyTorch. We usually run experiments with multi-GPU training.

	Tested runtime:
	- Python `3.12.3`
	- PyTorch `2.8.0+cu128`

	## 📥 Clone the Git repo

	``` shell
	$ https://github.com/yyliu01/AuralSAM2
	$ cd AuralSAM2
	```

	## 🧩 Install dependencies

	1) create conda env from yaml
	```shell
	$ conda env create -f docs/auralsam2.yml
	```

	2) activate env
	```shell
	$ conda activate auralsam2
	```

	3) install PyTorch (recommended: match tested runtime)
	```shell
	# CUDA 12.8 (tested):
	$ pip install torch==2.8.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
	```

	4) install python packages (if needed)
	```shell
	$ pip install -r docs/requirements.txt
	```

	## 🗂️ Prepare dataset

	### AVSBench (`avs.code`)

	1) download and prepare AVSBench under repository root.
	2) ensure the dataset root path is:
	- `AVSBench/`
	- `AVSBench/avss_index/metadata.csv` (and subset folders `v1s/`, `v1m/`, `v2/`)

	### Ref-AVS (`ref-avs.code`)

	1) download and prepare the Ref-AVS (REFAVS) dataset under repository root.
	2) ensure the dataset root path is:
	- `REFAVS/`
	- `REFAVS/metadata.csv` (splits: `train`, `test_s`, `test_u`, `test_n`)


	### Checkpoints (shared)

	Prepare under repository root:

	- `ckpts/sam_ckpts/sam2_hiera_large.pt`
	- `ckpts/vggish-10086976.pth`

	## 🏗️ Workspace structure

	```shell
	AuralSAM2/
	├── avs.code/
	│ ├── v1s.code/
	│ ├── v1m.code/
	│ └── v2.code/
	├── ref-avs.code/
	├── scripts/
	│ ├── run_avs_train.sh
	│ └── run_ref_train.sh
	├── AVSBench/
	│ ├── avss_index
	│ │ ├── metadata.csv
	│ │ ├── metadata_v1m_man.csv
	│ │ └── metadata_v2_man.csv
	│ ├── v1m
	│ │ ├── 01uIJMwnUvA_0
	│ │ ├── 0WxgIKuetYI_0
	│ │ ... (419 more)
	│ ├── v1s
	│ │ ├── --FenyW2i_4_5000_10000
	│ │ ├── --ZHUMfueO0_5000_10000
	│ │ ... (4927 more)
	│ └── v2
	│ ├── --KCIeTv6PM_14000_24000
	│ ├── --iSerV5DbY_68000_78000
	│ ... (5995 more)
	├── REFAVS/
	│ ├── gt_mask
	│ │ ├── --KCIeTv6PM_14000_24000
	│ │ ├── --iSerV5DbY_68000_78000
	│ │ ... (~4000 more)
	│ ├── media
	│ │ ├── --KCIeTv6PM_14000_24000
	│ │ ├── --iSerV5DbY_68000_78000
	│ │ ... (~4300 more)
	│ └── metadata.csv
	├── ckpts/
	│ ├── sam_ckpts/
	│ │ └── sam2_hiera_large.pt
	│ └── vggish-10086976.pth
	└── docs/
	├── installation.md
	├── before_start.md
	├── requirements.txt
	└── auralsam2.yml
	```

	## 📝 Notes

	- use `docs/before_start.md` for training and inference commands.
	- if wandb is not needed, disable online logging in your config.