| name: env-setup | |
| description: "Training environment setup on lucia6750000000 — conda env, dependencies, permissions" | |
| metadata: | |
| node_type: memory | |
| type: reference | |
| originSessionId: a902e50d-bd1f-422b-8298-552e3fb0a73f | |
| ## Environment on lucia6750000000 | |
| - **User:** tunneladmin (in sudo group, NOT in sigma group) | |
| - **Machine:** 8x H100 80GB, 32TB disk at /data | |
| - `/data/xuano/` owned by sigma — write access granted via `sudo chmod -R o+w /data/xuano/` | |
| - **Conda env `ttt`:** `/home/tunneladmin/.conda/envs/ttt/` | |
| - Python 3.11, PyTorch 2.8+cu128, transformers 4.57.3, VeOmni 0.1.0 | |
| - FlashAttention 2.8.3, liger-kernel, datasets 2.21.0 | |
| - Installed via: `conda create -n ttt python=3.11` + pip per [[qwen3-4b-cpt-experiment]] | |
| - **VeOmni:** Installed from git commit `9b91e164bea9e17f17ed490aab5e076c2335ca25` (ByteDance-Seed/VeOmni) | |
| - **Project code:** `/data/xuano/Plug-In-Test-time-training/` (In-Place TTT repo, also registers custom HF models for Qwen3/LLaMA/Mistral) | |
| ### Key notes | |
| - VeOmni's `lr_decay_ratio` means fraction of total steps that use cosine decay (NOT the min lr ratio). Set to 1.0 for full cosine. | |
| - `FLOPS_DISABLE=1` needed for mbs>=4 at 32K context on H100 80GB (FlopCounterMode causes OOM) | |
| - Use `nohup` for long training runs to prevent process death | |
| - `hf` CLI installed at `/home/tunneladmin/.local/bin/hf` (v1.14.0) for HF bucket sync | |
Xet Storage Details
- Size:
- 1.38 kB
- Xet hash:
- 6d8b2e6b15129938fd016127e2b84cf4430701681257a1b94642b5e56c211428
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.