metadata
name: env-setup
description: >-
Training environment setup on lucia6750000000 — conda env, dependencies,
permissions
metadata:
node_type: memory
type: reference
originSessionId: a902e50d-bd1f-422b-8298-552e3fb0a73f
Environment on lucia6750000000
- User: tunneladmin (in sudo group, NOT in sigma group)
- Machine: 8x H100 80GB, 32TB disk at /data
/data/xuano/owned by sigma — write access granted viasudo chmod -R o+w /data/xuano/- Conda env
ttt:/home/tunneladmin/.conda/envs/ttt/- Python 3.11, PyTorch 2.8+cu128, transformers 4.57.3, VeOmni 0.1.0
- FlashAttention 2.8.3, liger-kernel, datasets 2.21.0
- Installed via:
conda create -n ttt python=3.11+ pip per [[qwen3-4b-cpt-experiment]]
- VeOmni: Installed from git commit
9b91e164bea9e17f17ed490aab5e076c2335ca25(ByteDance-Seed/VeOmni) - Project code:
/data/xuano/Plug-In-Test-time-training/(In-Place TTT repo, also registers custom HF models for Qwen3/LLaMA/Mistral)
Key notes
- VeOmni's
lr_decay_ratiomeans fraction of total steps that use cosine decay (NOT the min lr ratio). Set to 1.0 for full cosine. FLOPS_DISABLE=1needed for mbs>=4 at 32K context on H100 80GB (FlopCounterMode causes OOM)- Use
nohupfor long training runs to prevent process death hfCLI installed at/home/tunneladmin/.local/bin/hf(v1.14.0) for HF bucket sync
Xet Storage Details
- Size:
- 1.38 kB
- Xet hash:
- 6d8b2e6b15129938fd016127e2b84cf4430701681257a1b94642b5e56c211428
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.