Commit History

Add cuda-toolkit and cuda-nvcc to Conda environment to provide nvcc for Transformer Engine compilation
f52a4d0

elungky commited on

Fix cudnn.h not found error by dynamically locating and symlinking it from pip installed package
8da02bd

elungky commited on

Fix cudnn.h not found error by copying to Conda env include path and setting CUDA_HOME
a27c594

elungky commited on

Fix 'conda: not found' by moving global ENV PATH for Conda earlier in Dockerfile
e0e9267

elungky commited on

Fix 'conda: not found' during Miniconda installation by updating PATH within the RUN command
193cb9f

elungky commited on

Merged Dockerfile with robust build environment for transformer-engine compilation
9679875

elungky commited on

Final Dockerfile syntax correction: ensure chmod +x is a standalone RUN command
95d53e3

elungky commited on

Final Dockerfile syntax correction: ensure chmod +x is a standalone RUN command
fb926f5

elungky commited on

Final fix for Dockerfile syntax: ensure chmod +x is a standalone RUN command
598f651

elungky commited on

Fix Dockerfile syntax: separate chmod +x into its own RUN instruction
9530488

elungky commited on

Re-add transformer_engine.whl with Git LFS tracking
785a197

elungky commited on

Add Git LFS tracking for transformer_engine.whl
b25a23c

elungky commited on

Install transformer-engine using pre-built wheel
9b739b4

elungky commited on

Removed header symlinks before transformer-engine install
a752929

elungky commited on

Removed header symlinks before transformer-engine install
106eac9

elungky commited on

Removed --no-cache-dir --no-build-isolation on transformer-engine
46cbf58

elungky commited on

Attempt to fix Exit code 137 (OOM) by using --no-build-isolation for transformer-engine
54bda79

elungky commited on

Fix cudnn.h not found during Transformer Engine build by adding symlinks as per INSTALL.md
62d1e04

elungky commited on

Attempt to fix libcudnn.so.9 error by installing cudnn via conda and transformer_engine separately
cf41009

elungky commited on

Add 'attrs' package to cosmos-predict1.yaml dependencies
f305527

elungky commited on

Add 'attrs' package to cosmos-predict1.yaml dependencies
17c6444

elungky commited on

Add 'attrs' package to cosmos-predict1.yaml dependencies
8bf5f16

elungky commited on

Fix ImportError: libgthread-2.0.so.0 by installing libglib2.0-0
d6f9440

elungky commited on

Attempt to fix torchvision::nms error by installing PyTorch via pip with official CUDA index
cbe7167

elungky commited on

Attempt to fix torchvision::nms error by aligning pytorch-cuda to 12.4
9ec2085

elungky commited on

Attempt to fix torchvision::nms error by adding cudnn and libcublas to conda dependencies
d708e6e

elungky commited on

Add 'omegaconf' package to cosmos-predict1.yaml dependencies
488121d

elungky commited on

Add 'omegaconf' package to cosmos-predict1.yaml dependencies
61d27a8

elungky commited on

Add 'omegaconf' package to cosmos-predict1.yaml dependencies
33eb31d

elungky commited on

Add 'omegaconf' package to cosmos-predict1.yaml dependencies
579d922

elungky commited on

Add 'omegaconf' package to cosmos-predict1.yaml dependencies
7104534

elungky commited on

Add 'einops' package to cosmos-predict1.yaml dependencies
3cc0f0e

elungky commited on

Fix PyTorch verification command in Dockerfile using heredoc for robust multi-line Python
ff1b85e

elungky commited on

Fix PyTorch verification command in Dockerfile using heredoc for robust multi-line Python
10e72c6

elungky commited on

Further refine PyTorch verification command in Dockerfile for robust syntax parsing
b7ff06a

elungky commited on

Install MoGe from Git repository as specified in INSTALL.md
701e903

elungky commited on

Add 'moge' package to cosmos-predict1.yaml dependencies
660ae5e

elungky commited on

Fix PyTorch verification command in Dockerfile to avoid f-string syntax error
e5064c2

elungky commited on

Temporarily disable 'set -u' around conda activate to resolve MKL unbound variable error
8c586aa

elungky commited on

Fix pip FileNotFoundError by using absolute path in cosmos-predict1.yaml
0b8e777

elungky commited on

Resolve Conda environment conflicts by simplifying CUDA dependencies and using stable PyTorch
8bdfbb7

elungky commited on

Fix 'source: not found' error by using '.' in Dockerfile and start.sh
59d6df8

elungky commited on

Explicitly accept conda TOS in Dockerfile
ca59c13

elungky commited on

Configure Dockerfile with provided cosmos-predict1.yaml and install pip deps
5fa8a70

elungky commited on

Add directory listings for conda diagnosis in start.sh
a1c20fc

elungky commited on

Attempt to fix conda path by adding /opt/conda/bin to PATH
4ad0a28

elungky commited on

Robustly locate and activate conda environment in start.sh
5114a95

elungky commited on

Activate conda environment 'cosmos-predict1' in start.sh
4a9d38b

elungky commited on

Fix ModuleNotFoundError: Add /app/gui/api to PYTHONPATH
86ed0ec

elungky commited on

Ensured gui/requirements.txt is explicitly added and committed
1e34ce9

elungky commited on