Add cuda-toolkit and cuda-nvcc to Conda environment to provide nvcc for Transformer Engine compilation f52a4d0 elungky commited on Jul 23
Fix cudnn.h not found error by dynamically locating and symlinking it from pip installed package 8da02bd elungky commited on Jul 23
Fix cudnn.h not found error by copying to Conda env include path and setting CUDA_HOME a27c594 elungky commited on Jul 23
Fix 'conda: not found' by moving global ENV PATH for Conda earlier in Dockerfile e0e9267 elungky commited on Jul 23
Fix 'conda: not found' during Miniconda installation by updating PATH within the RUN command 193cb9f elungky commited on Jul 23
Merged Dockerfile with robust build environment for transformer-engine compilation 9679875 elungky commited on Jul 23
Final Dockerfile syntax correction: ensure chmod +x is a standalone RUN command 95d53e3 elungky commited on Jul 23
Final Dockerfile syntax correction: ensure chmod +x is a standalone RUN command fb926f5 elungky commited on Jul 23
Final fix for Dockerfile syntax: ensure chmod +x is a standalone RUN command 598f651 elungky commited on Jul 23
Fix Dockerfile syntax: separate chmod +x into its own RUN instruction 9530488 elungky commited on Jul 23
Attempt to fix Exit code 137 (OOM) by using --no-build-isolation for transformer-engine 54bda79 elungky commited on Jul 23
Fix cudnn.h not found during Transformer Engine build by adding symlinks as per INSTALL.md 62d1e04 elungky commited on Jul 22
Attempt to fix libcudnn.so.9 error by installing cudnn via conda and transformer_engine separately cf41009 elungky commited on Jul 22
Attempt to fix torchvision::nms error by installing PyTorch via pip with official CUDA index cbe7167 elungky commited on Jul 22
Attempt to fix torchvision::nms error by aligning pytorch-cuda to 12.4 9ec2085 elungky commited on Jul 22
Attempt to fix torchvision::nms error by adding cudnn and libcublas to conda dependencies d708e6e elungky commited on Jul 22
Fix PyTorch verification command in Dockerfile using heredoc for robust multi-line Python ff1b85e elungky commited on Jul 22
Fix PyTorch verification command in Dockerfile using heredoc for robust multi-line Python 10e72c6 elungky commited on Jul 22
Further refine PyTorch verification command in Dockerfile for robust syntax parsing b7ff06a elungky commited on Jul 22
Fix PyTorch verification command in Dockerfile to avoid f-string syntax error e5064c2 elungky commited on Jul 22
Temporarily disable 'set -u' around conda activate to resolve MKL unbound variable error 8c586aa elungky commited on Jul 22
Fix pip FileNotFoundError by using absolute path in cosmos-predict1.yaml 0b8e777 elungky commited on Jul 22
Resolve Conda environment conflicts by simplifying CUDA dependencies and using stable PyTorch 8bdfbb7 elungky commited on Jul 22
Fix 'source: not found' error by using '.' in Dockerfile and start.sh 59d6df8 elungky commited on Jul 22
Configure Dockerfile with provided cosmos-predict1.yaml and install pip deps 5fa8a70 elungky commited on Jul 22