Project

This guide provides instructions for setting up the environment, training the model, and running inference.

Quick Start

Follow these steps to set up the required environment.

Create and activate a new Conda environment:

conda create -n creatidesign python=3.10 -y
conda activate creatidesign

Install PyTorch with CUDA 12.0:

conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.0 -c pytorch -c nvidia

To start training the model, run the following command:

bash train/train_coco.sh

To run inference using a trained model, execute the test script:

python test_coco.py

Model Configuration: The main model configuration can be found and modified in train_coco.py.
RMA (Region Mask Attention) Settings: You can enable or disable RMA based on your available GPU memory.

Configuration	Settings in `train_coco.py`	Requirements & Performance
With RMA (Full)	`mask_cross_attention_double_layers: 1` `mask_cross_attention_single_layers: 1`	Slower training speed. Requires > 96G of GPU memory.
With RMA (Partial)	`mask_cross_attention_double_layers: 0` `mask_cross_attention_single_layers: 1`	Requires > 64G of GPU memory (e.g., ~80G).
Without RMA	`mask_cross_attention_double_layers: 0` `mask_cross_attention_single_layers: 0`	Faster training speed. Requires < 64G of GPU memory.