Stylique's picture
Upload folder using huggingface_hub
789eef1 verified

Sapiens-Lite

⚑ Introduction

Sapiens-Lite is our optimized "inference-only" solution, offering:

  • Up to 4x faster inference
  • Minimal dependencies
  • Negligible accuracy loss

πŸš€ Getting Started

  • Set the sapiens_lite code root.

    export SAPIENS_LITE_ROOT=$SAPIENS_ROOT/lite
    
  • We support lite-inference for multiple GPU architectures, primarily in two modes.

    • MODE=torchscript: All GPUs with PyTorch2.2+. Inference at float32, slower but closest to original model performance.
    • MODE=bfloat16: Optimized mode for A100 GPUs with PyTorch-2.3. Uses FlashAttention for accelerated inference. Coming Soon!
  • Note to Windows users: Please use the python scripts in ./demo instead of ./scripts.

  • Please download the checkpoints from hugging-face.
    Checkpoints are suffixed with "_$MODE.pt2".
    You can be selective about only downloading the checkpoints of interest.
    Set $SAPIENS_LITE_CHECKPOINT_ROOT to the path of sapiens_lite_host/$MODE. Checkpoint directory structure:

    sapiens_lite_host/
    β”œβ”€β”€ torchscript
        β”œβ”€β”€ pretrain/
        β”‚   └── checkpoints/
        β”‚       β”œβ”€β”€ sapiens_0.3b/
        β”‚       β”œβ”€β”€ sapiens_0.6b/
        β”‚       β”œβ”€β”€ sapiens_1b/
        β”‚       └── sapiens_2b/
        β”œβ”€β”€ pose/
        └── seg/
        └── depth/
        └── normal/
    β”œβ”€β”€ bfloat16
        β”œβ”€β”€ pretrain/
        β”œβ”€β”€ pose/
        └── seg/
        └── depth/
        └── normal/
    

πŸ”§ Installation

Set up the minimal sapiens_lite conda environment (pytorch >= 2.2):

conda create -n sapiens_lite python=3.10
conda activate sapiens_lite
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
pip install opencv-python tqdm json-tricks

🌟 Sapiens-Lite Inference

Note: For inference in bfloat16 mode:

  • Outputs may result in slight variations from the original float32 predictions.
  • The first model run will autotune the model and print the log. Subsequent runs automatically load the tuned model.
  • Due to torch.compile warmup iterations, you'll observe better speedups with a larger number of images, thanks to amortization.

Available tasks:

βš™οΈ Converting Models to Lite

Obtain a torch.ExportedProgram or torchscript from the existing sapiens model checkpoint. Note, this requires the full-install sapiens conda env.

cd $SAPIENS_ROOT/scripts/[pretrain,pose,seg]/optimize/local
./[feature_extracter,keypoints*,seg,depth,normal]_optimizer.sh

For inference:

  • Use demo.AdhocImageDataset wrapped with a DataLoader for image fetching and preprocessing.\
  • Utilize the WorkerPool class for multiprocessing capabilities in tasks like saving predictions and visualizations.