| <p align="center"> | |
| <img src="../assets/sapiens_lite_logo.png" alt="Sapiens-Lite" title="Sapiens-Lite" width="500"/> | |
| </p> | |
| ## β‘ Introduction | |
| Sapiens-Lite is our optimized "inference-only" solution, offering: | |
| - Up to 4x faster inference | |
| - Minimal dependencies | |
| - Negligible accuracy loss | |
| ## π Getting Started | |
| - Set the sapiens_lite code root. | |
| ```bash | |
| export SAPIENS_LITE_ROOT=$SAPIENS_ROOT/lite | |
| ``` | |
| - We support lite-inference for multiple GPU architectures, primarily in two modes. | |
| - `MODE=torchscript`: All GPUs with PyTorch2.2+. Inference at `float32`, slower but closest to original model performance. | |
| - `MODE=bfloat16`: Optimized mode for A100 GPUs with PyTorch-2.3. Uses [FlashAttention](https://github.com/Dao-AILab/flash-attention) for accelerated inference. Coming Soon! | |
| - Note to Windows users: Please use the python scripts in `./demo` instead of `./scripts`. | |
| - Please download the checkpoints from [hugging-face](https://huggingface.co/facebook/sapiens).\ | |
| Checkpoints are suffixed with "_$MODE.pt2".\ | |
| You can be selective about only downloading the checkpoints of interest.\ | |
| Set `$SAPIENS_LITE_CHECKPOINT_ROOT` to the path of `sapiens_lite_host/$MODE`. Checkpoint directory structure: | |
| ```plaintext | |
| sapiens_lite_host/ | |
| βββ torchscript | |
| βββ pretrain/ | |
| β βββ checkpoints/ | |
| β βββ sapiens_0.3b/ | |
| β βββ sapiens_0.6b/ | |
| β βββ sapiens_1b/ | |
| β βββ sapiens_2b/ | |
| βββ pose/ | |
| βββ seg/ | |
| βββ depth/ | |
| βββ normal/ | |
| βββ bfloat16 | |
| βββ pretrain/ | |
| βββ pose/ | |
| βββ seg/ | |
| βββ depth/ | |
| βββ normal/ | |
| ``` | |
| ## π§ Installation | |
| Set up the minimal `sapiens_lite` conda environment (pytorch >= 2.2): | |
| ``` | |
| conda create -n sapiens_lite python=3.10 | |
| conda activate sapiens_lite | |
| conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia | |
| pip install opencv-python tqdm json-tricks | |
| ``` | |
| ## π Sapiens-Lite Inference | |
| Note: For inference in `bfloat16` mode: | |
| - Outputs may result in slight variations from the original `float32` predictions. | |
| - The first model run will `autotune` the model and print the log. Subsequent runs automatically load the tuned model. | |
| - Due to `torch.compile` warmup iterations, you'll observe better speedups with a larger number of images, thanks to amortization. | |
| Available tasks: | |
| - ### [Image Encoder](docs/PRETRAIN_README.md) | |
| - ### [Pose Estimation](docs/POSE_README.md) | |
| - ### [Body Part Segmentation](docs/SEG_README.md) | |
| - ### [Depth Estimation](docs/DEPTH_README.md) | |
| - ### [Surface Normal Estimation](docs/NORMAL_README.md) | |
| ## βοΈ Converting Models to Lite | |
| Obtain a `torch.ExportedProgram` or `torchscript` from the existing sapiens model checkpoint. Note, this requires the full-install `sapiens` conda env. | |
| ```bash | |
| cd $SAPIENS_ROOT/scripts/[pretrain,pose,seg]/optimize/local | |
| ./[feature_extracter,keypoints*,seg,depth,normal]_optimizer.sh | |
| ``` | |
| For inference: | |
| - Use `demo.AdhocImageDataset` wrapped with a `DataLoader` for image fetching and preprocessing.\ | |
| - Utilize the `WorkerPool` class for multiprocessing capabilities in tasks like saving predictions and visualizations. | |