This repository contains the model weights and configuration files for the Sphere Encoder project.
These model weights have been reproduced with the released code and yield slightly different evaluation results compared to those reported in the original paper.
Model Card
| dataset | π€ hf model repo | params |
|---|---|---|
| Animal-Faces | sphere-l-af |
642M |
| Oxford-Flowers | sphere-l-of |
948M |
| ImageNet | sphere-l-imagenet |
950M |
| ImageNet | sphere-xl-imagenet |
1.3B |
Download model checkpoints and put them in ./workspace/experiments.
The directory tree should look like this:
./workspace/experiments/
βββ sphere-l-af
βββ ckpt/ep0999.pth
|ββ config.json
βββ sphere-l-of
|ββ sphere-l-imagenet
|ββ sphere-xl-imagenet
Evaluation Results
Evaluate ImageNet models with CFG = 1.4:
# --job_dir can be
# sphere-l-imagenet, or sphere-xl-imagenet
./run.sh eval.py \
--job_dir sphere-xl-imagenet \
--forward_steps 1 4 \
--report_fid rfid gfid \
--use_cfg True \
--cfg_min 1.4 \
--cfg_max 1.4 \
--cfg_position combo \
--rm_folder_after_eval True
The evaluation results will be saved in ./workspace/experiments/sphere-xl-imagenet/eval/:
| dataset | model | steps | rFID β | gFID β | IS β |
|---|---|---|---|---|---|
| ImageNet 256x256 | Sphere-L | 1 | 0.62 | 15.69 | 274.5 |
| Sphere-L | 4 | - | 4.78 | 259.1 | |
| Sphere-XL | 1 | 0.62 | 14.52 | 299.3 | |
| Sphere-XL | 4 | - | 4.05 | 266.0 |
Evaluate unconditional Animal-Faces model:
./run.sh eval.py \
--job_dir sphere-l-af \
--forward_steps 1 4 \
--report_fid gfid \
--rm_folder_after_eval True
| dataset | model | steps | rFID β | gFID β | IS β |
|---|---|---|---|---|---|
| Animal-Faces 256x256 | Sphere-L | 1 | - | 21.56 | 8.3 |
| Sphere-L | 4 | - | 18.73 | 9.8 |
Evaluate Oxford-Flowers model with CFG = 1.4:
./run.sh eval.py \
--job_dir sphere-l-of \
--forward_steps 1 4 \
--report_fid gfid \
--use_cfg True \
--cfg_min 1.6 \
--cfg_max 1.6 \
--cfg_position combo \
--num_eval_samples 51000 \
--rm_folder_after_eval True \
--cache_sampling_noise False \
--num_eval_samples = 51000 are set for 102 classes such that each class has 500 samples for evaluation on 8 gpus.
Adjust them accordingly if you have different number of gpus or want to evaluate on different number of samples.
| dataset | model | steps | rFID β | gFID β | IS β |
|---|---|---|---|---|---|
| Oxford-Flowers 256x256 | Sphere-L | 1 | - | 25.10 | 3.4 |
| Sphere-L | 4 | - | 11.27 | 3.2 |