File size: 3,074 Bytes
0a37d1d b76a443 106256f 7c2c7b4 3773e46 7c2c7b4 3773e46 7c2c7b4 3773e46 106256f 3773e46 7c2c7b4 3773e46 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | ---
license: cc-by-nc-4.0
---
This repository contains the model weights and configuration files for the [**Sphere Encoder**](https://github.com) project.
> [!Note]
> These model weights have been **reproduced** with the released code and yield slightly different evaluation results compared to those reported in the original paper.
# Model Card
| dataset | π€ hf model repo | params |
|:--:|:--:|:--:|
| Animal-Faces | [`sphere-l-af`](https://huggingface.co/kaiyuyue/sphere-encoder-models/tree/main/sphere-l-af) | 642M |
| Oxford-Flowers | [`sphere-l-of`](https://huggingface.co/kaiyuyue/sphere-encoder-models/tree/main/sphere-l-of) | 948M |
| ImageNet | [`sphere-l-imagenet`](https://huggingface.co/kaiyuyue/sphere-encoder-models/tree/main/sphere-l-imagenet) | 950M |
| ImageNet | [`sphere-xl-imagenet`](https://huggingface.co/kaiyuyue/sphere-encoder-models/tree/main/sphere-xl-imagenet) | 1.3B |
Download model checkpoints and put them in `./workspace/experiments`.
The directory tree should look like this:
```bash
./workspace/experiments/
βββ sphere-l-af
βββ ckpt/ep0999.pth
|ββ config.json
βββ sphere-l-of
|ββ sphere-l-imagenet
|ββ sphere-xl-imagenet
```
<br>
# Evaluation Results
Evaluate **ImageNet** models with `CFG = 1.4`:
```bash
# --job_dir can be
# sphere-l-imagenet, or sphere-xl-imagenet
./run.sh eval.py \
--job_dir sphere-xl-imagenet \
--forward_steps 1 4 \
--report_fid rfid gfid \
--use_cfg True \
--cfg_min 1.4 \
--cfg_max 1.4 \
--cfg_position combo \
--rm_folder_after_eval True
```
The evaluation results will be saved in `./workspace/experiments/sphere-xl-imagenet/eval/`:
| dataset | model | steps | rFID ↓ | gFID ↓ | IS ↑ |
|:--:|:--|:--:|:--:|--:|:--:|
ImageNet 256x256 | Sphere-L | 1 | 0.62 | 15.69 | 274.5 |
|| Sphere-L | 4 | - | 4.78 | 259.1 |
|| Sphere-XL | 1 | 0.62 | 14.52 | 299.3 |
|| Sphere-XL | 4 | - | 4.05 | 266.0 |
Evaluate unconditional **Animal-Faces** model:
```bash
./run.sh eval.py \
--job_dir sphere-l-af \
--forward_steps 1 4 \
--report_fid gfid \
--rm_folder_after_eval True
```
| dataset | model | steps | rFID ↓ | gFID ↓ | IS ↑ |
|:--:|:--|:--:|:--:|:--:|:--:|
Animal-Faces 256x256 | Sphere-L | 1 | - | 21.56 | 8.3 |
|| Sphere-L | 4 | - | 18.73 | 9.8 |
Evaluate **Oxford-Flowers** model with `CFG = 1.4`:
```bash
./run.sh eval.py \
--job_dir sphere-l-of \
--forward_steps 1 4 \
--report_fid gfid \
--use_cfg True \
--cfg_min 1.6 \
--cfg_max 1.6 \
--cfg_position combo \
--num_eval_samples 51000 \
--rm_folder_after_eval True \
--cache_sampling_noise False \
```
`--num_eval_samples = 51000` are set for 102 classes such that each class has 500 samples for evaluation on 8 gpus.
Adjust them accordingly if you have different number of gpus or want to evaluate on different number of samples.
| dataset | model | steps | rFID ↓ | gFID ↓ | IS ↑ |
|:--:|:--|:--:|:--:|:--:|:--:|
| Oxford-Flowers 256x256 | Sphere-L | 1 | - | 25.10 | 3.4 |
|| Sphere-L | 4 | - | 11.27 | 3.2 |
|