---
license: cc-by-nc-4.0
---

This repository contains the model weights and configuration files for the [**Sphere Encoder**](https://github.com) project.

> [!Note]
> These model weights have been **reproduced** with the released code and yield slightly different evaluation results compared to those reported in the original paper.

# Model Card

| dataset | 🤗 hf model repo | params |
|:--:|:--:|:--:|
| Animal-Faces | [`sphere-l-af`](https://huggingface.co/kaiyuyue/sphere-encoder-models/tree/main/sphere-l-af) | 642M |
| Oxford-Flowers | [`sphere-l-of`](https://huggingface.co/kaiyuyue/sphere-encoder-models/tree/main/sphere-l-of) | 948M |
| ImageNet | [`sphere-l-imagenet`](https://huggingface.co/kaiyuyue/sphere-encoder-models/tree/main/sphere-l-imagenet) | 950M |
| ImageNet | [`sphere-xl-imagenet`](https://huggingface.co/kaiyuyue/sphere-encoder-models/tree/main/sphere-xl-imagenet) | 1.3B |

Download model checkpoints and put them in `./workspace/experiments`.
The directory tree should look like this:

```bash
./workspace/experiments/
├── sphere-l-af
    ├── ckpt/ep0999.pth
    |── config.json
├── sphere-l-of
|── sphere-l-imagenet
|── sphere-xl-imagenet
```

<br>

# Evaluation Results 

Evaluate **ImageNet** models with `CFG = 1.4`:

```bash
# --job_dir can be
#   sphere-l-imagenet, or sphere-xl-imagenet

./run.sh eval.py \
  --job_dir sphere-xl-imagenet \
  --forward_steps 1 4 \
  --report_fid rfid gfid \
  --use_cfg True \
  --cfg_min 1.4 \
  --cfg_max 1.4 \
  --cfg_position combo \
  --rm_folder_after_eval True
```

The evaluation results will be saved in `./workspace/experiments/sphere-xl-imagenet/eval/`:

| dataset | model | steps | rFID &darr; | gFID &darr; | IS &uarr; |
|:--:|:--|:--:|:--:|--:|:--:|
ImageNet 256x256 | Sphere-L | 1 | 0.62 | 15.69 | 274.5 |
|| Sphere-L | 4 | - | 4.78 | 259.1 |
|| Sphere-XL | 1 | 0.62 | 14.52 | 299.3 |
|| Sphere-XL | 4 | - | 4.05 | 266.0 |

Evaluate unconditional **Animal-Faces** model:

```bash
./run.sh eval.py \
  --job_dir sphere-l-af \
  --forward_steps 1 4 \
  --report_fid gfid \
  --rm_folder_after_eval True
```

| dataset | model | steps | rFID &darr; | gFID &darr; | IS &uarr; |
|:--:|:--|:--:|:--:|:--:|:--:|
Animal-Faces 256x256 | Sphere-L | 1 | - | 21.56 | 8.3 |
|| Sphere-L | 4 | - | 18.73 | 9.8 |

Evaluate **Oxford-Flowers** model with `CFG = 1.4`:

```bash
./run.sh eval.py \
  --job_dir sphere-l-of \
  --forward_steps 1 4 \
  --report_fid gfid \
  --use_cfg True \
  --cfg_min 1.6 \
  --cfg_max 1.6 \
  --cfg_position combo \
  --num_eval_samples 51000 \
  --rm_folder_after_eval True \
  --cache_sampling_noise False \
```

`--num_eval_samples = 51000` are set for 102 classes such that each class has 500 samples for evaluation on 8 gpus. 
Adjust them accordingly if you have different number of gpus or want to evaluate on different number of samples.

| dataset | model | steps | rFID &darr; | gFID &darr; | IS &uarr; |
|:--:|:--|:--:|:--:|:--:|:--:|
| Oxford-Flowers 256x256 | Sphere-L | 1 | - | 25.10 | 3.4 |
|| Sphere-L | 4 | - | 11.27 | 3.2 |