File size: 2,685 Bytes
2a4c86a
 
 
 
 
ded3729
 
 
 
2a4c86a
4c58b48
 
ded3729
 
2a4c86a
 
a9c9521
2a4c86a
a9c9521
2a4c86a
a9c9521
1071e0d
a9c9521
2a4c86a
1071e0d
2a4c86a
a9c9521
2a4c86a
a9c9521
2a4c86a
a9c9521
1071e0d
a9c9521
 
 
 
 
 
2a4c86a
a9c9521
2a4c86a
1071e0d
a9c9521
1071e0d
 
 
a9c9521
 
1071e0d
 
a9c9521
2a4c86a
1071e0d
 
 
 
 
 
 
 
2a4c86a
1071e0d
 
2a4c86a
 
1071e0d
a9c9521
1071e0d
 
5673750
a9c9521
1071e0d
a9c9521
2a4c86a
a9c9521
 
 
2a4c86a
a9c9521
1071e0d
a9c9521
2a4c86a
 
 
1071e0d
 
2a4c86a
 
1071e0d
 
a9c9521
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
license: mit
library_name: diffusers
pipeline_tag: unconditional-image-generation
tags:
- diffusers
- jit
- image-generation
- class-conditional
widget:
- output:
    url: demo.png
language:
- en
---

# JiT-diffusers

Native diffusers implementation of **JiT** (Just image Transformer). Each variant folder is self-contained:

- `pipeline.py` — `JiTPipeline`
- `scheduler/scheduler_config.json` — `FlowMatchHeunDiscreteScheduler` config (default `shift=4.0`)
- `transformer/jit_transformer_2d.py` — `JiTTransformer2DModel`

The pipeline now supports dynamic inference resolution in `__call__` with positional interpolation.

No separate `jit_diffusers` package; only PyPI `diffusers` plus local custom code in the variant directory.

## Available checkpoints

| Checkpoint | Path | Resolution | Recommended CFG |
| --- | --- | --- | --- |
| JiT-B/16 | `./JiT-B-16` | 256×256 | 3.0 |
| JiT-L/16 | `./JiT-L-16` | 256×256 | 2.4 |
| JiT-H/16 | `./JiT-H-16` | 256×256 | 2.2 |
| JiT-B/32 | `./JiT-B-32` | 512×512 | 3.0 |
| JiT-L/32 | `./JiT-L-32` | 512×512 | 2.5 |
| JiT-H/32 | `./JiT-H-32` | 512×512 | 2.3 |

## ImageNet class labels

Each variant keeps an English `id2label` map directly in its own `model_index.json` (DiT-style).

- `pipe.id2label` — inspect id → English label correspondence
- `pipe.labels` — reverse map (English synonym → id), sorted for browsing
- `pipe.get_label_ids("golden retriever")`
- `pipe(class_labels="golden retriever", ...)` — string labels resolved automatically

Chinese labels are preserved in the main source repo under `src/labels/id2label_cn.json` for reference.

## Inference

Run the bundled demo script from the repo root:

```bash
python demo_inference.py
```

This writes `demo.png` using `JiT-H-32` with the settings below.

```python
from pathlib import Path
from diffusers import DiffusionPipeline, FlowMatchHeunDiscreteScheduler
import torch

model_dir = Path("./JiT-H-32")
pipe = DiffusionPipeline.from_pretrained(
    str(model_dir),
    custom_pipeline=str(model_dir / "pipeline.py"),
    trust_remote_code=True,
)
pipe.scheduler = FlowMatchHeunDiscreteScheduler.from_config(pipe.scheduler.config, shift=4.0)
pipe.to("cuda")

# Numeric or human-readable labels
print(pipe.id2label[207])
print(pipe.get_label_ids("golden retriever"))

generator = torch.Generator(device="cuda").manual_seed(42)
image = pipe(
    class_labels="golden retriever",
    num_inference_steps=50,
    guidance_scale=2.3,
    generator=generator,
).images[0]
image.save("demo.png")
```

`height` and `width` default to the checkpoint's native resolution when omitted.

Load a **variant subfolder** (e.g. `./JiT-H-32`), not the repo root.