File size: 2,757 Bytes
70b4948
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
PiD — Pixel Diffusion Decoder (sdxl 2kto4k student)
====================================================

This repository redistributes a converted copy of a model checkpoint
originally produced and released by NVIDIA Corporation.

Original work
-------------
  Name:     PiD (Pixel Diffusion) — PixelDiT distillation decoders
  Author:   NVIDIA Corporation and its affiliates (NVIDIA Toronto AI Lab)
  Source:   https://huggingface.co/nvidia/PiD
  Project:  https://research.nvidia.com/labs/sil/projects/pid/
  Code:     https://github.com/nv-tlabs/pid
  Paper:    arXiv:2605.23902
  Original checkpoint:
            checkpoints/PiD_res2kto4k_sr4x_official_sdxl_distill_4step/model_ema_bf16.pth

Latent space
------------
  This is the `sdxl` PiD student — the SDXL VAE latent space (4-channel, affine
  scale 0.13025 / shift 0.0). In SceneWorks it serves every model in that latent
  space: SDXL base, RealVisXL (incl. RealVisXL Lightning), and Kolors (which
  reuses the SDXL VAE). This is the variance-preserving (VP-frame) student; the
  shipped clean (sigma=0) decode path is frame-agnostic.

What was changed
----------------
  The original PyTorch checkpoint (`model_ema_bf16.pth`) was converted to
  safetensors for SceneWorks' native MLX/candle PiD decoder (`mlx-gen-pid`). The
  conversion is a lossless key/format transform only: training-only tensors
  (`net_ema.*`, `fake_score.*`, `discriminator.*`) are dropped and the `net.`
  prefix is stripped, per the reference inference loader
  (`pid_distill_model.py::PidDistillModel.load_state_dict`). Tensor values and
  dtype (bfloat16) are unchanged. No re-training or fine-tuning was performed.

License
-------
  This work and the original are licensed under the NVIDIA License (the license
  HuggingFace tags as "NSCLv1"). The full license text is in the accompanying
  LICENSE file and applies to this redistribution and to any derivative works.

  USE LIMITATION (NVIDIA License §3.3): The Work and any derivative works thereof
  may only be used, or be intended for use, NON-COMMERCIALLY — i.e. for RESEARCH
  OR EVALUATION PURPOSES ONLY. (NVIDIA Corporation and its affiliates may use the
  Work commercially.)

  This non-commercial restriction FLOWS TO THE OUTPUT: images decoded with this
  PiD decoder are for research/evaluation use only and are distinct in that
  respect from images produced by the rest of the SceneWorks pipeline.

  Per NVIDIA License §3.1, this distribution (a) is under the same license,
  (b) includes a complete copy of it, and (c) retains all copyright/patent/
  trademark/attribution notices present in the original Work. NVIDIA's name,
  logos, and trademarks are not used except as necessary to reproduce these
  notices (NVIDIA License §3.5).