KSimplex Geometric Prior for Stable Diffusion: Complete Mathematical Reference

Community Article Published February 6, 2026

A Cayley-Menger validated pentachoron attention skeleton for rectified flow diffusion.

Author: AbstractPhil
License: MIT
Weights: AbstractPhil/sd15-rectified-geometric-matching


Abstract

We present KSimplex, a geometric cross-attention prior that modulates CLIP conditioning before it enters the frozen UNet of a Stable Diffusion 1.5 rectified flow pipeline. The prior operates in a 4-dimensional simplex (pentachoron) coordinate space, using Cayley-Menger determinants to validate geometric configurations and enforce structural coherence. With only 4.8M trainable parameters (0.56% of the 859M frozen backbone), the prior produces measurable improvements in spatial coherence and anatomical correctness after a single epoch on 10,000 synthetic images.

This document catalogs every mathematical formula implemented in the system, from simplex construction through training loss to ODE sampling.


1. Architecture

CLIP(prompt) → [B, 77, 768]
    → KSimplexCrossAttentionPrior  (4.8M params, fp32)
        → StackedKSimplexAttention (4 layers)
            → KSimplexAttentionLayer × 4
                → coordinate projection
                → soft vertex assignment
                → deformed template anchoring
                → geometric attention (distance-based)
                → CM-validated value projection
        → residual blend
    → modulated [B, 77, 768]
    → Frozen SD1.5 UNet (859M params, fp16)
    → v_predicted

All original SD1.5 weights load unmodified. Geometric parameters are purely additive.


2. Simplex Foundations

2.1 Regular Simplex Template

A k-simplex is the k-dimensional generalization of a triangle. For k=4 (pentachoron), we embed n = k+1 = 5 vertices in ℝ^{d_e} (d_e = 32).

The regular template T ∈ ℝ^{(k+1) × d_e} has unit edge lengths:

dij2=TiTj2=1  ijd_{ij}^2 = \|\mathbf{T}_i - \mathbf{T}_j\|^2 = 1 \quad \forall\; i \neq j

Constructed via geovocab2.shapes.factory.SimplexFactory(k=4, embed_dim=32, method="regular").

2.2 Edge Count

For n = k+1 vertices:

E=(n2)=n(n1)2|\mathcal{E}| = \binom{n}{2} = \frac{n(n-1)}{2}

For k=4: |E| = 5·4/2 = 10 edges.

2.3 Stability Constraint

Empirically validated: the embedding dimension must satisfy:

dek8\frac{d_e}{k} \geq 8

For k=4, d_e=32: ratio = 8 ✓. Below this ratio, CM determinants become numerically unstable.

2.4 Pairwise Squared Distances (Vectorized)

Given vertices V ∈ ℝ^{n × d_e}:

Dij2=m=1de(VimVjm)2D_{ij}^2 = \sum_{m=1}^{d_e} (V_{im} - V_{jm})^2

Computed as: diff = V.unsqueeze(1) - V.unsqueeze(0), D² = (diff²).sum(dim=-1). Upper triangle extracted via torch.triu_indices(n, n, offset=1).


3. Cayley-Menger Determinant

3.1 Bordered Distance Matrix

For n = k+1 points with pairwise squared distances d²_{ij}, the CM determinant is the determinant of the (n+1)×(n+1) bordered matrix:

CM(d2)=011110d012d0,n121d0120d1,n121d0,n12d1,n120\text{CM}(d^2) = \begin{vmatrix} 0 & 1 & 1 & \cdots & 1 \\ 1 & 0 & d_{01}^2 & \cdots & d_{0,n-1}^2 \\ 1 & d_{01}^2 & 0 & \cdots & d_{1,n-1}^2 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & d_{0,n-1}^2 & d_{1,n-1}^2 & \cdots & 0 \end{vmatrix}

Structure:

  • cm[0, 0] = 0 (corner)
  • cm[0, 1:] = cm[1:, 0] = 1 (border row/column)
  • cm[i+1, j+1] = d²_{ij} for i ≠ j (squared distances)
  • cm[i+1, i+1] = 0 (diagonal)

Computed via torch.linalg.det(cm) in float32 (half precision not supported).

3.2 Simplex Validity Condition

A set of distances defines a valid k-simplex in Euclidean space iff:

(1)k+1CM(d2)>0(-1)^{k+1} \cdot \text{CM}(d^2) > 0

For k=4: (-1)^5 · CM < 0, so CM must be negative.

3.3 Simplex Volume from CM

The squared volume of a k-simplex:

vol2(Δk)=(1)k+12k(k!)2CM(d2)\text{vol}^2(\Delta_k) = \frac{(-1)^{k+1}}{2^k \cdot (k!)^2} \cdot \text{CM}(d^2)

For k=4: denom = 2^4 · (4!)^2 = 16 · 576 = 9216.

vol2(Δ4)=19216CM(d2)\text{vol}^2(\Delta_4) = \frac{-1}{9216} \cdot \text{CM}(d^2)


4. Deformed Template

4.1 Per-Layer Deformation

Each attention layer maintains learnable offsets ΔV ∈ ℝ^{(k+1) × d_e}, initialized as N(0, 0.01):

Vdeformed=T+δΔV\mathbf{V}_{\text{deformed}} = \mathbf{T} + \delta \cdot \Delta\mathbf{V}

where the deformation scale δ is clamped:

δ=clamp(δraw,  0.05,  0.5)\delta = \text{clamp}(\delta_{\text{raw}},\; 0.05,\; 0.5)

Stability zone: δ ∈ [0.15, 0.35] empirically optimal. The clamp bounds allow the optimizer to explore slightly beyond.

4.2 Timestep-Conditioned Deformation (Phase 3)

A small MLP maps the normalized timestep t ∈ [0,1] to per-layer deformation factors:

f(t)=σ ⁣(W2SiLU(W1t))[0,1]L\mathbf{f}(t) = \sigma\!\left(W_2 \cdot \text{SiLU}(W_1 \cdot t)\right) \in [0, 1]^{L}

where W₁ ∈ ℝ^{64×1}, W₂ ∈ ℝ^{L×64}, σ = sigmoid.

The effective deformation scale for layer ℓ at timestep t:

δeff(t)=clamp ⁣(δ(0.5+f(t)),  0.05,  0.5)\delta_\ell^{\text{eff}}(t) = \text{clamp}\!\Big(\delta_\ell \cdot \big(0.5 + f_\ell(t)\big),\; 0.05,\; 0.5\Big)

The multiplier (0.5 + f) ∈ [0.5, 1.5] can both reduce and amplify the base deformation.

Critical implementation detail: Effective scales are computed transiently — the underlying parameter δ_ℓ is never mutated during forward passes. An earlier version used .data.copy_() which caused training drift.


5. Token-to-Simplex Mapping

5.1 Coordinate Projection

Each CLIP token h_i ∈ ℝ^{768} is projected into simplex coordinate space:

ci=WcoordLayerNorm(hi)Rde\mathbf{c}_i = W_{\text{coord}} \cdot \text{LayerNorm}(\mathbf{h}_i) \quad \in \mathbb{R}^{d_e}

where W_coord ∈ ℝ^{d_e × 768}.

5.2 Soft Vertex Assignment

Tokens are softly routed to simplex vertices:

wi=softmax ⁣(WvertexLayerNorm(hi))Rk+1\mathbf{w}_i = \text{softmax}\!\left(W_{\text{vertex}} \cdot \text{LayerNorm}(\mathbf{h}_i)\right) \in \mathbb{R}^{k+1}

where W_vertex ∈ ℝ^{(k+1) × 768}.

5.3 Template Anchoring

The geometric contribution for each token blends the deformed template vertices:

ai=j=0kwijVdeformed,j=wiTVdeformed\mathbf{a}_i = \sum_{j=0}^{k} w_{ij} \cdot \mathbf{V}_{\text{deformed},j} = \mathbf{w}_i^T \cdot \mathbf{V}_{\text{deformed}}

Batched: template_contribution = vertex_weights @ deformed # (B, 77, d_e)

5.4 Final Coordinates

Combined projection + anchor:

c^i=ci+ai\hat{\mathbf{c}}_i = \mathbf{c}_i + \mathbf{a}_i

5.5 Unit Sphere Normalization

L2-normalized to bound pairwise squared distances to [0, 4]:

c~i=c^ic^i2\tilde{\mathbf{c}}_i = \frac{\hat{\mathbf{c}}_i}{\|\hat{\mathbf{c}}_i\|_2}

This keeps the CM determinant well-conditioned regardless of initialization scale.


6. Geometric Attention

6.1 Distance-Based Logits

Unlike standard dot-product attention, KSimplex attention derives from pairwise distances in simplex space:

dij2=c~ic~j2d_{ij}^2 = \|\tilde{\mathbf{c}}_i - \tilde{\mathbf{c}}_j\|^2

logitsij=dij2de\text{logits}_{ij} = -\frac{d_{ij}^2}{\sqrt{d_e}}

The 1/√d_e scaling mirrors standard attention's 1/√d_k. Tokens that are geometrically close in simplex space attend more strongly to each other.

6.2 Attention Weights

αij=softmaxj(logitsij)\alpha_{ij} = \text{softmax}_j(\text{logits}_{ij})

Geometric attention is shared across all H heads. Each head has independent value projections.

6.3 Multi-Head Value Projection

V(h)=LayerNorm(H)WV,reshaped to (B,T,H,dh)\mathbf{V}^{(h)} = \text{LayerNorm}(\mathbf{H}) \cdot W_V, \quad \text{reshaped to } (B, T, H, d_h)

out(h)=αV(h)RB×T×dh\text{out}^{(h)} = \alpha \cdot \mathbf{V}^{(h)} \quad \in \mathbb{R}^{B \times T \times d_h}

output=Concat(out(1),,out(H))WO\text{output} = \text{Concat}(\text{out}^{(1)}, \ldots, \text{out}^{(H)}) \cdot W_O

where d_h = 768/8 = 96.

6.4 Residual Connection

Each layer adds to the residual stream:

H(+1)=H()+output()\mathbf{H}^{(\ell+1)} = \mathbf{H}^{(\ell)} + \text{output}^{(\ell)}


7. Entropy Sharpening (Emergent Property)

7.1 Per-Layer Attention Entropy

H=ijαij()log ⁣(αij()+1010)\mathcal{H}_\ell = -\sum_{i} \sum_{j} \alpha_{ij}^{(\ell)} \log\!\left(\alpha_{ij}^{(\ell)} + 10^{-10}\right)

averaged over batch and token dimensions.

7.2 Monotonic Decrease

Validated across classification, language modeling, and now diffusion:

H1>H2>H3>H4\mathcal{H}_1 > \mathcal{H}_2 > \mathcal{H}_3 > \mathcal{H}_4

This coarse-to-fine sharpening emerges naturally from stacked geometric constraints — no explicit entropy loss is applied. Early layers attend broadly (soft geometry), later layers attend sharply (crystallized structure).


8. Residual Blending

8.1 Learnable Blend

The geo_prior output blends with original CLIP conditioning:

Eout=(1β)Eclip+βEgeo\mathbf{E}_{\text{out}} = (1 - \beta) \cdot \mathbf{E}_{\text{clip}} + \beta \cdot \mathbf{E}_{\text{geo}}

β=σ(βlogit)\beta = \sigma(\beta_{\text{logit}})

Initialized: β_logit = 0.0 → β = 0.5 (equal mix). Sigmoid bounds β ∈ (0, 1).

8.2 Timestep-Conditioned Blend (Optional)

β(t)=σ ⁣(fβ(t))\beta(t) = \sigma\!\left(f_\beta(t)\right)

fβ:R1W1R64×1SiLUW2R1×64R1f_\beta: \mathbb{R}^1 \xrightarrow{W_1 \in \mathbb{R}^{64 \times 1}} \text{SiLU} \xrightarrow{W_2 \in \mathbb{R}^{1 \times 64}} \mathbb{R}^1

Allows the network to learn different blend strengths at different noise levels.


9. Geometric Loss Functions

9.1 CM Validity Loss

Penalizes invalid simplex configurations with a hinge loss:

LCM=1L=1LReLU ⁣((1)k+1CM(d2)+ϵ)\mathcal{L}_{\text{CM}} = \frac{1}{L} \sum_{\ell=1}^{L} \text{ReLU}\!\left(-(-1)^{k+1} \cdot \text{CM}(d_\ell^2) + \epsilon\right)

where ε = 10⁻⁶ is a small margin preventing zero-volume degeneracies. Distances are sampled from the first k+1 tokens of each layer (deterministic sampling for stability).

9.2 Volume Consistency Loss

Prevents all layers from collapsing to identical geometry:

Lvol=std ⁣(logvol12,  logvol22,  ,  logvolL2)\mathcal{L}_{\text{vol}} = -\text{std}\!\left(\log|\text{vol}^2_1|,\; \log|\text{vol}^2_2|,\; \ldots,\; \log|\text{vol}^2_L|\right)

The negative standard deviation of log-volumes rewards spread: each layer should carve a distinct geometric structure, not duplicate.

9.3 Combined Geometric Loss

Lgeo=λCMLCM+λvolLvol\mathcal{L}_{\text{geo}} = \lambda_{\text{CM}} \cdot \mathcal{L}_{\text{CM}} + \lambda_{\text{vol}} \cdot \mathcal{L}_{\text{vol}}

Defaults: λ_CM = 0.01, λ_vol = 0.005.

9.4 Geometric Loss Warmup

Linear ramp over W steps prevents geometric regularization from dominating early training:

λgeoeff(s)=λgeomin ⁣(sW,  1)\lambda_{\text{geo}}^{\text{eff}}(s) = \lambda_{\text{geo}} \cdot \min\!\left(\frac{s}{W},\; 1\right)

Default: W = 200 steps.


10. Rectified Flow Matching

10.1 Linear Interpolation

Rectified flow defines a straight path between clean data x₀ and Gaussian noise ε:

xt=(1t)x0+tϵ,t[0,1]\mathbf{x}_t = (1 - t)\,\mathbf{x}_0 + t\,\boldsymbol{\epsilon}, \quad t \in [0, 1]

10.2 Velocity Target

The derivative along this path is constant:

v=dxtdt=ϵx0\mathbf{v} = \frac{d\mathbf{x}_t}{dt} = \boldsymbol{\epsilon} - \mathbf{x}_0

10.3 Shifted Schedule

Shift parameter s > 1 biases timestep distribution toward higher noise:

ts=st1+(s1)tt_s = \frac{s \cdot t}{1 + (s - 1) \cdot t}

t_base t_shifted (s=2.5)
0.1 0.217
0.3 0.517
0.5 0.714
0.7 0.854
0.9 0.957

Applied identically during training (timestep sampling) and inference (sigma schedule).

10.4 Logit-Normal Timestep Sampling

Biases samples toward mid-range timesteps where learning signal is strongest:

uN(μ,σ2),t=σ(u)=11+euu \sim \mathcal{N}(\mu, \sigma^2), \quad t = \sigma(u) = \frac{1}{1 + e^{-u}}

Default: μ = 0, σ = 1. Then scaled to [t_min, t_max] and shifted.

10.5 Continuous Float Timesteps

Timesteps passed to the UNet are not quantized:

τ=t×1000.0(float, not .long())\tau = t \times 1000.0 \quad \text{(float, not .long())}

The raw continuous t is additionally passed to the geo_prior as t_continuous to avoid quantization artifacts in the deformation schedule.


11. Min-SNR Loss Weighting

11.1 Signal-to-Noise Ratio

For the flow matching parameterization where σ = t:

SNR(t)=(1t)2t2+108\text{SNR}(t) = \frac{(1 - t)^2}{t^2 + 10^{-8}}

11.2 Min-SNR-γ Clamping

Prevents high-noise timesteps (large t, low SNR) from dominating gradients:

wsnr(t)=min(SNR(t),  γ)SNR(t)w_{\text{snr}}(t) = \frac{\min(\text{SNR}(t),\; \gamma)}{\text{SNR}(t)}

11.3 Velocity Prediction Adjustment

An additional factor accounts for the velocity parameterization:

wvel(t)=wsnr(t)SNR(t)+1w_{\text{vel}}(t) = \frac{w_{\text{snr}}(t)}{\text{SNR}(t) + 1}

11.4 Weighted Task Loss

Ltask=1Bb=1Bwvel(tb)vθ(xtb,tb)vb2\mathcal{L}_{\text{task}} = \frac{1}{B} \sum_{b=1}^{B} w_{\text{vel}}(t_b) \cdot \|\mathbf{v}_\theta(\mathbf{x}_{t_b}, t_b) - \mathbf{v}_b\|^2

Default: γ = 5.0.


12. Classifier-Free Guidance

12.1 CFG Dropout (Training)

Encoder hidden states are zeroed with probability p_drop to train the unconditional path:

E={0with probability pdropEotherwise\mathbf{E}' = \begin{cases} \mathbf{0} & \text{with probability } p_{\text{drop}} \\ \mathbf{E} & \text{otherwise} \end{cases}

Default: p_drop = 0.1.

12.2 CFG at Inference

vguided=vuncond+scfg(vcondvuncond)\mathbf{v}_{\text{guided}} = \mathbf{v}_{\text{uncond}} + s_{\text{cfg}} \cdot (\mathbf{v}_{\text{cond}} - \mathbf{v}_{\text{uncond}})

Implemented via batched forward pass: latent_input = cat([x, x]), enc_input = cat([E_cond, E_uncond]), then chunk the output.


13. Total Training Loss

Ltotal=Ltask+λgeoeff(s)Lgeo\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{task}} + \lambda_{\text{geo}}^{\text{eff}}(s) \cdot \mathcal{L}_{\text{geo}}

Expanding fully:

Ltotal=1Bbwvel(tb)vθvb2Min-SNR flow matching+λgeomin ⁣(sW,1)warmup[λCMLCMvalidity+λvolLvoldiversity]\mathcal{L}_{\text{total}} = \underbrace{\frac{1}{B}\sum_b w_{\text{vel}}(t_b) \|\mathbf{v}_\theta - \mathbf{v}_b\|^2}_{\text{Min-SNR flow matching}} + \underbrace{\lambda_{\text{geo}} \cdot \min\!\left(\frac{s}{W}, 1\right)}_{\text{warmup}} \cdot \left[\underbrace{\lambda_{\text{CM}} \cdot \mathcal{L}_{\text{CM}}}_{\text{validity}} + \underbrace{\lambda_{\text{vol}} \cdot \mathcal{L}_{\text{vol}}}_{\text{diversity}}\right]

Only the geo_prior parameters (4.8M) receive gradients. The UNet backbone (859M) and CLIP encoder are frozen.


14. Euler ODE Sampling

14.1 Sigma Schedule

Linear spacing from 1→0 with shift applied:

τi=1iN,σi=sτi1+(s1)τi\tau_i = 1 - \frac{i}{N}, \quad \sigma_i = \frac{s \cdot \tau_i}{1 + (s-1) \cdot \tau_i}

producing N+1 values from σ₀ ≈ 1 to σ_N = 0.

14.2 Euler Integration Step

xi+1=xi+(σi+1σi)vθ(xi,σi)\mathbf{x}_{i+1} = \mathbf{x}_i + (\sigma_{i+1} - \sigma_i) \cdot \mathbf{v}_\theta(\mathbf{x}_i, \sigma_i)

Note: dt = σ_{i+1} - σ_i is negative (moving from noise toward clean).


15. Learning Rate Schedule

15.1 Linear Warmup

η(s)=η0s+1Wwarmupfor s<Wwarmup\eta(s) = \eta_0 \cdot \frac{s + 1}{W_{\text{warmup}}} \quad \text{for } s < W_{\text{warmup}}

15.2 Cosine Decay

p=sWwarmupmax(StotalWwarmup,  1)p = \frac{s - W_{\text{warmup}}}{\max(S_{\text{total}} - W_{\text{warmup}},\; 1)}

η(s)=ηmin+(η0ηmin)1+cos(πp)2\eta(s) = \eta_{\min} + (\eta_0 - \eta_{\min}) \cdot \frac{1 + \cos(\pi \cdot p)}{2}

Defaults: η₀ = 10⁻⁴, η_min = 10⁻⁶, W_warmup = 100.


16. VAE Latent Space

16.1 Encoding

z=VAEenc(x)sf\mathbf{z} = \text{VAE}_{\text{enc}}(\mathbf{x}) \cdot s_f

where s_f = 0.18215 (SD1.5 scaling factor).

16.2 Decoding

x=VAEdec ⁣(zsf)\mathbf{x} = \text{VAE}_{\text{dec}}\!\left(\frac{\mathbf{z}}{s_f}\right)

Latent shape: (B, 4, H/8, W/8). For 512×512 images: (B, 4, 64, 64).


17. Mixed Precision Strategy

Component Dtype Rationale
Frozen UNet backbone fp16 859M params, memory efficiency
Geo prior (all ops) fp32 CM det numerical stability
CM determinant fp32 torch.linalg.det requires it
Geo prior output → fp16 Cast to match UNet cross-attn
GradScaler Enabled Mixed fp32 geo + fp16 UNet pass

The geo_prior runs inside torch.amp.autocast("cuda", enabled=False) with explicit .float() casts. On first inference call, geo_prior auto-casts itself to fp32 (one-time, 4.8M params ≈ 19MB).


18. Complete Hyperparameter Table

Symbol Parameter Value Source
k Simplex dimension 4 SimplexConfig
d_e Coordinate embedding dim 32 SimplexConfig
L Stacked attention layers 4 SimplexConfig
H Attention heads 8 SimplexConfig
d_h Head dimension (768/H) 96 Derived
n Vertices (k+1) 5 Derived
|E| Edges (n choose 2) 10 Derived
δ_base Base deformation scale 0.25 SimplexConfig
β_init Initial blend logit 0.0 (→β=0.5) SimplexConfig
λ_CM CM validity weight 0.01 SimplexConfig
λ_vol Volume consistency weight 0.005 SimplexConfig
λ_geo Total geo loss weight 0.01 TrainConfig
W_geo Geo loss warmup steps 200 TrainConfig
s Schedule shift 2.5 TrainConfig
γ Min-SNR gamma 5.0 TrainConfig
p_drop CFG dropout 0.1 TrainConfig
η₀ Base learning rate 10⁻⁴ TrainConfig
η_min Minimum learning rate 10⁻⁶ TrainConfig
W_lr LR warmup steps 100 TrainConfig
s_f VAE scaling factor 0.18215 SD1.5

19. Empirical Validation

Prior KSimplex Research

Benchmark Result
FMNIST classification 89.13%
CIFAR-10 classification 84.59%
CIFAR-100 classification 69.08%
KSimplex LLM (54M params) PPL 113
Geometric validity (CM sign) 100%
Attention entropy Monotonically decreasing across layers
Token sweet spot 25–77 tokens
Deformation stability zone δ ∈ [0.15, 0.35]

Diffusion Results (1 epoch, 10k ImageNet-synthetic)

  • Trainable params: 4,845,725 (0.56% of model)
  • Training time: ~7 min on L4 (24GB)
  • Batch size: 6
  • Fragmented anatomy (split bodies, duplicated heads) → coherent spatial composition
  • Circular geometry (bowls, wheels) maintains curvature
  • Novel prompts (unseen during training) produce equally coherent outputs
  • Lune base capabilities fully preserved — purely additive improvement

20. Formula Index

Quick reference of all 42 formulas by section:

# Formula Section
1 Regular simplex: d²_{ij} = 1 ∀ i≠j §2.1
2 Edge count: n(n-1)/2 §2.2
3 Stability: d_e/k ≥ 8 §2.3
4 Pairwise distances: D²_{ij} = Σ(V_i - V_j)² §2.4
5 CM bordered matrix construction §3.1
6 CM determinant: det(CM) §3.1
7 Validity: (-1)^{k+1} · CM > 0 §3.2
8 Volume²: (-1)^{k+1}/(2^k·(k!)²) · CM §3.3
9 k=4 volume: -1/9216 · CM §3.3
10 Deformed template: T + δ·ΔV §4.1
11 Deformation clamp: [0.05, 0.5] §4.1
12 Deform schedule MLP: σ(W₂·SiLU(W₁·t)) §4.2
13 Effective scale: δ·(0.5 + f(t)) §4.2
14 Coord projection: W_coord · LN(h) §5.1
15 Soft vertex assign: softmax(W_vertex · LN(h)) §5.2
16 Template anchor: w^T · V_deformed §5.3
17 Final coords: c + a §5.4
18 L2 normalize: ĉ/‖ĉ‖₂ §5.5
19 Distance logits: -d²/√d_e §6.1
20 Attention weights: softmax(logits) §6.2
21 Multi-head output: Concat(α·V^h)·W_O §6.3
22 Residual: H + output §6.4
23 Entropy: -Σ α log α §7.1
24 Entropy ordering: H₁ > H₂ > H₃ > H₄ §7.2
25 Blend: (1-β)·E_clip + β·E_geo §8.1
26 Blend factor: σ(β_logit) §8.1
27 Timestep blend: σ(MLP(t)) §8.2
28 CM validity loss: ReLU hinge §9.1
29 Volume consistency: -std(log|vol²|) §9.2
30 Combined geo: λ_CM·L_CM + λ_vol·L_vol §9.3
31 Geo warmup: λ·min(s/W, 1) §9.4
32 Flow interpolation: (1-t)·x₀ + t·ε §10.1
33 Velocity target: ε - x₀ §10.2
34 Schedule shift: s·t/(1+(s-1)·t) §10.3
35 Logit-normal: σ(N(μ,σ²)) §10.4
36 SNR: (1-t)²/t² §11.1
37 Min-SNR weight: min(SNR,γ)/SNR §11.2
38 Velocity adjustment: w/(SNR+1) §11.3
39 CFG dropout: E'=0 w.p. p_drop §12.1
40 CFG guidance: v_u + s·(v_c - v_u) §12.2
41 Euler step: x + dt·v_θ §14.2
42 Cosine LR: η_min + (η₀-η_min)·½(1+cos(πp)) §15.2

21. Usage

from sd15_trainer_geo.pipeline import load_pipeline, load_geo_from_hub
from sd15_trainer_geo.generate import generate, show_images

pipe = load_pipeline(device="cuda", dtype=torch.float16)
pipe.unet.load_pretrained(
    "AbstractPhil/tinyflux-experts",
    subfolder="", filename="sd15-flow-lune-unet.safetensors"
)
load_geo_from_hub(pipe, "AbstractPhil/sd15-rectified-geometric-matching")

out = generate(pipe, ["a tabby cat on a windowsill"], shift=2.5, seed=42)
show_images(out)

Citation

@misc{abstractphil2026ksimplex,
  title   = {KSimplex Geometric Prior for Stable Diffusion},
  author  = {AbstractPhil},
  year    = {2026},
  url     = {https://huggingface.co/AbstractPhil/sd15-rectified-geometric-matching}
}

License: MIT — Build on it freely.

Community

Sign up or log in to comment