Upload README.md with huggingface_hub

f708710 verified 6 days ago

6.76 kB

	---
	license: other
	license_name: krea-2-research
	license_link: https://huggingface.co/dataautogpt3/Krea2-weights-experiments/blob/main/LICENSE
	language:
	- en
	library_name: diffusers
	tags:
	- krea-2
	- turbo
	- weight-editing
	- diffusion
	- dit
	- mmdit
	- safetensors
	- comfyui
	- experimental
	pipeline_tag: text-to-image
	---

	# Krea 2 Turbo — Hand-Edited Weight Experiments

	![Comparison Grid](comparison_ALL.png)

	## Overview

	This repository contains weight-edited variants of the Krea 2 Turbo diffusion model. Each variant was created by surgically scaling specific transformer block weights in the 12.8B parameter single-stream MMDiT, producing artistic and functional model variations without any retraining.

	These are research artifacts from hand-editing diffusion model weights using the methodology described below. The base models (Krea 2 Turbo and Krea 2 Raw) are NOT included — only the edited variants.

	## Method

	All variants use the core formula:

	```
	theta_new = theta_original * (1 - 2 * alpha)
	```

	Where `alpha` controls the inversion strength:
	- `alpha=0.05` → scale 0.90 (subtle)
	- `alpha=0.10` → scale 0.80 (artistic sweet spot)
	- `alpha=0.15` → scale 0.70 (strong)
	- `alpha=0.20` → scale 0.60 (aggressive but functional)

	Full negation (`alpha=0.5`, scale=-1.0) breaks the model and is excluded from this repository.

	## Architecture: Krea 2 Turbo

	- Type: Single-stream MMDiT (Diffusion Transformer)
	- Parameters: 12.8B
	- File size: ~25GB per variant (BF16 + F32 tensors)
	- Structure: 28 uniform transformer blocks
	- Block sub-layers:
	- `blocks.N.attn.*` (7 tensors): gate, qknorm, wq, wk, wv, wo
	- `blocks.N.mlp.*` (3 tensors): gate, up, down (SwiGLU)
	- `blocks.N.mod.lin` (1 tensor): conditioning modulation
	- `blocks.N.prenorm.scale` / `blocks.N.postnorm.scale`

	## Variants

	### B1 — Partial Inversion (Most Artistic)
	\| Property \| Value \|
	\|---\|---\|
	\| File \| `Krea_2_turbo_inv_B1_partial10.safetensors` \|
	\| Blocks \| 12-14 (mid) \|
	\| Layers \| ALL (39 tensors per block group) \|
	\| Alpha \| 0.10 (scale=0.80) \|
	\| Result \| Most artistic variant — strong style/content shift while remaining coherent \|

	### B3 — Attention-Only Partial Inversion
	\| Property \| Value \|
	\|---\|---\|
	\| File \| `Krea_2_turbo_inv_B3_attn_p10.safetensors` \|
	\| Blocks \| 12-14 (mid) \|
	\| Layers \| attn only (21 tensors) \|
	\| Alpha \| 0.10 (scale=0.80) \|
	\| Result \| Functional, subtler than B1 — attention-specific perturbation \|

	### D — Gate Scaling (All Blocks)
	\| Property \| Value \|
	\|---\|---\|
	\| File \| `Krea_2_turbo_inv_D_gate_p20.safetensors` \|
	\| Blocks \| 0-27 (all) \|
	\| Layers \| attn.gate only (28 tensors) \|
	\| Alpha \| 0.20 (scale=0.60) \|
	\| Result \| Functional, moderate effect — gate weights are more tolerant of aggressive scaling \|

	### F — Early/Late Block Inversion
	\| Property \| Value \|
	\|---\|---\|
	\| File \| `Krea_2_turbo_F_early_a10.safetensors` \|
	\| Blocks \| 0-2 (early) \|
	\| Layers \| ALL \|
	\| Alpha \| 0.10 (scale=0.80) \|
	\| Result \| Affects structure, composition, spatial layout \|

	\| Property \| Value \|
	\|---\|---\|
	\| File \| `Krea_2_turbo_F_late_a10.safetensors` \|
	\| Blocks \| 25-27 (late) \|
	\| Layers \| ALL \|
	\| Alpha \| 0.10 (scale=0.80) \|
	\| Result \| Affects style, color, detail, texture refinement \|

	### G — Mid-Block Alpha Sweep
	Three variants at different inversion strengths on the same block zone:

	\| File \| Alpha \| Scale \| Notes \|
	\|---\|---\|---\|---\|
	\| `Krea_2_turbo_G_mid_a05.safetensors` \| 0.05 \| 0.90 \| Subtle \|
	\| `Krea_2_turbo_G_mid_a15.safetensors` \| 0.15 \| 0.70 \| Strong \|
	\| `Krea_2_turbo_G_mid_a20.safetensors` \| 0.20 \| 0.60 \| Aggressive but functional \|

	All target blocks 12-14, ALL layers.

	### H — Layer-Selective Mid-Block
	\| File \| Blocks \| Layers \| Alpha \|
	\|---\|---\|---\|---\|
	\| `Krea_2_turbo_H_mid_attn_a10.safetensors` \| 12-14 \| attn only \| 0.10 \|
	\| `Krea_2_turbo_H_mid_mlp_a10.safetensors` \| 12-14 \| mlp only \| 0.10 \|

	Isolates the effect of attention vs MLP perturbation on the same block zone.

	### I — Gradient Alpha
	\| Property \| Value \|
	\|---\|---\|
	\| File \| `Krea_2_turbo_I_gradient.safetensors` \|
	\| Blocks \| 0-27 (all) \|
	\| Layers \| ALL \|
	\| Alpha \| 0.03 → 0.17 (gradient across blocks) \|
	\| Scale \| 0.94 → 0.66 \|
	\| Result \| Smooth global perturbation — early blocks barely touched, late blocks aggressively inverted \|

	## Excluded Variants (Broken)

	The following variants were created but are broken (model produces noise/garbage) and are NOT included:

	\| Variant \| What was done \| Why it broke \|
	\|---\|---\|---\|
	\| B2_attn_full \| attn weights * -1.0 \| Full negation destroys attention computation \|
	\| D_wv_all \| wv weights * -1.0 \| Full negation of value projection \|
	\| E_ties_mid \| TIES-style sign flip on mid blocks \| Full negation variant \|

	## Usage

	### ComfyUI

	1. Place `.safetensors` files in `ComfyUI/models/diffusion_models/`
	2. Load via `UNETLoader` node
	3. Use the same VAE, CLIP, and text encoder as Krea 2 Turbo
	4. Generate with your standard Krea 2 workflow

	### Diffusers

	```python
	from diffusers import DiffusionPipeline
	import torch

	pipe = DiffusionPipeline.from_pretrained(
	"dataautogpt3/Krea2-weights-experiments",
	torch_dtype=torch.bfloat16,
	variant="bf16"
	).to("cuda")
	```

	> Note: These are diffusion model weights only. You need the corresponding VAE, text encoders, and tokenizer from the original Krea 2 Turbo release.

	## Key Findings

	1. Scaling works, full negation breaks. Partial inversion (scale 0.60-0.90) produces functional, artistic variants. Full negation (scale=-1.0) breaks the model.

	2. 10% inversion is the sweet spot. Alpha=0.10 (scale=0.80) on mid blocks 12-14 produces the most artistically interesting results.

	3. Mid blocks are safest to modify. Blocks 12-14 are the most redundant and tolerate perturbation best.

	4. Gate weights are most tolerant. Attention gate weights can be scaled to 0.60 across all blocks while remaining functional — other layers break sooner.

	5. The artistic effects come from compensation. Partial perturbation triggers creative reorganization in unedited blocks — the compensatory masquerade effect.

	## Research Context

	This work draws on findings from:
	- Task Arithmetic (Ilharco et al., ICLR 2023) — formal basis for weight negation
	- weights2weights (NeurIPS 2024) — diffusion weight space as meta-latent
	- Unraveling MMDiT Blocks (2025) — per-block role mapping for MMDiT
	- C3: Creative Concept Catalyst (CVPR 2025) — low-frequency amplification in shallow blocks
	- ConceptPrune (ICLR 2025) — tiny weight changes shift semantic output

	## Credits

	- Base model: Krea 2 Turbo (Krea AI)
	- Weight editing: DataPlusEngine
	- Methodology: Hand-editing diffusion weights via mmap-based surgical tensor scaling

	---
	license: other
	license_name: krea-2-research
	license_link: https://huggingface.co/dataautogpt3/Krea2-weights-experiments/blob/main/LICENSE
	language:
	- en
	library_name: diffusers
	tags:
	- krea-2
	- turbo
	- weight-editing
	- diffusion
	- dit
	- mmdit
	- safetensors
	- comfyui
	- experimental
	pipeline_tag: text-to-image
	---

	# Krea 2 Turbo — Hand-Edited Weight Experiments

	![Comparison Grid](comparison_ALL.png)

	## Overview

	This repository contains weight-edited variants of the Krea 2 Turbo diffusion model. Each variant was created by surgically scaling specific transformer block weights in the 12.8B parameter single-stream MMDiT, producing artistic and functional model variations without any retraining.

	These are research artifacts from hand-editing diffusion model weights using the methodology described below. The base models (Krea 2 Turbo and Krea 2 Raw) are NOT included — only the edited variants.

	## Method

	All variants use the core formula:

	```
	theta_new = theta_original * (1 - 2 * alpha)
	```

	Where `alpha` controls the inversion strength:
	- `alpha=0.05` → scale 0.90 (subtle)
	- `alpha=0.10` → scale 0.80 (artistic sweet spot)
	- `alpha=0.15` → scale 0.70 (strong)
	- `alpha=0.20` → scale 0.60 (aggressive but functional)

	Full negation (`alpha=0.5`, scale=-1.0) breaks the model and is excluded from this repository.

	## Architecture: Krea 2 Turbo

	- Type: Single-stream MMDiT (Diffusion Transformer)
	- Parameters: 12.8B
	- File size: ~25GB per variant (BF16 + F32 tensors)
	- Structure: 28 uniform transformer blocks
	- Block sub-layers:
	- `blocks.N.attn.*` (7 tensors): gate, qknorm, wq, wk, wv, wo
	- `blocks.N.mlp.*` (3 tensors): gate, up, down (SwiGLU)
	- `blocks.N.mod.lin` (1 tensor): conditioning modulation
	- `blocks.N.prenorm.scale` / `blocks.N.postnorm.scale`

	## Variants

	### B1 — Partial Inversion (Most Artistic)
	\| Property \| Value \|
	\|---\|---\|
	\| File \| `Krea_2_turbo_inv_B1_partial10.safetensors` \|
	\| Blocks \| 12-14 (mid) \|
	\| Layers \| ALL (39 tensors per block group) \|
	\| Alpha \| 0.10 (scale=0.80) \|
	\| Result \| Most artistic variant — strong style/content shift while remaining coherent \|

	### B3 — Attention-Only Partial Inversion
	\| Property \| Value \|
	\|---\|---\|
	\| File \| `Krea_2_turbo_inv_B3_attn_p10.safetensors` \|
	\| Blocks \| 12-14 (mid) \|
	\| Layers \| attn only (21 tensors) \|
	\| Alpha \| 0.10 (scale=0.80) \|
	\| Result \| Functional, subtler than B1 — attention-specific perturbation \|

	### D — Gate Scaling (All Blocks)
	\| Property \| Value \|
	\|---\|---\|
	\| File \| `Krea_2_turbo_inv_D_gate_p20.safetensors` \|
	\| Blocks \| 0-27 (all) \|
	\| Layers \| attn.gate only (28 tensors) \|
	\| Alpha \| 0.20 (scale=0.60) \|
	\| Result \| Functional, moderate effect — gate weights are more tolerant of aggressive scaling \|

	### F — Early/Late Block Inversion
	\| Property \| Value \|
	\|---\|---\|
	\| File \| `Krea_2_turbo_F_early_a10.safetensors` \|
	\| Blocks \| 0-2 (early) \|
	\| Layers \| ALL \|
	\| Alpha \| 0.10 (scale=0.80) \|
	\| Result \| Affects structure, composition, spatial layout \|

	\| Property \| Value \|
	\|---\|---\|
	\| File \| `Krea_2_turbo_F_late_a10.safetensors` \|
	\| Blocks \| 25-27 (late) \|
	\| Layers \| ALL \|
	\| Alpha \| 0.10 (scale=0.80) \|
	\| Result \| Affects style, color, detail, texture refinement \|

	### G — Mid-Block Alpha Sweep
	Three variants at different inversion strengths on the same block zone:

	\| File \| Alpha \| Scale \| Notes \|
	\|---\|---\|---\|---\|
	\| `Krea_2_turbo_G_mid_a05.safetensors` \| 0.05 \| 0.90 \| Subtle \|
	\| `Krea_2_turbo_G_mid_a15.safetensors` \| 0.15 \| 0.70 \| Strong \|
	\| `Krea_2_turbo_G_mid_a20.safetensors` \| 0.20 \| 0.60 \| Aggressive but functional \|

	All target blocks 12-14, ALL layers.

	### H — Layer-Selective Mid-Block
	\| File \| Blocks \| Layers \| Alpha \|
	\|---\|---\|---\|---\|
	\| `Krea_2_turbo_H_mid_attn_a10.safetensors` \| 12-14 \| attn only \| 0.10 \|
	\| `Krea_2_turbo_H_mid_mlp_a10.safetensors` \| 12-14 \| mlp only \| 0.10 \|

	Isolates the effect of attention vs MLP perturbation on the same block zone.

	### I — Gradient Alpha
	\| Property \| Value \|
	\|---\|---\|
	\| File \| `Krea_2_turbo_I_gradient.safetensors` \|
	\| Blocks \| 0-27 (all) \|
	\| Layers \| ALL \|
	\| Alpha \| 0.03 → 0.17 (gradient across blocks) \|
	\| Scale \| 0.94 → 0.66 \|
	\| Result \| Smooth global perturbation — early blocks barely touched, late blocks aggressively inverted \|

	## Excluded Variants (Broken)

	The following variants were created but are broken (model produces noise/garbage) and are NOT included:

	\| Variant \| What was done \| Why it broke \|
	\|---\|---\|---\|
	\| B2_attn_full \| attn weights * -1.0 \| Full negation destroys attention computation \|
	\| D_wv_all \| wv weights * -1.0 \| Full negation of value projection \|
	\| E_ties_mid \| TIES-style sign flip on mid blocks \| Full negation variant \|

	## Usage

	### ComfyUI

	1. Place `.safetensors` files in `ComfyUI/models/diffusion_models/`
	2. Load via `UNETLoader` node
	3. Use the same VAE, CLIP, and text encoder as Krea 2 Turbo
	4. Generate with your standard Krea 2 workflow

	### Diffusers

	```python
	from diffusers import DiffusionPipeline
	import torch

	pipe = DiffusionPipeline.from_pretrained(
	"dataautogpt3/Krea2-weights-experiments",
	torch_dtype=torch.bfloat16,
	variant="bf16"
	).to("cuda")
	```

	> Note: These are diffusion model weights only. You need the corresponding VAE, text encoders, and tokenizer from the original Krea 2 Turbo release.

	## Key Findings

	1. Scaling works, full negation breaks. Partial inversion (scale 0.60-0.90) produces functional, artistic variants. Full negation (scale=-1.0) breaks the model.

	2. 10% inversion is the sweet spot. Alpha=0.10 (scale=0.80) on mid blocks 12-14 produces the most artistically interesting results.

	3. Mid blocks are safest to modify. Blocks 12-14 are the most redundant and tolerate perturbation best.

	4. Gate weights are most tolerant. Attention gate weights can be scaled to 0.60 across all blocks while remaining functional — other layers break sooner.

	5. The artistic effects come from compensation. Partial perturbation triggers creative reorganization in unedited blocks — the compensatory masquerade effect.

	## Research Context

	This work draws on findings from:
	- Task Arithmetic (Ilharco et al., ICLR 2023) — formal basis for weight negation
	- weights2weights (NeurIPS 2024) — diffusion weight space as meta-latent
	- Unraveling MMDiT Blocks (2025) — per-block role mapping for MMDiT
	- C3: Creative Concept Catalyst (CVPR 2025) — low-frequency amplification in shallow blocks
	- ConceptPrune (ICLR 2025) — tiny weight changes shift semantic output

	## Credits

	- Base model: Krea 2 Turbo (Krea AI)
	- Weight editing: DataPlusEngine
	- Methodology: Hand-editing diffusion weights via mmap-based surgical tensor scaling