WeReCooking2
/

FE2E-INT8

Depth Estimation

normal-estimation

Model card Files Files and versions

FE2E-INT8 / README.md

Nekochu's picture

Fix ONNX note: honest about the real cause

9e2c096 verified about 1 month ago

|

History Blame Contribute Delete

1.85 kB

	---
	license: mit
	base_model: stepfun-ai/Step1X-Edit
	tags:
	- depth-estimation
	- normal-estimation
	- quantized
	- int8
	---

	# FE2E INT8 (Pre-quantized for CPU)

	Pre-quantized INT8 model for [FE2E](https://github.com/AMAP-ML/FE2E) (CVPR 2026) monocular depth + surface normal estimation from a single image.

	Demo Space: [WeReCooking2/FE2E-CPU](https://huggingface.co/spaces/WeReCooking2/FE2E-CPU)

	## Files

	\| File \| Size \| Description \|
	\|------\|------\|-------------\|
	\| `dit_int8_full.pt` \| 12.4 GB \| Step1X-Edit DiT (12.4B params) + LDRN LoRA merged, dynamic INT8 quantized \|
	\| `vae_full.pt` \| 335 MB \| AutoEncoder, FP32 \|

	Both files are saved with `torch.save(model)` (full model, not state_dict). Load with `torch.load(..., mmap=True)` to avoid doubling memory.

	## How it was made

	1. Loaded FP32 base model (`step1x-edit-i1258.safetensors`) on GPU
	2. Cast to FP32 on CPU
	3. Merged LDRN LoRA in full precision
	4. Applied `torch.quantization.quantize_dynamic` (INT8 on all `nn.Linear` layers)
	5. Saved full model with `torch.save(model)`

	## Usage

	```python
	import torch

	dit = torch.load("dit_int8_full.pt", map_location="cpu", weights_only=False, mmap=True)
	vae = torch.load("vae_full.pt", map_location="cpu", weights_only=False, mmap=True)
	```

	Requires ~12 GB RAM with mmap loading.

	## Performance

	\| Platform \| Time per image \|
	\|----------\|---------------\|
	\| GPU (RTX 5090, FP8 original) \| ~2s \|
	\| CPU (HF free Space, INT8) \| ~29 min (768x1024) \|

	Single denoise step, outputs both depth and surface normal maps simultaneously.

	> No ONNX: PyTorch dynamo exporter produces a broken graph (100% NaN output).

	## Credits

	- [FE2E](https://github.com/AMAP-ML/FE2E) (CVPR 2026)
	- [Step1X-Edit](https://github.com/stepfun-ai/Step1X-Edit) base model
	- [rkfg/Step1X-Edit-FP8](https://huggingface.co/rkfg/Step1X-Edit-FP8) FP8 quantization