| --- |
| license: mit |
| base_model: stepfun-ai/Step1X-Edit |
| tags: |
| - depth-estimation |
| - normal-estimation |
| - quantized |
| - int8 |
| --- |
| |
| # FE2E INT8 (Pre-quantized for CPU) |
|
|
| Pre-quantized INT8 model for [FE2E](https://github.com/AMAP-ML/FE2E) (CVPR 2026) monocular depth + surface normal estimation from a single image. |
|
|
| **Demo Space:** [WeReCooking2/FE2E-CPU](https://huggingface.co/spaces/WeReCooking2/FE2E-CPU) |
|
|
| ## Files |
|
|
| | File | Size | Description | |
| |------|------|-------------| |
| | `dit_int8_full.pt` | 12.4 GB | Step1X-Edit DiT (12.4B params) + LDRN LoRA merged, dynamic INT8 quantized | |
| | `vae_full.pt` | 335 MB | AutoEncoder, FP32 | |
|
|
| Both files are saved with `torch.save(model)` (full model, not state_dict). Load with `torch.load(..., mmap=True)` to avoid doubling memory. |
| |
| ## How it was made |
| |
| 1. Loaded FP32 base model (`step1x-edit-i1258.safetensors`) on GPU |
| 2. Cast to FP32 on CPU |
| 3. Merged LDRN LoRA in full precision |
| 4. Applied `torch.quantization.quantize_dynamic` (INT8 on all `nn.Linear` layers) |
| 5. Saved full model with `torch.save(model)` |
|
|
| ## Usage |
|
|
| ```python |
| import torch |
| |
| dit = torch.load("dit_int8_full.pt", map_location="cpu", weights_only=False, mmap=True) |
| vae = torch.load("vae_full.pt", map_location="cpu", weights_only=False, mmap=True) |
| ``` |
|
|
| Requires ~12 GB RAM with mmap loading. |
|
|
| ## Performance |
|
|
| | Platform | Time per image | |
| |----------|---------------| |
| | GPU (RTX 5090, FP8 original) | ~2s | |
| | CPU (HF free Space, INT8) | ~29 min (768x1024) | |
|
|
| Single denoise step, outputs both depth and surface normal maps simultaneously. |
|
|
| > No ONNX: PyTorch dynamo exporter produces a broken graph (100% NaN output). |
|
|
| ## Credits |
|
|
| - [FE2E](https://github.com/AMAP-ML/FE2E) (CVPR 2026) |
| - [Step1X-Edit](https://github.com/stepfun-ai/Step1X-Edit) base model |
| - [rkfg/Step1X-Edit-FP8](https://huggingface.co/rkfg/Step1X-Edit-FP8) FP8 quantization |
|
|