postprocess at model res, defer resize+write to CPU (saves ~35s GPU) f4a7288 Running Nekochu commited on 18 days ago
add blue 1024 ONNX (FP16 on disk, FP32 at runtime), rename models 646f0cd Nekochu commited on 18 days ago
safetensors loading, Phase 0 4x faster (uint8), total time in status 33e616b Nekochu commited on 18 days ago
quality: lower clean_matte threshold 0.25→0.02, always keep largest component 1363975 Nekochu commited on 18 days ago
cleanup: stale comments, dead import, redundant makedirs, fix batch size in UI a2a7a3e Nekochu commited on 18 days ago
simplify: merge write functions, fix missing Processed output, bulk transfer 9d23c67 Nekochu commited on 18 days ago
remove dead code: AOTI export, inductor/triton cache, shared_results, deferred write 2a4471f Nekochu commited on 18 days ago
disable torch.compile on ZeroGPU — net negative for GreenFormer f4a2965 Nekochu commited on 19 days ago
fix: reduce-overhead instead of max-autotune (118s→~30s), dedicated export endpoint c53eb28 Nekochu commited on 19 days ago
fix README: accurate torch.compile description, no triton/AOTI claim cdef1d9 Nekochu commited on 19 days ago
GPU postprocessing pipeline + TF32 + conditional torch.compile 8ea0c8b Nekochu commited on 19 days ago
add ZeroGPU GPU inference (FP16, flash-attn, batch=32@1024/16@2048) 0b6961f Nekochu commited on Mar 25