UniCalli_Dev / inference.py

Commit History

precache fa3 kernel and font before gpu task
743a20a

Tianshuo-Xu commited on

optimize cold start with local cache paths and font resolution
e51b773

Tianshuo-Xu commited on

fix(cuda): prevent flash_sdp init on cpu
0108eaf

Tianshuo-Xu commited on

Preserve native FP8 quantization instead of un/re-quantizing
d477f9b

Tianshuo-Xu commited on

Pre-load InternVL embedding at startup to save GPU time
4c08c35

TSXu commited on

Refactor to use Float8 + torch.compile from FLUX-Kontext-fp8
a1f5b88

TSXu commited on

Add Float8 quantization and torch.compile optimizations
a53108a

TSXu commited on

Use fp32 for inference to fix CUBLAS errors on ZeroGPU
9d88d74

TSXu commited on

Always use fp16 for inference, convert bf16 to fp16
e6fb03f

TSXu commited on

Add dtype parameter to fix CUDA bf16 compatibility issues
8bfa41d

TSXu commited on

Fix: improve HF repo ID detection to avoid confusing local paths with repo IDs
f36f495

TSXu commited on

Switch to full model with fp16/bf16 inference for better performance
39fa408

Txu647 commited on

Add NF4 4-bit inference with bitsandbytes
414150e

Txu647 commited on

UI improvements: move status bar to right side, simplify layout, update defaults to Wang Xizhi
89e2699

TSXu commited on

fix: use float32 instead of bfloat16 for compatibility
e7cbbce

Txu647 commited on

perf: parallel loading of safetensors shards
0634e0c

Txu647 commited on

perf: enable FlashAttention/MemEfficient SDPA backends instead of torch.compile
9de4f7d

Txu647 commited on

Add batch generation, torch.compile acceleration, fix dtype issues
d3ccd4b

TSXu commited on

fix: use assign=True in load_state_dict to preserve checkpoint dtype
c6a1e05

Txu647 commited on

fix: ensure model dtype is bfloat16 after loading safetensors
d4a5608

Txu647 commited on

feat: support sharded safetensors for faster model loading
a65f47f

Txu647 commited on

fix: version compatibility and HF_TOKEN for private repo
9e1bb06

Txu647 commited on

fix: update checkpoint path to TSXu/Unicalli_Pro
c2f3dfc

Txu647 commited on

feat: Add UniCalli Chinese calligraphy generator
5c86cdc

Txu647 commited on