LFED / model_inference.py

Commit History

fix: load LoRA adapter weights to CPU for ZeroGPU startup compat
e8c46ef

Kasualdad commited on

Switch inference to transformers + bnb-4bit + LoRA for ZeroGPU
de794a7

Kasualdad commited on

Switch from Zero GPU to T4: remove Dockerfile, simplify theme
53a83b7

Kasualdad commited on

Day 1 expanded schema: 5 tables, 14B fine-tuned model
240383f

Kasualdad commited on

fix: Qwen2.5 chat template for fine-tuned model
c36c8fb

Kasualdad commited on

feat: swap to fine-tuned Q4_K_M model (4.68 GB)
e0c4e2f

Kasualdad commited on

feat: fine-tuned model + GGUF export pipeline fixes
31c493f

Kasualdad commited on

fix: remove nonexistent fine-tuned model reference to prevent double-linking on HF Space
14ee1d7

Kasualdad commited on

fix: restore GPU offload on Zero GPU (emulation mode handles module-level CUDA)
47f650c

Kasualdad commited on

fix: CPU-only on Zero GPU (model persists in RAM between queries), 8 threads, restore eager load
7a51b3e

Kasualdad commited on

fix: preload ALL CUDA .so files, add nvidia-cublas-cu12 + nvidia-cusparse-cu12
a963ee6

Kasualdad commited on

fix: preload libcudart.so.12 via ctypes (LD_LIBRARY_PATH has no effect at runtime)
adf4793

Kasualdad commited on

fix: resolve CUDA runtime from pip package, add GPU→CPU fallback
a6f6cbd

Kasualdad commited on

Add CUDA library path resolution for Zero GPU Spaces
903ae4b

Kasualdad commited on

Switch to CUDA wheel + n_gpu_layers=-1 for Zero GPU (T4)
7e04581

Kasualdad commited on

Optimize for free HF Space: n_threads=2, remove threading timeout overhead
4c667d9

Kasualdad commited on

Initial commit: Kasualdad LFED — Phases 0-8 complete
17674c2

Kasualdad commited on