refactor: enhance model unloading and memory management for improved GPU efficiency 371aac9 Patryk Studzinski commited on 7 days ago
refactor: enable CPU offload and adjust model loading for improved performance 9ecca89 Patryk Studzinski commited on 7 days ago
refactor: disable KV cache to prevent quality degradation after multiple requests 4297da2 Patryk Studzinski commited on 8 days ago
refactor: enable 8-bit quantization and adjust device map for improved model loading diagnostics 19175de Patryk Studzinski commited on 8 days ago
refactor: disable 8-bit quantization and set device map to CPU when GPU is unavailable 31d96e8 Patryk Studzinski commited on 8 days ago
refactor: enable 8-bit quantization for improved memory efficiency in Transformers model loading 4a88d6f Patryk Studzinski commited on 8 days ago
refactor: disable 8-bit quantization and CPU offload for optimized model loading on T4 GPUs b95b5b2 Patryk Studzinski commited on 14 days ago
fix: improve error handling during model loading and fallback for quantization failures 0916214 Patryk Studzinski commited on 15 days ago
fix: streamline CPU offload handling in model loading for better memory management 1784558 Patryk Studzinski commited on 16 days ago
feat: add CPU offload support for Transformers model to optimize memory usage f639230 Patryk Studzinski commited on 16 days ago
feat: add Transformers model support with GPU optimization and 8-bit quantization 470149b Patryk Studzinski commited on 16 days ago