bielik_app_service / app /models /transformers_model.py

Commit History

refactor: enhance model unloading and memory management for improved GPU efficiency
371aac9

Patryk Studzinski commited on

refactor: enable CPU offload and adjust model loading for improved performance
9ecca89

Patryk Studzinski commited on

refactor: disable KV cache to prevent quality degradation after multiple requests
4297da2

Patryk Studzinski commited on

refactor: enable 8-bit quantization and adjust device map for improved model loading diagnostics
19175de

Patryk Studzinski commited on

refactor: disable 8-bit quantization and set device map to CPU when GPU is unavailable
31d96e8

Patryk Studzinski commited on

refactor: enable 8-bit quantization for improved memory efficiency in Transformers model loading
4a88d6f

Patryk Studzinski commited on

refactor: disable 8-bit quantization and CPU offload for optimized model loading on T4 GPUs
b95b5b2

Patryk Studzinski commited on

fix: improve error handling during model loading and fallback for quantization failures
0916214

Patryk Studzinski commited on

fix: streamline CPU offload handling in model loading for better memory management
1784558

Patryk Studzinski commited on

feat: add CPU offload support for Transformers model to optimize memory usage
f639230

Patryk Studzinski commited on

feat: add Transformers model support with GPU optimization and 8-bit quantization
470149b

Patryk Studzinski commited on