refactor: enhance model unloading and memory management for improved GPU efficiency 371aac9 Patryk Studzinski commited on 5 days ago
refactor: enable CPU offload and adjust model loading for improved performance 9ecca89 Patryk Studzinski commited on 6 days ago
refactor: disable KV cache to prevent quality degradation after multiple requests 4297da2 Patryk Studzinski commited on 6 days ago
refactor: enable 8-bit quantization and adjust device map for improved model loading diagnostics 19175de Patryk Studzinski commited on 7 days ago
refactor: disable 8-bit quantization and set device map to CPU when GPU is unavailable 31d96e8 Patryk Studzinski commited on 7 days ago
refactor: enable 8-bit quantization for improved memory efficiency in Transformers model loading 4a88d6f Patryk Studzinski commited on 7 days ago
refactor: disable 8-bit quantization and CPU offload for optimized model loading on T4 GPUs b95b5b2 Patryk Studzinski commited on 13 days ago
fix: improve error handling during model loading and fallback for quantization failures 0916214 Patryk Studzinski commited on 14 days ago
refactor: remove unused model configurations and streamline model creation logic 36a4581 Patryk Studzinski commited on 14 days ago
refactor: remove runtime installation of llama-cpp-python, now pre-installed via requirements.txt 45df19f Patryk Studzinski commited on 14 days ago
feat: Add main backup and simplified service implementations with API endpoints 9222e8a Patryk Studzinski commited on 14 days ago
fix: streamline CPU offload handling in model loading for better memory management 1784558 Patryk Studzinski commited on 14 days ago
feat: add CPU offload support for Transformers model to optimize memory usage f639230 Patryk Studzinski commited on 14 days ago
feat: add Transformers model support with GPU optimization and 8-bit quantization 470149b Patryk Studzinski commited on 15 days ago
feat: add model size and polish support to model info b31e4c3 Patryk Studzinski commited on 15 days ago
refactor: defer llama-cpp-python install to runtime 1caee5e Patryk Studzinski commited on 16 days ago
feat: add GPU-enabled Dockerfile.gpu for HF Spaces CUDA support a957e36 Patryk Studzinski commited on 16 days ago
fix: graceful fallback for llama-cpp-python installation on HF Spaces 21b6bfe Patryk Studzinski commited on 16 days ago
perf: defer llama-cpp-python build to runtime startup 4a91398 Patryk Studzinski commited on 16 days ago
feat: enable GPU acceleration for Bielik GGUF models 7c2f84b Patryk Studzinski commited on 16 days ago
update Dockerfile and README.md to replace Qwen2.5-3B and Gemma-2-2B with Bielik-1.5B-GGUF; adjust model loading instructions in the API documentation 812e56d Patryk Studzinski commited on Dec 30, 2025
update HuggingFaceInferenceAPI comment for clarity; change huggingface_hub version to minimum required f4ce3a1 Patryk Studzinski commited on Dec 29, 2025
refine GBNF grammar for car advertisement; ensure compact JSON output and improve gap-item structure 068583f Patryk Studzinski commited on Dec 29, 2025
add model management methods to ModelRegistry; include model listing, loading, and unloading functionalities c50ae32 Patryk Studzinski commited on Dec 29, 2025
add HuggingFace Inference API model; implement async initialization and text generation with caching b2cbc2b Patryk Studzinski commited on Dec 29, 2025
add GBNF grammar for car advertisement gap filling; update LlamaCppModel to support loading grammar from file c14ac43 Patryk Studzinski commited on Dec 29, 2025
add GBNF grammar utilities for structured LLM output; integrate grammar in model generation 329abd1 Patryk Studzinski commited on Dec 29, 2025
enhance infill processing to handle custom messages; return cleaned output directly when provided 89e4dfe Patryk Studzinski commited on Dec 29, 2025
update llama-cpp-python installation to version 0.3.16 for improved compatibility 3aec39a Patryk Studzinski commited on Dec 29, 2025
install llama-cpp-python at runtime to avoid build issues in HuggingFace Spaces; update requirements.txt to reflect this change c704a06 Patryk Studzinski commited on Dec 29, 2025
update LlamaCppModel initialization parameters and enable verbose logging for model loading; update llama-cpp-python requirement fb1531e Patryk Studzinski commited on Dec 29, 2025
enhance error handling in LlamaCppModel initialization; include full traceback on failure cdff838 Patryk Studzinski commited on Dec 29, 2025
add get_info method to return model details for /models endpoint baa08b7 Patryk Studzinski commited on Dec 29, 2025
add debug logging for batch infill and model generation processes; update bielik model configuration 9d2cc15 Patryk Studzinski commited on Dec 29, 2025
increase context size and improve message handling in LlamaCppModel db4996d Patryk Studzinski commited on Dec 29, 2025
fix: Add optional attributes parameter to create_infill_prompt and update InfillItem schema 1569809 Patryk Studzinski commited on Dec 16, 2025
fix: Refine infill prompt instructions for clarity and improved user guidance 48663e4 Patryk Studzinski commited on Dec 16, 2025
fix: Simplify infill prompt creation and enhance JSON parsing logic bb4c63f Patryk Studzinski commited on Dec 16, 2025
fix: Improve infill prompt instructions for clarity and structured JSON output bfb930c Patryk Studzinski commited on Dec 16, 2025
fix: Enhance infill prompt instructions for clarity and structured JSON output a67d61c Patryk Studzinski commited on Dec 15, 2025
fix: Update infill prompt instructions for clarity and structured JSON output 2997045 Patryk Studzinski commited on Dec 15, 2025
fix: Refine infill prompt instructions for clarity and strict JSON output bbce3d6 Patryk Studzinski commited on Dec 15, 2025
feat: Add flexible prompt with 2 and 3 gap examples plus explicit instruction to only fill specified gaps 8faec17 Patryk Studzinski commited on Dec 15, 2025
fix: Adjust prompt example to use 2 gaps instead of 3 to match common use case 6e635a1 Patryk Studzinski commited on Dec 15, 2025
fix: Improve infill prompt with real example to prevent model from copying generic placeholder ee4a115 Patryk Studzinski commited on Dec 15, 2025