Spaces:

studzinsky
/

bielik_app_service

Sleeping

App Files Files Community

bielik_app_service

Commit History

refactor: enhance model unloading and memory management for improved GPU efficiency

371aac9

Patryk Studzinski commited on Feb 13

refactor: enable CPU offload and adjust model loading for improved performance

9ecca89

Patryk Studzinski commited on Feb 12

refactor: disable KV cache to prevent quality degradation after multiple requests

4297da2

Patryk Studzinski commited on Feb 12

refactor: enable 8-bit quantization and adjust device map for improved model loading diagnostics

19175de

Patryk Studzinski commited on Feb 11

refactor: disable 8-bit quantization and set device map to CPU when GPU is unavailable

31d96e8

Patryk Studzinski commited on Feb 11

refactor: enable 8-bit quantization for improved memory efficiency in Transformers model loading

4a88d6f

Patryk Studzinski commited on Feb 11

refactor: disable 8-bit quantization and CPU offload for optimized model loading on T4 GPUs

b95b5b2

Patryk Studzinski commited on Feb 5

fix: improve error handling during model loading and fallback for quantization failures

0916214

Patryk Studzinski commited on Feb 4

refactor: remove unused model configurations and streamline model creation logic

36a4581

Patryk Studzinski commited on Feb 4

refactor: remove runtime installation of llama-cpp-python, now pre-installed via requirements.txt

45df19f

Patryk Studzinski commited on Feb 4

feat: Add main backup and simplified service implementations with API endpoints

9222e8a

Patryk Studzinski commited on Feb 4

fix: streamline CPU offload handling in model loading for better memory management

1784558

Patryk Studzinski commited on Feb 4

feat: add CPU offload support for Transformers model to optimize memory usage

f639230

Patryk Studzinski commited on Feb 4

feat: add Transformers model support with GPU optimization and 8-bit quantization

470149b

Patryk Studzinski commited on Feb 3

feat: add model size and polish support to model info

b31e4c3

Patryk Studzinski commited on Feb 3

fix: use prebuilt CUDA wheel for llama-cpp-python

3d43242

Patryk Studzinski commited on Feb 2

fix: use python3.10 instead of python3.9 for ubuntu 22.04

9cab5ee

Patryk Studzinski commited on Feb 2

fix: defer model downloads to first request

6415787

Patryk Studzinski commited on Feb 2

refactor: defer llama-cpp-python install to runtime

1caee5e

Patryk Studzinski commited on Feb 2

fix: use symlinks instead of update-alternatives for python

ba285b0

Patryk Studzinski commited on Feb 2

fix: correct Dockerfile syntax for llama-cpp-python fallback

421d61e

Patryk Studzinski commited on Feb 2

refactor: consolidate to single unified Dockerfile with GPU support

afbf927

Patryk Studzinski commited on Feb 2

config: add GPU Dockerfile to README frontmatter

4349abd

Patryk Studzinski commited on Feb 2

feat: add GPU-enabled Dockerfile.gpu for HF Spaces CUDA support

a957e36

Patryk Studzinski commited on Feb 2

fix: graceful fallback for llama-cpp-python installation on HF Spaces

21b6bfe

Patryk Studzinski commited on Feb 2

fix: enable CUDA compilation for llama-cpp-python

ba31957

Patryk Studzinski commited on Feb 2

perf: defer llama-cpp-python build to runtime startup

4a91398

Patryk Studzinski commited on Feb 2

fix: remove invalid chown command from Dockerfile

08f73ce

Patryk Studzinski commited on Feb 2

feat: enable GPU acceleration for Bielik GGUF models

7c2f84b

Patryk Studzinski commited on Feb 2

update Dockerfile and README.md to replace Qwen2.5-3B and Gemma-2-2B with Bielik-1.5B-GGUF; adjust model loading instructions in the API documentation

812e56d

Patryk Studzinski commited on Dec 30, 2025

update HuggingFaceInferenceAPI comment for clarity; change huggingface_hub version to minimum required

f4ce3a1

Patryk Studzinski commited on Dec 29, 2025

refine GBNF grammar for car advertisement; ensure compact JSON output and improve gap-item structure

068583f

Patryk Studzinski commited on Dec 29, 2025

add model management methods to ModelRegistry; include model listing, loading, and unloading functionalities

c50ae32

Patryk Studzinski commited on Dec 29, 2025

add HuggingFace Inference API model; implement async initialization and text generation with caching

b2cbc2b

Patryk Studzinski commited on Dec 29, 2025

add GBNF grammar for car advertisement gap filling; update LlamaCppModel to support loading grammar from file

c14ac43

Patryk Studzinski commited on Dec 29, 2025

add GBNF grammar utilities for structured LLM output; integrate grammar in model generation

329abd1

Patryk Studzinski commited on Dec 29, 2025

enhance infill processing to handle custom messages; return cleaned output directly when provided

89e4dfe

Patryk Studzinski commited on Dec 29, 2025

update llama-cpp-python installation to version 0.3.16 for improved compatibility

3aec39a

Patryk Studzinski commited on Dec 29, 2025

install llama-cpp-python at runtime to avoid build issues in HuggingFace Spaces; update requirements.txt to reflect this change

c704a06

Patryk Studzinski commited on Dec 29, 2025

update LlamaCppModel initialization parameters and enable verbose logging for model loading; update llama-cpp-python requirement

fb1531e

Patryk Studzinski commited on Dec 29, 2025

enhance error handling in LlamaCppModel initialization; include full traceback on failure

cdff838

Patryk Studzinski commited on Dec 29, 2025

add get_info method to return model details for /models endpoint

baa08b7

Patryk Studzinski commited on Dec 29, 2025

add debug logging for batch infill and model generation processes; update bielik model configuration

9d2cc15

Patryk Studzinski commited on Dec 29, 2025

increase context size and improve message handling in LlamaCppModel

db4996d

Patryk Studzinski commited on Dec 29, 2025

update requirements with libmetadata

d9b1571

Patryk Studzinski commited on Dec 23, 2025

dockerfile fix

87ebbc6

Patryk Studzinski commited on Dec 23, 2025

fixing naming for bielik gguf

858725c

Patryk Studzinski commited on Dec 23, 2025

improved-index-url

821afac

Patryk Studzinski commited on Dec 23, 2025

fix-docker-error-for-gguf

2d2d7ff

Patryk Studzinski commited on Dec 23, 2025

adding-bielik-gguf

8cde7d1

Patryk Studzinski commited on Dec 23, 2025