Spaces:

tusarway
/

rag-backend

Running

App Files Files Community

rag-backend / model /loader.py

Commit History

fix: update model

ab16882

Running

imtrt004 commited on 12 days ago

fix: remove exAI

6780118

imtrt004 commited on 12 days ago

fix: still error

ac40983

imtrt004 commited on 12 days ago

fix: still error

666afe0

imtrt004 commited on 12 days ago

fix: check_model_input issue

bd1ed9f

imtrt004 commited on 12 days ago

fix: update req

3e3b4c9

imtrt004 commited on 12 days ago

fix: update req

01693a1

imtrt004 commited on 12 days ago

fix: update req and docker

c227cde

imtrt004 commited on 12 days ago

fix: update base model

11d16b6

imtrt004 commited on 12 days ago

fix: exAI ropeparam

5f8085e

imtrt004 commited on 12 days ago

fix: update dependency

9f311d9

imtrt004 commited on 13 days ago

fix: update backend lib with log

2aa0b72

imtrt004 commited on 14 days ago

fix: update model lists

7997082

imtrt004 commited on 14 days ago

feat: select model from admin panel

1903740

imtrt004 commited on 15 days ago

fix: update model to 360m

198a583

imtrt004 commited on 15 days ago

feat: update base free model

67a030a

imtrt004 commited on 15 days ago

perf: switch default LLM to SmolLM2-1.7B - 40-50% faster tok/s, better instruction following

18dc770

imtrt004 commited on 19 days ago

perf: greedy decoding + dtype fix - 2-3x faster inference on CPU

d16b829

imtrt004 commited on 19 days ago

fix: rewrite loader.py as clean UTF-8 - remove Windows-1252 em-dashes causing SyntaxError

8210d54

imtrt004 commited on 19 days ago

feat: self-hosted Qwen2.5-1.5B-Instruct via transformers — no external API, no compilation

deea70e

imtrt004 commited on 19 days ago

feat: replace llama-cpp-python/Groq with free HF InferenceClient (zero compilation)

98e3f05

imtrt004 commited on 19 days ago

fix: double .gguf extension — skip symlink when path already ends in .gguf; add verbose step logging

dbce995

imtrt004 commited on 20 days ago

feat: LLM readiness tracking — 503 while loading, llm_ready in /health

256f0fc

imtrt004 commited on 20 days ago

fix: symlink blob to .gguf extension so llama.cpp C loader accepts it

915613c

imtrt004 commited on 20 days ago

fix: use hf_hub_download + realpath to avoid snapshot ./path crash

fd0d531

imtrt004 commited on 20 days ago

fix: correct GGUF filename case — Qwen3-4B-Q4_K_M.gguf

1057c73

imtrt004 commited on 20 days ago

Initial backend

b5be2eb

imtrt004 commited on 20 days ago

Commit History

fix: update model ab16882 Running

fix: remove exAI 6780118

fix: still error ac40983

fix: still error 666afe0

fix: check_model_input issue bd1ed9f

fix: update req 3e3b4c9

fix: update req 01693a1

fix: update req and docker c227cde

fix: update base model 11d16b6

fix: exAI ropeparam 5f8085e

fix: update dependency 9f311d9

fix: update backend lib with log 2aa0b72

fix: update model lists 7997082

feat: select model from admin panel 1903740

fix: update model to 360m 198a583

feat: update base free model 67a030a

perf: switch default LLM to SmolLM2-1.7B - 40-50% faster tok/s, better instruction following 18dc770

perf: greedy decoding + dtype fix - 2-3x faster inference on CPU d16b829

fix: rewrite loader.py as clean UTF-8 - remove Windows-1252 em-dashes causing SyntaxError 8210d54

feat: self-hosted Qwen2.5-1.5B-Instruct via transformers — no external API, no compilation deea70e

feat: replace llama-cpp-python/Groq with free HF InferenceClient (zero compilation) 98e3f05

fix: double .gguf extension — skip symlink when path already ends in .gguf; add verbose step logging dbce995

feat: LLM readiness tracking — 503 while loading, llm_ready in /health 256f0fc

fix: symlink blob to .gguf extension so llama.cpp C loader accepts it 915613c

fix: use hf_hub_download + realpath to avoid snapshot ./path crash fd0d531

fix: correct GGUF filename case — Qwen3-4B-Q4_K_M.gguf 1057c73

Initial backend b5be2eb

fix: update model

ab16882

Running

fix: remove exAI

6780118

fix: still error

ac40983

fix: still error

666afe0

fix: check_model_input issue

bd1ed9f

fix: update req

3e3b4c9

fix: update req

01693a1

fix: update req and docker

c227cde

fix: update base model

11d16b6

fix: exAI ropeparam

5f8085e

fix: update dependency

9f311d9

fix: update backend lib with log

2aa0b72

fix: update model lists

7997082

feat: select model from admin panel

1903740

fix: update model to 360m

198a583

feat: update base free model

67a030a

perf: switch default LLM to SmolLM2-1.7B - 40-50% faster tok/s, better instruction following

18dc770

perf: greedy decoding + dtype fix - 2-3x faster inference on CPU

d16b829

fix: rewrite loader.py as clean UTF-8 - remove Windows-1252 em-dashes causing SyntaxError

8210d54

feat: self-hosted Qwen2.5-1.5B-Instruct via transformers — no external API, no compilation

deea70e

feat: replace llama-cpp-python/Groq with free HF InferenceClient (zero compilation)

98e3f05

fix: double .gguf extension — skip symlink when path already ends in .gguf; add verbose step logging

dbce995

feat: LLM readiness tracking — 503 while loading, llm_ready in /health

256f0fc

fix: symlink blob to .gguf extension so llama.cpp C loader accepts it

915613c

fix: use hf_hub_download + realpath to avoid snapshot ./path crash

fd0d531

fix: correct GGUF filename case — Qwen3-4B-Q4_K_M.gguf

1057c73

Initial backend

b5be2eb