Spaces:

tusarway
/

rag-backend

Running

App Files Files Community

rag-backend

Commit History

fix: update model

ab16882

Running

imtrt004 commited on Mar 6

fix: remove exAI

6780118

imtrt004 commited on Mar 6

fix: still error

ac40983

imtrt004 commited on Mar 6

fix: still error

666afe0

imtrt004 commited on Mar 6

fix: chat error

d5be08f

imtrt004 commited on Mar 6

fix: chat error

1589f43

imtrt004 commited on Mar 6

fix: check_model_input issue

bd1ed9f

imtrt004 commited on Mar 6

fix: update app py

7a3d50f

imtrt004 commited on Mar 6

fix: update req

3e3b4c9

imtrt004 commited on Mar 6

fix: update req

ae136a2

imtrt004 commited on Mar 6

fix: update req

01693a1

imtrt004 commited on Mar 6

fix: update req and docker

c227cde

imtrt004 commited on Mar 6

fix: update base model

11d16b6

imtrt004 commited on Mar 6

fix: exAI ropeparam

5f8085e

imtrt004 commited on Mar 6

fix: update req

44c35da

imtrt004 commited on Mar 6

fix: for exai req update

e0ebda8

imtrt004 commited on Mar 5

fix: update req and docker

5d9ef52

imtrt004 commited on Mar 5

fix: update dockerfile

839aefc

imtrt004 commited on Mar 5

fix: update req and head method

fa776dd

imtrt004 commited on Mar 5

fix: update dependency

9f311d9

imtrt004 commited on Mar 5

fix: update requirements for model

8fe0004

imtrt004 commited on Mar 5

fix: update backend lib with log

2aa0b72

imtrt004 commited on Mar 4

fix: update model lists

7997082

imtrt004 commited on Mar 4

fix: model changing

11a93d7

imtrt004 commited on Mar 4

feat: select model from admin panel

1903740

imtrt004 commited on Mar 4

fix: update model to 360m

198a583

imtrt004 commited on Mar 4

feat: update base free model

67a030a

imtrt004 commited on Mar 4

fix: update tier mode

0829183

imtrt004 commited on Mar 4

fix: update limit

69975bb

imtrt004 commited on Mar 3

feat: add ping for uptime

113b6c1

imtrt004 commited on Mar 3

fix: update context window and prompt

27128c4

imtrt004 commited on Mar 3

feat: add cerebras api and super mode

a488f5e

imtrt004 commited on Mar 3

feat: answering question

4d4abe9

imtrt004 commited on Feb 28

feat: line number and multi docs

391fc60

imtrt004 commited on Feb 28

fix: improve chunking

e2cc6a2

imtrt004 commited on Feb 28

fix: upload issue large than 23 mb

311fda6

imtrt004 commited on Feb 27

fix: llm model info

e859e26

imtrt004 commited on Feb 27

feat: add groq api and deepmind

29cfc16

imtrt004 commited on Feb 27

perf: switch default LLM to SmolLM2-1.7B - 40-50% faster tok/s, better instruction following

18dc770

imtrt004 commited on Feb 27

fix: JSON-encode SSE tokens to preserve newlines in markdown; reduce top_k to 3

ae897ea

imtrt004 commited on Feb 27

perf: greedy decoding + dtype fix - 2-3x faster inference on CPU

d16b829

imtrt004 commited on Feb 27

fix: rewrite loader.py as clean UTF-8 - remove Windows-1252 em-dashes causing SyntaxError

8210d54

imtrt004 commited on Feb 27

feat: self-hosted Qwen2.5-1.5B-Instruct via transformers — no external API, no compilation

deea70e

imtrt004 commited on Feb 27

feat: replace llama-cpp-python/Groq with free HF InferenceClient (zero compilation)

98e3f05

imtrt004 commited on Feb 27

fix: restore build-essential+cmake, pin llama-cpp-python==0.3.16 for stable layer cache

bfaa120

imtrt004 commited on Feb 27

fix: use pre-built llama-cpp-python CPU wheel — eliminates 8min C++ compile

6e6147b

imtrt004 commited on Feb 27

fix: Dockerfile — pre-install CPU torch, upgrade llama-cpp-python to >=0.3.14 (qwen3 support)

5cfcd30

imtrt004 commited on Feb 27

fix: upgrade llama-cpp-python >=0.3.14 for qwen3 arch support (was 0.3.8, pre-May 2025)

a0250ac

imtrt004 commited on Feb 27

fix: double .gguf extension — skip symlink when path already ends in .gguf; add verbose step logging

dbce995

imtrt004 commited on Feb 27

feat: LLM readiness tracking — 503 while loading, llm_ready in /health

256f0fc

imtrt004 commited on Feb 27

Commit History

fix: update model ab16882 Running

fix: remove exAI 6780118

fix: still error ac40983

fix: still error 666afe0

fix: chat error d5be08f

fix: chat error 1589f43

fix: check_model_input issue bd1ed9f

fix: update app py 7a3d50f

fix: update req 3e3b4c9

fix: update req ae136a2

fix: update req 01693a1

fix: update req and docker c227cde

fix: update base model 11d16b6

fix: exAI ropeparam 5f8085e

fix: update req 44c35da

fix: for exai req update e0ebda8

fix: update req and docker 5d9ef52

fix: update dockerfile 839aefc

fix: update req and head method fa776dd

fix: update dependency 9f311d9

fix: update requirements for model 8fe0004

fix: update backend lib with log 2aa0b72

fix: update model lists 7997082

fix: model changing 11a93d7

feat: select model from admin panel 1903740

fix: update model to 360m 198a583

feat: update base free model 67a030a

fix: update tier mode 0829183

fix: update limit 69975bb

feat: add ping for uptime 113b6c1

fix: update context window and prompt 27128c4

feat: add cerebras api and super mode a488f5e

feat: answering question 4d4abe9

feat: line number and multi docs 391fc60

fix: improve chunking e2cc6a2

fix: upload issue large than 23 mb 311fda6

fix: llm model info e859e26

feat: add groq api and deepmind 29cfc16

perf: switch default LLM to SmolLM2-1.7B - 40-50% faster tok/s, better instruction following 18dc770

fix: JSON-encode SSE tokens to preserve newlines in markdown; reduce top_k to 3 ae897ea

perf: greedy decoding + dtype fix - 2-3x faster inference on CPU d16b829

fix: rewrite loader.py as clean UTF-8 - remove Windows-1252 em-dashes causing SyntaxError 8210d54

feat: self-hosted Qwen2.5-1.5B-Instruct via transformers — no external API, no compilation deea70e

feat: replace llama-cpp-python/Groq with free HF InferenceClient (zero compilation) 98e3f05

fix: restore build-essential+cmake, pin llama-cpp-python==0.3.16 for stable layer cache bfaa120

fix: use pre-built llama-cpp-python CPU wheel — eliminates 8min C++ compile 6e6147b

fix: Dockerfile — pre-install CPU torch, upgrade llama-cpp-python to >=0.3.14 (qwen3 support) 5cfcd30

fix: upgrade llama-cpp-python >=0.3.14 for qwen3 arch support (was 0.3.8, pre-May 2025) a0250ac

fix: double .gguf extension — skip symlink when path already ends in .gguf; add verbose step logging dbce995

feat: LLM readiness tracking — 503 while loading, llm_ready in /health 256f0fc

fix: update model

ab16882

Running

fix: remove exAI

6780118

fix: still error

ac40983

fix: still error

666afe0

fix: chat error

d5be08f

fix: chat error

1589f43

fix: check_model_input issue

bd1ed9f

fix: update app py

7a3d50f

fix: update req

3e3b4c9

fix: update req

ae136a2

fix: update req

01693a1

fix: update req and docker

c227cde

fix: update base model

11d16b6

fix: exAI ropeparam

5f8085e

fix: update req

44c35da

fix: for exai req update

e0ebda8

fix: update req and docker

5d9ef52

fix: update dockerfile

839aefc

fix: update req and head method

fa776dd

fix: update dependency

9f311d9

fix: update requirements for model

8fe0004

fix: update backend lib with log

2aa0b72

fix: update model lists

7997082

fix: model changing

11a93d7

feat: select model from admin panel

1903740

fix: update model to 360m

198a583

feat: update base free model

67a030a

fix: update tier mode

0829183

fix: update limit

69975bb

feat: add ping for uptime

113b6c1

fix: update context window and prompt

27128c4

feat: add cerebras api and super mode

a488f5e

feat: answering question

4d4abe9

feat: line number and multi docs

391fc60

fix: improve chunking

e2cc6a2

fix: upload issue large than 23 mb

311fda6

fix: llm model info

e859e26

feat: add groq api and deepmind

29cfc16

perf: switch default LLM to SmolLM2-1.7B - 40-50% faster tok/s, better instruction following

18dc770

fix: JSON-encode SSE tokens to preserve newlines in markdown; reduce top_k to 3

ae897ea

perf: greedy decoding + dtype fix - 2-3x faster inference on CPU

d16b829

fix: rewrite loader.py as clean UTF-8 - remove Windows-1252 em-dashes causing SyntaxError

8210d54

feat: self-hosted Qwen2.5-1.5B-Instruct via transformers — no external API, no compilation

deea70e

feat: replace llama-cpp-python/Groq with free HF InferenceClient (zero compilation)

98e3f05

fix: restore build-essential+cmake, pin llama-cpp-python==0.3.16 for stable layer cache

bfaa120

fix: use pre-built llama-cpp-python CPU wheel — eliminates 8min C++ compile

6e6147b

fix: Dockerfile — pre-install CPU torch, upgrade llama-cpp-python to >=0.3.14 (qwen3 support)

5cfcd30

fix: upgrade llama-cpp-python >=0.3.14 for qwen3 arch support (was 0.3.8, pre-May 2025)

a0250ac

fix: double .gguf extension — skip symlink when path already ends in .gguf; add verbose step logging

dbce995

feat: LLM readiness tracking — 503 while loading, llm_ready in /health

256f0fc