Commit History

Indentation error
917b429
verified

gyrmo commited on

Where is this bracket that HF is finding?
d0c9e97
verified

gyrmo commited on

Added please to the placeholder text.
f04eab4
verified

gyrmo commited on

Syntax error, bracket on line 343
a7e924f
verified

gyrmo commited on

Updated how the gradio looks, and doubled down on the system prompt to ensure that the system refers to the documents as its knowledge base.
09ea7c4
verified

gyrmo commited on

Updated the system prompt a bit.
b5a2695
verified

gyrmo commited on

Called results ranked_results
b380e7b
verified

gyrmo commited on

Updated the reranker so that it stops breaking the llm section.
bcd5f13
verified

gyrmo commited on

Reducing the batch size to 1 - it'll be slower, but at least this way the model output doesn't collapse.
04ce9a3
verified

gyrmo commited on

Updated the reranker function to push the rerank model to the cpu to save on space instead of having it eat up the very limited GPU space that we are working with. Aso reduced the batch type to 8.
4e77426
verified

gyrmo commited on

Forgot that I called the function reranker_v1.
444a359
verified

gyrmo commited on

Added the rerankers package for the reranking package.
8e9dee9
verified

gyrmo commited on

Added the reranker to improve the quality of the nodes passed on to the query engine.
491dcef
verified

gyrmo commited on

Create reranker_v1.py
ece3f9f
verified

gyrmo commited on

It was the docstring with the indentation error lol.
5a3cec3
verified

gyrmo commited on

Indentation error.
7611dd1
verified

gyrmo commited on

Indent error fix
031a6bf
verified

gyrmo commited on

Forgot to update the package imports to bring in the extractor for the log handler
6f3f6e2
verified

gyrmo commited on

Changed the temperature to 0.5, and added a function that will extract the condensed question for analysis.
bd30577
verified

gyrmo commited on

Change the temp to 1
5cfb2df
verified

gyrmo commited on

Changed temperature
646e55b
verified

gyrmo commited on

Update app.py
be2873f
verified

gyrmo commited on

Added a prompt helper to help manage the tokens, reduced the summary size to 800
abf50ce
verified

gyrmo commited on

Memory issues solved
9312fe3
verified

gyrmo commited on

I now have more GPU, therefore I have now reduced the GPU utililisation to 0.8
9b002ee
verified

gyrmo commited on

Removed the quantization line - that's crashed vLLM.
5788d26
verified

gyrmo commited on

Increase the max model length, and corrected the quantisation to awq_merlin.
d5cb9c0
verified

gyrmo commited on

Added a background wait to counter the problem
887dea0
verified

gyrmo commited on

Changed max model length to 3600 to improve hte KV cache issue..
6e99e67
verified

gyrmo commited on

Reduced the time in the wait for vLLM fnuction.
dee8a12
verified

gyrmo commited on

Reduced max model length, and increased gpu utilisation to 0.9
647e94c
verified

gyrmo commited on

Specified chat mode, and made sure that the message was streamed for a nice UI action.
47362ec
verified

gyrmo commited on

Reduced the GPU utilization and specified the quantization method.
7c71431
verified

gyrmo commited on

Added a memory buffer, and moved the wait llm function to the main bit for gradio.
6c941da
verified

gyrmo commited on

Checking something
09d0f27
verified

gyrmo commited on

Update vllm_server.py
a2902fc
verified

gyrmo commited on

Changed the GPU utilisation to 0.95
c8e1e0f
verified

gyrmo commited on

I have added some server specifics because the gradio bit isn't starting up.
c54877c
verified

gyrmo commited on

Upgraded the max model length to 8092.
49b04f7
verified

gyrmo commited on

Changed the model from FP4 to AWQ
49d9cf3
verified

gyrmo commited on

Changed the model from Instruct FP4 to AWQ
4c45212
verified

gyrmo commited on

Decreased the maximum model lenght to 3408.
5691068
verified

gyrmo commited on

Moved the embedding model to the CPU. This will allow me to have more space on the GPU for the LLM.
824aa63
verified

gyrmo commited on

Changed the time from 5 to 20 seconds.
faefdf8
verified

gyrmo commited on

Switched the model from Llama 3.3-70B to Llama-3.3-70B-Instruct-FP4.
152d1ec
verified

gyrmo commited on

Switching to a pre-quantised version of llama 3.3-70B sourced from Nvidia.
4a09bfe
verified

gyrmo commited on

Added missing library socket.
1a107ca
verified

gyrmo commited on

Rectified indentation error on line 47.
ac2bec1
verified

gyrmo commited on

Added start server and wait for server
45a3359
verified

gyrmo commited on

Updated vllm_server to include a wait for vllm portion that ensures that the model is up before the chat section loads.
10f7946
verified

gyrmo commited on