Spaces:

gyrmo
/

CitizenClimate

Sleeping

App Files Files

CitizenClimate

Commit History

Reduced the documents retrieved to 10.

e1b326e
verified

gyrmo commited on about 16 hours ago

Reduced the batch size because there isn't enough space.

b0e5869
verified

gyrmo commited on about 16 hours ago

Updated the reranking functions!

f66db31
verified

gyrmo commited on about 16 hours ago

Increasing the batch size.

51db0e2
verified

gyrmo commited on about 16 hours ago

Adding in a wrapper for llamaindex to interact with the reranker instead of crashing

5df94e1
verified

gyrmo commited on about 17 hours ago

Indentation error

917b429
verified

gyrmo commited on Apr 8

Where is this bracket that HF is finding?

d0c9e97
verified

gyrmo commited on Apr 8

Added please to the placeholder text.

f04eab4
verified

gyrmo commited on Apr 8

Syntax error, bracket on line 343

a7e924f
verified

gyrmo commited on Apr 8

Updated how the gradio looks, and doubled down on the system prompt to ensure that the system refers to the documents as its knowledge base.

09ea7c4
verified

gyrmo commited on Apr 8

Updated the system prompt a bit.

b5a2695
verified

gyrmo commited on Apr 2

Called results ranked_results

b380e7b
verified

gyrmo commited on Apr 1

Updated the reranker so that it stops breaking the llm section.

bcd5f13
verified

gyrmo commited on Apr 1

Reducing the batch size to 1 - it'll be slower, but at least this way the model output doesn't collapse.

04ce9a3
verified

gyrmo commited on Apr 1

Updated the reranker function to push the rerank model to the cpu to save on space instead of having it eat up the very limited GPU space that we are working with. Aso reduced the batch type to 8.

4e77426
verified

gyrmo commited on Apr 1

Forgot that I called the function reranker_v1.

444a359
verified

gyrmo commited on Apr 1

Added the rerankers package for the reranking package.

8e9dee9
verified

gyrmo commited on Apr 1

Added the reranker to improve the quality of the nodes passed on to the query engine.

491dcef
verified

gyrmo commited on Apr 1

Create reranker_v1.py

ece3f9f
verified

gyrmo commited on Apr 1

It was the docstring with the indentation error lol.

5a3cec3
verified

gyrmo commited on Mar 26

Indentation error.

7611dd1
verified

gyrmo commited on Mar 26

Indent error fix

031a6bf
verified

gyrmo commited on Mar 26

Forgot to update the package imports to bring in the extractor for the log handler

6f3f6e2
verified

gyrmo commited on Mar 26

Changed the temperature to 0.5, and added a function that will extract the condensed question for analysis.

bd30577
verified

gyrmo commited on Mar 26

Change the temp to 1

5cfb2df
verified

gyrmo commited on Mar 26

Changed temperature

646e55b
verified

gyrmo commited on Mar 26

Update app.py

be2873f
verified

gyrmo commited on Mar 26

Added a prompt helper to help manage the tokens, reduced the summary size to 800

abf50ce
verified

gyrmo commited on Feb 26

Memory issues solved

9312fe3
verified

gyrmo commited on Feb 26

I now have more GPU, therefore I have now reduced the GPU utililisation to 0.8

9b002ee
verified

gyrmo commited on Feb 26

Removed the quantization line - that's crashed vLLM.

5788d26
verified

gyrmo commited on Feb 25

Increase the max model length, and corrected the quantisation to awq_merlin.

d5cb9c0
verified

gyrmo commited on Feb 25

Added a background wait to counter the problem

887dea0
verified

gyrmo commited on Feb 25

Changed max model length to 3600 to improve hte KV cache issue..

6e99e67
verified

gyrmo commited on Feb 25

Reduced the time in the wait for vLLM fnuction.

dee8a12
verified

gyrmo commited on Feb 25

Reduced max model length, and increased gpu utilisation to 0.9

647e94c
verified

gyrmo commited on Feb 25

Specified chat mode, and made sure that the message was streamed for a nice UI action.

47362ec
verified

gyrmo commited on Feb 25

Reduced the GPU utilization and specified the quantization method.

7c71431
verified

gyrmo commited on Feb 25

Added a memory buffer, and moved the wait llm function to the main bit for gradio.

6c941da
verified

gyrmo commited on Feb 25

Checking something

09d0f27
verified

gyrmo commited on Feb 24

Update vllm_server.py

a2902fc
verified

gyrmo commited on Feb 24

Changed the GPU utilisation to 0.95

c8e1e0f
verified

gyrmo commited on Feb 24

I have added some server specifics because the gradio bit isn't starting up.

c54877c
verified

gyrmo commited on Feb 24

Upgraded the max model length to 8092.

49b04f7
verified

gyrmo commited on Feb 24

Changed the model from FP4 to AWQ

49d9cf3
verified

gyrmo commited on Feb 24

Changed the model from Instruct FP4 to AWQ

4c45212
verified

gyrmo commited on Feb 24

Decreased the maximum model lenght to 3408.

5691068
verified

gyrmo commited on Feb 24

Moved the embedding model to the CPU. This will allow me to have more space on the GPU for the LLM.

824aa63
verified

gyrmo commited on Feb 24

Changed the time from 5 to 20 seconds.

faefdf8
verified

gyrmo commited on Feb 24

Switched the model from Llama 3.3-70B to Llama-3.3-70B-Instruct-FP4.

152d1ec
verified

gyrmo commited on Feb 24

Commit History

Reduced the documents retrieved to 10. e1b326e verified

Reduced the batch size because there isn't enough space. b0e5869 verified

Updated the reranking functions! f66db31 verified

Increasing the batch size. 51db0e2 verified

Adding in a wrapper for llamaindex to interact with the reranker instead of crashing 5df94e1 verified

Indentation error 917b429 verified

Where is this bracket that HF is finding? d0c9e97 verified

Added please to the placeholder text. f04eab4 verified

Syntax error, bracket on line 343 a7e924f verified

Updated how the gradio looks, and doubled down on the system prompt to ensure that the system refers to the documents as its knowledge base. 09ea7c4 verified

Updated the system prompt a bit. b5a2695 verified

Called results ranked_results b380e7b verified

Updated the reranker so that it stops breaking the llm section. bcd5f13 verified

Reducing the batch size to 1 - it'll be slower, but at least this way the model output doesn't collapse. 04ce9a3 verified

Updated the reranker function to push the rerank model to the cpu to save on space instead of having it eat up the very limited GPU space that we are working with. Aso reduced the batch type to 8. 4e77426 verified

Forgot that I called the function reranker_v1. 444a359 verified

Added the rerankers package for the reranking package. 8e9dee9 verified

Added the reranker to improve the quality of the nodes passed on to the query engine. 491dcef verified

Create reranker_v1.py ece3f9f verified

It was the docstring with the indentation error lol. 5a3cec3 verified

Indentation error. 7611dd1 verified

Indent error fix 031a6bf verified

Forgot to update the package imports to bring in the extractor for the log handler 6f3f6e2 verified

Changed the temperature to 0.5, and added a function that will extract the condensed question for analysis. bd30577 verified

Change the temp to 1 5cfb2df verified

Changed temperature 646e55b verified

Update app.py be2873f verified

Added a prompt helper to help manage the tokens, reduced the summary size to 800 abf50ce verified

Memory issues solved 9312fe3 verified

I now have more GPU, therefore I have now reduced the GPU utililisation to 0.8 9b002ee verified

Removed the quantization line - that's crashed vLLM. 5788d26 verified

Increase the max model length, and corrected the quantisation to awq_merlin. d5cb9c0 verified

Added a background wait to counter the problem 887dea0 verified

Changed max model length to 3600 to improve hte KV cache issue.. 6e99e67 verified

Reduced the time in the wait for vLLM fnuction. dee8a12 verified

Reduced max model length, and increased gpu utilisation to 0.9 647e94c verified

Specified chat mode, and made sure that the message was streamed for a nice UI action. 47362ec verified

Reduced the GPU utilization and specified the quantization method. 7c71431 verified

Added a memory buffer, and moved the wait llm function to the main bit for gradio. 6c941da verified

Checking something 09d0f27 verified

Update vllm_server.py a2902fc verified

Changed the GPU utilisation to 0.95 c8e1e0f verified

I have added some server specifics because the gradio bit isn't starting up. c54877c verified

Upgraded the max model length to 8092. 49b04f7 verified

Changed the model from FP4 to AWQ 49d9cf3 verified

Changed the model from Instruct FP4 to AWQ 4c45212 verified

Decreased the maximum model lenght to 3408. 5691068 verified

Moved the embedding model to the CPU. This will allow me to have more space on the GPU for the LLM. 824aa63 verified

Changed the time from 5 to 20 seconds. faefdf8 verified

Switched the model from Llama 3.3-70B to Llama-3.3-70B-Instruct-FP4. 152d1ec verified

Reduced the documents retrieved to 10.

e1b326e
verified

Reduced the batch size because there isn't enough space.

b0e5869
verified

Updated the reranking functions!

f66db31
verified

Increasing the batch size.

51db0e2
verified

Adding in a wrapper for llamaindex to interact with the reranker instead of crashing

5df94e1
verified

Indentation error

917b429
verified

Where is this bracket that HF is finding?

d0c9e97
verified

Added please to the placeholder text.

f04eab4
verified

Syntax error, bracket on line 343

a7e924f
verified

Updated how the gradio looks, and doubled down on the system prompt to ensure that the system refers to the documents as its knowledge base.

09ea7c4
verified

Updated the system prompt a bit.

b5a2695
verified

Called results ranked_results

b380e7b
verified

Updated the reranker so that it stops breaking the llm section.

bcd5f13
verified

Reducing the batch size to 1 - it'll be slower, but at least this way the model output doesn't collapse.

04ce9a3
verified

Updated the reranker function to push the rerank model to the cpu to save on space instead of having it eat up the very limited GPU space that we are working with. Aso reduced the batch type to 8.

4e77426
verified

Forgot that I called the function reranker_v1.

444a359
verified

Added the rerankers package for the reranking package.

8e9dee9
verified

Added the reranker to improve the quality of the nodes passed on to the query engine.

491dcef
verified

Create reranker_v1.py

ece3f9f
verified

It was the docstring with the indentation error lol.

5a3cec3
verified

Indentation error.

7611dd1
verified

Indent error fix

031a6bf
verified

Forgot to update the package imports to bring in the extractor for the log handler

6f3f6e2
verified

Changed the temperature to 0.5, and added a function that will extract the condensed question for analysis.

bd30577
verified

Change the temp to 1

5cfb2df
verified

Changed temperature

646e55b
verified

Update app.py

be2873f
verified

Added a prompt helper to help manage the tokens, reduced the summary size to 800

abf50ce
verified

Memory issues solved

9312fe3
verified

I now have more GPU, therefore I have now reduced the GPU utililisation to 0.8

9b002ee
verified

Removed the quantization line - that's crashed vLLM.

5788d26
verified

Increase the max model length, and corrected the quantisation to awq_merlin.

d5cb9c0
verified

Added a background wait to counter the problem

887dea0
verified

Changed max model length to 3600 to improve hte KV cache issue..

6e99e67
verified

Reduced the time in the wait for vLLM fnuction.

dee8a12
verified

Reduced max model length, and increased gpu utilisation to 0.9

647e94c
verified

Specified chat mode, and made sure that the message was streamed for a nice UI action.

47362ec
verified

Reduced the GPU utilization and specified the quantization method.

7c71431
verified

Added a memory buffer, and moved the wait llm function to the main bit for gradio.

6c941da
verified

Checking something

09d0f27
verified

Update vllm_server.py

a2902fc
verified

Changed the GPU utilisation to 0.95

c8e1e0f
verified

I have added some server specifics because the gradio bit isn't starting up.

c54877c
verified

Upgraded the max model length to 8092.

49b04f7
verified

Changed the model from FP4 to AWQ

49d9cf3
verified

Changed the model from Instruct FP4 to AWQ

4c45212
verified

Decreased the maximum model lenght to 3408.

5691068
verified

Moved the embedding model to the CPU. This will allow me to have more space on the GPU for the LLM.

824aa63
verified

Changed the time from 5 to 20 seconds.

faefdf8
verified

Switched the model from Llama 3.3-70B to Llama-3.3-70B-Instruct-FP4.

152d1ec
verified