Spaces:
Sleeping
A newer version of the Streamlit SDK is available:
1.52.2
title: Esg Countries Chatbot
emoji: 🚀
colorFrom: gray
colorTo: pink
sdk: streamlit
sdk_version: 1.31.1
app_file: app.py
pinned: false
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
ESG Countries Chatbot
LLM
I am using the HuggingFaceHub Inference API for the LLM (Mixtral-8x7B). It is free to use in the above link without any set up or hardware. However, the downside is it does not support token streaming unfortunately, which means that each response only come up after some time together as a whole chunk of text, which can appear to be slow.
Main App
The main Streamlit chatbot scripts are in:
app.py(all the chat logic)web_scrape_and_pdf_loader.py(web scraping using DuckDuckGo and WebLoader into documents, PDF upload and PDFLoader into documents) (chunking, embedding and vector store creation logic)There are 3 accompanying Jupyter notebooks which documents all my thought processes and experiments (more details below).
Run
streamlit run app.pyto start the app, with all libraries inrequirements.txtinstalled.To deploy locally, you would need to get the HuggingFace API token from here https://huggingface.co/settings/tokens and set it to the environment variable
HUGGINGFACEHUB_API_TOKEN. In the HuggingFace Spaces link above I have already put mine in the secret variables.Under the app menu, there is a page for users to scrape real time data and upload his/her own PDF to override the older data from existing pre-built retrievers.
I have prepared a set of pre-built retrievers for the countries "Australia", "China", "Japan", "Malaysia", "Singapore", "Germany" so there is no need to wait for them to build in real time (can take very long). The retrievers are in the following links:
- https://drive.google.com/uc?id=1q-hNnyyBA8tKyF3vR69nkwCk9kJj7WHi (BM25 Retriever, for keyword search)
- https://drive.google.com/uc?id=1zad6tgYm2o5M9E2dTLQqmm6GoI8kxNC3 (Chroma DB, semantic semantic)
- When the app first starts, it will automatically pull the 2 retrievers from the above links and unzip them into the
bm25andchromadbfolders. - When user uploads his/her own PDF file or conduct real time web scraping in the app, new retrievers are built and saved locally for the country of interest. After that, user can then select in the Document Config in the sidebar (below) which countries he/she wants to override with this new data.
The yellow box in the sidebar above would not be populated until the user uploads his/her own PDF or perform web scraping in the app. Countries will be present inside as options only if user has uploaded data for those countries. Multiple countries can be selected and the new data in the new retrievers will override the old data in the pre-built retrievers only for these countries.
The process of creating the vector store (especially semantic ones) is very slow. Hence only a retriever is created for a single chunk size and overlap each time the user uses the app to create them. The pre-built retriever folders have more than 50 retrievers with all the different chunk sizes and overlaps to experiment with.
The retrievers (Chroma DB persist directory and BM25 pickle files) are only suitable for prototyping, not production.
There is option for users to use either:
- bm25 (keyword)
- chromadb (semantic)
- ensemble (both keyword and semantic then re-ranked using reciprocal rank fusion)
Document config, LLM config, Retriever config are all at the side bar.
Side bar also has a menu for user to choose between the main chatbot, view source documents of the previous query (only after a full query has run) or upload PDF/scrap new data.
Sidebar is not to be touched while a query is running halfway!
There is no Conversational Buffer Memory due to RAM issues on HuggingFace Spaces and also because the aim is to get comparison between countries right. Furthermore, a lot of tokens are already going into the LLM each time due to the agent logic and scratchpad.
Note: I have chosen to include all chat logic into a single app.py for ease of keeping track of the whole flow, even though it would have been good to also split all the different categories of Python functions into different scripts.
Three Accompanying Jupyter Notebooks for Initial Experimentation
There are 3 accompanying Jupyter notebooks which documents all my thought processes, experiments and tests before making the app. All the notebooks, my written notes and the output can be viewed below. They are all in the zipped folder and also HuggingFace spaces. I have also included HTML versions so the original output will not be deleted accidentally.
1. 1_Scrape_Web_and_Process_Data_for_Countries_ESG_Policies.ipynb
This documents the process of using DuckDuckGo to scrape the web for ESG related policy/practice articles for each country, before processing them and using WebLoaders and PDFLoaders to convert to documents. (The scraped links include both normal HTML pages as well as PDF links.) They are then saved into ChromaDB vector stores and BM25 retrievers.
2. 2_LLM_Agent_Countries_ESG_Comparison.ipynb
This notebook documents the chat logic that is eventually used. A single agent with appropriate tools, prompting and descriptions are used. There are tools created in case the user decides to engage in casual chat with the agent, though these tools are commented out for now.
Multiple scenarios are documented in the notebook, for example:
- Comparing between 2 countries
- Between 3 countries (see if tool is called 3 times)
- When country is not present in vector store (ensure no hallucination and agent says don't know)
- When information is not in vector store for both countries (ensure no hallucination and agent says don't know)
- When agent is asked to give opinion, e.g. which is more stringent/effective etc.
- Query asks for multiple queries, for multiple countries
- If query asks for 2 things for 2 countries, tool should be called 2 x 2 = 4 times
- Difficult Case: (if query asks for 3 things for 5 countries, tool should ideally be called 3 x 5 = 15 times, the tool did try to do this but failed due to the time limit. Scroll down to the second last test for details. However this is okay if we let the user know not to ask so many things to so many countries at once. Also not advisable to have so many inputs for the LLM to process all at once.)
- Difficult Case: (if query asks for 3 things for 3 countries, tool should be called 3 x 3 = 9 times, the agent passed this test although it is still not advisable to ask so many things at once. Scroll down to the last cell in the notebook to see more.)
The list above is not exhaustive.
3. [Experimental] Three_Agents_Countries_ESG_Comparison.ipynb
This is an experimental notebook to explore the concept of 3 interacting agents for the above purpose. This is eventually not implemented in Streamlit app for reasons below.
Idea: 3 agents will be used to talk to each other, the first 2 agents will be ESG policy experts for each of the 2 countries that the user selects, they contribute to the conversation by using their RetrievalQA tools to find out the relevant ESG policy for their own countries. The 3rd agent will be the moderator without access to any tool which ideally uses the first 2 agent's input to come to a conclusion about the user query/topic of interest.
Result: However this doesn't work very well, each of the agents tend to continue to simulate the conversation for other parties after giving their own input, despite being prompted not to do so. The moderator also tends to hallucinate no matter which temperature is used. For now, too many things can go wrong in the middle of the conversation between the 2 agents, and too many prompts are needed to try to get everything right.
But in this case, even with the current free LLM API, it is clear from the second notebook that a single agent has the ability to reason all the steps out to answer a query regarding 2 countries. Hence the previous notebook of having a single agent equipped with the RetrievalQA tool is used eventually in the app.
Maybe multiple agents like these can be used for a debate between ESG policies instead, where each agent tries to convince the other of its views.