| # Ask ANRG Project Description | |
| Our demo is available at [here](https://huggingface.co/spaces/FloraJ/Ask-ANRG). | |
| A concise and structured guide to setting up and understanding the ANRG project. | |
| --- | |
| ## π Setup | |
| 1. **Clone the Repository**: | |
| ``` | |
| git clone git@github.com:ANRGUSC/ask-anrg.git | |
| ``` | |
| 2. **Navigate to the Directory**: | |
| ``` | |
| cd ask-anrg/ | |
| ``` | |
| 3. **Create a Conda Environment**: | |
| ``` | |
| conda create --name ask_anrg | |
| ``` | |
| 4. **Activate the Conda Environment**: | |
| ``` | |
| conda activate ask_anrg | |
| ``` | |
| 5. **Install Required Dependencies**: | |
| ``` | |
| pip3 install -r requirements.txt | |
| ``` | |
| 6. **Download database from [here](https://drive.google.com/file/d/1-TV70IFIzjO4uPzNRzef3FLhssAfK2g3/view?usp=sharing) for demo purpose, unzip it, and put it directly under the root directory, or place your own documents under the [original_documents](database/original_documents)** | |
| ``` | |
| ask-anrg/ | |
| |-- database/ | |
| |-- original_documents/ | |
| |-- openai_function_utils/ | |
| |-- openai_function_impl.py | |
| |-- openai_function_interface.py | |
| |-- configs.py | |
| |-- requirements.txt | |
| |-- utils.py | |
| |-- main.py | |
| |-- Readme.md | |
| |-- project_description.md | |
| |-- result_report.txt | |
| |-- .gitignore | |
| ``` | |
| 7. **set up database data** | |
| If you place your own documents inside the [original_documents](database/original_documents) directory, please run the following command to prepare embeddings for your documents. | |
| ``` | |
| python3 utils.py | |
| ``` | |
| It will create `/database/embeddings` to store the embeddings of the original documents, and create a csv file ```database/document_name_to_embedding.csv``` that stores document name and its embedding vector. | |
| ## π₯οΈ How to Run | |
| ``` | |
| python main.py | |
| ``` | |
| After the prompt "Hi! What question do you have for ANGR? Press 0 to exit", you can reply with your question. | |
| ## π Structure | |
| * database: Contains scraped and processed data related to the lab. | |
| * embeddings: Processed embeddings for the publications. | |
| * original_documents: Original texts scraped from the lab website. | |
| * document_name_to_embedding.csv: Embeddings for all publications. | |
| * openai_function_utils: Utility functions related to OpenAI. | |
| * openai_function_impl.py: Implementations of the OpenAI functions. | |
| * openai_function_interface.py: Interfaces (descriptions) for the OpenAI functions. | |
| * configs.py: Configuration settings, e.g., OpenAI API key. | |
| * requirements.txt: Required Python libraries for the project. | |
| * utils.py: Utility functions, such as embedding, searching, and retrieving answers from ChatGPT. | |
| * main.py: Main entry point of the project. | |
| ## π οΈ Implemented Functions for OPENAI | |
| These functions are selected to be used by ChatGPT during handling user questions: | |
| - `get_lab_member_info`: Retrieve details (name, photo URL, links, description) of a lab member by name. | |
| - `get_lab_member_detailed_info`: Detailed information(link, photo, description) of a lab member. | |
| - `get_publication_by_year`: List all publication information for a given year. | |
| - `get_pub_info`: Access details (title, venue, authors, year, link) of a publication by its title. | |
| - `get_pub_by_name`: Get information on all publications written by a specific lab member. | |
| More details on the functions can be checked under `openai_function_utils/`. | |
| ## Evaluation: Turing Test | |
| We follow the steps below to evaluate our chatbot: | |
| 1. Based on the information scraped from lab's website, we come up with questions that chatbot's users may ask, including both general (applied to any lab) and lab-specific questions. Here are some examples: | |
| - Who works here? | |
| - List all publications of this lab. | |
| - What are some recent publications by this lab in the area of [x]? | |
| - What conferences does this lab usually publish to? | |
| - What kind of undergraduate projects does this lab work on? | |
| - Give me the link to [x]'s homepage. | |
| - Give me a publication written by [x]. | |
| - How long has [x] been doing research in [y] area? | |
| - Who in the lab is currently working on [x]? | |
| - Where does former member [x] work now? | |
| 2. Given 4 team members A, B, C, D. We will have A and B manually write down and provide answers to the evaluation questions for questions from each category. | |
| 3. Then, C will test the questions on the ChatBot and collect the answers. | |
| 4. Without knowing which answers are provided by human/chatbot, D will compare the answers for every question and choose which one is more preferable by human. | |
| 5. Chatbot's winning rate (i.e. how many times the Chatbot manages to win over the human answerer) will be calculated. | |
| | Overall Winning Rate | | |
| |:-----------------------------: | | |
| | N/A | | |
| Refer to [ask_anrg_eval_question.csv](ask_anrg_eval_question.csv) for more details regarding the questions used for evaluation & evaluation results. |