DanielKiani commited on
Commit
a202a46
ยท
verified ยท
1 Parent(s): 974e083
Files changed (1) hide show
  1. README.md +66 -58
README.md CHANGED
@@ -8,13 +8,13 @@ app_file: scripts/app.py
8
  ---
9
 
10
  ![Banner](assets/banner.png)
11
- [![Python](https://img.shields.io/badge/Python-3.12.11-blue?logo=python)](https://www.python.org/)[![PyTorch](https://img.shields.io/badge/PyTorch-2.8-EE4C2C?logo=pytorch)](https://pytorch.org/)![Made with ML](https://img.shields.io/badge/Made%20with-ML-blueviolet?logo=openai)[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
12
 
13
  # ๐Ÿค– Advanced Customer Service Agent
14
 
15
- An intelligent, multi-modal customer service agent built with a Retrieval-Augmented Generation (RAG) pipeline. This agent can understand user sentiment, retrieve relevant information from a knowledge base, and provide empathetic, context-aware responses in both text and voice.
16
 
17
- the gradio demo can be found [Here](https://huggingface.co/spaces/Deathshot78/CustomerServiceAgent)
18
 
19
  ![Gradio](assets/gradio.png)
20
 
@@ -24,102 +24,103 @@ the gradio demo can be found [Here](https://huggingface.co/spaces/Deathshot78/Cu
24
 
25
  - [๐Ÿ“– About The Project](#-about-the-project)
26
  - [โœจ Features](#-features)
27
- - [๐Ÿ› ๏ธ Tech Stack & Model Architecture](#๏ธ-tech-stack--model-architecture)
28
- - [Model Selection Rationale](#model-selection-rationale)
29
- - [๐Ÿ“Š Performance Benchmark](#-performance-benchmark)
30
  - [๐Ÿ”ฎ Future Improvements](#-future-improvements)
31
  - [๐Ÿš€ Getting Started](#-getting-started)
32
- - [Prerequisites](#prerequisites)
33
- - [Installation & Usage](#installation--usage)
34
 
35
  ---
36
 
37
  ## ๐Ÿ“– About The Project
38
 
39
- This project is a complete implementation of an advanced AI customer service agent. The core of the agent is a RAG pipeline that allows it to answer user queries based on a predefined knowledge base, ensuring factual and relevant responses. It includes conversation memory to handle follow-up questions and sentiment analysis to adapt its tone, making the interaction feel more natural and empathetic.
40
 
41
  ---
42
 
43
  ## โœจ Features
44
 
45
- - **๐Ÿง  Conversation Memory**: Remembers previous turns in the conversation to understand context.
46
- - **๐Ÿ˜  Sentiment-Aware**: Detects user sentiment (Positive/Negative) and adjusts its persona to be more helpful or empathetic.
47
- - **๐Ÿ“š Retrieval-Augmented Generation (RAG)**: Retrieves relevant information from a vector database to provide accurate, knowledge-based answers.
 
48
  - **๐Ÿ”Š Text-to-Speech**: Can read its responses aloud for a complete voice-enabled experience.
49
  - **๐ŸŒ Interactive UI**: Built with Gradio for an easy-to-use web interface.
50
 
51
  ---
52
 
53
- ## ๐Ÿ› ๏ธ Tech Stack & Model Architecture
54
 
55
- The agent is built on a modern RAG architecture using the Hugging Face ecosystem.
56
 
57
- 1. **User Query**: The user asks a question.
58
- 2. **Sentiment Analysis**: The query's sentiment is analyzed.
59
- 3. **Embedding & Retrieval**: The query is converted into a vector embedding. This embedding is used to search a FAISS vector database to find the most relevant documents from the knowledge base.
60
- 4. **Prompt Engineering**: A detailed prompt is constructed containing the agent's persona (based on sentiment), the conversation history, the retrieved documents (context), and the user's current query.
61
- 5. **LLM Response Generation**: The complete prompt is sent to the LLM, which generates a context-aware and tonally appropriate response.
62
- 6. **Text-to-Speech**: The final text response can be converted to audio.
63
 
64
- ### Model Selection Rationale
 
 
 
 
 
 
 
65
 
66
- | Component | Model | Reason for Choice |
67
- | :--- | :--- | :--- |
68
- | **Embedding** | `sentence-transformers/all-MiniLM-L6-v2` | A very lightweight and fast model that provides excellent performance for semantic retrieval. It's ideal for creating knowledge base embeddings without requiring massive computational resources. |
69
- | **Response Generation** | `google/flan-t5-large` | We chose this model after benchmarking it against the smaller `flan-t5-base`. While slower, `flan-t5-large` is significantly better at following complex instructions, such as adopting an empathetic persona. This was crucial for handling negative user sentiment effectively. |
70
- | **Sentiment Analysis** | `distilbert-base-uncased-finetuned-sst-2-english` | A small, fast, and accurate sentiment classifier. Its efficiency ensures that adding sentiment awareness doesn't create a bottleneck in the response pipeline. |
71
- | **Text-to-Speech** | `gTTS` (Google Text-to-Speech) | Chosen for its simplicity and reliability. It's very easy to implement and works consistently across different environments, making it perfect for this project. |
72
 
73
  ---
74
 
75
- ## ๐Ÿ“Š Performance Benchmark
76
 
77
- A key decision in this project was selecting the right LLM for response generation. We tested two models on a Google Colab CPU environment to measure the trade-off between response time and quality.
78
 
79
- | Model | Average Response Time (Colab CPU) | Response Quality |
80
- | :--- | :--- | :--- |
81
- | `google/flan-t5-base` | ~4 seconds | Fast, but often ignored persona instructions and provided blunt, unhelpful answers to negative queries. |
82
- | `google/flan-t5-large` | ~20 seconds | Significantly slower, but consistently followed the empathetic persona instructions, leading to much higher-quality, more appropriate responses. |
 
 
 
83
 
84
- **Conclusion**: We chose `flan-t5-large` because the improvement in response quality and instruction-following was critical for the agent's primary function, justifying the longer response time for a portfolio demonstration.
 
 
 
 
 
 
 
 
85
 
86
  ---
87
 
88
  ## ๐Ÿ”ฎ Future Improvements
89
 
90
- While this project is a fully functional proof-of-concept, there are several ways it could be enhanced for a production environment:
91
-
92
- - **๐Ÿค– RLHF-lite for Continuous Improvement**: Extend the agent with reinforcement learning from human feedback (RLHF) using Hugging Faceโ€™s TRL library and PPO. This would allow the model to learn from thumbs-up/down feedback or simulated reward signals, improving response quality, politeness, and relevance over time.
93
-
94
- - **๐Ÿ“ˆ Scale the LLM**: For even higher quality responses and more nuanced conversations, we could upgrade to a much larger model (e.g., Llama 3, Mistral Large). This would require a more powerful GPU for inference to maintain an acceptable response time.
95
-
96
- - **๐ŸŽฏ Customize the Knowledge Base**: Instead of a generic FAQ dataset [(MakTek/Customer_support_faqs_dataset)](https://huggingface.co/datasets/MakTek/Customer_support_faqs_dataset), the agent could be provided with a company's internal documentation, product manuals, or past support tickets. This would make it a highly specialized and valuable internal tool.
97
-
98
- - **โš™๏ธ Fine-Tune the Embedding Model**: For a highly specific domain (e.g., medical or legal support), the `all-MiniLM-L6-v2` embedding model could be fine-tuned on domain-specific text to improve the accuracy of the document retrieval step.
99
-
100
- - **๐Ÿ—ฃ๏ธ Higher-Quality TTS**: While `gTTS` is reliable, we could integrate a more advanced, natural-sounding TTS model (like those from Coqui AI or Microsoft) for a more polished user experience.
101
-
102
- - **๐ŸŽค Add Speech-to-Text (STT)**: Re-integrate a robust STT model (like `openai/whisper`) to create a full voice-to-voice conversation flow, allowing users to speak their queries directly to the agent.
103
-
104
- - **๐Ÿณ Dockerize for Deployment**: The application could be containerized using Docker, making it easy to deploy consistently across different environments, from local machines to cloud servers.
105
 
106
  ---
107
 
108
  ## ๐Ÿš€ Getting Started
109
 
110
  Follow these steps to get the agent running locally.
 
111
 
112
- ### Prerequisites
 
113
 
114
- You need to have Python 3.8+ installed on your system.
 
115
 
116
  ### Installation & Usage
117
-
118
- 1. **Clone the repository (or download the files):**
119
-
120
  ```sh
121
- git clone https://github.com/Deathshot78/CustomerServiceAgent
122
- cd <your-repo-directory>
123
  ```
124
 
125
  2. **Install the dependencies:**
@@ -128,7 +129,14 @@ You need to have Python 3.8+ installed on your system.
128
  ```
129
 
130
  3. **Run the terminal-based demo (optional):**
131
- To see the core agent logic in action, run the `agent.py` script.
 
 
 
 
 
 
132
  ```sh
133
- python agent.py
134
  ```
 
 
8
  ---
9
 
10
  ![Banner](assets/banner.png)
11
+ [![Python](https://img.shields.io/badge/Python-3.12-blue?logo=python)](https://www.python.org/)[![PyTorch](https://img.shields.io/badge/PyTorch-2.8-EE4C2C?logo=pytorch)](https://pytorch.org/)[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-yellow)](https://huggingface.co/spaces)[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
12
 
13
  # ๐Ÿค– Advanced Customer Service Agent
14
 
15
+ An intelligent customer service agent built with a Retrieval-Augmented Generation (RAG) pipeline Made from scratch and also with Langchain. This agent understands user sentiment, retrieves information from a knowledge base, and provides empathetic, context-aware responses. It features a robust, multi-layered safeguard system to ensure conversations remain on-topic and safe.
16
 
17
+ The live Gradio demo is hosted on Hugging Face Spaces: **[๐Ÿš€ View Demo Here](https://huggingface.co/spaces/Deathshot78/CustomerServiceAgent)**
18
 
19
  ![Gradio](assets/gradio.png)
20
 
 
24
 
25
  - [๐Ÿ“– About The Project](#-about-the-project)
26
  - [โœจ Features](#-features)
27
+ - [๐Ÿง  Project Journey & Key Learnings](#-project-journey--key-learnings)
28
+ - [๐Ÿ› ๏ธ Final Architecture & Tech Stack](#๏ธ-final-architecture--tech-stack)
 
29
  - [๐Ÿ”ฎ Future Improvements](#-future-improvements)
30
  - [๐Ÿš€ Getting Started](#-getting-started)
 
 
31
 
32
  ---
33
 
34
  ## ๐Ÿ“– About The Project
35
 
36
+ This project chronicles the end-to-end development of an AI customer service agent, from a simple prototype to a production-ready application with advanced safeguards and with Langchain and from scratch implementations. The agent's core is a RAG pipeline that answers queries based on a predefined knowledge base. The final version integrates a custom, multi-signal moderation system to handle off-topic questions and a dynamic prompting strategy to adapt its tone based on user sentiment.
37
 
38
  ---
39
 
40
  ## โœจ Features
41
 
42
+ - **๐Ÿ›ก๏ธ Advanced Safeguards**: A custom, multi-signal moderation system rejects off-topic queries by combining keyword matching, embedding similarity, and zero-shot classification.
43
+ - **๐Ÿง  Conversation Memory**: Remembers previous turns to understand context and handle follow-up questions effectively.
44
+ - **๐Ÿ˜  Dynamic Persona**: Detects user sentiment (`Positive`/`Negative`) and dynamically adjusts its persona in the prompt to be more helpful or empathetic.
45
+ - **๐Ÿ“š Retrieval-Augmented Generation (RAG)**: Retrieves relevant "chunks" of information from a FAISS vector database to provide accurate, knowledge-based answers.
46
  - **๐Ÿ”Š Text-to-Speech**: Can read its responses aloud for a complete voice-enabled experience.
47
  - **๐ŸŒ Interactive UI**: Built with Gradio for an easy-to-use web interface.
48
 
49
  ---
50
 
51
+ ## ๐Ÿง  Project Journey & Key Learnings
52
 
53
+ This project evolved significantly, with each phase revealing new challenges and leading to more sophisticated solutions.
54
 
55
+ #### 1. The Quality vs. Speed Dilemma
56
+ The initial prototype used `google/flan-t5-base` for fast responses (~4 seconds). However, it struggled to follow persona instructions, often giving blunt or unhelpful answers to frustrated users. We benchmarked this against `google/flan-t5-large`. While significantly slower (~20 seconds on a CPU), the larger model's ability to adopt an empathetic persona was a non-negotiable requirement for a customer service agent. **Key Learning:** For user-facing applications, response quality and the ability to follow nuanced instructions are often more important than raw speed.
 
 
 
 
57
 
58
+ #### 2. The Safeguard Challenge
59
+ Ensuring the agent stayed on-topic was the most critical challenge.
60
+ - **Initial Failure:** A simple, prompt-based moderator using `flan-t5-base` proved unreliable. It failed to understand context-dependent follow-up questions and was easily fooled by well-formed but irrelevant queries (e.g., "What's the recipe for lasagna?").
61
+ - **The Breakthrough - A Multi-Signal Approach:** The final solution was a "Defense in Depth" strategy implemented in a single function. Instead of relying on one signal, our safeguard combines three:
62
+ 1. **Keyword Heuristics:** A fast check for obvious on-topic words.
63
+ 2. **Embedding Similarity:** Measures the semantic relevance of a query against the entire knowledge base.
64
+ 3. **Zero-Shot Classification:** Uses a dedicated classifier (`facebook/bart-large-mnli`) to explicitly categorize the query into allowed topics or "off-topic."
65
+ - **Final Logic:** By combining these signals with weighted scores, we created a robust and nuanced gatekeeper that successfully rejects irrelevant queries while understanding legitimate follow-ups.
66
 
67
+ #### 3. Refactoring for Production
68
+ With the core logic proven, the final step was to refactor the project for maintainability and scalability. I explored both a from-scratch implementation and a version using the **LangChain** framework. The final version combines the best of both worlds: it uses LangChain's powerful components (like `ConversationalRetrievalChain`) but replaces its default moderation with our superior, custom-built multi-signal safeguard.
69
+
70
+ #### 4. The "Garbage-In, Garbage-Out" Principle: Knowledge Base is King
71
+ Even with advanced safeguards and a capable LLM, the agent's performance is fundamentally limited by the quality of its knowledge base. We observed several "retrieval failures" where the agent gave factually incorrect or irrelevant answers. For example, when asked for the information needed to find a lost package, the retriever found a document about returning a *wrong item* because it was the most semantically similar text in the generic FAQ dataset. The LLM then correctly answered based on this faulty context. **Key Learning:** A RAG system is only as good as its knowledge. The most significant improvement for a production system is not a better model, but a highly curated, accurate, and specific knowledge base tailored to the agent's exact domain.
 
72
 
73
  ---
74
 
75
+ ## ๐Ÿ› ๏ธ Final Architecture & Tech Stack
76
 
77
+ The final architecture is a robust pipeline with a pre-processing safeguard gate.
78
 
79
+ 1. **User Query**: The user asks a question.
80
+ 2. **๐Ÿ›ก๏ธ Safeguard Gate**: The query is first sent to our multi-signal moderator. If it's off-topic, the process stops and a polite refusal is returned.
81
+ 3. **Sentiment Analysis**: If the query is on-topic, its sentiment is analyzed.
82
+ 4. **Conversational Rewriting**: Follow-up questions are rewritten into standalone queries for better retrieval.
83
+ 5. **RAG Pipeline**: The standalone query is used to retrieve context from the FAISS index.
84
+ 6. **Dynamic Prompting**: A prompt is constructed with the persona, guardrails, and retrieved context.
85
+ 7. **LLM Response Generation**: The prompt is sent to `google/flan-t5-large` to generate the final answer.
86
 
87
+ | Component | Model / Library |
88
+ | :--- | :--- |
89
+ | **Orchestration** | LangChain / Custom Python |
90
+ | **Embedding** | `sentence-transformers/all-MiniLM-L6-v2` |
91
+ | **Response Generation** | `google/flan-t5-large` |
92
+ | **Safeguard (Moderation)** | Custom multi-signal logic using `facebook/bart-large-mnli` |
93
+ | **Vector Store** | `faiss-cpu` |
94
+ | **User Interface** | `gradio` |
95
+ | **Text-to-Speech** | `gTTS` |
96
 
97
  ---
98
 
99
  ## ๐Ÿ”ฎ Future Improvements
100
 
101
+ - **Fine-Tune a Specialized Moderator**: For ultimate accuracy, the zero-shot classifier in the safeguard could be replaced with a smaller model (like DistilBERT) fine-tuned on thousands of company-specific on-topic/off-topic examples.
102
+ - **Output Moderation**: Add a final check on the agent's response *before* it's sent to the user to scan for PII, harmful language, or factual inconsistencies against the source context.
103
+ - **Customize the Knowledge Base**: Replace the generic FAQ dataset with a company's internal documentation and past support tickets to create a highly specialized and valuable internal tool.
104
+ - **๐Ÿณ Dockerize for Deployment**: Containerize the application using Docker for consistent and scalable deployment across different environments.
 
 
 
 
 
 
 
 
 
 
 
105
 
106
  ---
107
 
108
  ## ๐Ÿš€ Getting Started
109
 
110
  Follow these steps to get the agent running locally.
111
+ **Note :** you can find the code for the from scratch implementation and the Langchain version in the scripts folder. you can run the Gradio app for the langchain version locally as i only have the from scratch implementaino up in the huggingface spaces.
112
 
113
+ **From scratch implementation scripts:** `agent.py` , `app.py`
114
+ **Langchain version scripts:** `agent_langchain.py` , `app_langchain.py`
115
 
116
+ ### Prerequisites
117
+ You need to have Python 3.8+ and Git installed.
118
 
119
  ### Installation & Usage
120
+ 1. **Clone the repository:**
 
 
121
  ```sh
122
+ git clone https://github.com/DanielKiani/CustomerServiceAgent
123
+ cd CustomerServiceAgent
124
  ```
125
 
126
  2. **Install the dependencies:**
 
129
  ```
130
 
131
  3. **Run the terminal-based demo (optional):**
132
+ To see the core agent logic and debug output in your terminal, run `agent.py` or `agent_langchain.py`.
133
+ ```sh
134
+ python agent_langchain.py
135
+ ```
136
+
137
+ 4. **Launch the Gradio Web App:**
138
+ To start the interactive user interface, run `app.py` or `app_langchain.py`.
139
  ```sh
140
+ python app_langchain.py
141
  ```
142
+ This will print a local URL in your terminal. Open it in your browser to interact with the agent.