DanilaKopitayko commited on
Commit
ef62c1c
·
1 Parent(s): 7bd9d7f

README fixed, EXAMPLES file added

Browse files

Screenshots as a use case added, Jupyter Notebook added

EXAMPLES/Example.ipynb ADDED
@@ -0,0 +1,249 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": null,
6
+ "id": "48250b17-aada-4838-8fe9-843fe970b904",
7
+ "metadata": {
8
+ "id": "48250b17-aada-4838-8fe9-843fe970b904"
9
+ },
10
+ "outputs": [],
11
+ "source": [
12
+ "import os\n",
13
+ "import pandas as pd\n",
14
+ "from IPython.display import Markdown, HTML, display"
15
+ ]
16
+ },
17
+ {
18
+ "cell_type": "code",
19
+ "execution_count": 1,
20
+ "id": "146be3c7-90df-4fbe-bff6-00166f3d61d2",
21
+ "metadata": {
22
+ "id": "146be3c7-90df-4fbe-bff6-00166f3d61d2"
23
+ },
24
+ "outputs": [],
25
+ "source": [
26
+ "import os\n",
27
+ "\n",
28
+ "# Replace with your actual values\n",
29
+ "os.environ[\"AZURE_OPENAI_ENDPOINT\"] = \"INSERT THE OPENAI ENDPOINT\"\n",
30
+ "os.environ[\"AZURE_OPENAI_API_KEY\"] = \"INSERT YOUR OPENAI API KEY\"\n"
31
+ ]
32
+ },
33
+ {
34
+ "cell_type": "code",
35
+ "execution_count": null,
36
+ "id": "f5e1b596-4568-4078-ae14-b20d25cba62b",
37
+ "metadata": {
38
+ "id": "f5e1b596-4568-4078-ae14-b20d25cba62b"
39
+ },
40
+ "outputs": [],
41
+ "source": [
42
+ "# 2nd Cell: Azure OpenAI setup\n",
43
+ "import os\n",
44
+ "from langchain_openai import AzureChatOpenAI\n",
45
+ "from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
46
+ "\n",
47
+ "# Load your Azure environment variables\n",
48
+ "AZURE_OPENAI_ENDPOINT = os.getenv(\"AZURE_OPENAI_ENDPOINT\")\n",
49
+ "AZURE_DEPLOYMENT_NAME = \"gpt-4.1\" # 👈 Change if needed\n",
50
+ "AZURE_API_VERSION = \"2025-01-01-preview\" # 👈 Use your correct version\n",
51
+ "\n",
52
+ "# Define Azure LLM with streaming enabled\n",
53
+ "model = AzureChatOpenAI(\n",
54
+ " openai_api_version=AZURE_API_VERSION,\n",
55
+ " azure_deployment=AZURE_DEPLOYMENT_NAME,\n",
56
+ " azure_endpoint=AZURE_OPENAI_ENDPOINT,\n",
57
+ " streaming=True,\n",
58
+ " callbacks=[StreamingStdOutCallbackHandler()],\n",
59
+ ")\n"
60
+ ]
61
+ },
62
+ {
63
+ "cell_type": "code",
64
+ "execution_count": 4,
65
+ "id": "789b46d9-2189-4d3c-8f77-61b4675bf950",
66
+ "metadata": {
67
+ "id": "789b46d9-2189-4d3c-8f77-61b4675bf950"
68
+ },
69
+ "outputs": [],
70
+ "source": [
71
+ "# --- Setup ---\n",
72
+ "import os\n",
73
+ "import gradio as gr\n",
74
+ "import pandas as pd\n",
75
+ "import io\n",
76
+ "import contextlib\n",
77
+ "\n",
78
+ "from langchain.agents.agent_types import AgentType\n",
79
+ "from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent\n",
80
+ "\n",
81
+ "# Replace this with your actual LLM setup\n",
82
+ "# Example:\n",
83
+ "# from langchain_openai import AzureChatOpenAI\n",
84
+ "# model = AzureChatOpenAI(...)\n",
85
+ "\n",
86
+ "# Prompt\n",
87
+ "CSV_PROMPT_PREFIX = \"\"\"\n",
88
+ "Set pandas to show all columns.\n",
89
+ "Get the column names and infer data types.\n",
90
+ "Then attempt to answer the question using multiple methods.\n",
91
+ "Please provide only the Python code required to perform the action, and nothing else.\n",
92
+ "\"\"\"\n",
93
+ "\n",
94
+ "CSV_PROMPT_SUFFIX = \"\"\"\n",
95
+ "- Try at least 2 different methods of calculation or filtering.\n",
96
+ "- Reflect: Do they give the same result?\n",
97
+ "- After performing all necessary actions and analysis with the dataframe, return the answer in clean **Markdown**, include summary table if needed.\n",
98
+ "- Include **Execution Recommendation** and **Web Insight** in the final Markdown.\n",
99
+ "- Always conclude the final Markdown with:\n",
100
+ "\n",
101
+ "### Final Answer\n",
102
+ "\n",
103
+ "Your conclusion here.\n",
104
+ "\n",
105
+ "---\n",
106
+ "\n",
107
+ "### Explanation\n",
108
+ "\n",
109
+ "Mention specific columns you used.\n",
110
+ "Please provide only the Python code required to perform the action, and nothing else until the final Markdown output.\n",
111
+ "\"\"\"\n",
112
+ "\n",
113
+ "# --- Agent Logic ---\n",
114
+ "def ask_agent(files, question):\n",
115
+ " try:\n",
116
+ " dfs = [pd.read_csv(f.name) for f in files]\n",
117
+ " df = pd.concat(dfs, ignore_index=True)\n",
118
+ " except Exception as e:\n",
119
+ " return f\"❌ Could not read CSVs: {e}\", \"\"\n",
120
+ "\n",
121
+ " try:\n",
122
+ " agent = create_pandas_dataframe_agent(\n",
123
+ " llm=model,\n",
124
+ " df=df,\n",
125
+ " verbose=True,\n",
126
+ " agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,\n",
127
+ " allow_dangerous_code=True,\n",
128
+ " handle_parsing_errors=True, # 👈 this is the fix\n",
129
+ " )\n",
130
+ "\n",
131
+ "\n",
132
+ " full_prompt = CSV_PROMPT_PREFIX + question + CSV_PROMPT_SUFFIX\n",
133
+ "\n",
134
+ " buffer = io.StringIO()\n",
135
+ " with contextlib.redirect_stdout(buffer):\n",
136
+ " result = agent.invoke(full_prompt)\n",
137
+ " trace = buffer.getvalue()\n",
138
+ " output = result[\"output\"]\n",
139
+ "\n",
140
+ "\n",
141
+ " return output, trace\n",
142
+ "\n",
143
+ " except Exception as e:\n",
144
+ " return f\"❌ Agent error: {e}\", \"\"\n",
145
+ "\n",
146
+ "# --- Gradio UI ---\n",
147
+ "with gr.Blocks(\n",
148
+ " css=\"\"\"\n",
149
+ " body, .gradio-container {\n",
150
+ " background: #ffffff !important;\n",
151
+ " color: #1f2937 !important;\n",
152
+ " font-family: 'Segoe UI', sans-serif;\n",
153
+ " }\n",
154
+ "\n",
155
+ " #title {\n",
156
+ " color: #1f2937 !important;\n",
157
+ " font-size: 2rem;\n",
158
+ " font-weight: 600;\n",
159
+ " text-align: center;\n",
160
+ " padding-top: 20px;\n",
161
+ " padding-bottom: 10px;\n",
162
+ " }\n",
163
+ "\n",
164
+ " .gr-box, .gr-input, .gr-output, .gr-markdown, .gr-textbox, .gr-file, textarea, input {\n",
165
+ " background: rgba(0, 0, 0, 0.04) !important;\n",
166
+ " border: 1px solid rgba(0, 0, 0, 0.1);\n",
167
+ " border-radius: 12px !important;\n",
168
+ " color: #1f2937 !important;\n",
169
+ " }\n",
170
+ "\n",
171
+ " textarea::placeholder, input::placeholder {\n",
172
+ " color: rgba(31, 41, 55, 0.6) !important;\n",
173
+ " }\n",
174
+ "\n",
175
+ " button {\n",
176
+ " background: rgba(0, 0, 0, 0.07) !important;\n",
177
+ " color: #1f2937 !important;\n",
178
+ " border: 1px solid rgba(0, 0, 0, 0.15) !important;\n",
179
+ " border-radius: 8px !important;\n",
180
+ " }\n",
181
+ "\n",
182
+ " button:hover {\n",
183
+ " background: rgba(0, 0, 0, 0.15) !important;\n",
184
+ " }\n",
185
+ " \"\"\"\n",
186
+ ") as demo:\n",
187
+ "\n",
188
+ " gr.Markdown(\"<h2 id='title'>📊 NexDatawork Data Agent</h2>\")\n",
189
+ "\n",
190
+ " with gr.Column():\n",
191
+ " result_display = gr.Markdown(label=\"📌 Report Output (Markdown)\")\n",
192
+ " trace_display = gr.Textbox(label=\"🛠️ Data Agent Reasoning - Your Explainable Agent\", lines=20)\n",
193
+ "\n",
194
+ " with gr.Row(equal_height=True):\n",
195
+ " file_input = gr.File(label=\"📁 Upload CSV(s)\", file_types=[\".csv\"], file_count=\"multiple\")\n",
196
+ " question_input = gr.Textbox(\n",
197
+ " label=\"💬 Ask Your Data\",\n",
198
+ " placeholder=\"e.g., What is the trend for revenue over time?\",\n",
199
+ " lines=9\n",
200
+ ")\n",
201
+ "\n",
202
+ "\n",
203
+ " ask_button = gr.Button(\"💡 Analyze\")\n",
204
+ "\n",
205
+ " ask_button.click(\n",
206
+ " fn=ask_agent,\n",
207
+ " inputs=[file_input, question_input],\n",
208
+ " outputs=[result_display, trace_display]\n",
209
+ " )\n",
210
+ "\n",
211
+ "demo.launch(share=True)"
212
+ ]
213
+ },
214
+ {
215
+ "cell_type": "code",
216
+ "source": [],
217
+ "metadata": {
218
+ "id": "fM4cO6jTgXlu"
219
+ },
220
+ "id": "fM4cO6jTgXlu",
221
+ "execution_count": null,
222
+ "outputs": []
223
+ }
224
+ ],
225
+ "metadata": {
226
+ "kernelspec": {
227
+ "display_name": "Python 3.10 (LangChain)",
228
+ "language": "python",
229
+ "name": "langchain310"
230
+ },
231
+ "language_info": {
232
+ "codemirror_mode": {
233
+ "name": "ipython",
234
+ "version": 3
235
+ },
236
+ "file_extension": ".py",
237
+ "mimetype": "text/x-python",
238
+ "name": "python",
239
+ "nbconvert_exporter": "python",
240
+ "pygments_lexer": "ipython3",
241
+ "version": "3.10.16"
242
+ },
243
+ "colab": {
244
+ "provenance": []
245
+ }
246
+ },
247
+ "nbformat": 4,
248
+ "nbformat_minor": 5
249
+ }
Images/Data_evidence.png ADDED
Images/Methodology.png ADDED
Images/Statistical_insights.png ADDED
Images/analysis_summary.png ADDED
Images/business_insights.png ADDED
Images/categorical_distribution.png ADDED
Images/executive_summary.png ADDED
Images/file_information.png ADDED
Images/graph1.png ADDED
Images/graph2.png ADDED
Images/graph3.png ADDED
README.md CHANGED
@@ -79,21 +79,51 @@ In the **Chat** tab you can ask the bot about the details of the data.
79
  After the analysis is completed the results are received in two tabs: **Data Brain** and **Dashboard**.
80
 
81
  ### Data Brain
82
- 1) General overview of the data is presented as well as the methodology of approaching the dataset
 
 
 
 
83
 
84
  2) Recommendations on possible aspects of the data are generated
85
 
 
 
 
 
86
  3) a conclusive overview of the data and statistical insights are presented
87
 
88
 
 
 
 
 
 
 
 
 
 
89
  ### Dashboard
90
 
91
  Brief overview of the data with only the most important metrics and figures, such as:
92
  * file information
 
 
 
 
93
  * number of columns of each type (numerical, categorical and temporal)
 
94
  * data quality and statistical summary
 
 
 
95
 
96
  Finally, graphs of the most important variables are presented.
 
 
 
 
 
97
 
98
  ## <a name='requirenments--starting-procedures'></a>Requirenments & Starting Procedures
99
 
 
79
  After the analysis is completed the results are received in two tabs: **Data Brain** and **Dashboard**.
80
 
81
  ### Data Brain
82
+ 1) General overview of the data is presented as well as the methodology of approaching the dataset
83
+
84
+ <p align='center'>
85
+ <image src='Images/executive_summary.png' alt='executive summary' width=500>
86
+ </p>
87
 
88
  2) Recommendations on possible aspects of the data are generated
89
 
90
+ <p align='center'>
91
+ <image src='Images/business_insights.png' alt='business insights' width=500>
92
+ </p>
93
+
94
  3) a conclusive overview of the data and statistical insights are presented
95
 
96
 
97
+
98
+ <p align='center'>
99
+ <image src='Images/Methodology.png' alt='Methodology' width=500 />
100
+ <image src='Images/Data_evidence.png' alt='Data evidence' width=500 />
101
+ <image src='Images/Statistical_insights.png' alt='Statistical insights' width=500 />
102
+ <image src='Images/categorical_distribution.png' alt='categorical distribution' width=500 />
103
+ </p>
104
+
105
+
106
  ### Dashboard
107
 
108
  Brief overview of the data with only the most important metrics and figures, such as:
109
  * file information
110
+ <p align='center'>
111
+ <image src='Images/file_information.png' alt='file_information' width=500 />
112
+ </p>
113
+
114
  * number of columns of each type (numerical, categorical and temporal)
115
+
116
  * data quality and statistical summary
117
+ <p align='center'>
118
+ <image src='Images/analysis_summary.png' alt='analysis_summary' width=500 />
119
+ </p>
120
 
121
  Finally, graphs of the most important variables are presented.
122
+ <p align='center'>
123
+ <image src='Images/graph1.png' alt='graph1' width=225 />
124
+ <image src='Images/graph2.png' alt='graph2' width=250 />
125
+ <image src='Images/graph3.png' alt='graph3' width=225 />
126
+ </p>
127
 
128
  ## <a name='requirenments--starting-procedures'></a>Requirenments & Starting Procedures
129