Spaces:

Maheshsr
/

Insightlab

Build error

App Files Files Community

Maheshsr commited on Mar 2, 2025

Commit

485dd62

1 Parent(s): f130825

modifying the graph prompt

Browse files

Files changed (2) hide show

pages/__pycache__/solution.cpython-312.pyc +0 -0
pages/solution.py +112 -83

pages/__pycache__/solution.cpython-312.pyc CHANGED Viewed

Binary files a/pages/__pycache__/solution.cpython-312.pyc and b/pages/__pycache__/solution.cpython-312.pyc differ

pages/solution.py CHANGED Viewed

@@ -14,7 +14,7 @@ from openai import AzureOpenAI
 import os
 import json
 import altair as alt
-import plotly
 import ast
 import streamlit as st
 from streamlit_navigation_bar import st_navbar
@@ -230,12 +230,12 @@ def get_existing_token(current_month):
         blobs = container_client.list_blobs(name_starts_with=token_directory)
         for blob in blobs:
             blob_name = blob.name  # Extract the blob names
-            print(blob_name)
             file_name_with_extension = blob_name.split('/')[-1]
             file_name = file_name_with_extension.split('.')[0]
             blob_client = container_client.get_blob_client(blob_name)
             blob_content = blob_client.download_blob().readall()
-            print(blob_content)
             token_data = json.loads(blob_content)
             if token_data['year-month'] == current_month:
                 logger.info("Existing token_consumed found for month: {}", current_month)
@@ -493,7 +493,7 @@ def update_insight(insight_data, user_persona, file_number):
         logger.error("Error while updating insight: %s", e)
         return False
-def save_insight(next_file_number, user_persona, insight_desc, base_prompt, base_code,selected_db, insight_prompt, insight_code, chart_prompt, chart_code):
     new_insight = {
         'description': insight_desc,
         'base_prompt': base_prompt,
@@ -508,6 +508,7 @@ def save_insight(next_file_number, user_persona, insight_desc, base_prompt, base
         'chart': {
             'chart_1': {
                 'chart_prompt': chart_prompt,
                 'chart_code': chart_code
             }
         }
@@ -793,21 +794,52 @@ def answer_guide_question(question, dframe, df_structure, selected_db):
             logger.error("Trouble writing the code file for {} and method number {}: {}", question, last_method_num + 1, e)
         return duckdb_query, last_method_num + 1
-def generate_graph(query, df, df_structure,generate_graph):
-    if query is None or df is None or df_structure is None:
-        logger.error("generate_graph received None values for query, df, or df_structure")
         return None, None
     if len(query) == 0:
         return None, None
-    df_summary = {
-        "columns": df.columns.tolist(),
-        "dtypes": df.dtypes.astype(str).to_dict(),
-        "describe": df.describe().to_dict()
-    }
     with st.spinner('Generating graph'):
         graph_prompt = f"""
         You are an expert in understanding English language instructions to generate a graph based on a given dataframe.
@@ -815,39 +847,11 @@ def generate_graph(query, df, df_structure,generate_graph):
         I am providing you the dataframe structure as a dictionary in double backticks.
         Dataframe structure: ``{df_structure}``
-        I am also providing you a summary of the dataframe as a dictionary in double backticks.
-        Dataframe summary: ``{df_summary}``
-        I have provided the dataframe structure and its summary. I can't provide the entire dataframe.
         I am also giving you the intent instruction in triple backticks.
         Instruction for generating the graph: ```{query}```
-        Your task is to write the code that will generate a Plotly chart.
-        You should be able to derive the chart type from the instruction.
-        Graphs may need calculations, such as aggregating or calculating averages for some of the numeric columns.
-        You should generate the code that will allow me to create the Plotly chart object that can then be used as the parameter in Streamlit's `st.plotly_chart()` method.
-        Pay special attention to the field names. Some field names have an underscore (_) and some do not. You need to be accurate while generating the query.
-        Pay special attention when you need to group by based on two categorical columns to create things like bubble charts. For example, the sample code within four backticks below is the correct way to prepare a dataframe with procedure code, a categorical variable in one axis, and diagnosis code, another categorical variable in another axis, and the size of the bubble would be based on the sum of 'Total Paid' values for each procedure and diagnosis code combination.
-        Sample code: ````grouped_df = df_ma.groupby(['Procedure Code', 'Diagnosis Codes'])['Total Paid'].sum().reset_index()````
-        If you need to add a filter criterion, then you need to add a second step as indicated in five backticks below. This shows it is filtering the dataframe for all groups with a sum of 'Total Paid' more than 1000. You can feed the last dataframe to the Plotly chart.
-        Sample code: `````grouped_df = df.groupby(['Procedure Code', 'Diagnosis Codes'])['Total Paid'].sum().reset_index() \\n\\nfiltered_df = grouped_df[grouped_df['Total Paid'] > 1000]`````
-        If there is a space in the column name, then you need to fully enclose each occurrence of the column name with double quotes in the query.
-        While creating the Plotly chart, you need to get the top 5000 rows since Plotly chart cannot handle more than 5000 rows.
-        Pay special attention to grouped bar charts. For grouped bar charts, there should be at least two x-axis columns. One can be the actual x-axis and the other can be used in the 'column' parameter of the Plotly Chart object. For example, the following code in four backticks shows a grouped bar chart with the x-axis showing 'year' and each 'site' for each year.
-        Grouped bar chart sample code: ````alt.Chart(source).mark_bar().encode(
-                                                x='year:O',
-                                                y='sum(yield):Q',
-                                                column='site:N'
-                                            )````
-        A grouped bar chart will be explicitly asked for in the instructions.
-        Only produce the Python code.
         Do NOT produce any backticks or double quotes or single quotes before or after the code.
         Do generate the Plotly import statement as part of the code.
         Do NOT justify your code.
@@ -856,28 +860,21 @@ def generate_graph(query, df, df_structure,generate_graph):
         Do not print or return the chart object at the end.
         Do NOT produce any additional text that is not part of the query itself.
         Always name the final Plotly chart object as 'chart'.
-        Go back and check if the generated code can be used in the `st.plotly_chart()` method.
         """
-        logger.info(f"Generating graph with prompt: {graph_prompt}")
-        graph_response = run_prompt(graph_prompt,query,"generate graph",generate_graph)
-        logger.debug("Graph response: {}", graph_response)
-        try:
-            # Create a dictionary to capture local variables
-            local_vars = {}
-            # Execute the chart generation code and update the local_vars dictionary
-            exec(graph_response, {}, local_vars)  # type: ignore
-            logger.debug("Graph code executed.")
-            # Extract the chart object from local_vars
-            chart = local_vars['chart']
-            logger.info("Plotly chart object created successfully.")
-        except Exception as e:
-            logger.error("Error creating plotly chart object: {}", e)
-            return None, None
-    return chart, graph_response
 def get_table_details(engine,selected_db):
     query_tables = """
@@ -1134,7 +1131,9 @@ def design_insight():
         if 'selected_query' not in st.session_state or st.session_state['selected_query'] != selected_query:
             st.session_state['selected_query'] = selected_query
             st.session_state['data_obj'] = None
             st.session_state['graph_obj'] = None
             st.session_state['data_prompt'] = ''
             st.session_state['graph_prompt'] = ''
             st.session_state['data_prompt_value']= ''
@@ -1235,29 +1234,51 @@ def design_insight():
                     logger.debug("Graph prompt: %s | Previous graph prompt: %s", st.session_state.get('graph_prompt'), graph_prompt)
                     if st.session_state['graph_prompt'] != graph_prompt:
                         try:
-                            graph_obj, st.session_state['graph_code'] = generate_graph(graph_prompt, st.session_state['explore_df'], st.session_state['explore_dtype'], selected_db)
-                            st.session_state['graph_obj'] = graph_obj
-                            if graph_obj is not None:
-                                # st.text(st.session_state['graph_prompt'])
-                                st.plotly_chart(graph_obj, use_container_width=True)
-                                logger.info("Graph generated and displayed using Plotly.")
                             else:
-                                st.session_state['graph_obj'] = None
-                                st.text('Error in generating graph, please try again.')
                         except Exception as e:
-                            logger.error("Error in generating graph: %s", e)
                             st.write("Error in generating graph, please try again")
                     else:
                         try:
-                            st.plotly_chart(st.session_state['graph_obj'], use_container_width=True)
                         except Exception as e:
                             st.write("Error in displaying graph, please try again")
                 st.session_state['graph_prompt'] = graph_prompt
             else:
-                if st.session_state['graph_obj'] is not None:
                     try:
-                        st.plotly_chart(st.session_state['graph_obj'], use_container_width=True)
                     except Exception as e:
                         st.write("Error in displaying graph, please try again")
                         logger.error("Error in displaying graph: %s", e)
@@ -1271,9 +1292,10 @@ def design_insight():
                     insight_prompt = st.session_state.get('data_prompt', '')
                     insight_code = st.session_state.get('query', '')
                     chart_prompt = st.session_state.get('graph_prompt', '')
-                    chart_code = st.session_state.get('graph_code', '')
                     try:
                         result = get_existing_insight(base_code, user_persona)
@@ -1287,6 +1309,7 @@ def design_insight():
                             if chart_prompt and chart_code is not None:
                                 existing_insight['chart'][f'chart_{len(existing_insight["chart"]) + 1}'] = {
                                     'chart_prompt': chart_prompt,
                                     'chart_code': chart_code
                                 }
                             try:
@@ -1308,7 +1331,7 @@ def design_insight():
                             # logger.info(f"Next file number: {next_file_number}")
                             try:
-                                save_insight(next_file_number, user_persona, insight_desc, base_prompt, base_code,selected_db, insight_prompt, insight_code, chart_prompt, chart_code)
                                 st.text(f'Insight #{next_file_number} with Graph and/or Data saved.')
                                 # logger.info(f'Insight #{next_file_number} with Graph and/or Data saved.')
                             except Exception as e:
@@ -1400,12 +1423,18 @@ def insight_library():
                 for key, value in charts.items():
                     st.markdown(f"**{value.get('chart_prompt', 'No chart prompt available')}**")
                     try:
-                        local_vars = {}
-                        exec(value.get('chart_code', ''), {}, local_vars)
-                        chart = local_vars.get('chart', None)
-                        if chart is not None:
-                            st.plotly_chart(chart, use_container_width=True)
-                            st.session_state['print_chart'] = chart
                     except Exception as e:
                         logger.error(f"Error generating chart: {repr(e)}")
                         st.error("Please try again")

 import os
 import json
 import altair as alt
+import plotly.express as px
 import ast
 import streamlit as st
 from streamlit_navigation_bar import st_navbar
         blobs = container_client.list_blobs(name_starts_with=token_directory)
         for blob in blobs:
             blob_name = blob.name  # Extract the blob names
+            # print(blob_name)
             file_name_with_extension = blob_name.split('/')[-1]
             file_name = file_name_with_extension.split('.')[0]
             blob_client = container_client.get_blob_client(blob_name)
             blob_content = blob_client.download_blob().readall()
+            # print(blob_content)
             token_data = json.loads(blob_content)
             if token_data['year-month'] == current_month:
                 logger.info("Existing token_consumed found for month: {}", current_month)
         logger.error("Error while updating insight: %s", e)
         return False
+def save_insight(next_file_number, user_persona, insight_desc, base_prompt, base_code,selected_db, insight_prompt, insight_code, chart_prompt, chart_query, chart_code):
     new_insight = {
         'description': insight_desc,
         'base_prompt': base_prompt,
         'chart': {
             'chart_1': {
                 'chart_prompt': chart_prompt,
+                'chart_query': chart_query,
                 'chart_code': chart_code
             }
         }
             logger.error("Trouble writing the code file for {} and method number {}: {}", question, last_method_num + 1, e)
         return duckdb_query, last_method_num + 1
+def generate_duckdb_query(question, mydf , df_structure, selected_db):
+    # Generate the DuckDB query based on the graph prompt and dataframe structure
+    code_gen_prompt = f"""
+    You are an expert in writing SQL queries for DuckDB. Given the task and the structure of a dataframe, your goal is to generate only the SQL query string that can be executed directly on DuckDB, **without any extra code or formatting**.
+    The user prompt is a graph prompt: generate a 2-column dataset for that graph.
+    Task: ``{question}``
+    The dataframe structure is provided as a dictionary where the column names are the keys, and their data types are the values:
+    DataFrame Structure: ```{df_structure}```
+    Your goal is to generate a **clean, valid DuckDB SQL query** that can be executed with `duckdb.query()`. Do **NOT** include any assignment to variables (e.g., `result_df`), comments, backticks, or any additional text.
+    The **output should be a valid SQL query string**, ready to be executed directly in DuckDB. **Do not include any extra SQL keywords like `sql` or backticks around the query**.
+    Return **only the raw SQL query string**, without any additional formatting, comments, or explanation.
+    """
+    logger.info(f"Generating insight with prompt: {code_gen_prompt}")
+    analysis_code = run_prompt(code_gen_prompt, question, "generate graph query", selected_db)
+    # Ensure analysis_code is a string
+    if not isinstance(analysis_code, str):
+        logger.error("Generated code is not a string: {}", analysis_code)
+        raise ValueError("Generated code is not a string")
+    # Strip any unwanted formatting
+    duckdb_query = analysis_code.strip()
+    # Replace "FROM dataframe" with "FROM mydf"
+    duckdb_query = duckdb_query.replace("FROM dataframe", "FROM mydf")
+    # Ensure no additional modifications like newlines or extra spaces
+    graph_query = duckdb_query.strip()
+    logger.error(graph_query)
+    return graph_query
+def generate_graph(query, df_structure, selected_db):
+    if query is None or df_structure is None:
+        logger.error("generate_graph received None values for query or df_structure")
         return None, None
     if len(query) == 0:
         return None, None
     with st.spinner('Generating graph'):
         graph_prompt = f"""
         You are an expert in understanding English language instructions to generate a graph based on a given dataframe.
         I am providing you the dataframe structure as a dictionary in double backticks.
         Dataframe structure: ``{df_structure}``
         I am also giving you the intent instruction in triple backticks.
         Instruction for generating the graph: ```{query}```
+        # Ensure deterministic behavior in graph code
+        Only produce the Python code for creating the Plotly chart.
         Do NOT produce any backticks or double quotes or single quotes before or after the code.
         Do generate the Plotly import statement as part of the code.
         Do NOT justify your code.
         Do not print or return the chart object at the end.
         Do NOT produce any additional text that is not part of the query itself.
         Always name the final Plotly chart object as 'chart'.
+        The task is to generate a Plotly chart using the 2-coloum dataset. Mention the x, y, title, and type of chart based on the user prompt and dataframe structure.
+        Extract only the Plotly chart creation code segment like `px.bar(graph_df, x='discharge_disposition', y='record_count', color='condition_class', title='Count of Records for Every Condition Class with X Axis Showing Discharge Dispositions')`.
         """
+        logger.info(f"Generating graph with prompt: {graph_prompt}")
+        graph_response = run_prompt(graph_prompt, query, "generate graph", selected_db)
+        logger.debug(f"Graph response: {graph_response}")
+    # Extract the specific Plotly chart creation code segment
+    import re
+    pattern = r'px\.[a-z]+\([^\)]*\)'  # Regex pattern to match Plotly chart code
+    match = re.search(pattern, graph_response)
+    graph_code = match.group(0) if match else ""
+    return graph_code
 def get_table_details(engine,selected_db):
     query_tables = """
         if 'selected_query' not in st.session_state or st.session_state['selected_query'] != selected_query:
             st.session_state['selected_query'] = selected_query
             st.session_state['data_obj'] = None
+            st.session_state['graph_query'] = None
             st.session_state['graph_obj'] = None
+            st.session_state['graph_chart'] = None
             st.session_state['data_prompt'] = ''
             st.session_state['graph_prompt'] = ''
             st.session_state['data_prompt_value']= ''
                     logger.debug("Graph prompt: %s | Previous graph prompt: %s", st.session_state.get('graph_prompt'), graph_prompt)
                     if st.session_state['graph_prompt'] != graph_prompt:
                         try:
+                            duckdb_query =generate_duckdb_query(graph_prompt, st.session_state['explore_df'], st.session_state['explore_dtype'], selected_db)
+                            mydf=df
+                            st.session_state['graph_query'] = duckdb_query
+                            result_df = duckdb.query(duckdb_query).to_df()
+                            result_df = drop_duplicate_columns(result_df)
+                            result_df_dict = get_column_types(result_df)
+                            result_df_dtypes = pd.DataFrame.from_dict(result_df_dict, orient='index', columns=['Dtype'])
+                            result_df_dtypes.reset_index(inplace=True)
+                            result_df_dtypes.rename(columns={'index': 'Column'}, inplace=True)
+                            graph_df=result_df
+                            graph_response = generate_graph(graph_prompt, result_df_dtypes, selected_db)
+                            graph_code = graph_response  # Extract the graph code from the response
+                            st.session_state['graph_obj'] = graph_code
+                            # Ensure 'graph_df' is replaced by 'df' in the generated code
+                            graph_code = graph_code.replace('graph_df', 'df')
+                            # Check and print the generated graph code for debugging
+                            print("Generated graph code:", graph_code)
+                            # Execute the graph code to create the Plotly figure object
+                            local_vars = {'df': graph_df}  # Define the dataframe as 'df'
+                            exec(f"import plotly.express as px\nchart = {graph_code}", local_vars)
+                            if 'chart' in local_vars:
+                                chart = local_vars['chart']  # Extract the Plotly chart object
+                                st.session_state['graph_chart'] = chart
+                                st.session_state['graph_df'] = graph_df
+                                st.plotly_chart(chart, use_container_width=True)
                             else:
+                                st.write("Chart object was not created.")
                         except Exception as e:
+                            logger.error("Error in generating graph:", e)
                             st.write("Error in generating graph, please try again")
                     else:
                         try:
+                            st.plotly_chart(st.session_state['graph_chart'], use_container_width=True)
                         except Exception as e:
                             st.write("Error in displaying graph, please try again")
                 st.session_state['graph_prompt'] = graph_prompt
             else:
+                if st.session_state['graph_chart'] is not None:
                     try:
+                        graph_df = st.session_state['graph_df']
+                        st.plotly_chart(st.session_state['graph_chart'], use_container_width=True)
                     except Exception as e:
                         st.write("Error in displaying graph, please try again")
                         logger.error("Error in displaying graph: %s", e)
                     insight_prompt = st.session_state.get('data_prompt', '')
                     insight_code = st.session_state.get('query', '')
                     chart_prompt = st.session_state.get('graph_prompt', '')
+                    chart_query = st.session_state.get('graph_query','')
+                    chart_code = st.session_state.get('graph_obj', '')
                     try:
                         result = get_existing_insight(base_code, user_persona)
                             if chart_prompt and chart_code is not None:
                                 existing_insight['chart'][f'chart_{len(existing_insight["chart"]) + 1}'] = {
                                     'chart_prompt': chart_prompt,
+                                    'chart_query' : chart_query,
                                     'chart_code': chart_code
                                 }
                             try:
                             # logger.info(f"Next file number: {next_file_number}")
                             try:
+                                save_insight(next_file_number, user_persona, insight_desc, base_prompt, base_code,selected_db, insight_prompt, insight_code, chart_prompt, chart_query, chart_code)
                                 st.text(f'Insight #{next_file_number} with Graph and/or Data saved.')
                                 # logger.info(f'Insight #{next_file_number} with Graph and/or Data saved.')
                             except Exception as e:
                 for key, value in charts.items():
                     st.markdown(f"**{value.get('chart_prompt', 'No chart prompt available')}**")
                     try:
+                        mydf=df
+                        query_code = value.get('chart_query','')
+                        result_df = duckdb.query(query_code).to_df()
+                        graph_df=result_df
+                        graph_code = value.get('chart_code', '')
+                        graph_code = graph_code.replace('graph_df', 'df')
+                        local_vars = {'df': graph_df}  # Define the dataframe as 'df'
+                        exec(f"import plotly.express as px\nchart = {graph_code}", local_vars)
+                        if 'chart' in local_vars:
+                            chart = local_vars['chart']  # Extract the Plotly chart object
+                            st.plotly_chart(chart, use_container_width=True, key=f"chart_{key}")
+                            st.session_state[f'print_chart_{key}'] = chart
                     except Exception as e:
                         logger.error(f"Error generating chart: {repr(e)}")
                         st.error("Please try again")