Spaces:

Vashishta-S-2141
/

LLM_Powered_Database_Chatbot

Sleeping

App Files Files Community

SVashishta1 commited on Apr 24, 2025

Commit

36118e8

1 Parent(s): 42d8891

Update: Added visualization tab and improved UI

Browse files

Files changed (5) hide show

.DS_Store +0 -0
README.md +57 -21
app.py +185 -143
huggingface.yml +5 -4
requirements.txt +7 -6

.DS_Store CHANGED Viewed

Binary files a/.DS_Store and b/.DS_Store differ

README.md CHANGED Viewed

@@ -1,38 +1,74 @@
 ---
-title: Testing Space
-emoji: 🐢
-colorFrom: green
-colorTo: red
 sdk: gradio
-sdk_version: 5.20.0
 app_file: app.py
 pinned: false
 license: mit
-short_description: testing
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
-# 📚 AI Document Assistant
-This is an AI-powered document analysis chatbot that allows you to:
-- Upload documents (PDF, TXT, DOCX, CSV, XLSX)
-- Ask questions about your documents
-- Get AI-generated responses based on document content
 ## Features
-- **Document Parsing**: Extract text from various file formats
-- **Vector Storage**: Efficient document retrieval with ChromaDB
-- **AI-Powered Responses**: Using Groq API for natural language understanding
-- **User-Friendly Interface**: Built with Gradio
 ## How to Use
-1. **Upload Documents**: Go to the "Document Upload" tab and upload your files
-2. **Ask Questions**: Go to the "Chat" tab and ask questions about your documents
-3. **Configure Settings**: Set your API keys in the "Settings" tab
 ## Technical Details

 ---
+title: "LLM Powered Database Chatbot"
+emoji: "🤖"
+colorFrom: "blue"
+colorTo: "purple"
 sdk: gradio
+sdk_version: 4.19.0
 app_file: app.py
 pinned: false
+space: Vashishta-S-2141/LLM_Powered_Database_Chatbot
 license: mit
+hardware: cpu
+persistentStorage: true
 ---
+# 🤖 LLM Powered Database Chatbot
+A powerful chatbot that can analyze your documents and data, providing insights and visualizations through natural language queries.
 ## Features
+- **Document Analysis**: Upload and query PDFs, TXT, DOCX, CSV, and XLSX files
+- **Data Visualization**: Generate interactive plots and charts from your data
+- **Natural Language Interface**: Ask questions in plain English
+- **Multiple Data Sources**: Work with both documents and structured data
+- **Interactive Visualizations**: View and save your data visualizations
 ## How to Use
+1. **Upload Documents**:
+   - Go to the "Document Upload" tab
+   - Upload your files (PDF, TXT, DOCX, CSV, or XLSX)
+   - Click "Process & Index Documents"
+2. **Ask Questions**:
+   - Type your question in the chat interface
+   - The bot will analyze your documents and provide answers
+   - For data-related questions, it will generate visualizations
+3. **View Visualizations**:
+   - Switch to the "Visualizations" tab to see your plots
+   - Use the buttons to save or clear visualizations
+## Requirements
+- Groq API key (set in environment variables)
+- Python 3.8 or higher
+## Local Development
+1. Clone this repository
+2. Install dependencies:
+   ```bash
+   pip install -r requirements.txt
+   ```
+3. Set up environment variables:
+   ```bash
+   export GROQ_API_KEY=your_api_key_here
+   ```
+4. Run the application:
+   ```bash
+   python app.py
+   ```
+## License
+MIT License
+## Author
+Vashishta-S-2141
 ## Technical Details

app.py CHANGED Viewed

@@ -6,7 +6,7 @@ import tempfile
 import pandas as pd
 import sqlite3
 from langchain_core.prompts import ChatPromptTemplate
-from langchain_groq import ChatGroq
 import plotly.express as px
 import time
 import plotly.io as pio
@@ -45,6 +45,17 @@ llm = ChatGroq(
 DB_PATH = os.path.join(os.path.dirname(os.path.abspath(__file__)), "data", "csv_data.db")
 os.makedirs(os.path.dirname(DB_PATH), exist_ok=True)
 # Current context to track what we're working with
 current_context = {
     "file_type": None,
@@ -248,98 +259,53 @@ def process_text_query(query, history):
                         cols_str = ", ".join(cols_to_use)
                         sql_query = f"SELECT {cols_str} FROM data_tab WHERE {numeric_cols[0]} IS NOT NULL LIMIT 1000;"
                     else:
-                        # Not enough numeric columns
                         sql_query = "SELECT * FROM data_tab LIMIT 10;"
             else:
-                # Generate SQL query using LLM
-                ai_msg = query_prompt | llm
-                raw_sql_query = ai_msg.invoke({"question": question_with_context}).content.strip()
-                # Clean the SQL query
-                sql_query = clean_sql_query(raw_sql_query)
-            print(f"Generated SQL Query: {sql_query}")
-            try:
-                # Execute the query
-                result_df = pd.read_sql_query(sql_query, conn)
-                # Generate data summary
-                if not result_df.empty:
-                    data_summary = result_df.describe(include='all').to_string()
-                    # For small result sets, include the actual data
-                    if len(result_df) <= 10:
-                        data_summary += f"\n\nFull Results:\n{result_df.to_string()}"
-                    else:
-                        data_summary += f"\n\nFirst 5 rows:\n{result_df.head(5).to_string()}"
-                else:
-                    data_summary = "No relevant data found."
-                # Generate interpretation
-                answer_chain = interpret_prompt | llm
-                interpretation = answer_chain.invoke({
-                    "question": query,
-                    "sql_query": sql_query,
-                    "data_summary": data_summary
-                }).content.strip()
-                # Create the response
-                response = f"**SQL Query:**\n```sql\n{sql_query}\n```\n\n"
-                if not result_df.empty:
-                    if len(result_df) > 10:
-                        response += f"**Results (first 5 of {len(result_df)} rows):**\n```\n{result_df.head(5).to_string()}\n```\n\n"
-                    else:
-                        response += f"**Results:**\n```\n{result_df.to_string()}\n```\n\n"
-                else:
-                    response += "**No results found.**\n\n"
-                response += f"**Analysis:**\n{interpretation}"
-                # Add visualization if requested
-                if is_visualization and not result_df.empty:
-                    try:
-                        # Generate visualization
-                        viz_html = generate_visualization(result_df, query)
-                        if viz_html:
-                            # Add the visualization to the response
-                            response += f"\n\n{viz_html}"
-                            # Add note about visualization
-                            response += "\n\n**A visualization has been generated and is displayed above.**"
-                        else:
-                            response += "\n\n**Could not generate visualization due to an error.**"
-                    except Exception as viz_error:
-                        print(f"Visualization error: {str(viz_error)}")
-                        import traceback
-                        traceback.print_exc()
-            except Exception as e:
-                response = f"**SQL Query:**\n```sql\n{sql_query}\n```\n\n**Error executing query:** {str(e)}"
-            conn.close()
         except Exception as e:
-            response = f"Error processing query: {str(e)}"
     else:
-        # For non-CSV queries, use the document assistant
         try:
             response = document_assistant.process_query(query)
         except Exception as e:
-            response = f"Error processing document query: {str(e)}"
-    # Calculate processing time
-    processing_time = time.time() - start_time
-    response += f"\n\n(Query processed in {processing_time:.2f} seconds)"
-    # Add the response to history
-    history.append({"role": "assistant", "content": response})
-    return "", history
 def process_file_upload(files):
     """Process uploaded files and index them"""
@@ -638,8 +604,8 @@ def generate_visualization(result_df, query):
         print("Visualization requested, attempting to create plot...")
         # Set common figure parameters
-        fig_width = 900  # Adjusted for a more square shape
-        fig_height = 800  # Increased to make it more square
         # Determine visualization type from query
         viz_type = 'bar'  # Default
@@ -749,17 +715,13 @@ def generate_visualization(result_df, query):
                         result_df,
                         x=x_col,
                         y=y_col,
-                        title=f'Bar Chart of {y_col} by {x_col}',
-                        width=900,
-                        height=800
                     )
                 else:
                     fig = px.bar(
                         result_df,
                         x=x_col,
-                        title=f'Bar Chart of {x_col}',
-                        width=900,
-                        height=800
                     )
             # Improve bar chart layout
@@ -777,14 +739,25 @@ def generate_visualization(result_df, query):
             margin=dict(l=40, r=40, t=80, b=80, pad=4),  # Balanced margins
             autosize=True,  # Allow the plot to resize with the container
             plot_bgcolor='rgba(240,240,240,0.2)',  # Light gray background
-            paper_bgcolor='white'
         )
         print(f"Created figure with width={fig_width}, height={fig_height}")
-        # Convert to image
         print("Converting figure to image...")
-        img_bytes = pio.to_image(fig, format="png", width=fig_width, height=fig_height, scale=2)
         print("Image conversion successful")
         # Encode as base64
@@ -794,8 +767,14 @@ def generate_visualization(result_df, query):
         print("HTML conversion successful")
-        # Return the HTML img tag
-        return f"<img src='{img_src}' width='100%' style='max-width:900px; height:800px; object-fit:contain; display:block; margin:0 auto;' />"
     except Exception as e:
         import traceback
@@ -808,6 +787,9 @@ with gr.Blocks(title="LLM Powered Database Chatbot") as demo:
     gr.Markdown("# 🤖 LLM Powered Database Chatbot")
     gr.Markdown("Upload documents, ask questions, and get AI-powered responses!")
     with gr.Tab("Chat"):
         # Use a custom CSS to ensure images are displayed properly
         gr.HTML("""
@@ -839,70 +821,124 @@ with gr.Blocks(title="LLM Powered Database Chatbot") as demo:
                     show_label=False
                 )
             with gr.Column(scale=1):
-                # I am commenting out the voice button because we are not using it
-                # voice_btn = gr.Button("🎤")
-                pass  # I am using pass so the code still works
         with gr.Row():
             submit_btn = gr.Button("Submit")
             clear_btn = gr.Button("Clear")
             clear_context_btn = gr.Button("Clear Context")
-        # I am commenting out audio output because we are not using it
-        # audio_output = gr.Audio(label="Voice Response", type="filepath")
-        # I am commenting out voice input because we are not using it
-        """
-        voice_input = gr.Audio(
-            label="Voice Input",
-            type="filepath",
-            visible=False
-        )
-        """
-        # Event handlers
-        submit_btn.click(
-            process_text_query,
-            inputs=[msg, chatbot],
-            outputs=[msg, chatbot]
-        )
-        msg.submit(
-            process_text_query,
-            inputs=[msg, chatbot],
-            outputs=[msg, chatbot]
-        )
-        clear_btn.click(lambda: None, None, [chatbot], queue=False)
-        clear_context_btn.click(clear_context, inputs=[], outputs=[chatbot])
-        # I am commenting out voice button click because it is still in development phase
-        """
-        voice_btn.click(
-            lambda: gr.update(visible=True),
-            None,
-            voice_input
         )
-        """
-        # I am commenting out voice input change because it is still in development phase
-        """
-        voice_input.change(
-            process_voice_input,
-            inputs=[voice_input],
-            outputs=[msg]
         )
-        """
-        # I am commenting out TTS button because it is still in development phase
-        """
-        tts_btn = gr.Button("🔊 Speak Response")
-        tts_btn.click(
-            text_to_speech_output,
-            inputs=[chatbot],
-            outputs=[audio_output]
-        )
-        """
     with gr.Tab("Document Upload"):
         file_upload = gr.File(
@@ -985,4 +1021,10 @@ with gr.Blocks(title="LLM Powered Database Chatbot") as demo:
 # Launch the app
 if __name__ == "__main__":
-    demo.launch()

 import pandas as pd
 import sqlite3
 from langchain_core.prompts import ChatPromptTemplate
+from langchain_community.chat_models import ChatGroq
 import plotly.express as px
 import time
 import plotly.io as pio
 DB_PATH = os.path.join(os.path.dirname(os.path.abspath(__file__)), "data", "csv_data.db")
 os.makedirs(os.path.dirname(DB_PATH), exist_ok=True)
+# Create data directory if it doesn't exist
+DATA_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), "data")
+os.makedirs(DATA_DIR, exist_ok=True)
+# Create chroma_db directory if it doesn't exist
+CHROMA_DB_DIR = os.path.join(DATA_DIR, "chroma_db")
+os.makedirs(CHROMA_DB_DIR, exist_ok=True)
+# Set environment variables for ChromaDB
+os.environ["CHROMA_DB_PATH"] = CHROMA_DB_DIR
 # Current context to track what we're working with
 current_context = {
     "file_type": None,
                         cols_str = ", ".join(cols_to_use)
                         sql_query = f"SELECT {cols_str} FROM data_tab WHERE {numeric_cols[0]} IS NOT NULL LIMIT 1000;"
                     else:
                         sql_query = "SELECT * FROM data_tab LIMIT 10;"
             else:
+                # For other queries, use the LLM to generate SQL
+                sql_query = llm.invoke(query_prompt.format(question=question_with_context)).content
+                sql_query = clean_sql_query(sql_query)
+            # Execute the query
+            result_df = pd.read_sql_query(sql_query, conn)
+            # Close the connection
+            conn.close()
+            # Generate visualization if requested
+            if is_visualization:
+                viz_html = generate_visualization(result_df, query)
+                if viz_html:
+                    # Add the visualization to history
+                    history.append({"role": "assistant", "content": viz_html})
+                    return viz_html, history
+            # If no visualization or visualization failed, generate text response
+            data_summary = result_df.to_string()
+            response = llm.invoke(interpret_prompt.format(
+                question=query,
+                sql_query=sql_query,
+                data_summary=data_summary
+            )).content
+            # Add the response to history
+            history.append({"role": "assistant", "content": response})
+            return response, history
         except Exception as e:
+            error_msg = f"Error processing query: {str(e)}"
+            history.append({"role": "assistant", "content": error_msg})
+            return error_msg, history
     else:
+        # Handle non-CSV queries (document queries)
         try:
             response = document_assistant.process_query(query)
+            history.append({"role": "assistant", "content": response})
+            return response, history
         except Exception as e:
+            error_msg = f"Error processing query: {str(e)}"
+            history.append({"role": "assistant", "content": error_msg})
+            return error_msg, history
 def process_file_upload(files):
     """Process uploaded files and index them"""
         print("Visualization requested, attempting to create plot...")
         # Set common figure parameters
+        fig_width = 1200  # Increased for better quality
+        fig_height = 800  # Maintain aspect ratio
         # Determine visualization type from query
         viz_type = 'bar'  # Default
                         result_df,
                         x=x_col,
                         y=y_col,
+                        title=f'Bar Chart of {y_col} by {x_col}'
                     )
                 else:
                     fig = px.bar(
                         result_df,
                         x=x_col,
+                        title=f'Bar Chart of {x_col}'
                     )
             # Improve bar chart layout
             margin=dict(l=40, r=40, t=80, b=80, pad=4),  # Balanced margins
             autosize=True,  # Allow the plot to resize with the container
             plot_bgcolor='rgba(240,240,240,0.2)',  # Light gray background
+            paper_bgcolor='white',
+            font=dict(size=12)  # Increase font size
+        )
+        # Add hover information
+        fig.update_traces(
+            hovertemplate="%{x}: %{y}<extra></extra>",
+            hoverlabel=dict(
+                bgcolor="white",
+                font_size=12,
+                font_family="Arial"
+            )
         )
         print(f"Created figure with width={fig_width}, height={fig_height}")
+        # Convert to image with higher quality
         print("Converting figure to image...")
+        img_bytes = pio.to_image(fig, format="png", width=fig_width, height=fig_height, scale=3)  # Increased scale for better quality
         print("Image conversion successful")
         # Encode as base64
         print("HTML conversion successful")
+        # Return the HTML img tag with responsive sizing
+        return f"""
+        <div class="visualization-wrapper">
+            <img src='{img_src}'
+                 style='max-width:100%; height:auto; display:block; margin:0 auto;'
+                 alt='Data Visualization' />
+        </div>
+        """
     except Exception as e:
         import traceback
     gr.Markdown("# 🤖 LLM Powered Database Chatbot")
     gr.Markdown("Upload documents, ask questions, and get AI-powered responses!")
+    # Add a global variable to store the current visualization
+    current_visualization = gr.State(None)
     with gr.Tab("Chat"):
         # Use a custom CSS to ensure images are displayed properly
         gr.HTML("""
                     show_label=False
                 )
             with gr.Column(scale=1):
+                pass
         with gr.Row():
             submit_btn = gr.Button("Submit")
             clear_btn = gr.Button("Clear")
             clear_context_btn = gr.Button("Clear Context")
+    with gr.Tab("Visualizations"):
+        gr.Markdown("## 📊 Data Visualizations")
+        with gr.Row():
+            with gr.Column(scale=3):
+                visualization_output = gr.HTML(
+                    label="Current Visualization",
+                    elem_classes="visualization-container"
+                )
+            with gr.Column(scale=1):
+                with gr.Group():
+                    clear_viz_btn = gr.Button("🗑️ Clear Visualization", variant="stop")
+                    save_viz_btn = gr.Button("💾 Save Visualization")
+                    save_status = gr.Textbox(label="Save Status", visible=False)
+        gr.Markdown("""
+        ### How to use:
+        1. Ask a question about your data in the Chat tab
+        2. If your question involves visualization, the plot will appear here
+        3. You can switch between Chat and Visualizations tabs to see both the conversation and the plots
+        4. Use the buttons above to clear or save the current visualization
+        """)
+        # Add custom CSS for better visualization display
+        gr.HTML("""
+        <style>
+        .visualization-container {
+            min-height: 600px;
+            max-height: 800px;
+            overflow: auto;
+            padding: 20px;
+            background-color: #f8f9fa;
+            border-radius: 8px;
+        }
+        .visualization-container img {
+            max-width: 100%;
+            height: auto;
+            display: block;
+            margin: 0 auto;
+        }
+        </style>
+        """)
+        def clear_visualization():
+            return "", ""
+        def save_visualization(viz_html):
+            if not viz_html:
+                return "No visualization to save", gr.update(visible=True)
+            try:
+                # Create a unique filename
+                timestamp = time.strftime("%Y%m%d_%H%M%S")
+                filename = f"visualization_{timestamp}.html"
+                filepath = os.path.join(DATA_DIR, filename)
+                # Save the visualization
+                with open(filepath, "w") as f:
+                    f.write(viz_html)
+                return f"Visualization saved as {filename}", gr.update(visible=True)
+            except Exception as e:
+                return f"Error saving visualization: {str(e)}", gr.update(visible=True)
+        clear_viz_btn.click(
+            clear_visualization,
+            outputs=[visualization_output, current_visualization]
         )
+        save_viz_btn.click(
+            save_visualization,
+            inputs=[current_visualization],
+            outputs=[save_status, save_status]
         )
+    # Update the process_text_query function to handle visualizations
+    def process_text_query_with_visualization(query, history, current_viz):
+        """Process a text query and update chat history and visualization"""
+        if not query:
+            return "", history, current_viz
+        # Process the query and get the response
+        response, new_history = process_text_query(query, history)
+        # Check if the response contains a visualization
+        if "<img src=" in response:
+            # Extract the visualization HTML
+            viz_html = response
+            # Update the visualization state
+            current_viz = viz_html
+            # Remove the visualization from the chat response
+            response = "I've created a visualization for your query. Please check the 'Visualizations' tab to see it."
+        return response, new_history, current_viz
+    # Update the button click handlers
+    submit_btn.click(
+        process_text_query_with_visualization,
+        inputs=[msg, chatbot, current_visualization],
+        outputs=[msg, chatbot, current_visualization]
+    ).then(
+        lambda: None,  # Clear the input
+        outputs=[msg]
+    ).then(
+        lambda viz: viz if viz else "",  # Update visualization tab
+        inputs=[current_visualization],
+        outputs=[visualization_output]
+    )
+    clear_btn.click(lambda: None, None, chatbot, queue=False)
+    clear_context_btn.click(clear_context, None, chatbot, queue=False)
     with gr.Tab("Document Upload"):
         file_upload = gr.File(
 # Launch the app
 if __name__ == "__main__":
+    demo.launch(
+        share=True,
+        server_name="0.0.0.0",
+        server_port=7860,
+        show_error=True,
+        debug=True
+    )

huggingface.yml CHANGED Viewed

@@ -1,11 +1,12 @@
-title: AI Document Assistant
-emoji: 📚
-colorFrom: blue
-colorTo: indigo
 sdk: gradio
 sdk_version: 4.19.0
 app_file: app.py
 pinned: false
 license: mit
 hardware: cpu
 persistentStorage: true

+title: "LLM Powered Database Chatbot"
+emoji: "🤖"
+colorFrom: "blue"
+colorTo: "purple"
 sdk: gradio
 sdk_version: 4.19.0
 app_file: app.py
 pinned: false
+space: Vashishta-S-2141/LLM_Powered_Database_Chatbot
 license: mit
 hardware: cpu
 persistentStorage: true

requirements.txt CHANGED Viewed

@@ -1,4 +1,6 @@
 langchain>=0.1.0
 groq>=0.4.0
 chromadb>=0.4.22
 pymupdf>=1.23.0
@@ -6,9 +8,8 @@ pandas>=2.0.0
 python-docx>=0.8.11
 gradio>=4.19.0
 python-dotenv>=1.0.0
-langchain-community>=0.0.10
-langchain-groq>=0.0.5
-plotly>=5.14.0
-gtts>=2.3.1
-SpeechRecognition>=3.10.0
-kaleido>=0.2.1

 langchain>=0.1.0
+langchain-core>=0.1.0
+langchain-community>=0.0.10
 groq>=0.4.0
 chromadb>=0.4.22
 pymupdf>=1.23.0
 python-docx>=0.8.11
 gradio>=4.19.0
 python-dotenv>=1.0.0
+plotly>=5.14.0
+kaleido>=0.2.1
+numpy>=1.24.0
+sqlite3>=3.35.0
+python-multipart>=0.0.6