Spaces:

GFiaMon
/

meeting-agent-docker

Paused

App Files Files Community

GFiaMon commited on Dec 6, 2025

Commit

b812a47

1 Parent(s): be4bd63

Updated many system prompt bugs

Browse files

Files changed (5) hide show

app.py +1 -1
src/agents/conversational.py +98 -28
src/retrievers/pinecone.py +3 -3
src/tools/general.py +309 -22
src/ui/gradio_app.py +13 -3

app.py CHANGED Viewed

@@ -62,5 +62,5 @@ if __name__ == "__main__":
         demo.launch(
             server_name="0.0.0.0",
             server_port=7860,
-            share=True
         )

         demo.launch(
             server_name="0.0.0.0",
             server_port=7860,
+            share=False
         )

src/agents/conversational.py CHANGED Viewed

@@ -27,6 +27,8 @@ from src.tools.general import (
     list_recent_meetings,
     search_meetings,
     upsert_text_to_pinecone,
 )
 from src.tools.video import (
     cancel_video_workflow,
@@ -60,6 +62,21 @@ class ConversationalMeetingAgent:
     # Enhanced system prompt for conversational workflow
     SYSTEM_PROMPT = """You are a friendly and helpful Meeting Intelligence Assistant. You help users manage their meeting recordings through natural conversation.
 **IMPORTANT: Handling Meeting References**
 - If the user refers to a meeting by index (e.g., "meeting 1", "the second meeting"), you MUST first call `list_recent_meetings` to find the actual `meeting_id` (e.g., "meeting_abc123").
 - NEVER use "meeting 1" or "meeting 2" as a `meeting_id` in tool calls. Always map it to the real ID first.
@@ -101,48 +118,92 @@ You can help users with two main workflows:
 - `list_recent_meetings`: Show available meetings
 - `search_meetings`: Search meeting content semantically
 - `get_meeting_metadata`: Get meeting details
-**Notion Integration (Export & Documentation):**
 **IMPORTANT: You CAN and SHOULD use Notion tools when the user asks!**
-When the user asks to create/upload content to Notion:
-1. **Use `API-post-page` to create a new page**:
    ```
-   API-post-page(
-       parent={"page_id": "2bc5a424-5cbb-80ec-8aa9-c4fd989e67bc"},
-       properties={"title": [{"text": {"content": "Your Page Title"}}]},
-       children=["Content goes here as a string"]
    )
    ```
-2. **Default Parent Page**: Use `2bc5a424-5cbb-80ec-8aa9-c4fd989e67bc` (the "Meetings Summary Test" page).
-3. **Alternative**: If the user specifies a different location, use `API-post-search(query="page name")` to find it first.
-**Available Tools:**
-- `API-post-page`: Create new pages (USE THIS!)
 - `API-post-search`: Search for pages
-- `API-append-block-children`: Add content to existing pages
 - `API-patch-page`: Update page properties
-**Example:**
-User: "Create a test page in Notion"
-You: Call `API-post-page(parent={"page_id": "2bc5a424-5cbb-80ec-8aa9-c4fd989e67bc"}, properties={"title": [{"text": {"content": "Test Page"}}]}, children=["This is a test"])`
-**Generic Document/Text Upsert:**
-**IMPORTANT: Saving Content from Notion or Manual Entry**
-When the user wants to save content from Notion, manual notes, or any text that is NOT a video transcription, you MUST use the `upsert_text_to_pinecone` tool.
-1. **Get the content**: If it's a Notion page, first read it using `API-get-block-children` or `API-retrieve-page`. If it's manual text, use the text provided by the user.
-2. **Upsert**: Call `upsert_text_to_pinecone(text="...", title="...", source="Notion/Manual")`.
-**Example:**
-User: "Save this Notion page to Pinecone"
-You: [After reading page content] Call `upsert_text_to_pinecone(text="Page content...", title="Page Title", source="Notion")`
 **Conversational Guidelines:**
@@ -165,7 +226,11 @@ You: [After reading page content] Call `upsert_text_to_pinecone(text="Page conte
    - Confirm success and offer to help with queries
 4. **Meeting Query Flow**:
-   - For "what meetings": call `list_recent_meetings`
    - For specific questions: call `search_meetings`
    - For meeting details: call `get_meeting_metadata`
    - To create minutes/summaries:
@@ -260,7 +325,11 @@ Remember: You're a helpful assistant focused on making meeting management effort
             search_meetings,
             get_meeting_metadata,
             list_recent_meetings,
-            upsert_text_to_pinecone
         ]
         # Load MCP tools (Notion integration)
@@ -310,6 +379,7 @@ Remember: You're a helpful assistant focused on making meeting management effort
             if success:
                 tools = mcp_manager.get_langchain_tools()
                 print(f"✅ Integrated {len(tools)} MCP tools into agent")
                 return tools
             else:
                 print("⚠️  MCP initialization failed")

     list_recent_meetings,
     search_meetings,
     upsert_text_to_pinecone,
+    import_notion_to_pinecone,
+    create_notion_page,
 )
 from src.tools.video import (
     cancel_video_workflow,
     # Enhanced system prompt for conversational workflow
     SYSTEM_PROMPT = """You are a friendly and helpful Meeting Intelligence Assistant. You help users manage their meeting recordings through natural conversation.
+**CRITICAL: INTENT ROUTING (READ FIRST)**
+Before calling ANY tool, determine the user's intent:
+1. **"Create a Notion page"** / **"Save to Notion"** / **"Export to Notion"** / **"Upload to Notion"**
+   - **ACTION**: You MUST use `create_notion_page(title=..., content=...)`.
+   - **FORBIDDEN TOOLS**: Do NOT use `upsert_text_to_pinecone` or `import_notion_to_pinecone`.
+   - **Example**: "Create a Notion page with these minutes" -> `create_notion_page(...)`
+2. **"Save to Database"** / **"Save to Memory"** / **"Upload to Pinecone"** / **"Ingest this"**
+   - **ACTION**: Use `upsert_text_to_pinecone` (for manual text) or `import_notion_to_pinecone` (for Notion pages).
+   - **FORBIDDEN TOOLS**: Do NOT use Notion creation tools.
+3. **"Import from Notion"** / **"Sync from Notion"**
+   - **ACTION**: Use `import_notion_to_pinecone`.
 **IMPORTANT: Handling Meeting References**
 - If the user refers to a meeting by index (e.g., "meeting 1", "the second meeting"), you MUST first call `list_recent_meetings` to find the actual `meeting_id` (e.g., "meeting_abc123").
 - NEVER use "meeting 1" or "meeting 2" as a `meeting_id` in tool calls. Always map it to the real ID first.
 - `list_recent_meetings`: Show available meetings
 - `search_meetings`: Search meeting content semantically
 - `get_meeting_metadata`: Get meeting details
+- `get_current_time` (from World Time MCP): Check today's date (use this for questions like "last week", "yesterday", etc.)
+**Notion Integration & Retrieval:**
 **IMPORTANT: You CAN and SHOULD use Notion tools when the user asks!**
+**A. RETRIEVING from Notion (Workflow):**
+To retrieve a full page from Notion, you MUST follow these steps (Notion pages are split into metadata and content):
+1. **Find Page**: Use `API-post-search(query="name")` to get the `page_id`.
+2. **Get Metadata**: Use `API-retrieve-a-page(page_id=...)` to get the title and properties. *This does NOT return the page content/text.*
+3. **Get Content (CRITICAL)**: Use `API-get-block-children(block_id=page_id)` to get the actual text blocks.
+   - You MUST iterate through the blocks to extract the "plain_text" or "content".
+   - If you skip this, you will only have an empty page!
+**B. CREATING in Notion:**
+1. **Use `create_notion_page`**:
+   - Simply provide the `title` and the `content` (plain text or markdown).
+   - This tool handles all paragraph formatting 2000-char limits automatically.
+   - Do NOT try to build complex JSON blocks yourself.
    ```
+   create_notion_page(
+       title="Meeting Minutes - Dec 24",
+       content="Here is the summary...\n\n- Point 1\n- Point 2"
    )
    ```
+**Available Notion Tools:**
 - `API-post-search`: Search for pages
+- `API-retrieve-a-page`: Get page metadata (Title, Date, etc.)
+- `API-get-block-children`: Get page content/blocks (USE THIS FOR CONTENT!)
+- `API-post-page`: Create new pages
+- `API-patch-block-children`: Add content to existing pages (Append)
 - `API-patch-page`: Update page properties
+**C. APPENDING to Notion:**
+When adding content to an existing page, you MUST use `API-patch-block-children`.
+**CRITICAL**: The `children` argument MUST be a list of Block Objects (like `API-post-page`).
+```
+API-patch-block-children(
+    block_id="page_id_here",
+    children=[
+        {
+            "object": "block",
+            "type": "heading_2",
+            "heading_2": {"rich_text": [{"type": "text", "text": {"content": "New Section"}}]}
+        },
+        {
+            "object": "block",
+            "type": "paragraph",
+            "paragraph": {"rich_text": [{"type": "text", "text": {"content": "New content..."}}]}
+        }
+    ]
+)
+```
+**D. SAVING to Pinecone (Generic Document/Text Upsert):**
+1. **Importing from Notion (MANDATORY)**:
+   - **ALWAYS** call `import_notion_to_pinecone(query='Meeting Title')`.
+   - **Context Resolution**: If the user says "upload the first one" or "that meeting", you MUST resolve the reference to the actual **Page Title** from the conversation history (e.g., "Meeting 1"). Do NOT pass "first one" as the query.
+   - **No Batch Uploads**: If the user asks to "upload all", "upload the missing ones", or provides a list, you MUST call `import_notion_to_pinecone` SEPARATELY for each meeting title. Do NOT call the tool once with a list or a summary. Provide one confirmation message after all are done.
+   - **NEVER** use `upsert_text_to_pinecone` for Notion content, even if you think you have the text in your history.
+   - **REASON**: Usage of `upsert_text_to_pinecone` for Notion runs the risk of you summarizing the content. `import_notion_to_pinecone` purely transfers raw data via code, which is safer.
+   - This single tool handles search, content fetching, and saving automatically.
+2. **Manual Entry (User types text directly)**:
+   - Use `upsert_text_to_pinecone` with the FULL text provided by the user.
+   - Ensure you pass the raw text without summarizing.
+**Example 1 (Notion -> Pinecone):**
+User: "Save 'Meeting 3' from Notion to Pinecone"
+You: `import_notion_to_pinecone(query="Meeting 3")`
+**Example 2 (Notion -> Pinecone):**
+User: "Sync 'Project Kickoff' to database"
+You: `import_notion_to_pinecone(query="Project Kickoff")`
+**Example 3 (Pinecone/Agent -> Notion):**
+User: "Save this summary to a Notion page"
+You: `create_notion_page(title="Summary", content="The summary...")`
+**Example 4 (Manual -> Pinecone):**
+User: "Save this note: 'Discussion about budget'"
+You: `upsert_text_to_pinecone(text="Discussion about budget", title="Manual Note")`
 **Conversational Guidelines:**
    - Confirm success and offer to help with queries
 4. **Meeting Query Flow**:
+   - For "what meetings" (db): call `list_recent_meetings`
+   - For "meetings in Notion" or "Notion pages": call `API-post-search(query="Meeting")`. Do NOT use `list_recent_meetings`.
+   - For "compare Notion and Database" or "what is missing": Call BOTH `list_recent_meetings` AND `API-post-search(query="Meeting")`, then compare the lists. Report any missing meetings clearly. If meetings are missing, ASK "Would you like to sync [Meeting Name]?" before uploading. Do NOT auto-upload.
+   - For "find meeting about X", "do I have...", or "search everywhere": Call BOTH `search_meetings(query='X')` AND `API-post-search(query='X')` and report all findings.
+   - For time-based questions (e.g., "last week", "yesterday"): FIRST call the available time tool (e.g., `get_current_time` from World Time MCP), THEN calculate the date, THEN call `search_meetings`.
    - For specific questions: call `search_meetings`
    - For meeting details: call `get_meeting_metadata`
    - To create minutes/summaries:
             search_meetings,
             get_meeting_metadata,
             list_recent_meetings,
+            upsert_text_to_pinecone,
+            list_recent_meetings,
+            upsert_text_to_pinecone,
+            import_notion_to_pinecone,
+            create_notion_page
         ]
         # Load MCP tools (Notion integration)
             if success:
                 tools = mcp_manager.get_langchain_tools()
                 print(f"✅ Integrated {len(tools)} MCP tools into agent")
+                print(f"📋 Available Tools: {[t.name for t in tools]}")
                 return tools
             else:
                 print("⚠️  MCP initialization failed")

src/retrievers/pinecone.py CHANGED Viewed

@@ -161,12 +161,12 @@ class PineconeManager:
                     meetings[meeting_id] = {
                         "meeting_id": meeting_id,
                         "meeting_date": metadata.get("meeting_date"),
-                        "title": metadata.get("title", "Untitled Meeting"),
-                        "duration": metadata.get("duration", "N/A"),
                         "source_file": metadata.get("source_file", "N/A"),
                     }
-            return list(meetings.values())
         except Exception as e:
             print(f"Error listing meetings: {e}")

                     meetings[meeting_id] = {
                         "meeting_id": meeting_id,
                         "meeting_date": metadata.get("meeting_date"),
+                        "meeting_title": metadata.get("meeting_title", metadata.get("title", "Untitled Meeting")),
+                        "meeting_duration": metadata.get("duration", metadata.get("meeting_duration", "N/A")),
                         "source_file": metadata.get("source_file", "N/A"),
                     }
+            return list(meetings.values())
         except Exception as e:
             print(f"Error listing meetings: {e}")

src/tools/general.py CHANGED Viewed

@@ -11,10 +11,12 @@ Reference: https://docs.langchain.com/oss/python/langchain/tools#create-tools
 from typing import List, Dict, Any, Optional
 from datetime import datetime
 import uuid
 from langchain.tools import tool
 from langchain_core.documents import Document
 from src.retrievers.pipeline import process_transcript_to_documents
 from src.config.settings import Config
 # Global reference to PineconeManager (will be set during initialization)
@@ -80,12 +82,18 @@ def search_meetings(query: str, max_results: int = 5, meeting_id: Optional[str]
         for i, doc in enumerate(docs, 1):
             metadata = doc.metadata
             meeting_id = metadata.get("meeting_id", "unknown")
-            meeting_date = metadata.get("meeting_date", "unknown")  # ✅ Fixed: was "date"
             chunk_index = metadata.get("chunk_index", "?")
             result_parts.append(
                 f"\n--- Segment {i} ---\n"
-                f"Meeting: {meeting_id} (Date: {meeting_date})\n"
                 f"Chunk: {chunk_index}\n"
                 f"Content:\n{doc.page_content}\n"
             )
@@ -177,7 +185,7 @@ def list_recent_meetings(limit: int = 10) -> str:
         # Get retriever with high k to fetch many documents
         retriever = _pinecone_manager.get_retriever(
             namespace=Config.PINECONE_NAMESPACE,
-            search_kwargs={"k": 100}  # Fetch many to find unique meetings
         )
         # Use a generic query to get documents
@@ -223,29 +231,285 @@ def list_recent_meetings(limit: int = 10) -> str:
         return f"Error listing meetings: {str(e)}"
 # Export all tools for easy import
 __all__ = [
     "initialize_tools",
     "search_meetings",
     "get_meeting_metadata",
     "list_recent_meetings",
-    "upsert_text_to_pinecone"
 ]
 @tool
 def upsert_text_to_pinecone(text: str, title: str, source: str = "Manual Entry", date: str = None) -> str:
     """
     Upsert any text content (e.g., Notion pages, manual notes) to Pinecone.
-    Use this tool when the user wants to save a Notion page, meeting notes, or any other text
-    that is NOT a video transcription.
     Args:
-        text: The content to save
         title: Title of the document/meeting
         source: Source of the content (e.g., "Notion", "Manual Entry")
-        date: Date of the content (YYYY-MM-DD). Defaults to today.
     Returns:
         Success message with the generated meeting_id
@@ -254,36 +518,59 @@ def upsert_text_to_pinecone(text: str, title: str, source: str = "Manual Entry",
         return "Error: Pinecone service is not initialized."
     try:
-        # Generate ID and defaults
-        meeting_id = f"doc_{uuid.uuid4().hex[:8]}"
-        if not date:
-            date = datetime.now().strftime("%Y-%m-%d")
-        # Create comprehensive metadata with consistent field names
         meeting_metadata = {
             "meeting_id": meeting_id,
-            "meeting_date": date,  # ✅ Fixed: was "date"
             "date_transcribed": datetime.now().strftime("%Y-%m-%d"),
             "source": source,
-            "meeting_title": title,  # ✅ Fixed: was "title"
-            "summary": f"Imported from {source}",  # ✅ Added summary
             "source_file": f"{source.lower()}_upload",
             "transcription_model": "text_import",
-            "language": "en"
         }
-        # Process text into documents (using fallback chunking since no speaker data)
         docs = process_transcript_to_documents(
-            transcript_text=text,
-            speaker_data=None,
             meeting_id=meeting_id,
             meeting_metadata=meeting_metadata
         )
-        # Upsert to Pinecone
         _pinecone_manager.upsert_documents(docs, namespace=Config.PINECONE_NAMESPACE)
-        return f"✅ Successfully saved '{title}' to Pinecone (ID: {meeting_id})"
     except Exception as e:
         return f"❌ Error saving to Pinecone: {str(e)}"

 from typing import List, Dict, Any, Optional
 from datetime import datetime
 import uuid
+import requests
 from langchain.tools import tool
 from langchain_core.documents import Document
 from src.retrievers.pipeline import process_transcript_to_documents
+from src.processing.metadata_extractor import MetadataExtractor
 from src.config.settings import Config
 # Global reference to PineconeManager (will be set during initialization)
         for i, doc in enumerate(docs, 1):
             metadata = doc.metadata
             meeting_id = metadata.get("meeting_id", "unknown")
+            meeting_date = metadata.get("meeting_date", "N/A")  # Fixed missing variable
+            meeting_title = metadata.get("meeting_title", "Untitled") # Added title
             chunk_index = metadata.get("chunk_index", "?")
+            summary = metadata.get("summary", "N/A")
+            speakers = metadata.get("speaker_mapping", "N/A")
             result_parts.append(
                 f"\n--- Segment {i} ---\n"
+                f"Meeting: {meeting_title} (ID: {meeting_id})\n"
+                f"Date: {meeting_date}\n"
+                f"Summary: {summary}\n"
+                f"Speakers: {speakers}\n"
                 f"Chunk: {chunk_index}\n"
                 f"Content:\n{doc.page_content}\n"
             )
         # Get retriever with high k to fetch many documents
         retriever = _pinecone_manager.get_retriever(
             namespace=Config.PINECONE_NAMESPACE,
+            search_kwargs={"k": 500}  # Fetch many to find unique meetings
         )
         # Use a generic query to get documents
         return f"Error listing meetings: {str(e)}"
+@tool
+def get_current_time() -> str:
+    """
+    Get the current date and time.
+    Use this tool when you need to answer questions about relative time
+    (e.g., "what happened yesterday?", "meetings from last week?").
+    Returns:
+        Current date and time in YYYY-MM-DD HH:MM format
+    """
+    return datetime.now().strftime("%Y-%m-%d %H:%M")
+@tool
+def import_notion_to_pinecone(query: str) -> str:
+    """
+    Directly import a Notion page to Pinecone by name.
+    Fetch a Notion page and save it TO Pinecone.
+    Use this tool ONLY when the user wants to *Import* or *Sync* a page FROM Notion INTO the database.
+    Do NOT use this tool to write content TO Notion. Use `API-post-page` or `API-append-block-children` for that.
+    This tool handles the entire process (Search -> Fetch Content -> Upsert) automatically.
+    Args:
+        query: The name of the Notion page to find (e.g., "Meeting 1").
+    Returns:
+        Status message indicating success or failure.
+    """
+    if not Config.NOTION_TOKEN:
+        return "❌ Error: NOTION_TOKEN not set in configuration."
+    headers = {
+        "Authorization": f"Bearer {Config.NOTION_TOKEN}",
+        "Notion-Version": "2022-06-28",
+        "Content-Type": "application/json"
+    }
+    def fetch_blocks_recursive(block_id: str, depth: int = 0) -> List[str]:
+        """Recursive helper to fetch blocks and their children."""
+        if depth > 5: # Safety limit for recursion depth
+            return []
+        collected_text = []
+        cursor = None
+        has_more = True
+        while has_more:
+            blocks_url = f"https://api.notion.com/v1/blocks/{block_id}/children"
+            params = {"page_size": 100}
+            if cursor:
+                params["start_cursor"] = cursor
+            resp = requests.get(blocks_url, headers=headers, params=params)
+            if resp.status_code != 200:
+                print(f"⚠️ Error fetching sub-blocks for {block_id}: {resp.text}")
+                return []
+            data = resp.json()
+            blocks = data.get("results", [])
+            for block in blocks:
+                # 1. Extract text from this block
+                b_type = block.get("type")
+                plain_text = ""
+                if b_type and block.get(b_type) and "rich_text" in block[b_type]:
+                    rich_text = block[b_type]["rich_text"]
+                    plain_text = "".join([t.get("plain_text", "") for t in rich_text])
+                # Append text if present
+                if plain_text.strip():
+                    collected_text.append(plain_text)
+                # 2. Check for children (Recursion)
+                if block.get("has_children", False):
+                    # Fetch children text and append
+                    children_text = fetch_blocks_recursive(block["id"], depth + 1)
+                    collected_text.extend(children_text)
+            has_more = data.get("has_more", False)
+            cursor = data.get("next_cursor")
+        return collected_text
+    try:
+        # 1. Search for the page
+        print(f"🔍 Searching Notion for: {query}...")
+        search_url = "https://api.notion.com/v1/search"
+        search_payload = {
+            "query": query,
+            "filter": {"value": "page", "property": "object"},
+            "sort": {"direction": "descending", "timestamp": "last_edited_time"},
+            "page_size": 25
+        }
+        response = requests.post(search_url, headers=headers, json=search_payload)
+        if response.status_code != 200:
+            return f"❌ Notion Search Error: {response.text}"
+        results = response.json().get("results", [])
+        if not results:
+            return f"❌ No Notion page found matching '{query}'."
+        # Select best match
+        best_page = None
+        exact_match = None
+        substring_match = None
+        for p in results:
+            # Extract title for this page
+            p_props = p.get("properties", {})
+            p_title_prop = next((v for k, v in p_props.items() if v["id"] == "title"), None)
+            p_title = ""
+            if p_title_prop and p_title_prop.get("title"):
+                 p_title = "".join([t.get("plain_text", "") for t in p_title_prop.get("title", [])])
+            p_title_clean = p_title.lower().strip()
+            query_clean = query.lower().strip()
+            # Check 1: Exact Match
+            if p_title_clean == query_clean:
+                exact_match = p
+                print(f"✅ Exact match found: '{p_title}'")
+                break # Found the perfect match
+            # Check 2: Substring Match (save the first one found)
+            # Check 2: Substring Match (save the first one found)
+            if query_clean in p_title_clean and substring_match is None:
+                substring_match = p
+                print(f"🔍 Substring match candidate: '{p_title}'")
+            # Print for debugging
+            print(f"   - Found result: '{p_title}'")
+        # Decide which page to use
+        if exact_match:
+            best_page = exact_match
+        elif substring_match:
+            best_page = substring_match
+            print("⚠️ Using substring match.")
+        else:
+            # Generate list of titles found to guide the user
+            titles_found = []
+            for p in results:
+                p_props = p.get("properties", {})
+                p_title_prop = next((v for k, v in p_props.items() if v["id"] == "title"), None)
+                if p_title_prop and p_title_prop.get("title"):
+                     titles_found.append("".join([t.get("plain_text", "") for t in p_title_prop.get("title", [])]))
+            return f"❌ Could not find a specific match for '{query}'. Found these pages instead: {', '.join(titles_found)}. Please try again with the exact name."
+        page = best_page
+        page_id = page["id"]
+        # Re-extract title for the selected page for final usage
+        props = page.get("properties", {})
+        title_prop = next((v for k, v in props.items() if v["id"] == "title"), None)
+        title = "Untitled"
+        if title_prop and title_prop.get("title"):
+             title = "".join([t.get("plain_text", "") for t in title_prop.get("title", [])])
+        print(f"📄 Found Page: '{title}' ({page_id})")
+        # 2. Recursive Fetch of All Content
+        all_text_lines = fetch_blocks_recursive(page_id)
+        if not all_text_lines:
+             return f"⚠️ Page '{title}' found but appears empty or has no text blocks."
+        full_content = "\n\n".join(all_text_lines)
+        # 3. Upsert to Pinecone
+        return upsert_text_to_pinecone.invoke({"text": full_content, "title": title, "source": "Notion"})
+    except Exception as e:
+        return f"❌ Import failed: {str(e)}"
 # Export all tools for easy import
 __all__ = [
     "initialize_tools",
     "search_meetings",
     "get_meeting_metadata",
     "list_recent_meetings",
+    "upsert_text_to_pinecone",
+    "import_notion_to_pinecone",
+    "create_notion_page",
+    "get_current_time"
 ]
+@tool
+def create_notion_page(title: str, content: str) -> str:
+    """
+    Create a new page in Notion with a Title and Text Content.
+    Use this tool for ANY request to "Write to Notion", "Save to Notion", "Create a page", "Draft an email in Notion".
+    This tool handles all the formatting automatically.
+    Args:
+        title: The title of the new page.
+        content: The text content of the page.
+    Returns:
+        Status message with link to the new page.
+    """
+    if not Config.NOTION_TOKEN:
+        return "❌ Error: NOTION_TOKEN not set."
+    headers = {
+        "Authorization": f"Bearer {Config.NOTION_TOKEN}",
+        "Notion-Version": "2022-06-28",
+        "Content-Type": "application/json"
+    }
+    # Split content into chunks of 2000 chars (Notion block limit)
+    chunks = [content[i:i+2000] for i in range(0, len(content), 2000)]
+    children_blocks = []
+    for chunk in chunks:
+        children_blocks.append({
+            "object": "block",
+            "type": "paragraph",
+            "paragraph": {
+                "rich_text": [{"type": "text", "text": {"content": chunk}}]
+            }
+        })
+    # Default parent page: Meetings Summary Test
+    parent_page_id = "2bc5a424-5cbb-80ec-8aa9-c4fd989e67bc"
+    payload = {
+        "parent": {"page_id": parent_page_id},
+        "properties": {
+            "title": [
+                {
+                    "text": {
+                        "content": title
+                    }
+                }
+            ]
+        },
+        "children": children_blocks
+    }
+    try:
+        url = "https://api.notion.com/v1/pages"
+        resp = requests.post(url, headers=headers, json=payload)
+        if resp.status_code == 200:
+            data = resp.json()
+            url = data.get('url', 'URL not found')
+            return f"✅ Successfully created Notion page: '{title}'.\nLink: {url}"
+        else:
+            return f"❌ Failed to create Notion page: {resp.status_code} - {resp.text}"
+    except Exception as e:
+        return f"❌ Error creating page: {str(e)}"
 @tool
 def upsert_text_to_pinecone(text: str, title: str, source: str = "Manual Entry", date: str = None) -> str:
     """
     Upsert any text content (e.g., Notion pages, manual notes) to Pinecone.
+    Automatically extracts metadata (summary, date, speakers) from the text.
+    Use this tool when retrieving full content from Notion or other sources.
+    CRITICAL: Do NOT use this tool if the user wants to "Save to Notion" or "Create a Page".
+    Use the Notion tools (`API-post-page`) for that. Use this ONLY for saving to Pinecone/Database.
     Args:
+        text: The FULL content to save (do not summarize!)
         title: Title of the document/meeting
         source: Source of the content (e.g., "Notion", "Manual Entry")
+        date: Optional date override (YYYY-MM-DD). If not provided, AI extracts it from text or uses today.
     Returns:
         Success message with the generated meeting_id
         return "Error: Pinecone service is not initialized."
     try:
+        # 1. Extract intelligent metadata
+        print(f"🔍 Extracting metadata for '{title}'...")
+        extractor = MetadataExtractor()
+        extracted = extractor.extract_metadata(text)
+        # 2. Resolve final metadata values
+        final_summary = extracted.get("summary") or f"Imported from {source}"
+        # Date logic: Argument > Extracted > Today
+        if date:
+            final_date = date
+        elif extracted.get("meeting_date"):
+            final_date = extracted.get("meeting_date")
+        else:
+            final_date = datetime.now().strftime("%Y-%m-%d")
+        speaker_mapping = extracted.get("speaker_mapping", {})
+        # 3. Apply speaker mapping to text (improves searchability)
+        # Replaces "SPEAKER_00" -> "Name" directly in the text content
+        processed_text = extractor.apply_speaker_mapping(text, speaker_mapping)
+        # 4. Generate ID and prepare metadata
+        meeting_id = f"doc_{uuid.uuid4().hex[:8]}"
         meeting_metadata = {
             "meeting_id": meeting_id,
+            "meeting_date": final_date,
             "date_transcribed": datetime.now().strftime("%Y-%m-%d"),
             "source": source,
+            "meeting_title": title,
+            "summary": final_summary,
             "source_file": f"{source.lower()}_upload",
             "transcription_model": "text_import",
+            "language": "en",
+            "speaker_mapping": speaker_mapping
         }
+        # 5. Process text into documents
         docs = process_transcript_to_documents(
+            transcript_text=processed_text,
+            speaker_data=None, # Uses fallback chunking
             meeting_id=meeting_id,
             meeting_metadata=meeting_metadata
         )
+        # 6. Upsert to Pinecone
         _pinecone_manager.upsert_documents(docs, namespace=Config.PINECONE_NAMESPACE)
+        return (f"✅ Successfully saved '{title}' to Pinecone (ID: {meeting_id})\n"
+                f"   - Date: {final_date}\n"
+                f"   - Extracted Speakers: {', '.join(speaker_mapping.values()) if speaker_mapping else 'None'}")
     except Exception as e:
         return f"❌ Error saving to Pinecone: {str(e)}"

src/ui/gradio_app.py CHANGED Viewed

@@ -315,9 +315,9 @@ The agent will acknowledge your upload and help you analyze the meeting.
             for i, meeting in enumerate(meetings, 1):
                 meeting_id = meeting.get('meeting_id', 'N/A')
-                title = meeting.get('title', 'Untitled')
                 date = meeting.get('meeting_date', 'N/A')
-                duration = meeting.get('duration', 'N/A')
                 source_file = meeting.get('source_file', 'N/A')
                 table_md += f"| {i} | `{meeting_id}` | {title} | {date} | {duration} | {source_file} |\n"
@@ -375,7 +375,7 @@ The agent will acknowledge your upload and help you analyze the meeting.
                 # Custom Chatbot with responsive height
                 custom_chatbot = gr.Chatbot(
-                    height=650,
                     show_label=False
                 )
@@ -384,6 +384,16 @@ The agent will acknowledge your upload and help you analyze the meeting.
                     "Summarize the key decisions from the last meeting",
                     "What are the action items assigned to me?",
                     "List all meetings from October",
                     "Find discussions about 'budget' and 'costs'",
                     "What did John say about the deadline?",
                     "Draft a follow-up email based on this meeting",

             for i, meeting in enumerate(meetings, 1):
                 meeting_id = meeting.get('meeting_id', 'N/A')
+                title = meeting.get('meeting_title', 'Untitled')
                 date = meeting.get('meeting_date', 'N/A')
+                duration = meeting.get('meeting_duration', 'N/A')
                 source_file = meeting.get('source_file', 'N/A')
                 table_md += f"| {i} | `{meeting_id}` | {title} | {date} | {duration} | {source_file} |\n"
                 # Custom Chatbot with responsive height
                 custom_chatbot = gr.Chatbot(
+                    height="70vh",
                     show_label=False
                 )
                     "Summarize the key decisions from the last meeting",
                     "What are the action items assigned to me?",
                     "List all meetings from October",
+                    "Create a summary for sendout",
+                    # self created REVIEW!
+                    "Show me the meeting minutes from last week",
+                    "Who were attendants from last week's meeting?",
+                    "When was the last meeting where budget was discussed?",
+                    "Who is responsible for what in that meeting?",
+                    "What tasks have been assigned to whom?",
+                    "What should person abc do?",
                     "Find discussions about 'budget' and 'costs'",
                     "What did John say about the deadline?",
                     "Draft a follow-up email based on this meeting",