Spaces:

rsm-wew068
/

automated-task-manager

Running

App Files Files Community

automated-task-manager / README.md

rsm-wew068

update

30f62cd 6 months ago

preview code

raw

history blame contribute delete

8.37 kB

metadata

title: Automated Task Manager
emoji: 🚀
colorFrom: pink
colorTo: pink
sdk: docker
app_port: 8501
tags:
  - streamlit
pinned: false
short_description: Extract tasks from Gmail with AI-powered GraphRAG
license: mit

Automated Task Manager 🧠📅

A production-ready graph-aware reasoning assistant for task understanding, recommendations, and trustable GPT answers — grounded in structured user-uploaded email data with persistent storage.

🚀 Latest Major Upgrades:

Hybrid QA Pipeline: GraphRAG (topic-centered retrieval) + ChainQA (LLM answer synthesis) + RAGAS (automated answer evaluation)
LangSmith Integration: End-to-end tracing, debugging, and evaluation for every Q&A session
Neo4j Graph Database: Enterprise-grade graph storage for performance and scalability
Neon PostgreSQL: Persistent database storage for emails and tasks with serverless scaling
OpenAI Embeddings: text-embedding-3-small for superior semantic search
Production Architecture: Full ETL pipeline with ACID compliance, data integrity, and horizontal scaling

📦 Project Structure

The project is organized into modular components for database storage, graph operations, reasoning, LangGraph workflows, and Streamlit UI delivery:

File	Purpose
Core Database & Graph
`utils/database.py`	PostgreSQL operations for persistent email/task storage (Neon optimized)
`utils/neo4j_graph_writer.py`	Converts extracted task JSON to Neo4j graph database
`utils/neo4j_graphrag.py`	Neo4j-powered GraphRAG query system with OpenAI embeddings
Email Processing
`utils/email_parser.py`	Parses Gmail Takeout .mbox into structured email DataFrame
`utils/embedding.py`	OpenAI text-embedding-3-small + FAISS index creation
AI Pipeline
`utils/prompt_template.py`	GPT prompt templates for reasoning and extraction
`utils/langgraph_nodes.py`	LangGraph node definitions for each pipeline step
`utils/langgraph_dag.py`	Defines DAGs: agent chat and email-to-graph extraction
User Interface
`app.py`	Streamlit entry: upload .mbox, run extraction pipeline
`pages/My_Calendar.py`	Monthly calendar view of extracted tasks
`pages/AI_Chatbot.py`	Neo4j-powered chatbot interface for graph-based QA
Configuration
`requirements.txt`	Python dependencies (OpenAI, Neo4j, PostgreSQL, LangSmith, RAGAS)
`.env`	Environment variables (DATABASE_URL, NEO4J_URI, OPENAI_API_KEY, LANGCHAIN_API_KEY)

🧠 Hybrid QA Pipeline (GraphRAG + ChainQA + RAGAS)

Each Q&A session uses a hybrid pipeline:

GraphRAG provides graph-based retrieval for structure and grounding (using OpenAI embeddings and topic expansion).
ChainQA (LLM-based reasoning) synthesizes a fluent, explainable answer from the retrieved graph context.
RAGAS evaluates the answer’s quality based on the retrieved context and user query, using metrics like faithfulness and context recall. This evaluation happens automatically after each answer generation step (as long as the question, answer, and context are formatted correctly).

This approach replaces Cypher-generating QA with graph-grounded semantic retrieval (GraphRAG) and LLM answer synthesis (ChainQA).

✅ Latest Major Upgrades

Hybrid QA Pipeline: GraphRAG + ChainQA + RAGAS for reliable, explainable answers
LangSmith Integration: End-to-end tracing and evaluation for every Q&A session
Replaces Cypher-generating QA: Now uses graph-grounded semantic retrieval and LLM answer synthesis for all Q&A
Streamlined UI: Only "Load Emails" and "Extract Tasks with AI" (no "Parse Email" button)

🧪 RAGAS Evaluation

RAGAS evaluates each answer’s quality based on the retrieved context and user query, using metrics like faithfulness and context recall. Note: RAGAS requires the question, answer, and context to be formatted and passed correctly for accurate evaluation.

⚡ LangSmith Integration (Tracing & Evaluation)

LangSmith lets you trace, debug, and evaluate every Q&A session:

Sign up at smith.langchain.com and get your API key
Install LangSmith SDK:
```
pip install langsmith
```

Set environment variables:

LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=your-langsmith-api-key
LANGCHAIN_PROJECT=your-project-name

Tracing is automatic for all LangChain chains, retrievers, and LLM calls. For custom code, use:

from langsmith import traceable

@traceable(name="HybridQAChain")
def hybrid_qa_chain(user_query, graph_context):
    # ... LLM and RAGAS logic ...
    return answer, ragas_scores

View traces and RAGAS scores in your LangSmith dashboard for every user question.

🧠 Full ETL Workflow

User Uploads Inbox.mbox file (Gmail Takeout extracted)
Smart Email Filtering → Apply intelligent filters (date range, keywords, content length, etc.)
Clean & Normalize → Parse emails, sanitize content, and store in Neon PostgreSQL
OpenAI Embed & Store in FAISS → Create vector embeddings for semantic search
LLM Extraction → Structured JSON → Extract tasks, people, dates, etc.
Human-in-the-Loop Validation (if needed) → User can review and correct extracted tasks
Store in PostgreSQL and Neo4j → Validated tasks and relationships are persisted
GraphRAG + ChainQA + RAGAS → All Q&A uses the hybrid pipeline for reliable, explainable answers

🏗️ Technical Architecture

GraphRAG: Topic-centered retrieval from Neo4j using OpenAI embeddings
ChainQA: LLM (GPT-4) generates answers from graph context
RAGAS: Automated evaluation of answer quality (faithfulness, context recall, etc.)
LangSmith: Tracing, debugging, and experiment tracking for every Q&A session

Data Flow

Gmail Takeout → .mbox Upload → Smart Filtering → Email Parsing → Neon PostgreSQL (parsed_email) →
OpenAI Embedding → FAISS Index → LLM Extraction → HITL Validation → 
PostgreSQL (tasks) + Neo4j Graph → GraphRAG Retrieval → ChainQA Answer → RAGAS Evaluation → UI

⚙️ Environment Setup

Required Environment Variables

Create a .env file in your project root:

# 🔑 OpenAI Configuration (Required)
OPENAI_API_KEY=sk-your_openai_api_key_here

# 🐘 Neon PostgreSQL (Required)
DATABASE_URL=postgresql://username:password@ep-xxx-xxx.us-east-1.aws.neon.tech/neondb?sslmode=require

# 🕸️ Neo4j Configuration (Required)
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your_neo4j_password

# LangSmith (Optional but recommended)
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=your-langsmith-api-key
LANGCHAIN_PROJECT=your-project-name

🔧 Troubleshooting

Database Connection Issues: See error messages for PostgreSQL/Neo4j/OpenAI in the logs
LangSmith Issues: Ensure your API key and project name are set; see smith.langchain.com
RAGAS Evaluation: If RAGAS scores are missing, check your Python environment and logs for errors
File Upload: Only .mbox files are supported; large files are processed up to 200MB

🎯 System Benefits

Reliable, explainable answers: All Q&A is grounded in graph data, with automated evaluation
Traceable and debuggable: Every Q&A session is logged and traceable in LangSmith
Production-ready: Enterprise databases, scalable architecture, and robust error handling
Easy to use: Streamlined UI, clear workflow, and persistent storage

🎉 Ready to Deploy

Your automated task manager is now production-ready with:

✅ Hybrid QA pipeline (GraphRAG + ChainQA + RAGAS)
✅ LangSmith tracing and evaluation
✅ Enterprise-grade databases (Neo4j + Neon PostgreSQL)
✅ Superior AI embeddings (OpenAI text-embedding-3-small)
✅ Persistent data storage
✅ Scalable architecture
✅ Professional deployment

Perfect for: Teams, consultants, project managers, and anyone who needs to extract actionable insights from email data with enterprise-grade reliability and AI-powered intelligence.