full project scope

Files changed (10) hide show

app_requirements/1_feature_KG_backend.txt +25 -9
app_requirements/2_feature_API_integration.txt +24 -13
app_requirements/3_feature_agentic_reasoning_loop.txt +35 -17
app_requirements/4_feature_UI.txt +0 -22
app_requirements/4_feature_source_system_repo.txt +46 -0
app_requirements/5_feature_UI.txt +45 -0
app_requirements/5_feature_deployment.txt +0 -21
app_requirements/6_feature_MCP.txt +0 -13
app_requirements/6_feature_QA.txt +27 -0
app_requirements/7_feature_deployment.md +27 -0

app_requirements/1_feature_KG_backend.txt CHANGED Viewed

@@ -1,13 +1,29 @@
-1. Feature: Knowledge Graph Backend
-1.1 Story: As a developer, I need a Dockerized Neo4j instance so that the graph runs in a portable, consistent environment.
- 1.1.1 Task: Dockerfile builds successfully
- 1.1.2 Task: Neo4j container starts with correct version
- 1.1.3 Task: Database accessible on localhost with default creds
-1.2 Story: As a system, I need to ingest mock data into the graph so that it can be queried and tested.
- 1.2.1 Task: Sample CSV/JSON loaded into graph
- 1.2.2 Task: Nodes and relationships appear in Neo4j browser
- 1.2.3 Task: Queries return expected sample data

+1. Feature: Neo4j Knowledge Graph Core
+1.1 Story: As a developer, I need a flexible Neo4j deployment that serves as the central nervous system for all data and metadata.
+1.1.1 Task: Create Dockerfile for Neo4j Community Edition with APOC plugins
+1.1.2 Task: Configure environment variables for deployment modes (Docker/Enterprise/Aura)
+1.1.3 Task: Set up persistent volumes for graph data and backups
+1.1.4 Task: Implement connection pooling and retry logic
+1.1.5 Task: Create migration scripts from Community → Enterprise → Aura
+1.1.6 Task: Configure Neo4j for vector similarity search support
+1.2 Story: As a system, I need a comprehensive graph schema that models workflows, source systems, and their relationships.
+1.2.1 Task: Create operational nodes: Workflow, Phase, Instruction, Execution, Checkpoint, HumanIntervention, MonitoringQA
+1.2.2 Task: Create metadata nodes: SourceSystem, Database, Schema, Table, Column, DataType
+1.2.3 Task: Create knowledge nodes: CrossReference, SchemaVersion, SchemaChange, DataQuality, QueryTemplate
+1.2.4 Task: Implement all relationships with cardinality constraints
+1.2.5 Task: Add vector embedding properties for similarity search
+1.2.6 Task: Create composite indexes for query performance
+1.3 Story: As a system, I need automatic schema introspection and documentation generation.
+1.3.1 Task: Build meta-queries that extract complete graph structure
+1.3.2 Task: Generate JSON Schema from Neo4j model for API contracts
+1.3.3 Task: Create GraphQL schema from Neo4j structure
+1.3.4 Task: Auto-generate API documentation with example queries
+1.3.5 Task: Implement schema versioning with migration tracking
+1.3.6 Task: Cache schema with intelligent invalidation

app_requirements/2_feature_API_integration.txt CHANGED Viewed

@@ -1,17 +1,28 @@
-2. Feature: External MCP Connector
-Story 2.1: As an external organization, I want an MCP connector so I can connect my own LLM to the app and query Neo4j without needing to know Cypher.
- 2.1.1 Task: Define MCP tool schema for external use (get_schema, query_graph, write_graph, run_workflow)
- 2.1.2 Task: Package connector with documentation for external deployment
- 2.1.3 Task: Provide authentication mechanism for external calls
-Story 2.2: As an external organization, I want the MCP connector to guarantee schema alignment so my LLM always receives accurate context for queries.
- 2.2.1 Task: Implement schema introspection endpoint in the MCP connector
- 2.2.2 Task: Ensure schema updates automatically reflect in exposed MCP capabilities
- 2.2.3 Task: Validate with external LLMs against test datasets
-Story 2.3: As a developer, I want observability hooks in the MCP connector so that I can monitor usage and troubleshoot external LLM calls.
- 2.3.1 Task: Log each connector request and response
- 2.3.2 Task: Record errors and failed queries for QA review
- 2.3.3 Task: Expose basic usage metrics for external organizations

+2. Feature: Unified MCP Server Hub
+2.1 Story: As a system, I need a central MCP server that orchestrates all interactions between agents, Neo4j, and external sources.
+2.1.1 Task: Define core MCP tools: get_schema, query_graph, write_graph, run_workflow
+2.1.2 Task: Add orchestration tools: get_next_instruction, update_instruction, checkpoint_workflow
+2.1.3 Task: Add source tools: discover_sources, query_source, refresh_schema, get_lineage
+2.1.4 Task: Implement authentication layers (JWT internal, API key external)
+2.1.5 Task: Create permission matrix for tool access by caller type
+2.1.6 Task: Build request router that directs calls to appropriate handlers
+2.2 Story: As an external consumer, I need safe, governed access to the knowledge graph and connected sources.
+2.2.1 Task: Implement query sanitization and parameterization
+2.2.2 Task: Add query cost estimation and limits
+2.2.3 Task: Create result pagination for large datasets
+2.2.4 Task: Build response caching with smart invalidation
+2.2.5 Task: Implement field-level access controls
+2.2.6 Task: Generate audit trail for all external access
+2.3 Story: As a developer, I need comprehensive observability across all MCP operations.
+2.3.1 Task: Create MCP_Log nodes with full request/response capture
+2.3.2 Task: Link logs to workflows, sources, and users
+2.3.3 Task: Track metrics: latency, data volume, token usage, error rates
+2.3.4 Task: Build real-time monitoring dashboard
+2.3.5 Task: Implement alerting for anomalies and failures
+2.3.6 Task: Create performance optimization recommendations

app_requirements/3_feature_agentic_reasoning_loop.txt CHANGED Viewed

@@ -1,20 +1,38 @@
-3. Feature: Agentic Reasoning & Self-Learning (via Neo4j MCP Server)
-Story 3.1: As a system, I need to use the Neo4j MCP server for all interactions with the Neo4j database so that reasoning steps are structured and controlled.
- 3.1.1 Task: Route all schema discovery, query, and write operations through the Neo4j MCP server
- 3.1.2 Task: Confirm LLM cannot issue raw Cypher queries directly
- 3.1.3 Task: Validate structured MCP responses feed into reasoning loop
-Story 3.2: As a system, I need to iteratively refine my problem-solving by generating requirements, code, and QA steps, with each step documented in Neo4j as a node linked to the prior step so that a full audit trail of learning is preserved.
- 3.2.1 Task: Implement workflow for entity resolution using vector embeddings + LLM review of candidates (test objective)
- 3.2.2 Task: Generate a requirement (e.g., “link entity candidates”) and write it into Neo4j as a node
- 3.2.3 Task: Link each requirement node to its predecessor node to preserve chain-of-thought traceability
- 3.2.4 Task: Write Python script to satisfy the requirement, execute it, and record the output in Neo4j as a result node
- 3.2.5 Task: Generate QA requirement, store as a new node, link it to the corresponding step, and implement QA script
- 3.2.6 Task: Run QA cycle; if unsatisfied, ideate new requirement nodes, link them to prior steps, and repeat the loop
-Story 3.3: As a developer, I need the agentic loop to pause for 5 minutes between steps so that a human can edit the Neo4j node instructions before the agent proceeds.
- 3.3.1 Task: Implement configurable delay (default = 5 minutes) between loop phases
- 3.3.2 Task: Allow human edits to Neo4j requirement nodes during the pause
- 3.3.3 Task: Ensure the agent re-reads the latest node state after the pause before executing the next step
- 3.3.4 Task: Log cycle timing and human edits in Neo4j for observability

+3. Feature: Intelligent Agent Orchestration Layer
+3.1 Story: As an agent, I need to operate entirely from graph-stored instructions for full auditability.
+3.1.1 Task: Implement instruction fetcher that queries Neo4j for next task
+3.1.2 Task: Load instruction context including parameters and dependencies
+3.1.3 Task: Check for human interventions that modify instructions
+3.1.4 Task: Update instruction status atomically with optimistic locking
+3.1.5 Task: Implement instruction timeout and retry logic
+3.1.6 Task: Validate workflow iteration limits before proceeding
+3.2 Story: As an agent, I need to execute complex multi-phase workflows with continuous learning.
+3.2.1 Task: Initialize workflows from templates or custom definitions
+3.2.2 Task: Generate requirement nodes by analyzing data sources
+3.2.3 Task: Create implementation plans based on available MCP tools
+3.2.4 Task: Execute code/queries and store results as Execution nodes
+3.2.5 Task: Run QA validations with configurable success criteria
+3.2.6 Task: Generate refinement instructions when QA fails
+3.2.7 Task: Update QueryTemplate nodes with successful patterns
+3.2.8 Task: Create checkpoints for workflow state recovery
+3.3 Story: As an operations team, I need human-in-the-loop controls for oversight and guidance.
+3.3.1 Task: Implement configurable pause points between phases
+3.3.2 Task: Create approval workflow for high-risk operations
+3.3.3 Task: Build real-time notification system for required approvals
+3.3.4 Task: Store all human edits as HumanIntervention nodes
+3.3.5 Task: Implement emergency stop with graceful state preservation
+3.3.6 Task: Add scheduled review points for long-running workflows
+3.4 Story: As an agent, I need LLM integration for reasoning, embedding generation, and natural language processing.
+3.4.1 Task: Create LLM abstraction layer supporting multiple providers
+3.4.2 Task: Implement secure credential management (vault/environment)
+3.4.3 Task: Generate and store embeddings for semantic search
+3.4.4 Task: Build similarity graph with SIMILAR_TO relationships
+3.4.5 Task: Track token usage and costs per workflow
+3.4.6 Task: Implement fallback strategies for LLM failures

app_requirements/4_feature_UI.txt DELETED Viewed

@@ -1,22 +0,0 @@
-4. Feature: Front-End Chat Interface & Visualization
-Story 4.1: As a user, I want to enter questions into a chat interface so I can query my data in natural language without needing Cypher.
- 4.1.1 Task: Connect the chat input to the Neo4j MCP server so queries are routed through MCP
- 4.1.2 Task: Ensure MCP translates queries into Cypher and returns structured results
- 4.1.3 Task: Handle response errors and retries gracefully in the UI
-Story 4.2: As a user, I want responses displayed in clear natural language so that I can understand the results.
- 4.2.1 Task: Parse MCP responses into user-friendly text
- 4.2.2 Task: Highlight key details (nodes, relationships, counts) in the response
- 4.2.3 Task: Verify outputs with test queries for readability
-Story 4.3: As a user, I want to see supporting evidence from the graph (nodes, relationships) so that I can verify why an answer was given.
- 4.3.1 Task: Build R Shiny visualizations of graph substructures returned by Neo4j MCP server
- 4.3.2 Task: Link visual nodes and relationships directly to natural-language responses
- 4.3.3 Task: Allow user to toggle between text and graph visualization modes
-Story 4.4: As a user, I want to trigger domain-specific workflows (e.g., fraud detection, entity resolution) from the chat so that I can act on results.
- 4.4.1 Task: Add workflow trigger buttons in the R Shiny UI
- 4.4.2 Task: Ensure workflow triggers call Neo4j MCP server functions correctly
- 4.4.3 Task: Display confirmation and output of workflow execution in R Shiny dashboard

app_requirements/4_feature_source_system_repo.txt ADDED Viewed

	@@ -0,0 +1,46 @@

+4. Feature: Source System Integration & Schema Repository
+4.1 Story: As a system, I need to connect to and catalog all available data sources through MCP.
+4.1.1 Task: Implement MCP client for PostgreSQL with full introspection
+4.1.2 Task: Implement MCP client for MySQL/MariaDB
+4.1.3 Task: Implement MCP client for MongoDB with schema inference
+4.1.4 Task: Implement MCP client for S3/filesystem with format detection
+4.1.5 Task: Implement MCP client for REST APIs with OpenAPI import
+4.1.6 Task: Create SourceSystem nodes with connection metadata
+4.2 Story: As an agent, I need to automatically discover and map data across all sources.
+4.2.1 Task: Run initial discovery to catalog all tables/collections/endpoints
+4.2.2 Task: Extract column-level metadata (types, constraints, statistics)
+4.2.3 Task: Identify primary/foreign keys and relationships
+4.2.4 Task: Sample data for profiling and example generation
+4.2.5 Task: Detect potential cross-source join keys
+4.2.6 Task: Generate and store example queries for each source
+4.3 Story: As an agent, I need to continuously monitor sources for changes.
+4.3.1 Task: Implement scheduled schema comparison workflows
+4.3.2 Task: Run lightweight heartbeat queries to detect changes
+4.3.3 Task: Create SchemaChange nodes when differences found
+4.3.4 Task: Assess impact of changes on existing workflows
+4.3.5 Task: Alert on breaking changes requiring attention
+4.3.6 Task: Update statistics and samples periodically
+4.4 Story: As an agent, I need to intelligently route queries to appropriate sources.
+4.4.1 Task: Parse user questions for entity and domain references
+4.4.2 Task: Match entities to source tables using schema repository
+4.4.3 Task: Generate source-specific queries via MCP
+4.4.4 Task: Create QueryPlan nodes showing execution strategy
+4.4.5 Task: Execute parallel queries when multiple sources needed
+4.4.6 Task: Merge and reconcile results from multiple sources
+4.5 Story: As a system, I need to track data lineage and dependencies.
+4.5.1 Task: Create lineage relationships between source and derived data
+4.5.2 Task: Store transformation logic as nodes
+4.5.3 Task: Build impact analysis queries
+4.5.4 Task: Generate data flow documentation
+4.5.5 Task: Identify redundant or conflicting data sources
+4.5.6 Task: Recommend source consolidation opportunities

app_requirements/5_feature_UI.txt ADDED Viewed

	@@ -0,0 +1,45 @@

+5. Feature: Next.js Intelligent Frontend
+5.1 Story: As a user, I need a modern web interface to interact with the system.
+5.1.1 Task: Setup Next.js with TypeScript, Tailwind, and shadcn/ui
+5.1.2 Task: Implement tRPC for type-safe API communication
+5.1.3 Task: Add WebSocket support for real-time updates
+5.1.4 Task: Create Zustand stores for state management
+5.1.5 Task: Implement NextAuth with role-based access
+5.1.6 Task: Build responsive layout with dark mode
+5.2 Story: As a user, I need natural language interaction with intelligent query routing.
+5.2.1 Task: Create chat interface with context awareness
+5.2.2 Task: Display query routing decisions and source selection
+5.2.3 Task: Show real-time execution progress through sources
+5.2.4 Task: Present unified results with source attribution
+5.2.5 Task: Highlight confidence scores and data quality
+5.2.6 Task: Implement follow-up question suggestions
+5.3 Story: As a user, I need to visualize and explore the knowledge graph and data relationships.
+5.3.1 Task: Integrate Cytoscape.js for large graph exploration
+5.3.2 Task: Implement React Flow for workflow building
+5.3.3 Task: Create schema browser with source system navigation
+5.3.4 Task: Build lineage visualization showing data flow
+5.3.5 Task: Add search and filter capabilities
+5.3.6 Task: Implement node/edge inspection panels
+5.4 Story: As a user, I need to monitor and control workflow execution.
+5.4.1 Task: Create workflow dashboard with status overview
+5.4.2 Task: Build approval queue for pending instructions
+5.4.3 Task: Implement instruction editor with validation
+5.4.4 Task: Add execution timeline with phase progress
+5.4.5 Task: Create audit trail viewer
+5.4.6 Task: Build performance analytics dashboard
+5.5 Story: As a user, I need to manage data sources and their schemas.
+5.5.1 Task: Create source system configuration interface
+5.5.2 Task: Build schema change notification center
+5.5.3 Task: Implement data quality monitoring dashboard
+5.5.4 Task: Add query performance analytics by source
+5.5.5 Task: Create cross-source entity mapping tool
+5.5.6 Task: Build source health status monitor

app_requirements/5_feature_deployment.txt DELETED Viewed

@@ -1,21 +0,0 @@
-5. Feature: Deployment & Operations
-5.1 Story: As a developer, I need a Docker Compose configuration so that the app can be deployed as a single package.
- 5.1.1 Task: Compose file includes Neo4j, API, and UI containers
- 5.1.2 Task: docker-compose up starts all services
- 5.1.3 Task: All services communicate correctly
-5.2 Story: As a system, I need monitoring and logs for the containers so that issues can be quickly diagnosed.
- 5.2.1 Task: Logs accessible from host machine
- 5.2.2 Task: Health checks for each container
- 5.2.3 Task: Alerts triggered on container failure
-5.3 Story: As a developer, I need CI/CD integration so deployments are automated and reliable.
- 5.3.1 Task: GitHub/GitLab pipeline runs tests on push
- 5.3.2 Task: Successful build auto-deploys to staging
- 5.3.3 Task: Failed build blocks deployment
-5.4 Story: As a system, I need role-based access control so customer data stays private and secure.
- 5.4.1 Task: Users authenticated before accessing data
- 5.4.2 Task: Different roles tested (admin, user, read-only)
- 5.4.3 Task: Unauthorized requests blocked

app_requirements/6_feature_MCP.txt DELETED Viewed

@@ -1,13 +0,0 @@
-6. Feature: Neo4j MCP Server Integration (Middleware Layer)
-Story 6.1: As a system, I need the Neo4j MCP server to expose graph capabilities (schema, query, write, workflow) so the LLM can interact with Neo4j safely.
- 6.1.1 Task: Configure Neo4j MCP server with available tools (get_schema, query_graph, write_graph, run_workflow)
- 6.1.2 Task: Implement adapters or extensions to map MCP calls into Cypher queries
- 6.1.3 Task: Ensure schema metadata is exposed in JSON so the LLM understands available entities/relationships
- 6.1.4 Task: Test end-to-end MCP calls against a sample Neo4j instance
-Story 6.2: As a developer, I need the Neo4j MCP server to write logs into Neo4j so that queries, results, and errors can be monitored directly within the graph.
- 6.2.1 Task: Design log schema in Neo4j (nodes/relationships for queries, responses, errors)
- 6.2.2 Task: Configure MCP server to persist function call logs into Neo4j
- 6.2.3 Task: Store execution time, query text, and results summary in logs
- 6.2.4 Task: Enable querying of logs via Cypher for observability dashboards

app_requirements/6_feature_QA.txt ADDED Viewed

	@@ -0,0 +1,27 @@

+6. Feature: Testing, Quality & Learning
+6.1 Story: As a developer, I need comprehensive testing across all system layers.
+6.1.1 Task: Unit tests for MCP server and source clients
+6.1.2 Task: Integration tests for Neo4j operations
+6.1.3 Task: End-to-end tests for complete workflows
+6.1.4 Task: Test schema change detection and handling
+6.1.5 Task: Validate cross-source query execution
+6.1.6 Task: Load test concurrent workflow execution
+6.2 Story: As a system, I need to continuously improve through learning from operations.
+6.2.1 Task: Analyze query patterns to optimize routing
+6.2.2 Task: Learn entity relationships from successful joins
+6.2.3 Task: Identify and cache frequently accessed data
+6.2.4 Task: Generate new workflow templates from patterns
+6.2.5 Task: Recommend schema optimizations
+6.2.6 Task: Build anomaly detection for data quality
+6.3 Story: As an operations team, I need monitoring of system health and effectiveness.
+6.3.1 Task: Track QA pass rates by workflow type
+6.3.2 Task: Monitor source system response times
+6.3.3 Task: Measure human intervention frequency
+6.3.4 Task: Analyze workflow completion rates
+6.3.5 Task: Create SLA compliance reports
+6.3.6 Task: Generate daily operations summary

app_requirements/7_feature_deployment.md ADDED Viewed

	@@ -0,0 +1,27 @@

+7. Feature: Deployment & Operations
+7.1 Story: As a developer, I need containerized deployment with production readiness.
+7.1.1 Task: Create multi-stage Docker builds for all services
+7.1.2 Task: Write Docker Compose for local development
+7.1.3 Task: Create Kubernetes manifests for production
+7.1.4 Task: Implement health checks and readiness probes
+7.1.5 Task: Configure resource limits and auto-scaling
+7.1.6 Task: Set up distributed tracing with OpenTelemetry
+7.2 Story: As an operations team, I need security and compliance controls.
+7.2.1 Task: Implement RBAC with fine-grained permissions
+7.2.2 Task: Add data encryption at rest and in transit
+7.2.3 Task: Create data masking for sensitive fields
+7.2.4 Task: Build compliance audit reports
+7.2.5 Task: Implement secret rotation for credentials
+7.2.6 Task: Add penetration testing to CI/CD
+7.3 Story: As a developer, I need automated CI/CD with quality gates.
+7.3.1 Task: Setup GitHub Actions for automated testing
+7.3.2 Task: Add static analysis and security scanning
+7.3.3 Task: Implement database migration automation
+7.3.4 Task: Create blue-green deployment strategy
+7.3.5 Task: Add automated rollback on failures
+7.3.6 Task: Setup monitoring with Prometheus/Grafana