Timothy Eastridge commited on
Commit
8648083
Β·
1 Parent(s): c3a1aee

full project scope

Browse files
app_requirements/1_feature_KG_backend.txt CHANGED
@@ -1,13 +1,29 @@
1
- 1. Feature: Knowledge Graph Backend
 
2
 
3
- 1.1 Story: As a developer, I need a Dockerized Neo4j instance so that the graph runs in a portable, consistent environment.
4
-  1.1.1 Task: Dockerfile builds successfully
5
-  1.1.2 Task: Neo4j container starts with correct version
6
-  1.1.3 Task: Database accessible on localhost with default creds
 
 
7
 
8
- 1.2 Story: As a system, I need to ingest mock data into the graph so that it can be queried and tested.
9
-  1.2.1 Task: Sample CSV/JSON loaded into graph
10
-  1.2.2 Task: Nodes and relationships appear in Neo4j browser
11
-  1.2.3 Task: Queries return expected sample data
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
 
 
1
+ 1. Feature: Neo4j Knowledge Graph Core
2
+ 1.1 Story: As a developer, I need a flexible Neo4j deployment that serves as the central nervous system for all data and metadata.
3
 
4
+ 1.1.1 Task: Create Dockerfile for Neo4j Community Edition with APOC plugins
5
+ 1.1.2 Task: Configure environment variables for deployment modes (Docker/Enterprise/Aura)
6
+ 1.1.3 Task: Set up persistent volumes for graph data and backups
7
+ 1.1.4 Task: Implement connection pooling and retry logic
8
+ 1.1.5 Task: Create migration scripts from Community β†’ Enterprise β†’ Aura
9
+ 1.1.6 Task: Configure Neo4j for vector similarity search support
10
 
11
+ 1.2 Story: As a system, I need a comprehensive graph schema that models workflows, source systems, and their relationships.
12
+
13
+ 1.2.1 Task: Create operational nodes: Workflow, Phase, Instruction, Execution, Checkpoint, HumanIntervention, MonitoringQA
14
+ 1.2.2 Task: Create metadata nodes: SourceSystem, Database, Schema, Table, Column, DataType
15
+ 1.2.3 Task: Create knowledge nodes: CrossReference, SchemaVersion, SchemaChange, DataQuality, QueryTemplate
16
+ 1.2.4 Task: Implement all relationships with cardinality constraints
17
+ 1.2.5 Task: Add vector embedding properties for similarity search
18
+ 1.2.6 Task: Create composite indexes for query performance
19
+
20
+ 1.3 Story: As a system, I need automatic schema introspection and documentation generation.
21
+
22
+ 1.3.1 Task: Build meta-queries that extract complete graph structure
23
+ 1.3.2 Task: Generate JSON Schema from Neo4j model for API contracts
24
+ 1.3.3 Task: Create GraphQL schema from Neo4j structure
25
+ 1.3.4 Task: Auto-generate API documentation with example queries
26
+ 1.3.5 Task: Implement schema versioning with migration tracking
27
+ 1.3.6 Task: Cache schema with intelligent invalidation
28
 
29
 
app_requirements/2_feature_API_integration.txt CHANGED
@@ -1,17 +1,28 @@
1
- 2. Feature: External MCP Connector
 
2
 
3
- Story 2.1: As an external organization, I want an MCP connector so I can connect my own LLM to the app and query Neo4j without needing to know Cypher.
4
-  2.1.1 Task: Define MCP tool schema for external use (get_schema, query_graph, write_graph, run_workflow)
5
-  2.1.2 Task: Package connector with documentation for external deployment
6
-  2.1.3 Task: Provide authentication mechanism for external calls
 
 
7
 
8
- Story 2.2: As an external organization, I want the MCP connector to guarantee schema alignment so my LLM always receives accurate context for queries.
9
-  2.2.1 Task: Implement schema introspection endpoint in the MCP connector
10
-  2.2.2 Task: Ensure schema updates automatically reflect in exposed MCP capabilities
11
-  2.2.3 Task: Validate with external LLMs against test datasets
12
 
13
- Story 2.3: As a developer, I want observability hooks in the MCP connector so that I can monitor usage and troubleshoot external LLM calls.
14
-  2.3.1 Task: Log each connector request and response
15
-  2.3.2 Task: Record errors and failed queries for QA review
16
-  2.3.3 Task: Expose basic usage metrics for external organizations
 
 
 
 
 
 
 
 
 
 
 
17
 
 
1
+ 2. Feature: Unified MCP Server Hub
2
+ 2.1 Story: As a system, I need a central MCP server that orchestrates all interactions between agents, Neo4j, and external sources.
3
 
4
+ 2.1.1 Task: Define core MCP tools: get_schema, query_graph, write_graph, run_workflow
5
+ 2.1.2 Task: Add orchestration tools: get_next_instruction, update_instruction, checkpoint_workflow
6
+ 2.1.3 Task: Add source tools: discover_sources, query_source, refresh_schema, get_lineage
7
+ 2.1.4 Task: Implement authentication layers (JWT internal, API key external)
8
+ 2.1.5 Task: Create permission matrix for tool access by caller type
9
+ 2.1.6 Task: Build request router that directs calls to appropriate handlers
10
 
11
+ 2.2 Story: As an external consumer, I need safe, governed access to the knowledge graph and connected sources.
 
 
 
12
 
13
+ 2.2.1 Task: Implement query sanitization and parameterization
14
+ 2.2.2 Task: Add query cost estimation and limits
15
+ 2.2.3 Task: Create result pagination for large datasets
16
+ 2.2.4 Task: Build response caching with smart invalidation
17
+ 2.2.5 Task: Implement field-level access controls
18
+ 2.2.6 Task: Generate audit trail for all external access
19
+
20
+ 2.3 Story: As a developer, I need comprehensive observability across all MCP operations.
21
+
22
+ 2.3.1 Task: Create MCP_Log nodes with full request/response capture
23
+ 2.3.2 Task: Link logs to workflows, sources, and users
24
+ 2.3.3 Task: Track metrics: latency, data volume, token usage, error rates
25
+ 2.3.4 Task: Build real-time monitoring dashboard
26
+ 2.3.5 Task: Implement alerting for anomalies and failures
27
+ 2.3.6 Task: Create performance optimization recommendations
28
 
app_requirements/3_feature_agentic_reasoning_loop.txt CHANGED
@@ -1,20 +1,38 @@
1
- 3. Feature: Agentic Reasoning & Self-Learning (via Neo4j MCP Server)
 
2
 
3
- Story 3.1: As a system, I need to use the Neo4j MCP server for all interactions with the Neo4j database so that reasoning steps are structured and controlled.
4
-  3.1.1 Task: Route all schema discovery, query, and write operations through the Neo4j MCP server
5
-  3.1.2 Task: Confirm LLM cannot issue raw Cypher queries directly
6
-  3.1.3 Task: Validate structured MCP responses feed into reasoning loop
 
 
7
 
8
- Story 3.2: As a system, I need to iteratively refine my problem-solving by generating requirements, code, and QA steps, with each step documented in Neo4j as a node linked to the prior step so that a full audit trail of learning is preserved.
9
-  3.2.1 Task: Implement workflow for entity resolution using vector embeddings + LLM review of candidates (test objective)
10
-  3.2.2 Task: Generate a requirement (e.g., β€œlink entity candidates”) and write it into Neo4j as a node
11
-  3.2.3 Task: Link each requirement node to its predecessor node to preserve chain-of-thought traceability
12
-  3.2.4 Task: Write Python script to satisfy the requirement, execute it, and record the output in Neo4j as a result node
13
-  3.2.5 Task: Generate QA requirement, store as a new node, link it to the corresponding step, and implement QA script
14
-  3.2.6 Task: Run QA cycle; if unsatisfied, ideate new requirement nodes, link them to prior steps, and repeat the loop
15
 
16
- Story 3.3: As a developer, I need the agentic loop to pause for 5 minutes between steps so that a human can edit the Neo4j node instructions before the agent proceeds.
17
-  3.3.1 Task: Implement configurable delay (default = 5 minutes) between loop phases
18
-  3.3.2 Task: Allow human edits to Neo4j requirement nodes during the pause
19
-  3.3.3 Task: Ensure the agent re-reads the latest node state after the pause before executing the next step
20
-  3.3.4 Task: Log cycle timing and human edits in Neo4j for observability
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 3. Feature: Intelligent Agent Orchestration Layer
2
+ 3.1 Story: As an agent, I need to operate entirely from graph-stored instructions for full auditability.
3
 
4
+ 3.1.1 Task: Implement instruction fetcher that queries Neo4j for next task
5
+ 3.1.2 Task: Load instruction context including parameters and dependencies
6
+ 3.1.3 Task: Check for human interventions that modify instructions
7
+ 3.1.4 Task: Update instruction status atomically with optimistic locking
8
+ 3.1.5 Task: Implement instruction timeout and retry logic
9
+ 3.1.6 Task: Validate workflow iteration limits before proceeding
10
 
11
+ 3.2 Story: As an agent, I need to execute complex multi-phase workflows with continuous learning.
 
 
 
 
 
 
12
 
13
+ 3.2.1 Task: Initialize workflows from templates or custom definitions
14
+ 3.2.2 Task: Generate requirement nodes by analyzing data sources
15
+ 3.2.3 Task: Create implementation plans based on available MCP tools
16
+ 3.2.4 Task: Execute code/queries and store results as Execution nodes
17
+ 3.2.5 Task: Run QA validations with configurable success criteria
18
+ 3.2.6 Task: Generate refinement instructions when QA fails
19
+ 3.2.7 Task: Update QueryTemplate nodes with successful patterns
20
+ 3.2.8 Task: Create checkpoints for workflow state recovery
21
+
22
+ 3.3 Story: As an operations team, I need human-in-the-loop controls for oversight and guidance.
23
+
24
+ 3.3.1 Task: Implement configurable pause points between phases
25
+ 3.3.2 Task: Create approval workflow for high-risk operations
26
+ 3.3.3 Task: Build real-time notification system for required approvals
27
+ 3.3.4 Task: Store all human edits as HumanIntervention nodes
28
+ 3.3.5 Task: Implement emergency stop with graceful state preservation
29
+ 3.3.6 Task: Add scheduled review points for long-running workflows
30
+
31
+ 3.4 Story: As an agent, I need LLM integration for reasoning, embedding generation, and natural language processing.
32
+
33
+ 3.4.1 Task: Create LLM abstraction layer supporting multiple providers
34
+ 3.4.2 Task: Implement secure credential management (vault/environment)
35
+ 3.4.3 Task: Generate and store embeddings for semantic search
36
+ 3.4.4 Task: Build similarity graph with SIMILAR_TO relationships
37
+ 3.4.5 Task: Track token usage and costs per workflow
38
+ 3.4.6 Task: Implement fallback strategies for LLM failures
app_requirements/4_feature_UI.txt DELETED
@@ -1,22 +0,0 @@
1
- 4. Feature: Front-End Chat Interface & Visualization
2
-
3
- Story 4.1: As a user, I want to enter questions into a chat interface so I can query my data in natural language without needing Cypher.
4
-  4.1.1 Task: Connect the chat input to the Neo4j MCP server so queries are routed through MCP
5
-  4.1.2 Task: Ensure MCP translates queries into Cypher and returns structured results
6
-  4.1.3 Task: Handle response errors and retries gracefully in the UI
7
-
8
- Story 4.2: As a user, I want responses displayed in clear natural language so that I can understand the results.
9
-  4.2.1 Task: Parse MCP responses into user-friendly text
10
-  4.2.2 Task: Highlight key details (nodes, relationships, counts) in the response
11
-  4.2.3 Task: Verify outputs with test queries for readability
12
-
13
- Story 4.3: As a user, I want to see supporting evidence from the graph (nodes, relationships) so that I can verify why an answer was given.
14
-  4.3.1 Task: Build R Shiny visualizations of graph substructures returned by Neo4j MCP server
15
-  4.3.2 Task: Link visual nodes and relationships directly to natural-language responses
16
-  4.3.3 Task: Allow user to toggle between text and graph visualization modes
17
-
18
- Story 4.4: As a user, I want to trigger domain-specific workflows (e.g., fraud detection, entity resolution) from the chat so that I can act on results.
19
-  4.4.1 Task: Add workflow trigger buttons in the R Shiny UI
20
-  4.4.2 Task: Ensure workflow triggers call Neo4j MCP server functions correctly
21
-  4.4.3 Task: Display confirmation and output of workflow execution in R Shiny dashboard
22
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app_requirements/4_feature_source_system_repo.txt ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 4. Feature: Source System Integration & Schema Repository
2
+ 4.1 Story: As a system, I need to connect to and catalog all available data sources through MCP.
3
+
4
+ 4.1.1 Task: Implement MCP client for PostgreSQL with full introspection
5
+ 4.1.2 Task: Implement MCP client for MySQL/MariaDB
6
+ 4.1.3 Task: Implement MCP client for MongoDB with schema inference
7
+ 4.1.4 Task: Implement MCP client for S3/filesystem with format detection
8
+ 4.1.5 Task: Implement MCP client for REST APIs with OpenAPI import
9
+ 4.1.6 Task: Create SourceSystem nodes with connection metadata
10
+
11
+ 4.2 Story: As an agent, I need to automatically discover and map data across all sources.
12
+
13
+ 4.2.1 Task: Run initial discovery to catalog all tables/collections/endpoints
14
+ 4.2.2 Task: Extract column-level metadata (types, constraints, statistics)
15
+ 4.2.3 Task: Identify primary/foreign keys and relationships
16
+ 4.2.4 Task: Sample data for profiling and example generation
17
+ 4.2.5 Task: Detect potential cross-source join keys
18
+ 4.2.6 Task: Generate and store example queries for each source
19
+
20
+ 4.3 Story: As an agent, I need to continuously monitor sources for changes.
21
+
22
+ 4.3.1 Task: Implement scheduled schema comparison workflows
23
+ 4.3.2 Task: Run lightweight heartbeat queries to detect changes
24
+ 4.3.3 Task: Create SchemaChange nodes when differences found
25
+ 4.3.4 Task: Assess impact of changes on existing workflows
26
+ 4.3.5 Task: Alert on breaking changes requiring attention
27
+ 4.3.6 Task: Update statistics and samples periodically
28
+
29
+ 4.4 Story: As an agent, I need to intelligently route queries to appropriate sources.
30
+
31
+ 4.4.1 Task: Parse user questions for entity and domain references
32
+ 4.4.2 Task: Match entities to source tables using schema repository
33
+ 4.4.3 Task: Generate source-specific queries via MCP
34
+ 4.4.4 Task: Create QueryPlan nodes showing execution strategy
35
+ 4.4.5 Task: Execute parallel queries when multiple sources needed
36
+ 4.4.6 Task: Merge and reconcile results from multiple sources
37
+
38
+ 4.5 Story: As a system, I need to track data lineage and dependencies.
39
+
40
+ 4.5.1 Task: Create lineage relationships between source and derived data
41
+ 4.5.2 Task: Store transformation logic as nodes
42
+ 4.5.3 Task: Build impact analysis queries
43
+ 4.5.4 Task: Generate data flow documentation
44
+ 4.5.5 Task: Identify redundant or conflicting data sources
45
+ 4.5.6 Task: Recommend source consolidation opportunities
46
+
app_requirements/5_feature_UI.txt ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 5. Feature: Next.js Intelligent Frontend
2
+ 5.1 Story: As a user, I need a modern web interface to interact with the system.
3
+
4
+ 5.1.1 Task: Setup Next.js with TypeScript, Tailwind, and shadcn/ui
5
+ 5.1.2 Task: Implement tRPC for type-safe API communication
6
+ 5.1.3 Task: Add WebSocket support for real-time updates
7
+ 5.1.4 Task: Create Zustand stores for state management
8
+ 5.1.5 Task: Implement NextAuth with role-based access
9
+ 5.1.6 Task: Build responsive layout with dark mode
10
+
11
+ 5.2 Story: As a user, I need natural language interaction with intelligent query routing.
12
+
13
+ 5.2.1 Task: Create chat interface with context awareness
14
+ 5.2.2 Task: Display query routing decisions and source selection
15
+ 5.2.3 Task: Show real-time execution progress through sources
16
+ 5.2.4 Task: Present unified results with source attribution
17
+ 5.2.5 Task: Highlight confidence scores and data quality
18
+ 5.2.6 Task: Implement follow-up question suggestions
19
+
20
+ 5.3 Story: As a user, I need to visualize and explore the knowledge graph and data relationships.
21
+
22
+ 5.3.1 Task: Integrate Cytoscape.js for large graph exploration
23
+ 5.3.2 Task: Implement React Flow for workflow building
24
+ 5.3.3 Task: Create schema browser with source system navigation
25
+ 5.3.4 Task: Build lineage visualization showing data flow
26
+ 5.3.5 Task: Add search and filter capabilities
27
+ 5.3.6 Task: Implement node/edge inspection panels
28
+
29
+ 5.4 Story: As a user, I need to monitor and control workflow execution.
30
+
31
+ 5.4.1 Task: Create workflow dashboard with status overview
32
+ 5.4.2 Task: Build approval queue for pending instructions
33
+ 5.4.3 Task: Implement instruction editor with validation
34
+ 5.4.4 Task: Add execution timeline with phase progress
35
+ 5.4.5 Task: Create audit trail viewer
36
+ 5.4.6 Task: Build performance analytics dashboard
37
+
38
+ 5.5 Story: As a user, I need to manage data sources and their schemas.
39
+
40
+ 5.5.1 Task: Create source system configuration interface
41
+ 5.5.2 Task: Build schema change notification center
42
+ 5.5.3 Task: Implement data quality monitoring dashboard
43
+ 5.5.4 Task: Add query performance analytics by source
44
+ 5.5.5 Task: Create cross-source entity mapping tool
45
+ 5.5.6 Task: Build source health status monitor
app_requirements/5_feature_deployment.txt DELETED
@@ -1,21 +0,0 @@
1
- 5. Feature: Deployment & Operations
2
-
3
- 5.1 Story: As a developer, I need a Docker Compose configuration so that the app can be deployed as a single package.
4
-  5.1.1 Task: Compose file includes Neo4j, API, and UI containers
5
-  5.1.2 Task: docker-compose up starts all services
6
-  5.1.3 Task: All services communicate correctly
7
-
8
- 5.2 Story: As a system, I need monitoring and logs for the containers so that issues can be quickly diagnosed.
9
-  5.2.1 Task: Logs accessible from host machine
10
-  5.2.2 Task: Health checks for each container
11
-  5.2.3 Task: Alerts triggered on container failure
12
-
13
- 5.3 Story: As a developer, I need CI/CD integration so deployments are automated and reliable.
14
-  5.3.1 Task: GitHub/GitLab pipeline runs tests on push
15
-  5.3.2 Task: Successful build auto-deploys to staging
16
-  5.3.3 Task: Failed build blocks deployment
17
-
18
- 5.4 Story: As a system, I need role-based access control so customer data stays private and secure.
19
-  5.4.1 Task: Users authenticated before accessing data
20
-  5.4.2 Task: Different roles tested (admin, user, read-only)
21
-  5.4.3 Task: Unauthorized requests blocked
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app_requirements/6_feature_MCP.txt DELETED
@@ -1,13 +0,0 @@
1
- 6. Feature: Neo4j MCP Server Integration (Middleware Layer)
2
-
3
- Story 6.1: As a system, I need the Neo4j MCP server to expose graph capabilities (schema, query, write, workflow) so the LLM can interact with Neo4j safely.
4
-  6.1.1 Task: Configure Neo4j MCP server with available tools (get_schema, query_graph, write_graph, run_workflow)
5
-  6.1.2 Task: Implement adapters or extensions to map MCP calls into Cypher queries
6
-  6.1.3 Task: Ensure schema metadata is exposed in JSON so the LLM understands available entities/relationships
7
-  6.1.4 Task: Test end-to-end MCP calls against a sample Neo4j instance
8
-
9
- Story 6.2: As a developer, I need the Neo4j MCP server to write logs into Neo4j so that queries, results, and errors can be monitored directly within the graph.
10
-  6.2.1 Task: Design log schema in Neo4j (nodes/relationships for queries, responses, errors)
11
-  6.2.2 Task: Configure MCP server to persist function call logs into Neo4j
12
-  6.2.3 Task: Store execution time, query text, and results summary in logs
13
-  6.2.4 Task: Enable querying of logs via Cypher for observability dashboards
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app_requirements/6_feature_QA.txt ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 6. Feature: Testing, Quality & Learning
2
+ 6.1 Story: As a developer, I need comprehensive testing across all system layers.
3
+
4
+ 6.1.1 Task: Unit tests for MCP server and source clients
5
+ 6.1.2 Task: Integration tests for Neo4j operations
6
+ 6.1.3 Task: End-to-end tests for complete workflows
7
+ 6.1.4 Task: Test schema change detection and handling
8
+ 6.1.5 Task: Validate cross-source query execution
9
+ 6.1.6 Task: Load test concurrent workflow execution
10
+
11
+ 6.2 Story: As a system, I need to continuously improve through learning from operations.
12
+
13
+ 6.2.1 Task: Analyze query patterns to optimize routing
14
+ 6.2.2 Task: Learn entity relationships from successful joins
15
+ 6.2.3 Task: Identify and cache frequently accessed data
16
+ 6.2.4 Task: Generate new workflow templates from patterns
17
+ 6.2.5 Task: Recommend schema optimizations
18
+ 6.2.6 Task: Build anomaly detection for data quality
19
+
20
+ 6.3 Story: As an operations team, I need monitoring of system health and effectiveness.
21
+
22
+ 6.3.1 Task: Track QA pass rates by workflow type
23
+ 6.3.2 Task: Monitor source system response times
24
+ 6.3.3 Task: Measure human intervention frequency
25
+ 6.3.4 Task: Analyze workflow completion rates
26
+ 6.3.5 Task: Create SLA compliance reports
27
+ 6.3.6 Task: Generate daily operations summary
app_requirements/7_feature_deployment.md ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 7. Feature: Deployment & Operations
2
+ 7.1 Story: As a developer, I need containerized deployment with production readiness.
3
+
4
+ 7.1.1 Task: Create multi-stage Docker builds for all services
5
+ 7.1.2 Task: Write Docker Compose for local development
6
+ 7.1.3 Task: Create Kubernetes manifests for production
7
+ 7.1.4 Task: Implement health checks and readiness probes
8
+ 7.1.5 Task: Configure resource limits and auto-scaling
9
+ 7.1.6 Task: Set up distributed tracing with OpenTelemetry
10
+
11
+ 7.2 Story: As an operations team, I need security and compliance controls.
12
+
13
+ 7.2.1 Task: Implement RBAC with fine-grained permissions
14
+ 7.2.2 Task: Add data encryption at rest and in transit
15
+ 7.2.3 Task: Create data masking for sensitive fields
16
+ 7.2.4 Task: Build compliance audit reports
17
+ 7.2.5 Task: Implement secret rotation for credentials
18
+ 7.2.6 Task: Add penetration testing to CI/CD
19
+
20
+ 7.3 Story: As a developer, I need automated CI/CD with quality gates.
21
+
22
+ 7.3.1 Task: Setup GitHub Actions for automated testing
23
+ 7.3.2 Task: Add static analysis and security scanning
24
+ 7.3.3 Task: Implement database migration automation
25
+ 7.3.4 Task: Create blue-green deployment strategy
26
+ 7.3.5 Task: Add automated rollback on failures
27
+ 7.3.6 Task: Setup monitoring with Prometheus/Grafana