diff --git "a/index.html" "b/index.html" --- "a/index.html" +++ "b/index.html" @@ -3,11 +3,12 @@
-Knowledge Engine for Scientific Discovery
-- A collaborative research platform that transforms cutting-edge scientific research into accessible, - multi-format tools for collective knowledge exploration. These are research instruments—like microscopes - for observing the collective knowledge of humanity—enabling hypothesis formation, testing, and discovery - across scientific disciplines. +
A Universal Method for Process Analysis
++ Combining Large Language Models with Mermaid visualization to dissect and understand + complex processes across any discipline—from biology to business, physics to psychology.
- CopernicusAI is an operational research platform that synthesizes scientific literature from 250+ million papers into AI-generated podcasts, integrates with a knowledge graph of 23,246 indexed papers, and provides collaborative tools for research discovery. The system demonstrates production-ready multi-source research synthesis with full citation tracking and evidence-based content generation requiring minimum 3 research sources per episode. + The Programming Framework is a universal meta-tool for analyzing complex processes across any discipline by combining Large Language Models (LLMs) with visual flowchart representation. The Framework transforms textual process descriptions into structured, interactive Mermaid flowcharts stored as JSON, enabling systematic analysis, visualization, and integration with knowledge systems. +
++ Successfully demonstrated through GLMP (Genome Logic Modeling Project) with 100+ regulatory-process flowcharts, and extended to mathematics (algorithms plus axiomatic dependency graphs), chemistry, physics, and computer science. The Framework serves as the foundational methodology for the CopernicusAI Knowledge Engine.
-- The platform includes a fully operational Research Tools Dashboard (deployed December 2025) with interactive knowledge graph visualization, vector search, and RAG capabilities, enabling researchers to explore, query, and synthesize scientific knowledge across disciplines. +
+ Foundational typology (2026): + GLMP Foundational Typology + (Primitive Relations and Computational Complexity) bridges the public + Algorithms and Axiomatic Theories table and the + GLMP database table + (regulatory algorithms). This Space remains the hub for interactive viewers; the GCS tables are the authoritative machine-readable indices.
- The CopernicusAI Knowledge Engine systematically transforms information into knowledge through integrated capabilities. At its core, a knowledge engine is any system—biological or artificial—that systematically transforms information into knowledge, performing work by converting raw materials (information) into useful outputs (knowledge, understanding, insights). -
-- The system architecture demonstrates the integration of data ingestion, processing, storage, and query capabilities across multiple modalities—research papers, process descriptions, and media content—enabling comprehensive knowledge discovery and synthesis. -
+ +
- - Figure: Knowledge Engine Architecture - Data flow from ingestion through processing and storage to query interfaces +
+ The Programming Framework represents prior work that demonstrates a novel methodology for analyzing complex processes by combining Large Language Models (LLMs) with visual flowchart representation. This research establishes a universal, domain-agnostic approach to process analysis that transforms textual descriptions into structured, interactive visualizations.
- Multi-source acquisition from academic databases (PubMed, arXiv, NASA ADS), literature sources (textbooks, reviews), and educational content (videos, transcripts), with quality assessment and type classification. -
-- LLM-powered entity extraction and process logic extraction, structured data storage (JSON metadata, Mermaid flowcharts, transcripts), and specialized databases for papers, processes, and media. -
+- Multiple access interfaces including RAG queries, vector search, knowledge graph visualization, API endpoints, and web interfaces, converging to unified knowledge output. -
+- CopernicusAI is an active research prototype exploring AI-generated audio briefings as an interface for assisted scientific research. -
-- The system allows any user to generate, refine, and share AI-generated science podcasts based on structured prompts, enabling rapid orientation to a topic, iterative deepening, and personalized research briefings. -
-- Rather than functioning as a static content platform, CopernicusAI supports collectively generated and shared research artifacts, analogous to community-driven knowledge platforms (e.g., discussion forums), but grounded in scientific sources and metadata-aware workflows. +
+ The Programming Framework serves as the foundational meta-tool of the CopernicusAI Knowledge Engine, providing the underlying methodology that enables specialized applications:
-- The Research Tools Dashboard is fully operational and deployed to Google Cloud Run, providing unified access to all components with interactive knowledge graph visualization, vector search, RAG queries, and content browsing. -
-- See the "Knowledge Engine Ecosystem" section below for details. +
+ This work establishes a proof-of-concept for AI-assisted process analysis, demonstrating how LLMs can systematically extract and visualize complex logic from textual sources across diverse domains.
- Inspired by Nicolaus Copernicus who challenged accepted knowledge with evidence and rigorous analysis, - CopernicusAI creates collaborative research tools that enable collective participation in - scientific discovery. These platforms are instruments for exploring humanity's collective knowledge—tools for - hypothesis formation, testing, and collaborative research, not just educational content. -
-- Just as a microscope enables observation of the microscopic world, CopernicusAI tools enable observation and - exploration of humanity's collective knowledge. Subscribers collaborate to prompt, generate, and refine research - content—sharing discoveries publicly or keeping them private. As large language models (LLMs) and AI systems - gain unprecedented knowledge, CopernicusAI provides the infrastructure for human-AI collaborative knowledge - exploration, with evidence-based truth-seeking as our guiding principle. -
-- An integrated ecosystem of research and collaboration tools designed to assist scientists in their workflow, - from research discovery through knowledge synthesis to multi-format content generation. - - View Public Project Interface → - -
- -Synthesis & distribution platform for AI-powered research briefing podcast generation
- Visit Website → -Foundational meta-tool for universal process analysis across disciplines
- Explore → -Mermaid markdown format flowcharts modeling 100+ biochemical processes in Yeast and E. Coli
- Explore → -Core data infrastructure for research paper metadata and citation networks
- Explore → -Multi-modal content with transcript-based search for scientific videos
- Explore → -✅ Prototype web interface for testing knowledge graph, vector search, RAG queries, and content browsing
- Live System → -
- Collaborative research platform where subscribers prompt and generate multi-voice AI podcasts
- (5-10 minutes) synthesizing research from multiple academic sources. Subscribers can share their
- podcasts publicly or keep them private. Evidence-based content generation requiring minimum 3
- research sources per episode.
+
+
+ The Programming Framework is a meta-tool—a tool for creating tools. It provides a
+ systematic method for analyzing any complex process by combining the analytical power of Large Language
+ Models with the clarity of visual flowcharts.
+
+ Complex processes—whether biological, computational, or organizational—are difficult to
+ understand because they involve many steps, decision points, and interactions. Traditional
+ descriptions in text are hard to follow.
Multi-model architecture with intelligent model selection:
+ Use LLMs to extract process logic from literature, then encode it as Mermaid flowcharts
+ stored in JSON. Result: Clear, interactive visualizations that reveal hidden patterns and
+ enable systematic analysis.
+
- Comprehensive academic database coverage with 250+ million research papers accessible
- through integrated APIs.
- Provide scientific papers, documentation, or process descriptions AI extracts steps, decisions, branches, and logic flow Create Mermaid diagram encoded as JSON structure Interactive flowchart reveals insights and enables refinement
- Operating Audio Podcast System: Full production and distribution platform for subscriber-generated
- podcasts. Users can prompt, generate, publish, and distribute audio podcasts with RSS feed support for
- Spotify, Apple Podcasts, and Google Podcasts.
- Advanced video features planned for future development:
- See: Science Video Database - Companion project for research video content management.
- Input:
+ "DNA replication begins when the origin recognition complex (ORC) binds to DNA replication origins. This triggers the loading of the MCM2-7 helicase complex, which unwinds the DNA double helix. DNA polymerases then synthesize new strands using the unwound strands as templates..."
+ LLM Analysis:
+ Extracts 15 steps, identifies 3 decision points (origin recognition, helicase loading, polymerase binding), recognizes 4 key enzymes (ORC, MCM2-7, DNA polymerase, ligase), and maps regulatory checkpoints.
+ Output:
+ Mermaid flowchart with 25 nodes, 28 edges, 3 decision gates, properly colored using the 5-color scheme (red for inputs, yellow for structures, green for operations, blue for intermediates, violet for products), stored as structured JSON enabling interactive visualization and programmatic access.
+
- A centralized metadata repository (not a file archive) providing structured JSON objects
- with AI-powered preprocessing.
- Color Legend:
- The system requires a minimum of 3 research sources per podcast episode. Each source is:
-
+ Works across any field: biology, chemistry, software engineering, business processes,
+ legal workflows, manufacturing, and beyond.
+
- The system automatically extracts and formats citations from research papers:
+
+
+ Start with rough analysis, visualize, identify gaps, refine with LLM, repeat until
+ the process logic is crystal clear.
- The system uses LLM analysis to identify paradigm-shifting research by:
+
+
+ JSON storage enables programmatic access, version control, cross-referencing,
+ and integration with other tools and databases.
+ The Programming Framework has been applied across multiple scientific disciplines. Explore interactive flowchart collections organized by domain:
+
+ GLMP regulatory flowcharts (gene circuits, logic gates) are indexed in the GLMP table; higher-level organismal processes use the Biology Processes database. Both use the same Mermaid idiom as the mathematics corpus (see the 2026 typology paper).
+
+ GLMP table: sortable metadata and viewers for 100+ regulatory processes ·
+ GLMP Space
+
+ Hugging Face batch pages are being reorganized. The live chemistry index and metadata live on Google Cloud Storage.
+
+ Growing collection; see the table for current counts and subcategories.
+ · Local batch index (preview)
+
+ Algorithms and Axiomatic Theories: procedural flowcharts plus axiom–theorem dependency graphs, indexed in the mathematics processes database (see the 2026 foundational typology paper).
+
+ Local batch index on this Space ·
+ Working paper
+
+ Hugging Face batch pages are under construction. The physics database table on GCS remains the primary index.
+
+ Local batch index (preview)
+
+ Hugging Face batch pages are under construction. The computer science database table on GCS remains the primary index.
+
+ Local batch index (preview)
+
- These platforms enable collective participation and collaboration across diverse user communities:
-
- Like a microscope enables observation of the microscopic world, these tools enable observation and
- exploration of humanity's collective knowledge.
-
- This platform represents prior work that demonstrates foundational research and development
- achievements in AI-powered scientific knowledge synthesis, collaborative research tools, and multi-modal content
- generation. These contributions establish the technical foundation and proof-of-concept for the broader
- CopernicusAI Knowledge Engine initiative.
-
- This platform serves as the core synthesis and distribution component of the CopernicusAI Knowledge Engine.
- The Knowledge Engine is an integrated ecosystem of research and collaboration tools that work together to assist scientists
- in their workflow, from research discovery through knowledge synthesis to multi-format content generation.
- 100% of published flowcharts render without Mermaid syntax errors >=85% average quality score across all processes (exceeds NSF requirements) All processes include 1-3 verified research paper citations with accessible links
- The Knowledge Engine is designed to grow and evolve. Additional tools, databases, and collaboration components
- will be added as the project develops, expanding capabilities for AI-assisted scientific research and knowledge discovery.
-
- For Grant Proposals (NSF/DOE):
- Welz, G. (2025). CopernicusAI: Knowledge Engine for Scientific Discovery. Hugging Face Space. https://huggingface.co/spaces/garywelz/copernicusai Live Platform: https://www.copernicusai.fyi BibTeX Format:
+ Regulatory “algorithms” for microbial circuits—indexed in the
+ GLMP database table,
+ with interactive viewers on the GLMP Space.
+
+ Knowledge engine integrating the Programming Framework with AI podcasts, research papers,
+ and knowledge graph for scientific discovery.
+
- Welz, G. (2024–2025). CopernicusAI: AI-Generated Audio Briefings as a Research Interface. BibTeX Format:
+ Welz, G. (2024–2025). The Programming Framework: A Universal Method for Process Analysis. BibTeX Format:
- This platform is designed to support grant applications to:
- National Science Foundation - Science education and research infrastructure Department of Energy - Scientific computing and data science AI research and development initiatives
- The CopernicusAI Knowledge Engine is an integrated ecosystem of research and collaboration tools.
- The Research Tools Dashboard is now fully operational (December 2025) with a working web interface providing unified access to all components.
+
+ Welz, G. (2024). From Inspiration to AI: Biology as Visual Programming.
- Fully operational web interface with knowledge graph visualization (23,246 papers), vector search, RAG queries, and content browsing.
-
- Foundational meta-tool for universal process analysis across any discipline
-
- First application of Programming Framework to biology - 50+ biological processes visualized
-
- Core data infrastructure for structured research paper metadata and citation networks
-
- Multi-modal content component with transcript-based search for scientific videos
- Base URL: POST /api/papers/query API uses Bearer token authentication. Include in request headers: Standard rate limits apply: 100 requests/minute per API key. Contact for higher limits. Current version: v1.0. API is stable and backward-compatible.
+ This project serves as a foundational meta-tool for AI-assisted process analysis, enabling systematic extraction and visualization of complex logic from textual sources across diverse scientific and technical domains.
+
+ The Programming Framework is designed as infrastructure for AI-assisted science, providing a universal methodology that can be specialized for domain-specific applications.
+ 🎯 What is the Programming Framework?
+ 🔍 The Problem
+ Key Features:
-
-
- Research Integration:
-
-
- Advanced LLM Integration
- Primary Models:
-
-
- Capabilities:
-
-
- ✨ The Solution
+ Research Resource Access
- Academic Databases:
-
-
- ⚙️ How It Works
+
+ Input Process
+ LLM Analysis
+ Generate Flowchart
+ Visualize & Iterate
+ Audio and Video Podcast Production
- Current Audio Capabilities (Operational):
-
-
- Video Production (Future - Phase 2+):
-
-
- 📝 Concrete Example:
+ Research Papers Metadata Database (Phase 2)
- Structured JSON Objects:
-
-
- AI-Powered Preprocessing:
-
-
- 📊 Live Interactive Example:
+ 🔬 Methodology & System Design
-
- Multi-Source Validation Process
-
-
- Quality Assurance Mechanisms
-
-
- 💡 Core Principles
+
+ Domain Agnostic
+ Citation Extraction & Verification
- Iterative Refinement
+
-
Paradigm Shift Detection Implementation
- Structured Data
+
-
⚙️ Technology Stack
+ 📚 Process Diagram Collections
+ AI & Machine Learning
-
-
+
+ 🧬 Biology
+
+ Backend Infrastructure
-
-
+
+
+
+ ⚗️ Chemistry
+ Under construction
+
+ Frontend
-
-
+
+
+
+ 🔢 Mathematics
+
+
+ ⚛️ Physics
+ Under construction
+
+
+ 💻 Computer Science
+ Under construction
+
+ 🔍 Limitations & Future Directions
+ ⚙️ Technical Architecture
- Current Limitations
-
-
🤖 LLM Integration
+
+
Future Development
-
-
📊 Visualization Stack
+
+
🔬 Collaborative Research Tools
-
- Collaborative Research Tools
-
-
💾 Data Storage
+
+
- Key Innovations
-
-
🔗 Integration Points
+
+
📚 Prior Work & Research Contributions
+ ✅ Validation & Accuracy
- Overview
- 🔬 Research Contributions
+ 🔍 Quality Assurance Process
-
⚙️ Technical Achievements
+ 📊 Scale & Coverage
-
🎯 Position Within CopernicusAI Knowledge Engine
- Current Components:
-
-
-
-
+ 🎯 Accuracy Measures
+ Syntax Accuracy
+ Metadata Completeness
+ Source Coverage
+ Future Development:
- 📖 Citation Information
-
- @misc{welz2025copernicusai,
- title={CopernicusAI: Knowledge Engine for Scientific Discovery},
- author={Welz, Gary},
- year={2025},
- url={https://huggingface.co/spaces/garywelz/copernicusai},
- note={Hugging Face Space, Live Platform: https://www.copernicusai.fyi}
-}⚠️ Known Limitations
+
+
📊 Data Availability Statement
-
- Platform Access
-
-
- https://copernicus-podcast-api-phzp4ie2sq-uc.a.run.appData & Code Availability
-
-
+ 🔗 Related Projects
+
+ 🧬 GLMP - Genome Logic Modeling
+ Reproducibility Information
-
-
+
+ 🔬 CopernicusAI
+ How to Cite This Work
-
- Hugging Face Spaces. https://huggingface.co/spaces/garywelz/copernicusai
- @misc{welz2025copernicusai,
- title={CopernicusAI: AI-Generated Audio Briefings as a Research Interface},
+
+ Hugging Face Spaces. https://huggingface.co/spaces/garywelz/programming_framework (opens in new tab)
+
- @misc{welz2025programmingframework,
+ title={The Programming Framework: A Universal Method for Process Analysis},
author={Welz, Gary},
year={2024--2025},
- url={https://huggingface.co/spaces/garywelz/copernicusai},
- note={Hugging Face Space}
+ url={https://huggingface.co/spaces/garywelz/programming_framework},
+ note={Hugging Face Spaces}
}🌐 Grant Support & Collaboration
-
- Grant Applications Supported
- NSF
- DOE
- SAIR Foundation
- Collaboration Opportunities
-
-
- 🔗 Live Platform & Resources
-
- 🌐 Production Deployment
-
- 🧩 Knowledge Engine Components
-
+ Medium. https://medium.com/@garywelz_47126/from-inspiration-to-ai-biology-as-visual-programming-520ee523029a (opens in new tab)
✅ Research Tools Dashboard (Implemented)
-
-
- Research Tools Dashboard → (opens in new tab)
-
-
-
🔌 API Documentation
- https://copernicus-podcast-api-phzp4ie2sq-uc.a.run.appPodcast Generation
-
-
- Research Endpoints
-
-
- Admin Endpoints
-
-
- 📝 Example Request
-
- {
- "discipline": "biology",
- "keywords": ["DNA replication", "cell cycle"],
- "date_range": {
- "start": "2020-01-01",
- "end": "2025-01-01"
- },
- "limit": 10
-}📤 Example Response
-
- {
- "status": "success",
- "count": 10,
- "papers": [
- {
- "id": "pmid_12345678",
- "title": "Mechanisms of DNA Replication...",
- "authors": ["Smith, J.", "Doe, A."],
- "journal": "Nature",
- "year": 2023,
- "doi": "10.1038/s41586-023-01234",
- "abstract": "..."
- }
- ]
-}🔐 Authentication
-
- Authorization: Bearer YOUR_API_TOKEN⚡ Rate Limits
- 📚 API Version
-