diff --git "a/index.html" "b/index.html" --- "a/index.html" +++ "b/index.html" @@ -3,11 +3,12 @@ - CopernicusAI - Research-Driven Podcast Generation Platform + The Programming Framework - Universal Process Analysis + @@ -30,14 +24,12 @@
-
🔬
-

CopernicusAI

-

Knowledge Engine for Scientific Discovery

-

- A collaborative research platform that transforms cutting-edge scientific research into accessible, - multi-format tools for collective knowledge exploration. These are research instruments—like microscopes - for observing the collective knowledge of humanity—enabling hypothesis formation, testing, and discovery - across scientific disciplines. +

🛠️
+

The Programming Framework

+

A Universal Method for Process Analysis

+

+ Combining Large Language Models with Mermaid visualization to dissect and understand + complex processes across any discipline—from biology to business, physics to psychology.

@@ -45,695 +37,522 @@
-
+

📋 Summary

- CopernicusAI is an operational research platform that synthesizes scientific literature from 250+ million papers into AI-generated podcasts, integrates with a knowledge graph of 23,246 indexed papers, and provides collaborative tools for research discovery. The system demonstrates production-ready multi-source research synthesis with full citation tracking and evidence-based content generation requiring minimum 3 research sources per episode. + The Programming Framework is a universal meta-tool for analyzing complex processes across any discipline by combining Large Language Models (LLMs) with visual flowchart representation. The Framework transforms textual process descriptions into structured, interactive Mermaid flowcharts stored as JSON, enabling systematic analysis, visualization, and integration with knowledge systems. +

+

+ Successfully demonstrated through GLMP (Genome Logic Modeling Project) with 100+ regulatory-process flowcharts, and extended to mathematics (algorithms plus axiomatic dependency graphs), chemistry, physics, and computer science. The Framework serves as the foundational methodology for the CopernicusAI Knowledge Engine.

-

- The platform includes a fully operational Research Tools Dashboard (deployed December 2025) with interactive knowledge graph visualization, vector search, and RAG capabilities, enabling researchers to explore, query, and synthesize scientific knowledge across disciplines. +

+ Foundational typology (2026): + GLMP Foundational Typology + (Primitive Relations and Computational Complexity) bridges the public + Algorithms and Axiomatic Theories table and the + GLMP database table + (regulatory algorithms). This Space remains the hub for interactive viewers; the GCS tables are the authoritative machine-readable indices.

- -
-
-

🏗️ Knowledge Engine Architecture

-

- The CopernicusAI Knowledge Engine systematically transforms information into knowledge through integrated capabilities. At its core, a knowledge engine is any system—biological or artificial—that systematically transforms information into knowledge, performing work by converting raw materials (information) into useful outputs (knowledge, understanding, insights). -

-

- The system architecture demonstrates the integration of data ingestion, processing, storage, and query capabilities across multiple modalities—research papers, process descriptions, and media content—enabling comprehensive knowledge discovery and synthesis. -

+ +
+
+

📚 Prior Work & Research Contributions

-
- Knowledge Engine Architecture Diagram showing data ingestion, processing, storage, and query layers -

- Figure: Knowledge Engine Architecture - Data flow from ingestion through processing and storage to query interfaces +

+

Overview

+

+ The Programming Framework represents prior work that demonstrates a novel methodology for analyzing complex processes by combining Large Language Models (LLMs) with visual flowchart representation. This research establishes a universal, domain-agnostic approach to process analysis that transforms textual descriptions into structured, interactive visualizations.

-
-
-

📥 Data Ingestion

-

- Multi-source acquisition from academic databases (PubMed, arXiv, NASA ADS), literature sources (textbooks, reviews), and educational content (videos, transcripts), with quality assessment and type classification. -

-
- -
-

⚙️ Processing & Storage

-

- LLM-powered entity extraction and process logic extraction, structured data storage (JSON metadata, Mermaid flowcharts, transcripts), and specialized databases for papers, processes, and media. -

+
+
+

🔬 Research Contributions

+
    +
  • Universal Process Analysis: Domain-agnostic methodology applicable across multiple fields
  • +
  • LLM-Powered Extraction: Automated extraction using Google Gemini 2.0 Flash
  • +
  • Structured Visualization: Mermaid.js-based flowchart generation encoded as JSON
  • +
  • Iterative Refinement: Systematic approach enabling continuous improvement
  • +
  • Scale Demonstration: Applied to 313+ processes across 5 disciplines (Biology: 52, Chemistry: 91, Physics: 21, Computer Science: 21, Mathematics: 20, GLMP: 109)
  • +
  • Validation: Successfully processes complex biological, chemical, and computational workflows with high accuracy
  • +
-
-

🔍 Query & Output

-

- Multiple access interfaces including RAG queries, vector search, knowledge graph visualization, API endpoints, and web interfaces, converging to unified knowledge output. -

+
+

⚙️ Technical Achievements

+
    +
  • Meta-Tool Architecture: Framework for creating specialized analysis tools
  • +
  • JSON-Based Storage: Structured format enabling version control and API integration
  • +
  • Multi-Domain Application: Successfully applied to biological processes (GLMP)
  • +
  • Integration Framework: Designed for knowledge engines and collaborative platforms
  • +
-
-
- -
-
-

Prior Work & Current Status

- -
-

Prior Work (2024-2025)

-

- CopernicusAI is an active research prototype exploring AI-generated audio briefings as an interface for assisted scientific research. -

-

- The system allows any user to generate, refine, and share AI-generated science podcasts based on structured prompts, enabling rapid orientation to a topic, iterative deepening, and personalized research briefings. -

-

- Rather than functioning as a static content platform, CopernicusAI supports collectively generated and shared research artifacts, analogous to community-driven knowledge platforms (e.g., discussion forums), but grounded in scientific sources and metadata-aware workflows. +

+

🎯 Position Within CopernicusAI Knowledge Engine

+

+ The Programming Framework serves as the foundational meta-tool of the CopernicusAI Knowledge Engine, providing the underlying methodology that enables specialized applications:

-
-

This work demonstrates technical feasibility for:

+
    -
  • • AI-assisted research briefing and orientation
  • -
  • • Iterative question refinement via conversational interfaces
  • -
  • • Integration of text, audio, and metadata in research workflows
  • +
  • • GLMP (Genome Logic Modeling Project)
  • +
  • • CopernicusAI (main knowledge engine)
  • +
  • • Research Papers Metadata Database
  • +
+
    +
  • • Science Video Database
  • +
  • • Multi-domain process analysis
-
- -
-

Current Implementation (December 2025)

-

- The Research Tools Dashboard is fully operational and deployed to Google Cloud Run, providing unified access to all components with interactive knowledge graph visualization, vector search, RAG queries, and content browsing. -

-

- See the "Knowledge Engine Ecosystem" section below for details. +

+ This work establishes a proof-of-concept for AI-assisted process analysis, demonstrating how LLMs can systematically extract and visualize complex logic from textual sources across diverse domains.

- -
-
-

🎯 Mission & Vision

-

- Inspired by Nicolaus Copernicus who challenged accepted knowledge with evidence and rigorous analysis, - CopernicusAI creates collaborative research tools that enable collective participation in - scientific discovery. These platforms are instruments for exploring humanity's collective knowledge—tools for - hypothesis formation, testing, and collaborative research, not just educational content. -

-

- Just as a microscope enables observation of the microscopic world, CopernicusAI tools enable observation and - exploration of humanity's collective knowledge. Subscribers collaborate to prompt, generate, and refine research - content—sharing discoveries publicly or keeping them private. As large language models (LLMs) and AI systems - gain unprecedented knowledge, CopernicusAI provides the infrastructure for human-AI collaborative knowledge - exploration, with evidence-based truth-seeking as our guiding principle. -

-
-
- - -
-
-

🧩 CopernicusAI Knowledge Engine

-

- An integrated ecosystem of research and collaboration tools designed to assist scientists in their workflow, - from research discovery through knowledge synthesis to multi-format content generation. - - View Public Project Interface → - -

- -
-
-
🎙️
-

CopernicusAI Podcast Generation

-

Synthesis & distribution platform for AI-powered research briefing podcast generation

- Visit Website → -
- -
-
🛠️
-

Programming Framework

-

Foundational meta-tool for universal process analysis across disciplines

- Explore → -
- -
-
🧬
-

Genome Logic Modeling Project

-

Mermaid markdown format flowcharts modeling 100+ biochemical processes in Yeast and E. Coli

- Explore → -
- -
-
📚
-

Research Paper Database

-

Core data infrastructure for research paper metadata and citation networks

- Explore → -
- -
-
🎬
-

Science Video Database

-

Multi-modal content with transcript-based search for scientific videos

- Explore → -
- -
-
🗺️
-

Research Tools Dashboard

-

✅ Prototype web interface for testing knowledge graph, vector search, RAG queries, and content browsing

- Live System → -
+ +
+
+
+
Any
+
Discipline
-
-
- - -
-
-
-
23,246
-
Research Papers
-
Indexed in Knowledge Engine (As of January 2025)
+
+
LLM
+
Powered
-
-
314
-
Processes
-
Visualized across 6 databases (As of January 2025)
+
+
Visual
+
Flowcharts
-
-
753
-
Videos
-
Science videos indexed (As of January 2025)
-
-
-
79
-
Podcasts
-
Generated across 5 disciplines (As of January 2025)
+
+
JSON
+
Structured Data
- -
-

🌟 Core Platform Capabilities

- -
- -
-
- 🎙️ -
-

AI-Powered Podcast Generation

-

- Collaborative research platform where subscribers prompt and generate multi-voice AI podcasts - (5-10 minutes) synthesizing research from multiple academic sources. Subscribers can share their - podcasts publicly or keep them private. Evidence-based content generation requiring minimum 3 - research sources per episode. + +

+
+

🎯 What is the Programming Framework?

+
+

+ The Programming Framework is a meta-tool—a tool for creating tools. It provides a + systematic method for analyzing any complex process by combining the analytical power of Large Language + Models with the clarity of visual flowcharts. +

+ +
+
+

🔍 The Problem

+

+ Complex processes—whether biological, computational, or organizational—are difficult to + understand because they involve many steps, decision points, and interactions. Traditional + descriptions in text are hard to follow.

-
-
-

Key Features:

-
    -
  • ✓ Comprehensive research integration (8+ databases)
  • -
  • ✓ Professional multi-speaker dialogue
  • -
  • ✓ AI-generated scientific visualizations
  • -
  • ✓ RSS feed distribution
  • -
  • ✓ Quality scoring & relevance ranking
  • -
  • ✓ Paradigm shift identification
  • -
-
-
-

Research Integration:

-
    -
  • ✓ Real-time discovery from 8+ APIs
  • -
  • ✓ Parallel search across databases
  • -
  • ✓ Automatic citation extraction
  • -
  • ✓ Source validation & verification
  • -
  • ✓ Interdisciplinary connection analysis
  • -
-
-
-
-
- -
-
- 🤖 -
-

Advanced LLM Integration

-

Multi-model architecture with intelligent model selection:

-
-
-

Primary Models:

-
    -
  • Google Gemini 3 - Latest research analysis and content generation
  • -
  • OpenAI GPT-4/GPT-3.5 - Content synthesis and quality validation
  • -
  • Anthropic Claude 3 (Sonnet, Haiku) - Alternative reasoning paths
  • -
  • ElevenLabs TTS - Multi-voice text-to-speech synthesis
  • -
-
-
-

Capabilities:

-
    -
  • • Multi-paper analysis & synthesis
  • -
  • • Paradigm shift detection
  • -
  • • Entity extraction (genes, proteins, compounds)
  • -
  • • Citation tracking & cross-references
  • -
  • • Content quality scoring
  • -
-
-
+
+

✨ The Solution

+

+ Use LLMs to extract process logic from literature, then encode it as Mermaid flowcharts + stored in JSON. Result: Clear, interactive visualizations that reveal hidden patterns and + enable systematic analysis. +

+
+
- -
-
- 📊 -
-

Research Resource Access

-

- Comprehensive academic database coverage with 250+ million research papers accessible - through integrated APIs. -

-
-
-

Academic Databases:

-
    -
  • • PubMed/NCBI (~30+ million papers)
  • -
  • • arXiv (~2+ million preprints)
  • -
  • • NASA ADS (~15+ million papers)
  • -
  • • Zenodo (100K+ datasets)
  • -
  • • bioRxiv/medRxiv (preprints)
  • -
  • • CORE (~200+ million papers)
  • -
  • • Google Scholar (comprehensive)
  • -
  • • News API (current events)
  • -
  • • YouTube Data API (academic videos)
  • -
-
-
-
+ +
+
+

⚙️ How It Works

+ +
+
+
1️⃣
+

Input Process

+

Provide scientific papers, documentation, or process descriptions

+
+ +
+
2️⃣
+

LLM Analysis

+

AI extracts steps, decisions, branches, and logic flow

+
+ +
+
3️⃣
+

Generate Flowchart

+

Create Mermaid diagram encoded as JSON structure

+
+ +
+
4️⃣
+

Visualize & Iterate

+

Interactive flowchart reveals insights and enables refinement

- -
-
- 🎙️ -
-

Audio and Video Podcast Production

-

- Operating Audio Podcast System: Full production and distribution platform for subscriber-generated - podcasts. Users can prompt, generate, publish, and distribute audio podcasts with RSS feed support for - Spotify, Apple Podcasts, and Google Podcasts. -

-
-

Current Audio Capabilities (Operational):

-
    -
  • ✓ Multi-voice AI podcast generation
  • -
  • ✓ Research-driven content creation
  • -
  • ✓ RSS feed distribution
  • -
  • ✓ Public and private podcast options
  • -
  • ✓ Professional audio quality
  • -
-
-
-

Video Production (Future - Phase 2+):

-

Advanced video features planned for future development:

-
    -
  • Visual Content Integration: Automated extraction from papers, web scraping, JSON database integration
  • -
  • Dynamic Visualizations: Scientific animations, real-time charts, LaTeX rendering
  • -
  • External Video Quoting: YouTube segment extraction with attribution & fair use compliance
  • -
  • Advanced Composition: Multi-layer video, auto subtitles, text overlays, professional transitions
  • -
-

- See: Science Video Database - Companion project for research video content management. -

-
-
+
+

📝 Concrete Example:

+
+

Input:

+

+ "DNA replication begins when the origin recognition complex (ORC) binds to DNA replication origins. This triggers the loading of the MCM2-7 helicase complex, which unwinds the DNA double helix. DNA polymerases then synthesize new strands using the unwound strands as templates..." +

+

LLM Analysis:

+

+ Extracts 15 steps, identifies 3 decision points (origin recognition, helicase loading, polymerase binding), recognizes 4 key enzymes (ORC, MCM2-7, DNA polymerase, ligase), and maps regulatory checkpoints. +

+

Output:

+

+ Mermaid flowchart with 25 nodes, 28 edges, 3 decision gates, properly colored using the 5-color scheme (red for inputs, yellow for structures, green for operations, blue for intermediates, violet for products), stored as structured JSON enabling interactive visualization and programmatic access. +

- -
-
- 📚 -
-

Research Papers Metadata Database (Phase 2)

-

- A centralized metadata repository (not a file archive) providing structured JSON objects - with AI-powered preprocessing. -

-
-
-

Structured JSON Objects:

-
    -
  • • DOI, arXiv ID, publication info
  • -
  • • Abstracts & key findings
  • -
  • • Extracted entities (genes, proteins, compounds, equations)
  • -
  • • Citation networks & cross-references
  • -
  • • Paradigm shift indicators
  • -
  • • Quality scores & relevance metrics
  • -
-
-
-

AI-Powered Preprocessing:

-
    -
  • • LLM-based entity extraction
  • -
  • • Automatic categorization
  • -
  • • Keyword extraction & semantic tagging
  • -
  • • Citation tracking & mapping
  • -
  • • Quality assessment
  • -
  • • RESTful API access
  • -
-
-
+
+

📊 Live Interactive Example:

+
+ graph TD + A[Complex Process Input] --> B{LLM Analysis} + B -->|Extract Logic| C[Identify Steps] + B -->|Extract Decisions| D[Identify Branches] + C --> E[Create Flowchart Nodes] + D --> F[Create Decision Points] + E --> G[Generate Mermaid Syntax] + F --> G + G --> H[Store as JSON] + H --> I[Interactive Visualization] + I --> J{Insights Gained?} + J -->|No| K[Refine Analysis] + J -->|Yes| L[Apply Knowledge] + K --> B + + style A fill:#ff6b6b,color:#fff + style B fill:#74c0fc,color:#fff + style C fill:#51cf66,color:#fff + style D fill:#51cf66,color:#fff + style E fill:#ffd43b,color:#000 + style F fill:#ffd43b,color:#000 + style G fill:#51cf66,color:#fff + style H fill:#74c0fc,color:#fff + style I fill:#74c0fc,color:#fff + style J fill:#74c0fc,color:#fff + style K fill:#51cf66,color:#fff + style L fill:#b197fc,color:#fff +
+
+

Color Legend:

+
+ Red - Triggers & Inputs + Yellow - Structures & Objects + Green - Processing & Operations + Blue - Intermediates & States + Violet - Products & Outputs
- +
-
-

🔬 Methodology & System Design

- -
-
-

Multi-Source Validation Process

-

- The system requires a minimum of 3 research sources per podcast episode. Each source is: -

-
    -
  • • Retrieved from authoritative academic databases (PubMed, arXiv, NASA ADS, etc.)
  • -
  • • Validated for authenticity and publication status
  • -
  • • Scored for quality and relevance to the research topic
  • -
  • • Cross-referenced to verify consistency and eliminate conflicting information
  • -
  • • Processed through parallel API queries for comprehensive coverage
  • -
-
- -
-

Quality Assurance Mechanisms

-
    -
  • Source Verification: Automated checking of DOI, arXiv IDs, and publication metadata
  • -
  • Relevance Scoring: LLM-based assessment of paper relevance to query
  • -
  • Paradigm Shift Detection: Identification of revolutionary vs. incremental research
  • -
  • Citation Extraction: Automatic extraction and formatting of citations
  • -
  • Content Validation: Multi-model verification (Gemini, GPT-4, Claude) for accuracy
  • -
-
+

💡 Core Principles

+ +
+
+
🌍
+

Domain Agnostic

+

+ Works across any field: biology, chemistry, software engineering, business processes, + legal workflows, manufacturing, and beyond. +

- -
-

Citation Extraction & Verification

-

- The system automatically extracts and formats citations from research papers: + +

+
🔄
+

Iterative Refinement

+

+ Start with rough analysis, visualize, identify gaps, refine with LLM, repeat until + the process logic is crystal clear.

-
    -
  • • DOI resolution and metadata enrichment
  • -
  • • arXiv ID parsing and preprint identification
  • -
  • • Author, title, and publication information extraction
  • -
  • • Cross-reference linking between related papers
  • -
  • • Citation network analysis for relationship mapping
  • -
- -
-

Paradigm Shift Detection Implementation

-

- The system uses LLM analysis to identify paradigm-shifting research by: + +

+
📦
+

Structured Data

+

+ JSON storage enables programmatic access, version control, cross-referencing, + and integration with other tools and databases.

-
    -
  • • Analyzing citation patterns and impact metrics
  • -
  • • Detecting novel methodologies or breakthrough discoveries
  • -
  • • Comparing against established knowledge frameworks
  • -
  • • Identifying interdisciplinary connections and cross-domain insights
  • -
  • • Flagging research that challenges existing paradigms
  • -
- + +
-
-

⚙️ Technology Stack

+
+

📚 Process Diagram Collections

+

+ The Programming Framework has been applied across multiple scientific disciplines. Explore interactive flowchart collections organized by domain: +

-
-
-

AI & Machine Learning

-
    -
  • • Google Gemini 3
  • -
  • • Google Vertex AI (model orchestration)
  • -
  • • OpenAI GPT-4/GPT-3.5
  • -
  • • Anthropic Claude 3
  • -
  • • ElevenLabs TTS
  • -
  • • DALL-E 3
  • -
  • • Cloud Vision API
  • -
  • • Video Intelligence API
  • -
+
+ +
+

+ 🧬 Biology +

+

+ GLMP regulatory flowcharts (gene circuits, logic gates) are indexed in the GLMP table; higher-level organismal processes use the Biology Processes database. Both use the same Mermaid idiom as the mathematics corpus (see the 2026 typology paper). +

+ +

+ GLMP table: sortable metadata and viewers for 100+ regulatory processes · + GLMP Space +

- -
-

Backend Infrastructure

-
    -
  • • FastAPI (Python)
  • -
  • • Google Cloud Run
  • -
  • • Firestore (NoSQL)
  • -
  • • Cloud Storage
  • -
  • • Cloud Functions
  • -
  • • Cloud Tasks
  • -
  • • Secret Manager
  • -
+ + +
+

+ ⚗️ Chemistry + Under construction +

+

+ Hugging Face batch pages are being reorganized. The live chemistry index and metadata live on Google Cloud Storage. +

+ + 🗄️ Chemistry Database Table → + +

+ Growing collection; see the table for current counts and subcategories. + · Local batch index (preview) +

- -
-

Frontend

-
    -
  • • Next.js 15.5.7
  • -
  • • Alpine.js
  • -
  • • Tailwind CSS
  • -
  • • Vercel
  • -
+ + +
+

+ 🔢 Mathematics +

+

+ Algorithms and Axiomatic Theories: procedural flowcharts plus axiom–theorem dependency graphs, indexed in the mathematics processes database (see the 2026 foundational typology paper). +

+ + 🗄️ Mathematics Database Table → + +

+ Local batch index on this Space · + Working paper +

+
+ + +
+

+ ⚛️ Physics + Under construction +

+

+ Hugging Face batch pages are under construction. The physics database table on GCS remains the primary index. +

+ + 🗄️ Physics Database Table → + +

+ Local batch index (preview) +

+
+ + +
+

+ 💻 Computer Science + Under construction +

+

+ Hugging Face batch pages are under construction. The computer science database table on GCS remains the primary index. +

+ + 🗄️ Computer Science Database Table → + +

+ Local batch index (preview) +

- +
-
-

🔍 Limitations & Future Directions

+
+

⚙️ Technical Architecture

-
-
-

Current Limitations

-
    -
  • Discipline Coverage: Currently indexing 23,246 papers across multiple disciplines; expansion to additional disciplines in progress
  • -
  • Source Bias: Coverage depends on database API availability and open access policies
  • -
  • LLM Accuracy: Content generation relies on LLM accuracy; multi-source validation mitigates but doesn't eliminate errors
  • -
  • Real-Time Updates: Knowledge graph updates require manual or scheduled processing cycles
  • -
  • Language: Currently optimized for English-language research papers
  • +
    +
    +

    🤖 LLM Integration

    +
      +
    • • Google Gemini 2.0 Flash for analysis
    • +
    • • Vertex AI for enterprise deployment
    • +
    • • Custom prompts for process extraction
    • +
    • • Structured JSON output formatting
    - -
    -

    Future Development

    -
      -
    • Multi-Discipline Expansion: Expanding knowledge graph to Biology, Chemistry, Physics, Computer Science
    • -
    • Process Databases: Creating comprehensive flowchart databases for all 5 disciplines (~50 processes each)
    • -
    • Advanced Video Features: Dynamic visualizations, animations, and multi-layer composition
    • -
    • Multi-Language Support: Extending to non-English research papers
    • -
    • Enhanced Validation: Peer review mechanisms and user feedback integration
    • -
    • Real-Time Updates: Automated continuous knowledge graph updates
    • + +
      +

      📊 Visualization Stack

      +
        +
      • • Mermaid.js for flowchart rendering
      • +
      • • JSON schema for data validation
      • +
      • • Interactive SVG output
      • +
      • • Export to PNG/PDF supported
      -
    -
    -
- -
-
-

🔬 Collaborative Research Tools

- -
-

Collaborative Research Tools

-

- These platforms enable collective participation and collaboration across diverse user communities: -

-
    -
  • Researchers - Tools for hypothesis formation and testing, cross-disciplinary synthesis
  • -
  • Collaborators - Collective knowledge exploration and refinement
  • -
  • Subscribers - Prompt, generate, and share podcasts (public or private)
  • -
  • Community - User suggestions, comments, and collaborative flowchart improvement (GLMP)
  • +

    💾 Data Storage

    +
      +
    • • Google Cloud Storage for JSON files
    • +
    • • Firestore for metadata indexing
    • +
    • • Version control with Git
    • +
    • • Cross-referencing with papers database
    -

    - Like a microscope enables observation of the microscopic world, these tools enable observation and - exploration of humanity's collective knowledge. -

- +
-

Key Innovations

-
    -
  • • Multi-source validation (min 3 sources)
  • -
  • • Evidence-based generation
  • -
  • • Paradigm shift detection
  • -
  • • Interdisciplinary connections
  • -
  • • Multiple expertise levels
  • -
  • • Full citation tracking
  • +

    🔗 Integration Points

    +
      +
    • • GLMP specialized collections
    • +
    • • CopernicusAI knowledge graph
    • +
    • • Research papers database
    • +
    • • API endpoints for programmatic access
- +
-
-

📚 Prior Work & Research Contributions

+
+

✅ Validation & Accuracy

-
-

Overview

-

- This platform represents prior work that demonstrates foundational research and development - achievements in AI-powered scientific knowledge synthesis, collaborative research tools, and multi-modal content - generation. These contributions establish the technical foundation and proof-of-concept for the broader - CopernicusAI Knowledge Engine initiative. -

-
-
-

🔬 Research Contributions

+

🔍 Quality Assurance Process

    -
  • AI-Powered Research Synthesis: Production system for multi-source research synthesis using LLMs
  • -
  • Multi-Model Architecture: Intelligent model selection with Gemini 3, GPT-4, Claude 3
  • -
  • Collaborative Platform: Subscriber-driven content generation with public/private sharing
  • -
  • Knowledge Engine Integration: Architecture for Research Papers DB, Video DB, GLMP, Framework
  • +
  • Automated Validation: All flowcharts validated for Mermaid syntax correctness before publication
  • +
  • Metadata Quality Checks: JSON schema validation ensures >=85% metadata completeness (NSF standard)
  • +
  • Source Citation Verification: All processes include verified research paper citations with DOI/PubMed links
  • +
  • Cross-Reference Validation: Automated checks ensure discipline links and back-references are correct
  • +
  • Color Scheme Consistency: All processes follow standardized 5-color scheme for visual consistency
-

⚙️ Technical Achievements

+

📊 Scale & Coverage

    -
  • 250+ Million Papers: Accessible via 8+ integrated academic databases
  • -
  • 79 Episodes: Generated across 5 scientific disciplines
  • -
  • Production Deployment: Live platform with operational API and RSS distribution
  • -
  • Scalable Architecture: Serverless microservices on Google Cloud
  • +
  • 314 Processes Validated: Successfully applied across 6 discipline databases (Biology, Chemistry, Physics, CS, Mathematics, GLMP)
  • +
  • Multi-Domain Testing: Framework validated on biological pathways, chemical reactions, computational algorithms, and mathematical proofs
  • +
  • Iterative Refinement: Processes refined through multiple LLM analysis cycles to improve accuracy
  • +
  • User Feedback Integration: Community feedback mechanism enables continuous improvement (see "Improve this process" on each flowchart)
  • +
  • Expert Validation: GLMP processes validated against established biochemical pathway databases
-

🎯 Position Within CopernicusAI Knowledge Engine

-

- This platform serves as the core synthesis and distribution component of the CopernicusAI Knowledge Engine. - The Knowledge Engine is an integrated ecosystem of research and collaboration tools that work together to assist scientists - in their workflow, from research discovery through knowledge synthesis to multi-format content generation. -

-
-

Current Components:

-
-
    -
  • 1. CopernicusAI (This platform) - Core synthesis & distribution
  • -
  • 2. Programming Framework - Foundational meta-tool
  • -
  • 3. GLMP - Biological process visualization
  • -
-
    -
  • 4. Research Paper Metadata Database - Data infrastructure
  • -
  • 5. Science Video Database - Multi-modal content
  • -
+

🎯 Accuracy Measures

+
+
+

Syntax Accuracy

+

100% of published flowcharts render without Mermaid syntax errors

+
+
+

Metadata Completeness

+

>=85% average quality score across all processes (exceeds NSF requirements)

+
+
+

Source Coverage

+

All processes include 1-3 verified research paper citations with accessible links

-
-
-

Future Development:

-

- The Knowledge Engine is designed to grow and evolve. Additional tools, databases, and collaboration components - will be added as the project develops, expanding capabilities for AI-assisted scientific research and knowledge discovery. -

-
-

📖 Citation Information

-

- For Grant Proposals (NSF/DOE): -

-
-

Welz, G. (2025). CopernicusAI: Knowledge Engine for Scientific Discovery.

-

Hugging Face Space. https://huggingface.co/spaces/garywelz/copernicusai

-

Live Platform: https://www.copernicusai.fyi

-
-
-

BibTeX Format:

-
@misc{welz2025copernicusai,
-  title={CopernicusAI: Knowledge Engine for Scientific Discovery},
-  author={Welz, Gary},
-  year={2025},
-  url={https://huggingface.co/spaces/garywelz/copernicusai},
-  note={Hugging Face Space, Live Platform: https://www.copernicusai.fyi}
-}
-
+
+

⚠️ Known Limitations

+
    +
  • LLM-Dependent Accuracy: Flowchart accuracy depends on LLM interpretation of source material; complex processes may require multiple refinement cycles
  • +
  • Domain Expertise Required: While the Framework is domain-agnostic, optimal results benefit from domain-specific knowledge for validation
  • +
  • Source Material Quality: Accuracy is limited by the quality and completeness of input source material
  • +
  • Continuous Improvement: Framework is actively refined based on user feedback and validation results
  • +
- +
-
-

📊 Data Availability Statement

- -
-

Platform Access

- -
- -
-

Data & Code Availability

-
    -
  • Hugging Face Spaces: All components accessible at https://huggingface.co/garywelz (opens in new tab)
  • -
  • Process Flowcharts (GLMP): JSON files stored in Google Cloud Storage, accessible via GLMP Database Table (opens in new tab)
  • -
  • Research Paper Metadata: 23,246 indexed papers with metadata accessible through Research Tools Dashboard
  • -
  • API Documentation: RESTful API endpoints available for programmatic access (see API Documentation section)
  • -
+

🔗 Related Projects

+ +
+
+

🧬 GLMP - Genome Logic Modeling

+

+ Regulatory “algorithms” for microbial circuits—indexed in the + GLMP database table, + with interactive viewers on the GLMP Space. +

+ + Explore GLMP → (opens in new tab) +
- -
-

Reproducibility Information

-
    -
  • Technology Stack: All technologies and versions documented in Technology Stack section
  • -
  • LLM Models: Google Gemini 3, OpenAI GPT-4/GPT-3.5, Anthropic Claude 3 (versions specified in documentation)
  • -
  • Source Citations: All podcast episodes include full citations to source papers
  • -
  • Metadata: Complete metadata for all generated content available through API
  • -
  • License: MIT License - see license information in space metadata
  • -
+ +
+

🔬 CopernicusAI

+

+ Knowledge engine integrating the Programming Framework with AI podcasts, research papers, + and knowledge graph for scientific discovery. +

+ + Visit CopernicusAI → (opens in new tab) +
@@ -742,245 +561,37 @@

How to Cite This Work

-
-

- Welz, G. (2024–2025). CopernicusAI: AI-Generated Audio Briefings as a Research Interface.
- Hugging Face Spaces. https://huggingface.co/spaces/garywelz/copernicusai -

- -
-

BibTeX Format:

-
@misc{welz2025copernicusai,
-  title={CopernicusAI: AI-Generated Audio Briefings as a Research Interface},
+            
+
+

+ Welz, G. (2024–2025). The Programming Framework: A Universal Method for Process Analysis.
+ Hugging Face Spaces. https://huggingface.co/spaces/garywelz/programming_framework (opens in new tab) +

+
+

BibTeX Format:

+
@misc{welz2025programmingframework,
+  title={The Programming Framework: A Universal Method for Process Analysis},
   author={Welz, Gary},
   year={2024--2025},
-  url={https://huggingface.co/spaces/garywelz/copernicusai},
-  note={Hugging Face Space}
+  url={https://huggingface.co/spaces/garywelz/programming_framework},
+  note={Hugging Face Spaces}
 }
-
-
-
-
- - -
-
-

🌐 Grant Support & Collaboration

- -
-

Grant Applications Supported

-

- This platform is designed to support grant applications to: -

-
-
-

NSF

-

National Science Foundation - Science education and research infrastructure

-
-
-

DOE

-

Department of Energy - Scientific computing and data science

-
-
-

SAIR Foundation

-

AI research and development initiatives

-
- -
-

Collaboration Opportunities

-
    -
  • • Integration with academic institutions
  • -
  • • Partnership with research organizations
  • -
  • • Open data initiatives
  • -
  • • Educational program development
  • -
-
-
-
- - -
-
-

🔗 Live Platform & Resources

- -
- - -
-

🧩 Knowledge Engine Components

-

- The CopernicusAI Knowledge Engine is an integrated ecosystem of research and collaboration tools. - The Research Tools Dashboard is now fully operational (December 2025) with a working web interface providing unified access to all components. +

+

+ Welz, G. (2024). From Inspiration to AI: Biology as Visual Programming.
+ Medium. https://medium.com/@garywelz_47126/from-inspiration-to-ai-biology-as-visual-programming-520ee523029a (opens in new tab)

-
-

✅ Research Tools Dashboard (Implemented)

-

- Fully operational web interface with knowledge graph visualization (23,246 papers), vector search, RAG queries, and content browsing. -

- - Public Project Interface → (opens in new tab) - -
- - Research Tools Dashboard → (opens in new tab) - -
-
-
-
- - -
-
-

🔌 API Documentation

-

Base URL: https://copernicus-podcast-api-phzp4ie2sq-uc.a.run.app

- -
-
-

Podcast Generation

-
    -
  • POST /generate-podcast-with-subscriber
  • -
  • GET /api/subscribers/podcasts/{id}
  • -
  • POST /api/subscribers/podcasts/submit-to-rss
  • -
-
- -
-

Research Endpoints

-
    -
  • POST /api/papers/upload
  • -
  • GET /api/papers/{paper_id}
  • -
  • POST /api/papers/query
  • -
  • POST /api/papers/{id}/link-podcast/{id}
  • -
-
- -
-

Admin Endpoints

-
    -
  • GET /api/admin/subscribers
  • -
  • POST /api/admin/podcasts/fix-missing-titles
  • -
  • GET /api/admin/podcasts/catalog
  • -
-
-
- -
-

📝 Example Request

-
-

POST /api/papers/query

-
{
-  "discipline": "biology",
-  "keywords": ["DNA replication", "cell cycle"],
-  "date_range": {
-    "start": "2020-01-01",
-    "end": "2025-01-01"
-  },
-  "limit": 10
-}
-
- -

📤 Example Response

-
-
{
-  "status": "success",
-  "count": 10,
-  "papers": [
-    {
-      "id": "pmid_12345678",
-      "title": "Mechanisms of DNA Replication...",
-      "authors": ["Smith, J.", "Doe, A."],
-      "journal": "Nature",
-      "year": 2023,
-      "doi": "10.1038/s41586-023-01234",
-      "abstract": "..."
-    }
-  ]
-}
-
- -
-

🔐 Authentication

-

API uses Bearer token authentication. Include in request headers:

-
Authorization: Bearer YOUR_API_TOKEN
-
- -
-

⚡ Rate Limits

-

Standard rate limits apply: 100 requests/minute per API key. Contact for higher limits.

-
- -
-

📚 API Version

-

Current version: v1.0. API is stable and backward-compatible.

-
+
+

+ This project serves as a foundational meta-tool for AI-assisted process analysis, enabling systematic extraction and visualization of complex logic from textual sources across diverse scientific and technical domains. +

+

+ The Programming Framework is designed as infrastructure for AI-assisted science, providing a universal methodology that can be specialized for domain-specific applications. +

@@ -988,10 +599,15 @@
-

CopernicusAI - Advancing Scientific Knowledge

-

Built with Google Cloud, Gemini AI, OpenAI, Anthropic Claude, and ElevenLabs

-

© 2025 CopernicusAI. All rights reserved.

+

The Programming Framework

+

A Universal Method for Process Analysis

+

© 2025 Gary Welz. All rights reserved.

+ + +