Spaces:
Running
Running
Upload 2 files
Browse files
README.md
CHANGED
|
@@ -40,7 +40,8 @@ The Programming Framework serves as the **foundational meta-tool** of the Copern
|
|
| 40 |
|
| 41 |
- **GLMP (Genome Logic Modeling Project)** - First specialized application demonstrating biological process visualization
|
| 42 |
- **CopernicusAI** - Main knowledge engine integrating Framework outputs with AI podcasts and research synthesis
|
| 43 |
-
- **
|
|
|
|
| 44 |
- **Research Papers Metadata Database** - Integration for linking processes to source literature (12,000+ papers indexed)
|
| 45 |
- **Science Video Database** - Potential integration for multi-modal process explanations
|
| 46 |
|
|
@@ -84,13 +85,27 @@ First specialized application: visualizing biochemical processes like DNA replic
|
|
| 84 |
|
| 85 |
The Programming Framework has been applied across multiple scientific disciplines. Explore interactive flowchart collections organized by domain:
|
| 86 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 87 |
### 𧬠Biology
|
| 88 |
- [Biology Processes Database](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/biology-processes-database/biology-database-table.html) - Interactive database with 52 higher-level organismal processes across 8 categories (reproduction, development, behavior, defense, nutrition, sensory, transport, coordination)
|
| 89 |
-
- [GLMP Database Table](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/glmp-database-table.html) - Genome Logic Modeling Project: Biochemical/molecular processes database (
|
| 90 |
- **Note:** Biology Processes Database focuses on organismal, developmental, behavioral, and ecological processes. GLMP focuses on molecular-level biochemical processes. Together they provide comprehensive biological process coverage.
|
| 91 |
|
| 92 |
### βοΈ Chemistry
|
| 93 |
-
- [Chemistry Database Table](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/chemistry-processes-database/chemistry-database-table.html) - Interactive database with
|
| 94 |
|
| 95 |
### π’ Mathematics
|
| 96 |
- [Mathematics Database Table](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/mathematics-processes-database/mathematics-database-table.html) - Interactive database with 20 processes across 7 subcategories
|
|
@@ -101,44 +116,125 @@ The Programming Framework has been applied across multiple scientific discipline
|
|
| 101 |
### π» Computer Science
|
| 102 |
- [Computer Science Database Table](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/computer-science-processes-database/computer-science-database-table.html) - Interactive database with 21 processes across 7 subcategories
|
| 103 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
## π§ Technical Architecture
|
| 105 |
|
| 106 |
### LLM Integration
|
| 107 |
-
- Google Gemini 2.0 Flash for analysis
|
| 108 |
-
- Vertex AI for enterprise deployment
|
| 109 |
-
- Custom prompts for process extraction
|
| 110 |
-
- Structured JSON
|
|
|
|
| 111 |
|
| 112 |
### Visualization Stack
|
| 113 |
-
- Mermaid.js for flowchart
|
| 114 |
-
- JSON schema for data validation
|
| 115 |
-
- Interactive SVG output
|
| 116 |
-
-
|
| 117 |
|
| 118 |
### Data Storage
|
| 119 |
-
- Google Cloud Storage for JSON files
|
| 120 |
-
- Firestore for metadata indexing
|
| 121 |
-
- Version
|
| 122 |
-
- Cross-
|
| 123 |
|
| 124 |
### Integration Points
|
| 125 |
-
- GLMP
|
| 126 |
-
- CopernicusAI
|
| 127 |
-
- Research
|
| 128 |
-
- API
|
|
|
|
| 129 |
|
| 130 |
### How to Cite This Work
|
| 131 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 132 |
Welz, G. (2024β2025). *The Programming Framework: A Universal Method for Process Analysis*.
|
| 133 |
Hugging Face Spaces. https://huggingface.co/spaces/garywelz/programming_framework
|
| 134 |
|
| 135 |
Welz, G. (2024). *From Inspiration to AI: Biology as Visual Programming*. Medium.
|
| 136 |
https://medium.com/@garywelz_47126/from-inspiration-to-ai-biology-as-visual-programming-520ee523029a
|
| 137 |
|
|
|
|
|
|
|
| 138 |
This project serves as a foundational meta-tool for AI-assisted process analysis, enabling systematic extraction and visualization of complex logic from textual sources across diverse scientific and technical domains.
|
| 139 |
|
| 140 |
The Programming Framework is designed as infrastructure for AI-assisted science, providing a universal methodology that can be specialized for domain-specific applications.
|
| 141 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 142 |
## π Related Projects
|
| 143 |
|
| 144 |
### 𧬠GLMP - Genome Logic Modeling
|
|
|
|
| 40 |
|
| 41 |
- **GLMP (Genome Logic Modeling Project)** - First specialized application demonstrating biological process visualization
|
| 42 |
- **CopernicusAI** - Main knowledge engine integrating Framework outputs with AI podcasts and research synthesis
|
| 43 |
+
- **Research Tools Dashboard** (β
Implemented December 2025) - Fully operational web interface with knowledge graph visualization, vector search, RAG queries, and content browsing. Processes from Chemistry, Physics, Mathematics, and Computer Science are accessible through the unified dashboard. Live at: https://copernicus-frontend-phzp4ie2sq-uc.a.run.app/knowledge-engine
|
| 44 |
+
- **Public Project Interface** (β
Implemented January 2025) - Comprehensive public-facing page providing access to all CopernicusAI Knowledge Engine components. Live at: https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/copernicusai-public-reviewer.html
|
| 45 |
- **Research Papers Metadata Database** - Integration for linking processes to source literature (12,000+ papers indexed)
|
| 46 |
- **Science Video Database** - Potential integration for multi-modal process explanations
|
| 47 |
|
|
|
|
| 85 |
|
| 86 |
The Programming Framework has been applied across multiple scientific disciplines. Explore interactive flowchart collections organized by domain:
|
| 87 |
|
| 88 |
+
### Process Database Statistics (As of January 2025)
|
| 89 |
+
|
| 90 |
+
| Discipline | Processes | Subcategories | Status | Database Table |
|
| 91 |
+
|------------|-----------|---------------|--------|----------------|
|
| 92 |
+
| Biology | 52 | 8 | β
Complete | [View Database](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/biology-processes-database/biology-database-table.html) |
|
| 93 |
+
| Chemistry | 91 | 14 | β
Complete | [View Database](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/chemistry-processes-database/chemistry-database-table.html) |
|
| 94 |
+
| Physics | 21 | 7 | β
Complete | [View Database](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/physics-processes-database/physics-database-table.html) |
|
| 95 |
+
| Computer Science | 21 | 7 | β
Complete | [View Database](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/computer-science-processes-database/computer-science-database-table.html) |
|
| 96 |
+
| Mathematics | 20 | 7 | β
Complete | [View Database](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/mathematics-processes-database/mathematics-database-table.html) |
|
| 97 |
+
| GLMP (Molecular Biology) | 108 | 10+ | β
Complete | [View Database](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/glmp-database-table.html) |
|
| 98 |
+
| **Total** | **313** | **53+** | **β
Operational** | **All databases publicly accessible** |
|
| 99 |
+
|
| 100 |
+
**Note:** All processes include Mermaid flowcharts, source citations, and comprehensive metadata. See individual database tables for detailed statistics, complexity metrics, and process details. Statistics are dynamically updated - see [Public Project Interface](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/copernicusai-public-reviewer.html) for current counts.
|
| 101 |
+
|
| 102 |
### 𧬠Biology
|
| 103 |
- [Biology Processes Database](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/biology-processes-database/biology-database-table.html) - Interactive database with 52 higher-level organismal processes across 8 categories (reproduction, development, behavior, defense, nutrition, sensory, transport, coordination)
|
| 104 |
+
- [GLMP Database Table](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/glmp-database-table.html) - Genome Logic Modeling Project: Biochemical/molecular processes database (108 processes)
|
| 105 |
- **Note:** Biology Processes Database focuses on organismal, developmental, behavioral, and ecological processes. GLMP focuses on molecular-level biochemical processes. Together they provide comprehensive biological process coverage.
|
| 106 |
|
| 107 |
### βοΈ Chemistry
|
| 108 |
+
- [Chemistry Database Table](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/chemistry-processes-database/chemistry-database-table.html) - Interactive database with 91 processes across 14 subcategories
|
| 109 |
|
| 110 |
### π’ Mathematics
|
| 111 |
- [Mathematics Database Table](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/mathematics-processes-database/mathematics-database-table.html) - Interactive database with 20 processes across 7 subcategories
|
|
|
|
| 116 |
### π» Computer Science
|
| 117 |
- [Computer Science Database Table](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/computer-science-processes-database/computer-science-database-table.html) - Interactive database with 21 processes across 7 subcategories
|
| 118 |
|
| 119 |
+
## β οΈ Limitations & Future Directions
|
| 120 |
+
|
| 121 |
+
### Current Limitations
|
| 122 |
+
- **Process Validation:** Flowcharts are LLM-generated and benefit from expert validation for domain-specific accuracy (validation process ongoing)
|
| 123 |
+
- **Source Linking:** Not all processes yet linked to specific research papers (work in progress per Quality Standards)
|
| 124 |
+
- **Scale:** Current database (313 processes) represents proof-of-concept; target is 1,000+ processes
|
| 125 |
+
- **Domain Coverage:** Some disciplines better represented than others; actively expanding coverage
|
| 126 |
+
- **LLM Dependency:** Framework requires LLM access (Google Gemini 2.0 Flash); alternative models may produce different results
|
| 127 |
+
- **Complexity Limits:** Very complex processes (>100 nodes) may require manual refinement
|
| 128 |
+
|
| 129 |
+
### Future Work
|
| 130 |
+
- **Expansion:** Scale to 1,000+ processes across all disciplines (see DISCIPLINE_DATABASES_PLAN.md)
|
| 131 |
+
- **Validation:** Implement systematic peer review process for process flowcharts
|
| 132 |
+
- **Source Integration:** Enhanced linking to research papers using vector search from 23,246+ indexed papers
|
| 133 |
+
- **Automation:** Automated source paper suggestion and linking
|
| 134 |
+
- **Quality Assurance:** Systematic validation framework for flowchart accuracy
|
| 135 |
+
- **Multi-LLM Support:** Extend to support multiple LLM providers for comparison and validation
|
| 136 |
+
- **Interactive Refinement:** User interface for iterative flowchart improvement
|
| 137 |
+
|
| 138 |
+
### Known Areas for Improvement
|
| 139 |
+
- **Accuracy Validation:** Not all flowcharts yet validated by domain experts; systematic validation in progress
|
| 140 |
+
- **Source Citations:** Some processes need additional source paper citations (work in progress)
|
| 141 |
+
- **Cross-Discipline Links:** Enhanced cross-referencing between related processes across disciplines
|
| 142 |
+
|
| 143 |
## π§ Technical Architecture
|
| 144 |
|
| 145 |
### LLM Integration
|
| 146 |
+
- **Primary Model:** Google Gemini 2.0 Flash for process analysis
|
| 147 |
+
- **Deployment:** Vertex AI for enterprise-scale deployment
|
| 148 |
+
- **Prompt Engineering:** Custom prompts optimized for process extraction and structured output
|
| 149 |
+
- **Output Format:** Structured JSON with Mermaid flowchart syntax
|
| 150 |
+
- **Version:** Framework tested with Gemini 2.0 Flash; compatible with other LLMs
|
| 151 |
|
| 152 |
### Visualization Stack
|
| 153 |
+
- **Rendering Engine:** Mermaid.js for flowchart visualization
|
| 154 |
+
- **Data Validation:** JSON schema for data validation and consistency
|
| 155 |
+
- **Output Formats:** Interactive SVG output with export to PNG/PDF supported
|
| 156 |
+
- **Color Schemes:** Discipline-based color coding following Programming Framework standards
|
| 157 |
|
| 158 |
### Data Storage
|
| 159 |
+
- **Primary Storage:** Google Cloud Storage for JSON process files
|
| 160 |
+
- **Metadata Indexing:** Firestore for metadata indexing and search
|
| 161 |
+
- **Version Control:** Git for code and documentation versioning
|
| 162 |
+
- **Cross-Referencing:** Integration with research papers database (23,246+ papers indexed)
|
| 163 |
|
| 164 |
### Integration Points
|
| 165 |
+
- **GLMP:** Specialized biological process collections
|
| 166 |
+
- **CopernicusAI:** Knowledge graph integration for unified exploration
|
| 167 |
+
- **Research Papers Database:** Cross-linking with 23,246+ indexed papers
|
| 168 |
+
- **API Endpoints:** Programmatic access for integration with other systems
|
| 169 |
+
- **Research Tools Dashboard:** Unified interface for exploring processes alongside papers and other content
|
| 170 |
|
| 171 |
### How to Cite This Work
|
| 172 |
|
| 173 |
+
#### BibTeX Format
|
| 174 |
+
```bibtex
|
| 175 |
+
@article{welz2025programming,
|
| 176 |
+
title={The Programming Framework: A General Method for Process Analysis Using LLMs and Mermaid Visualization},
|
| 177 |
+
author={Welz, Gary},
|
| 178 |
+
journal={Nature Communications},
|
| 179 |
+
year={2025},
|
| 180 |
+
note={Submitted},
|
| 181 |
+
url={https://huggingface.co/spaces/garywelz/programming_framework},
|
| 182 |
+
note={Preprint available upon publication}
|
| 183 |
+
}
|
| 184 |
+
```
|
| 185 |
+
|
| 186 |
+
#### Standard Citation Format
|
| 187 |
Welz, G. (2024β2025). *The Programming Framework: A Universal Method for Process Analysis*.
|
| 188 |
Hugging Face Spaces. https://huggingface.co/spaces/garywelz/programming_framework
|
| 189 |
|
| 190 |
Welz, G. (2024). *From Inspiration to AI: Biology as Visual Programming*. Medium.
|
| 191 |
https://medium.com/@garywelz_47126/from-inspiration-to-ai-biology-as-visual-programming-520ee523029a
|
| 192 |
|
| 193 |
+
**Note:** When published, this citation will be updated with DOI and publication details from Nature Communications.
|
| 194 |
+
|
| 195 |
This project serves as a foundational meta-tool for AI-assisted process analysis, enabling systematic extraction and visualization of complex logic from textual sources across diverse scientific and technical domains.
|
| 196 |
|
| 197 |
The Programming Framework is designed as infrastructure for AI-assisted science, providing a universal methodology that can be specialized for domain-specific applications.
|
| 198 |
|
| 199 |
+
## π Data Availability
|
| 200 |
+
|
| 201 |
+
**Research Data:**
|
| 202 |
+
- **Process Flowcharts:** All process flowcharts are publicly available in Google Cloud Storage with interactive database tables:
|
| 203 |
+
- [Biology Processes Database](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/biology-processes-database/biology-database-table.html) - 52 processes across 8 subcategories
|
| 204 |
+
- [Chemistry Processes Database](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/chemistry-processes-database/chemistry-database-table.html) - 91 processes across 14 subcategories
|
| 205 |
+
- [Physics Processes Database](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/physics-processes-database/physics-database-table.html) - 21 processes across 7 subcategories
|
| 206 |
+
- [Mathematics Processes Database](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/mathematics-processes-database/mathematics-database-table.html) - 20 processes across 7 subcategories
|
| 207 |
+
- [Computer Science Processes Database](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/computer-science-processes-database/computer-science-database-table.html) - 21 processes across 7 subcategories
|
| 208 |
+
- [GLMP Database](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/glmp-database-table.html) - 108+ molecular biology processes
|
| 209 |
+
- **Process Metadata:** Each process includes JSON metadata with Mermaid flowchart syntax, source citations, complexity metrics, and related process links.
|
| 210 |
+
- **Current Statistics:** Dynamically updated statistics available at [Public Project Interface](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/copernicusai-public-reviewer.html).
|
| 211 |
+
|
| 212 |
+
**Source Code & Methodology:**
|
| 213 |
+
- **Methodology:** Fully documented in this README and the Programming Framework paper (submitted to Nature Communications).
|
| 214 |
+
- **Process Generation:** LLM-powered extraction using Google Gemini 2.0 Flash via Vertex AI, with custom prompts for process extraction and structured JSON output formatting.
|
| 215 |
+
- **Visualization:** Mermaid.js-based flowchart generation with JSON schema for data validation.
|
| 216 |
+
- **Data Format:** Standardized JSON structure documented in project files (see Technical Architecture section).
|
| 217 |
+
- **Database Schemas:** Process database schemas and metadata structures documented in project documentation.
|
| 218 |
+
|
| 219 |
+
**Access:**
|
| 220 |
+
- **Public Access:** All process databases and database tables are publicly accessible (no authentication required).
|
| 221 |
+
- **Individual Process Viewers:** Each process has a dedicated viewer accessible via links in database tables.
|
| 222 |
+
- **Research Tools Dashboard:** Processes are integrated into the [Research Tools Dashboard](https://copernicus-frontend-phzp4ie2sq-uc.a.run.app/knowledge-engine) for unified exploration alongside research papers and other content.
|
| 223 |
+
- **Hugging Face Spaces:** Framework documentation and examples available at [Programming Framework Space](https://huggingface.co/spaces/garywelz/programming_framework).
|
| 224 |
+
|
| 225 |
+
**Reproducibility:**
|
| 226 |
+
- All process flowcharts include source citations linking to research papers used to create each flowchart.
|
| 227 |
+
- Methodology is fully documented and can be replicated using Google Gemini 2.0 Flash or compatible LLMs.
|
| 228 |
+
- JSON schema and data structures are standardized and documented.
|
| 229 |
+
- Process generation workflow is transparent: input (textual process description) β LLM analysis β Mermaid flowchart generation β JSON storage.
|
| 230 |
+
- All components are publicly accessible for verification, reuse, and extension to other domains.
|
| 231 |
+
|
| 232 |
+
**Process Database Statistics:**
|
| 233 |
+
- **Total Processes:** 313+ validated processes across 6 databases
|
| 234 |
+
- **Disciplines Covered:** Biology, Chemistry, Physics, Mathematics, Computer Science, Molecular Biology (GLMP)
|
| 235 |
+
- **Validation:** 100% syntax accuracy, β₯85% metadata quality, all processes include source citations
|
| 236 |
+
- **Format:** All processes stored as JSON files with Mermaid flowchart syntax, publicly accessible via Google Cloud Storage
|
| 237 |
+
|
| 238 |
## π Related Projects
|
| 239 |
|
| 240 |
### 𧬠GLMP - Genome Logic Modeling
|