Spaces:
Running
Running
Upload 2 files
Browse files
README.md
CHANGED
|
@@ -65,6 +65,29 @@ Just as a microscope enables observation of the microscopic world, CopernicusAI
|
|
| 65 |
- Automatic citation extraction and formatting
|
| 66 |
- Source validation and authenticity verification
|
| 67 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 68 |
### π€ Advanced LLM Integration
|
| 69 |
|
| 70 |
**Multi-Model Architecture:**
|
|
@@ -254,19 +277,46 @@ A centralized **metadata repository** (not a file archive) that provides:
|
|
| 254 |
|
| 255 |
## π Platform Capabilities
|
| 256 |
|
| 257 |
-
### Research Coverage
|
| 258 |
- **250+ million research papers** accessible through integrated APIs
|
| 259 |
- **8+ academic databases** integrated with parallel search
|
|
|
|
| 260 |
- **Minimum 3 sources** required per episode for quality assurance
|
| 261 |
- **Multi-paper analysis** for comprehensive coverage
|
| 262 |
|
| 263 |
### Platform Features
|
| 264 |
- **Subscriber-driven content generation** - Users prompt and create podcasts
|
|
|
|
| 265 |
- **RSS feed distribution** to major podcast platforms
|
| 266 |
- **Public and private podcast options** - Share discoveries or keep them private
|
|
|
|
| 267 |
|
| 268 |
---
|
| 269 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 270 |
## π Live Platform & Resources
|
| 271 |
|
| 272 |
### Production Deployment
|
|
@@ -537,9 +587,60 @@ This platform is designed to support grant applications to:
|
|
| 537 |
|
| 538 |
## How to Cite This Work
|
| 539 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 540 |
Welz, G. (2024β2025). *CopernicusAI: AI-Generated Audio Briefings as a Research Interface*.
|
| 541 |
Hugging Face Spaces. https://huggingface.co/spaces/garywelz/copernicusai
|
| 542 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 543 |
---
|
| 544 |
|
| 545 |
## π License & Attribution
|
|
|
|
| 65 |
- Automatic citation extraction and formatting
|
| 66 |
- Source validation and authenticity verification
|
| 67 |
|
| 68 |
+
## π¬ Methodology & Quality Assurance
|
| 69 |
+
|
| 70 |
+
### Multi-Source Validation Process
|
| 71 |
+
1. **Source Discovery:** Parallel search across 8+ academic databases (PubMed, arXiv, NASA ADS, Zenodo, bioRxiv, CORE, Google Scholar, News API)
|
| 72 |
+
2. **Quality Scoring:** Relevance ranking using citation counts, journal impact factors, recency, and peer-review status
|
| 73 |
+
3. **Minimum Requirements:** At least 3 research sources required per episode for quality assurance
|
| 74 |
+
4. **Citation Extraction:** Automated extraction with manual verification and formatting
|
| 75 |
+
5. **Content Generation:** LLM synthesis (Google Gemini 3, GPT-4, Claude 3) with source attribution at each claim
|
| 76 |
+
6. **Validation:** Manual review of sample episodes by domain experts (ongoing)
|
| 77 |
+
|
| 78 |
+
### Paradigm Shift Detection
|
| 79 |
+
- **Citation Network Analysis:** Identifies highly cited recent papers that may represent paradigm shifts
|
| 80 |
+
- **Interdisciplinary Connection Analysis:** Detects connections across domains that may indicate emerging fields
|
| 81 |
+
- **Expert Review:** Validation of identified paradigm shifts against domain expert knowledge
|
| 82 |
+
- **Temporal Analysis:** Tracks citation patterns over time to identify emerging trends
|
| 83 |
+
|
| 84 |
+
### Quality Metrics
|
| 85 |
+
- **Source Quality:** Average citation count, journal impact factors, peer-review status
|
| 86 |
+
- **Coverage:** Number of sources per topic, cross-database coverage, temporal distribution
|
| 87 |
+
- **Accuracy:** Manual validation of sample episodes by domain experts (ongoing process)
|
| 88 |
+
- **Reproducibility:** Full citation tracking enables verification of all claims
|
| 89 |
+
- **Transparency:** All source papers accessible via Research Tools Dashboard and database tables
|
| 90 |
+
|
| 91 |
### π€ Advanced LLM Integration
|
| 92 |
|
| 93 |
**Multi-Model Architecture:**
|
|
|
|
| 277 |
|
| 278 |
## π Platform Capabilities
|
| 279 |
|
| 280 |
+
### Research Coverage (As of January 2025)
|
| 281 |
- **250+ million research papers** accessible through integrated APIs
|
| 282 |
- **8+ academic databases** integrated with parallel search
|
| 283 |
+
- **23,246+ papers indexed** with full metadata and vector embeddings (dynamically growing - see [Public Project Interface](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/copernicusai-public-reviewer.html) for current count)
|
| 284 |
- **Minimum 3 sources** required per episode for quality assurance
|
| 285 |
- **Multi-paper analysis** for comprehensive coverage
|
| 286 |
|
| 287 |
### Platform Features
|
| 288 |
- **Subscriber-driven content generation** - Users prompt and create podcasts
|
| 289 |
+
- **64+ podcast episodes** generated across 5 disciplines (as of January 2025)
|
| 290 |
- **RSS feed distribution** to major podcast platforms
|
| 291 |
- **Public and private podcast options** - Share discoveries or keep them private
|
| 292 |
+
- **Knowledge Engine Dashboard:** Operational since December 2025
|
| 293 |
|
| 294 |
---
|
| 295 |
|
| 296 |
+
## β οΈ Limitations & Future Directions
|
| 297 |
+
|
| 298 |
+
### Current Limitations
|
| 299 |
+
- **Discipline Coverage:** Currently strongest in mathematics (23,246+ papers indexed); expansion to other disciplines in progress (see Ramp-Up Plan)
|
| 300 |
+
- **Process Validation:** Flowcharts are LLM-generated and benefit from expert validation for domain-specific accuracy (validation process ongoing)
|
| 301 |
+
- **Source Linking:** Not all processes yet linked to specific research papers (work in progress per Quality Standards)
|
| 302 |
+
- **Scale:** Current process database (~313 processes) represents proof-of-concept; target is 1,000+ processes
|
| 303 |
+
- **Podcast Generation:** Requires manual review for accuracy; fully automated quality assurance in development
|
| 304 |
+
- **Video Production:** Advanced video features (Phase 2+) are planned but not yet implemented
|
| 305 |
+
|
| 306 |
+
### Future Work
|
| 307 |
+
- **Expansion:** Scale to 200,000+ papers across all disciplines (see RAMP_UP_PLAN.md)
|
| 308 |
+
- **Validation:** Implement systematic peer review process for process flowcharts
|
| 309 |
+
- **Integration:** Enhanced cross-linking between processes, papers, and podcasts
|
| 310 |
+
- **Automation:** Automated source paper suggestion and linking using vector search
|
| 311 |
+
- **Quality Assurance:** Systematic validation framework for flowchart accuracy and podcast content
|
| 312 |
+
- **Video Features:** Implement advanced video production capabilities (Phase 2+)
|
| 313 |
+
- **Multi-modal Integration:** Enhanced integration of visual content, animations, and interactive elements
|
| 314 |
+
|
| 315 |
+
### Known Areas for Improvement
|
| 316 |
+
- **Bias in Source Selection:** Current system may favor highly cited papers; working to balance with recent, emerging research
|
| 317 |
+
- **Domain Expertise:** Some domains better represented than others; actively expanding coverage
|
| 318 |
+
- **Validation Coverage:** Not all content yet validated by domain experts; systematic validation in progress
|
| 319 |
+
|
| 320 |
## π Live Platform & Resources
|
| 321 |
|
| 322 |
### Production Deployment
|
|
|
|
| 587 |
|
| 588 |
## How to Cite This Work
|
| 589 |
|
| 590 |
+
### BibTeX Format
|
| 591 |
+
```bibtex
|
| 592 |
+
@article{welz2025copernicusai,
|
| 593 |
+
title={CopernicusAI: AI-Generated Audio Briefings as a Research Interface},
|
| 594 |
+
author={Welz, Gary},
|
| 595 |
+
journal={Nature Communications},
|
| 596 |
+
year={2025},
|
| 597 |
+
note={Submitted},
|
| 598 |
+
url={https://huggingface.co/spaces/garywelz/copernicusai},
|
| 599 |
+
note={Preprint available upon publication}
|
| 600 |
+
}
|
| 601 |
+
```
|
| 602 |
+
|
| 603 |
+
### Standard Citation Format
|
| 604 |
Welz, G. (2024β2025). *CopernicusAI: AI-Generated Audio Briefings as a Research Interface*.
|
| 605 |
Hugging Face Spaces. https://huggingface.co/spaces/garywelz/copernicusai
|
| 606 |
|
| 607 |
+
**Note:** When published, this citation will be updated with DOI and publication details from Nature Communications.
|
| 608 |
+
|
| 609 |
+
---
|
| 610 |
+
|
| 611 |
+
## π Data Availability
|
| 612 |
+
|
| 613 |
+
**Research Data:**
|
| 614 |
+
- **Research Paper Metadata:** Research paper metadata is publicly accessible via the [Research Tools Dashboard](https://copernicus-frontend-phzp4ie2sq-uc.a.run.app/knowledge-engine) and [Research Paper Database Table](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/papers-database-table.html). Current statistics are dynamically updated at the [Public Project Interface](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/copernicusai-public-reviewer.html).
|
| 615 |
+
- **Podcast Episodes:** All generated podcast episodes, transcripts, and metadata are accessible via the [Podcast Database](https://www.copernicusai.fyi/episodes) and [RSS Feed](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/feeds/copernicus-mvp-rss-feed.xml).
|
| 616 |
+
- **Process Flowcharts:** Process flowcharts across all disciplines are publicly available in Google Cloud Storage:
|
| 617 |
+
- [Biology Processes Database](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/biology-processes-database/biology-database-table.html)
|
| 618 |
+
- [Chemistry Processes Database](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/chemistry-processes-database/chemistry-database-table.html)
|
| 619 |
+
- [Physics Processes Database](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/physics-processes-database/physics-database-table.html)
|
| 620 |
+
- [Mathematics Processes Database](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/mathematics-processes-database/mathematics-database-table.html)
|
| 621 |
+
- [Computer Science Processes Database](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/computer-science-processes-database/computer-science-database-table.html)
|
| 622 |
+
- [GLMP Database](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/glmp-database-table.html)
|
| 623 |
+
- **Science Videos:** Video database accessible via [Science Video Database](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/videos-database-table.html) and [Live Demo](https://scienceviddb-web-204731194849.us-central1.run.app/).
|
| 624 |
+
|
| 625 |
+
**Source Code & Methodology:**
|
| 626 |
+
- **Methodology:** Fully documented in the Programming Framework paper (see [Programming Framework Space](https://huggingface.co/spaces/garywelz/programming_framework) for methodology details).
|
| 627 |
+
- **Process Generation:** LLM-powered extraction using Google Gemini 2.0 Flash, documented in Programming Framework.
|
| 628 |
+
- **Database Schemas:** Documented in project documentation files (SCHEMA_EXTENSIBILITY_GUIDE.md, UNIFIED_METADATA_SCHEMA_MASTER.md).
|
| 629 |
+
- **API Documentation:** RESTful API endpoints documented in the API Documentation section above.
|
| 630 |
+
|
| 631 |
+
**Access:**
|
| 632 |
+
- **Public Access:** All process databases, database tables, public interfaces, and podcast episodes are publicly accessible (no authentication required).
|
| 633 |
+
- **Research Tools Dashboard:** [https://copernicus-frontend-phzp4ie2sq-uc.a.run.app/knowledge-engine](https://copernicus-frontend-phzp4ie2sq-uc.a.run.app/knowledge-engine) - Interactive knowledge graph, vector search, and RAG queries (public access).
|
| 634 |
+
- **Public Project Interface:** [https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/copernicusai-public-reviewer.html](https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/copernicusai-public-reviewer.html) - Comprehensive access to all public components with dynamically updated statistics.
|
| 635 |
+
- **API Endpoints:** [https://copernicus-podcast-api-phzp4ie2sq-uc.a.run.app](https://copernicus-podcast-api-phzp4ie2sq-uc.a.run.app) - RESTful API with full documentation (see API Documentation section above).
|
| 636 |
+
|
| 637 |
+
**Reproducibility:**
|
| 638 |
+
- All process flowcharts include source citations linking to research papers.
|
| 639 |
+
- Podcast generation methodology is fully documented and reproducible.
|
| 640 |
+
- Database structures are standardized and documented.
|
| 641 |
+
- Research synthesis workflow is transparent and can be replicated.
|
| 642 |
+
- All components are publicly accessible for verification and reuse.
|
| 643 |
+
|
| 644 |
---
|
| 645 |
|
| 646 |
## π License & Attribution
|