Spaces:
Running
Running
| title: Research Paper Metadata Database | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: static | |
| pinned: false | |
| license: apache-2.0 | |
| # π Research Paper Metadata Database | |
| A centralized metadata repository for scientific research papers, designed to enable AI-powered visualization and analysis of research structure with the goal of expanding research in interesting, useful, and practical ways. | |
| ## π Prior Work & Research Contributions | |
| ### Overview | |
| The Research Paper Metadata Database represents **prior work** that demonstrates the creation of a structured metadata repository for scientific research papers. This project establishes a foundation for using AI tools to visualize and analyze the structure of scientific research, enabling systematic exploration of research patterns, citation networks, and interdisciplinary connections. | |
| ### π¬ Research Contributions | |
| - **Structured Metadata Repository:** Centralized database of research paper metadata (not a file archive) | |
| - **AI-Powered Preprocessing:** LLM-based entity extraction and annotation for research papers | |
| - **Citation Network Analysis:** Cross-reference linking and relationship mapping between papers | |
| - **Integration Framework:** Designed for integration with CopernicusAI Knowledge Engine components | |
| ### βοΈ Technical Achievements | |
| - **JSON-Based Storage:** Structured metadata format enabling programmatic access and analysis | |
| - **Entity Extraction:** Automated extraction of genes, proteins, chemical compounds, equations, and key concepts | |
| - **Quality Assessment:** Automated quality scoring and relevance metrics for research papers | |
| - **API Architecture:** RESTful API design for external access and integration | |
| ### π― Position Within CopernicusAI Knowledge Engine | |
| The Research Paper Metadata Database serves as a **core data infrastructure component** of the CopernicusAI Knowledge Engine, providing: | |
| - **Foundation for Knowledge Graph Construction:** Structured metadata enables relationship mapping - **β Now Fully Operational** (December 2025) with 12,000+ mathematics papers indexed, interactive knowledge graph visualization, and relationship extraction (citations, semantic similarity, categories) | |
| - **Research Tools Dashboard** (β Implemented December 2025) - Fully operational web interface providing unified access to research papers through knowledge graph visualization, vector search, RAG queries, and content browsing. Live at: https://copernicus-frontend-phzp4ie2sq-uc.a.run.app/knowledge-engine | |
| - **Vector Search:** Semantic search using Vertex AI embeddings across papers, podcasts, and processes | |
| - **RAG System:** Retrieval-augmented generation with citation support and multi-modal content integration | |
| - **Integration with AI Podcast Generation:** Links research papers to generated podcast content | |
| - **Support for GLMP:** Provides source paper references for biological process visualizations | |
| - **Science Video Database Integration:** Potential linking between papers and related video content | |
| - **Programming Framework Support:** Supplies structured data for process analysis applications | |
| This work establishes a proof-of-concept for AI-assisted research metadata management, demonstrating how structured data can enable systematic analysis and visualization of scientific research patterns. The Knowledge Engine now provides a fully operational system for exploring research papers through multiple interfaces. | |
| ## π― Project Goals | |
| This project creates a database of scientific research paper metadata for the purpose of: | |
| - Using AI tools to visualize and analyze the structure of scientific research | |
| - Expanding research in interesting, useful, and practical ways | |
| - Enabling systematic exploration of research patterns and connections | |
| - Supporting knowledge graph construction and semantic search | |
| ## π§ Technical Architecture | |
| ### Metadata Structure | |
| - **DOI, arXiv ID, Publication Information:** Standard identifiers and publication details | |
| - **Abstracts and Key Findings:** Extracted summaries and main contributions | |
| - **Extracted Entities:** Genes, proteins, chemical compounds, equations, mathematical concepts | |
| - **Citation Networks:** Cross-references and relationship mapping | |
| - **Paradigm Shift Indicators:** Flags for revolutionary vs. incremental research | |
| - **Interdisciplinary Connections:** Links between different research domains | |
| - **Quality Scores:** Relevance metrics and validation scores | |
| ### AI-Powered Preprocessing | |
| - LLM-based entity extraction and annotation | |
| - Automatic categorization by discipline and subdomain | |
| - Keyword extraction and semantic tagging | |
| - Citation tracking and relationship mapping | |
| - Quality assessment and validation | |
| ### Integration Features | |
| - DOI/arXiv ID resolution and metadata enrichment | |
| - Cross-reference linking between papers | |
| - Podcast-to-paper relationship tracking | |
| - Search and query capabilities | |
| - API access for programmatic retrieval | |
| ## π Related Projects | |
| - [CopernicusAI](https://huggingface.co/spaces/garywelz/copernicusai) - Main knowledge engine integrating metadata with AI podcasts | |
| - [GLMP](https://huggingface.co/spaces/garywelz/glmp) - Genome Logic Modeling Project using metadata for source references | |
| - [Programming Framework](https://huggingface.co/spaces/garywelz/programming_framework) - Universal process analysis tool that can utilize metadata | |
| - [Science Video Database](https://huggingface.co/spaces/garywelz/sciencevideodb) - Video content management with potential metadata linking | |
| ## π» Technology Stack | |
| - **Database:** Firestore NoSQL for flexible JSON storage | |
| - **Processing:** Google Cloud Functions for automated metadata processing | |
| - **AI/ML:** Vertex AI for entity extraction and analysis | |
| - **API:** RESTful API for external access | |
| - **Storage:** Google Cloud Storage for associated assets | |
| ## π Resources | |
| - **GitHub Repository:** [garywelz/copernicusai-research-metadata](https://github.com/garywelz/copernicusai-research-metadata) | |
| - **Hugging Face Space:** [garywelz/metadata_database](https://huggingface.co/spaces/garywelz/metadata_database) | |
| ### How to Cite This Work | |
| Welz, G. (2024β2025). *Research Paper Metadata Database*. | |
| Hugging Face Spaces. https://huggingface.co/spaces/garywelz/metadata_database | |
| This project serves as infrastructure for AI-assisted research analysis, enabling systematic visualization and exploration of scientific research patterns through structured metadata management. | |
| The Research Paper Metadata Database is designed as infrastructure for AI-assisted science, providing the foundational data layer for knowledge graph construction and semantic search capabilities within the CopernicusAI Knowledge Engine. | |
| --- | |
| **Part of the CopernicusAI Knowledge Engine** | |
| Β© 2025 Gary Welz. All rights reserved. | |