Spaces:
Running
title: Research Paper Metadata Database
emoji: π
colorFrom: blue
colorTo: indigo
sdk: static
pinned: false
license: apache-2.0
π Research Paper Metadata Database
A centralized metadata repository for scientific research papers, designed to enable AI-powered visualization and analysis of research structure with the goal of expanding research in interesting, useful, and practical ways.
π Prior Work & Research Contributions
Overview
The Research Paper Metadata Database represents prior work that demonstrates the creation of a structured metadata repository for scientific research papers. This project establishes a foundation for using AI tools to visualize and analyze the structure of scientific research, enabling systematic exploration of research patterns, citation networks, and interdisciplinary connections.
π¬ Research Contributions
- Structured Metadata Repository: Centralized database of research paper metadata (not a file archive)
- AI-Powered Preprocessing: LLM-based entity extraction and annotation for research papers
- Citation Network Analysis: Cross-reference linking and relationship mapping between papers
- Integration Framework: Designed for integration with CopernicusAI Knowledge Engine components
βοΈ Technical Achievements
- JSON-Based Storage: Structured metadata format enabling programmatic access and analysis
- Entity Extraction: Automated extraction of genes, proteins, chemical compounds, equations, and key concepts
- Quality Assessment: Automated quality scoring and relevance metrics for research papers
- API Architecture: RESTful API design for external access and integration
π― Position Within CopernicusAI Knowledge Engine
The Research Paper Metadata Database serves as a core data infrastructure component of the CopernicusAI Knowledge Engine, providing:
- Foundation for Knowledge Graph Construction: Structured metadata enables relationship mapping - β Now Fully Operational (December 2025) with 12,000+ mathematics papers indexed, interactive knowledge graph visualization, and relationship extraction (citations, semantic similarity, categories)
- Research Tools Dashboard (β Implemented December 2025) - Fully operational web interface providing unified access to research papers through knowledge graph visualization, vector search, RAG queries, and content browsing. Live at: https://copernicus-frontend-phzp4ie2sq-uc.a.run.app/knowledge-engine
- Vector Search: Semantic search using Vertex AI embeddings across papers, podcasts, and processes
- RAG System: Retrieval-augmented generation with citation support and multi-modal content integration
- Integration with AI Podcast Generation: Links research papers to generated podcast content
- Support for GLMP: Provides source paper references for biological process visualizations
- Science Video Database Integration: Potential linking between papers and related video content
- Programming Framework Support: Supplies structured data for process analysis applications
This work establishes a proof-of-concept for AI-assisted research metadata management, demonstrating how structured data can enable systematic analysis and visualization of scientific research patterns. The Knowledge Engine now provides a fully operational system for exploring research papers through multiple interfaces.
π― Project Goals
This project creates a database of scientific research paper metadata for the purpose of:
- Using AI tools to visualize and analyze the structure of scientific research
- Expanding research in interesting, useful, and practical ways
- Enabling systematic exploration of research patterns and connections
- Supporting knowledge graph construction and semantic search
π§ Technical Architecture
Metadata Structure
- DOI, arXiv ID, Publication Information: Standard identifiers and publication details
- Abstracts and Key Findings: Extracted summaries and main contributions
- Extracted Entities: Genes, proteins, chemical compounds, equations, mathematical concepts
- Citation Networks: Cross-references and relationship mapping
- Paradigm Shift Indicators: Flags for revolutionary vs. incremental research
- Interdisciplinary Connections: Links between different research domains
- Quality Scores: Relevance metrics and validation scores
AI-Powered Preprocessing
- LLM-based entity extraction and annotation
- Automatic categorization by discipline and subdomain
- Keyword extraction and semantic tagging
- Citation tracking and relationship mapping
- Quality assessment and validation
Integration Features
- DOI/arXiv ID resolution and metadata enrichment
- Cross-reference linking between papers
- Podcast-to-paper relationship tracking
- Search and query capabilities
- API access for programmatic retrieval
π Related Projects
- CopernicusAI - Main knowledge engine integrating metadata with AI podcasts
- GLMP - Genome Logic Modeling Project using metadata for source references
- Programming Framework - Universal process analysis tool that can utilize metadata
- Science Video Database - Video content management with potential metadata linking
π» Technology Stack
- Database: Firestore NoSQL for flexible JSON storage
- Processing: Google Cloud Functions for automated metadata processing
- AI/ML: Vertex AI for entity extraction and analysis
- API: RESTful API for external access
- Storage: Google Cloud Storage for associated assets
π Resources
- GitHub Repository: garywelz/copernicusai-research-metadata
- Hugging Face Space: garywelz/metadata_database
How to Cite This Work
Welz, G. (2024β2025). Research Paper Metadata Database. Hugging Face Spaces. https://huggingface.co/spaces/garywelz/metadata_database
This project serves as infrastructure for AI-assisted research analysis, enabling systematic visualization and exploration of scientific research patterns through structured metadata management.
The Research Paper Metadata Database is designed as infrastructure for AI-assisted science, providing the foundational data layer for knowledge graph construction and semantic search capabilities within the CopernicusAI Knowledge Engine.
Part of the CopernicusAI Knowledge Engine
Β© 2025 Gary Welz. All rights reserved.