metadata_database / README.md
garywelz's picture
Upload 2 files
72629fd verified
metadata
title: Research Paper Metadata Database
emoji: πŸ“š
colorFrom: blue
colorTo: indigo
sdk: static
pinned: false
license: apache-2.0

πŸ“š Research Paper Metadata Database

A centralized metadata repository for scientific research papers, designed to enable AI-powered visualization and analysis of research structure with the goal of expanding research in interesting, useful, and practical ways.

πŸ“š Prior Work & Research Contributions

Overview

The Research Paper Metadata Database represents prior work that demonstrates the creation of a structured metadata repository for scientific research papers. This project establishes a foundation for using AI tools to visualize and analyze the structure of scientific research, enabling systematic exploration of research patterns, citation networks, and interdisciplinary connections.

πŸ”¬ Research Contributions

  • Structured Metadata Repository: Centralized database of research paper metadata (not a file archive)
  • AI-Powered Preprocessing: LLM-based entity extraction and annotation for research papers
  • Citation Network Analysis: Cross-reference linking and relationship mapping between papers
  • Integration Framework: Designed for integration with CopernicusAI Knowledge Engine components

βš™οΈ Technical Achievements

  • JSON-Based Storage: Structured metadata format enabling programmatic access and analysis
  • Entity Extraction: Automated extraction of genes, proteins, chemical compounds, equations, and key concepts
  • Quality Assessment: Automated quality scoring and relevance metrics for research papers
  • API Architecture: RESTful API design for external access and integration

🎯 Position Within CopernicusAI Knowledge Engine

The Research Paper Metadata Database serves as a core data infrastructure component of the CopernicusAI Knowledge Engine, providing:

  • Foundation for Knowledge Graph Construction: Structured metadata enables relationship mapping - βœ… Now Fully Operational (December 2025) with 12,000+ mathematics papers indexed, interactive knowledge graph visualization, and relationship extraction (citations, semantic similarity, categories)
  • Research Tools Dashboard (βœ… Implemented December 2025) - Fully operational web interface providing unified access to research papers through knowledge graph visualization, vector search, RAG queries, and content browsing. Live at: https://copernicus-frontend-phzp4ie2sq-uc.a.run.app/knowledge-engine
  • Vector Search: Semantic search using Vertex AI embeddings across papers, podcasts, and processes
  • RAG System: Retrieval-augmented generation with citation support and multi-modal content integration
  • Integration with AI Podcast Generation: Links research papers to generated podcast content
  • Support for GLMP: Provides source paper references for biological process visualizations
  • Science Video Database Integration: Potential linking between papers and related video content
  • Programming Framework Support: Supplies structured data for process analysis applications

This work establishes a proof-of-concept for AI-assisted research metadata management, demonstrating how structured data can enable systematic analysis and visualization of scientific research patterns. The Knowledge Engine now provides a fully operational system for exploring research papers through multiple interfaces.

🎯 Project Goals

This project creates a database of scientific research paper metadata for the purpose of:

  • Using AI tools to visualize and analyze the structure of scientific research
  • Expanding research in interesting, useful, and practical ways
  • Enabling systematic exploration of research patterns and connections
  • Supporting knowledge graph construction and semantic search

πŸ”§ Technical Architecture

Metadata Structure

  • DOI, arXiv ID, Publication Information: Standard identifiers and publication details
  • Abstracts and Key Findings: Extracted summaries and main contributions
  • Extracted Entities: Genes, proteins, chemical compounds, equations, mathematical concepts
  • Citation Networks: Cross-references and relationship mapping
  • Paradigm Shift Indicators: Flags for revolutionary vs. incremental research
  • Interdisciplinary Connections: Links between different research domains
  • Quality Scores: Relevance metrics and validation scores

AI-Powered Preprocessing

  • LLM-based entity extraction and annotation
  • Automatic categorization by discipline and subdomain
  • Keyword extraction and semantic tagging
  • Citation tracking and relationship mapping
  • Quality assessment and validation

Integration Features

  • DOI/arXiv ID resolution and metadata enrichment
  • Cross-reference linking between papers
  • Podcast-to-paper relationship tracking
  • Search and query capabilities
  • API access for programmatic retrieval

πŸ”— Related Projects

  • CopernicusAI - Main knowledge engine integrating metadata with AI podcasts
  • GLMP - Genome Logic Modeling Project using metadata for source references
  • Programming Framework - Universal process analysis tool that can utilize metadata
  • Science Video Database - Video content management with potential metadata linking

πŸ’» Technology Stack

  • Database: Firestore NoSQL for flexible JSON storage
  • Processing: Google Cloud Functions for automated metadata processing
  • AI/ML: Vertex AI for entity extraction and analysis
  • API: RESTful API for external access
  • Storage: Google Cloud Storage for associated assets

πŸ”— Resources

How to Cite This Work

Welz, G. (2024–2025). Research Paper Metadata Database. Hugging Face Spaces. https://huggingface.co/spaces/garywelz/metadata_database

This project serves as infrastructure for AI-assisted research analysis, enabling systematic visualization and exploration of scientific research patterns through structured metadata management.

The Research Paper Metadata Database is designed as infrastructure for AI-assisted science, providing the foundational data layer for knowledge graph construction and semantic search capabilities within the CopernicusAI Knowledge Engine.


Part of the CopernicusAI Knowledge Engine

Β© 2025 Gary Welz. All rights reserved.