mcp4rdf / copilot-instructions.md
RDF Validation Deployment
WIP before syncing with hf-https/main
4f8c53c
|
raw
history blame
8.66 kB

GitHub Copilot Instructions for MCP4RDF Project

Project Context

This is an RDF validation tool with AI features, deployed on Hugging Face Spaces. It validates RDF/XML against SHACL schemas and provides AI-powered suggestions for fixing validation errors.

Key Technologies

  • Frontend: Gradio 5.33.0
  • RDF Processing: rdflib, pyshacl
  • AI Integration: Hugging Face Inference API
  • Protocol: MCP (Model Context Protocol)
  • Deployment: Hugging Face Spaces

Project Structure

mcp4rdf-hf-space/
β”œβ”€β”€ app.py                    # Main Gradio application
β”œβ”€β”€ validator.py              # Core SHACL validation logic
β”œβ”€β”€ mcp_server_gradio.py      # MCP server implementation
β”œβ”€β”€ MonographDCTAP/           # TSV files with SHACL definitions
β”œβ”€β”€ electronic_MonographDCTAP/ # Electronic format SHACL definitions
└── requirements.txt          # Python dependencies

Code Style Guidelines

Python Standards

  • Use type hints for function parameters and return values
  • Follow PEP 8 naming conventions
  • Add docstrings for all public functions
  • Use logging instead of print statements

RDF/SHACL Patterns

# Always bind common namespaces
STANDARD_NAMESPACES = {
    "bf": "http://id.loc.gov/ontologies/bibframe/",
    "bflc": "http://id.loc.gov/ontologies/bflc/",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "sh": "http://www.w3.org/ns/shacl#"
}

# Use URIRef for RDF predicates
from rdflib import URIRef, Literal, Graph
sh_path = URIRef("http://www.w3.org/ns/shacl#path")

Common Tasks and Templates

1. Adding New SHACL Validation Rules

# Template for adding a new property constraint
def add_property_constraint(shape_node, property_id, constraints):
    """
    Add SHACL property constraints to a shape.
    
    Args:
        shape_node: RDF node representing the shape
        property_id: Property identifier (e.g., "bf:title")
        constraints: Dict with keys like 'mandatory', 'repeatable', 'datatype'
    """
    # Copilot: implement SHACL property shape creation

2. Parsing TSV to SHACL

# When converting TSV rows to SHACL shapes, use this pattern:
def tsv_row_to_shacl(row, graph, prefixes):
    """
    Convert a TSV row to SHACL constraints.
    Expected columns: shapeID, propertyID, mandatory, repeatable, valueShape
    """
    # Copilot: handle prefix expansion and constraint mapping

3. Error Message Formatting

# Format validation errors for user display
def format_validation_error(result):
    """
    Format pyshacl validation result for Gradio display.
    Include: severity, focus node, property path, and message
    """
    # Copilot: create user-friendly error messages with context

4. AI Integration Patterns

# Template for AI API calls
async def get_ai_suggestion(error_context, rdf_snippet):
    """
    Get AI suggestions for fixing RDF validation errors.
    Uses Hugging Face Inference API with proper error handling.
    """
    # Copilot: implement with retry logic and timeout handling

Debugging Helpers

SHACL Validation Issues

# Debug template for missing validations
def debug_shacl_targeting():
    """
    Log all target classes and matching nodes in the data graph.
    Helps diagnose why validations aren't triggering.
    """
    # Copilot: implement comprehensive logging of shapes and targets

Namespace Resolution

# Helper for namespace issues
def resolve_prefixed_uri(prefixed_id, namespace_map):
    """
    Resolve prefixed identifiers like 'bf:Work' to full URIs.
    Handle edge cases: no prefix, already full URI, unknown prefix
    """
    # Copilot: implement robust prefix resolution

MCP Server Implementation

Tool Registration Pattern

# MCP tool definition template
@mcp_server.tool()
async def new_mcp_tool(param1: str, param2: Optional[str] = None) -> dict:
    """
    MCP tool implementation.
    Returns: {"success": bool, "result": Any, "error": Optional[str]}
    """
    # Copilot: implement with proper error handling and logging

SSE Event Formatting

# Server-Sent Events response pattern
def format_sse_response(tool_name, result):
    """
    Format MCP tool response as SSE event.
    Include proper event type and JSON encoding.
    """
    # Copilot: implement SSE formatting with error states

Testing Patterns

Unit Test Templates

# Test SHACL shape generation
def test_shape_generation():
    """
    Test that TSV rows correctly generate SHACL shapes.
    Include: basic properties, cardinality, value shapes
    """
    # Copilot: generate comprehensive test cases

# Test RDF validation
def test_rdf_validation():
    """
    Test validation with various RDF inputs.
    Include: valid, invalid, edge cases
    """
    # Copilot: create test data and assertions

Integration Test Patterns

# Test MCP server endpoints
async def test_mcp_endpoints():
    """
    Test all MCP tools with realistic inputs.
    Verify: response format, error handling, performance
    """
    # Copilot: implement async test scenarios

Performance Optimization

Caching Strategies

# Cache compiled SHACL graphs
@lru_cache(maxsize=10)
def get_compiled_shacl_graph(template_name):
    """
    Cache parsed SHACL graphs to avoid repeated parsing.
    """
    # Copilot: implement with proper cache invalidation

# Cache namespace resolutions
@lru_cache(maxsize=1000)
def cached_uri_resolution(prefixed_id, namespace_json):
    """
    Cache URI resolutions to improve performance.
    """
    # Copilot: implement with hashable inputs

Batch Processing

# Process multiple RDF documents efficiently
async def batch_validate_rdf(rdf_documents: List[str]):
    """
    Validate multiple RDF documents in parallel.
    Use asyncio for concurrent processing.
    """
    # Copilot: implement with progress tracking

Common Pitfalls to Avoid

  1. Namespace Conflicts: Always use override=True when binding namespaces
  2. Graph Parsing: Specify format explicitly, don't rely on auto-detection
  3. SPARQL Queries: Escape special characters in URIs
  4. Async/Await: Don't mix synchronous and asynchronous code
  5. Error Messages: Always include context for debugging

Gradio UI Enhancements

Adding New UI Components

# Template for new Gradio components
def create_validation_interface():
    """
    Create Gradio interface with:
    - File upload for RDF
    - Template selection
    - Real-time validation
    - Export functionality
    """
    # Copilot: implement with proper event handlers

Custom CSS/Theming

# Apply custom styling to Gradio components
custom_css = """
    .validation-error { color: red; font-weight: bold; }
    .validation-warning { color: orange; }
    .validation-info { color: blue; }
"""
# Copilot: suggest CSS for better UX

Deployment Considerations

Hugging Face Spaces Configuration

# Environment variable handling
HF_API_KEY = os.environ.get("HF_API_KEY")
if not HF_API_KEY:
    logger.warning("HF_API_KEY not set, AI features disabled")
    
# Gradio launch configuration for Spaces
demo.launch(
    server_name="0.0.0.0",
    server_port=7860,
    share=False  # Don't use share=True on Spaces
)

Error Recovery

# Implement graceful degradation
def safe_ai_call(func):
    """
    Decorator for AI calls that falls back gracefully.
    """
    # Copilot: implement with fallback behavior

Quick Reference

Essential Imports

import gradio as gr
import rdflib
from rdflib import Graph, URIRef, Literal, Namespace, RDF, RDFS
from pyshacl import validate
import pandas as pd
import logging
import asyncio
from typing import Optional, Dict, List, Any

Debugging Commands

# Log graph contents
logger.debug(f"Graph has {len(graph)} triples")
logger.debug(graph.serialize(format='turtle'))

# Log validation details
conforms, results_graph, results_text = validate(
    data_graph, 
    shacl_graph=shapes, 
    debug=True,
    inference='rdfs'
)

Common SHACL Properties

  • sh:targetClass - Define which RDF types to validate
  • sh:path - Property to validate
  • sh:minCount - Minimum occurrences (1 for mandatory)
  • sh:maxCount - Maximum occurrences (1 for non-repeatable)
  • sh:datatype - Expected datatype
  • sh:node - Link to another shape (valueShape)
  • sh:severity - sh:Violation, sh:Warning, or sh:Info

Remember: Always test with real BIBFRAME data and verify MCP endpoints are accessible!