# GitHub Copilot Instructions for MCP4RDF Project ## Project Context This is an RDF validation tool with AI features, deployed on Hugging Face Spaces. It validates RDF/XML against SHACL schemas and provides AI-powered suggestions for fixing validation errors. ### Key Technologies - **Frontend**: Gradio 5.33.0 - **RDF Processing**: rdflib, pyshacl - **AI Integration**: Hugging Face Inference API - **Protocol**: MCP (Model Context Protocol) - **Deployment**: Hugging Face Spaces ### Project Structure ``` mcp4rdf-hf-space/ ├── app.py # Main Gradio application ├── validator.py # Core SHACL validation logic ├── mcp_server_gradio.py # MCP server implementation ├── MonographDCTAP/ # TSV files with SHACL definitions ├── electronic_MonographDCTAP/ # Electronic format SHACL definitions └── requirements.txt # Python dependencies ``` ## Code Style Guidelines ### Python Standards - Use type hints for function parameters and return values - Follow PEP 8 naming conventions - Add docstrings for all public functions - Use logging instead of print statements ### RDF/SHACL Patterns ```python # Always bind common namespaces STANDARD_NAMESPACES = { "bf": "http://id.loc.gov/ontologies/bibframe/", "bflc": "http://id.loc.gov/ontologies/bflc/", "rdfs": "http://www.w3.org/2000/01/rdf-schema#", "sh": "http://www.w3.org/ns/shacl#" } # Use URIRef for RDF predicates from rdflib import URIRef, Literal, Graph sh_path = URIRef("http://www.w3.org/ns/shacl#path") ``` ## Common Tasks and Templates ### 1. Adding New SHACL Validation Rules ```python # Template for adding a new property constraint def add_property_constraint(shape_node, property_id, constraints): """ Add SHACL property constraints to a shape. Args: shape_node: RDF node representing the shape property_id: Property identifier (e.g., "bf:title") constraints: Dict with keys like 'mandatory', 'repeatable', 'datatype' """ # Copilot: implement SHACL property shape creation ``` ### 2. Parsing TSV to SHACL ```python # When converting TSV rows to SHACL shapes, use this pattern: def tsv_row_to_shacl(row, graph, prefixes): """ Convert a TSV row to SHACL constraints. Expected columns: shapeID, propertyID, mandatory, repeatable, valueShape """ # Copilot: handle prefix expansion and constraint mapping ``` ### 3. Error Message Formatting ```python # Format validation errors for user display def format_validation_error(result): """ Format pyshacl validation result for Gradio display. Include: severity, focus node, property path, and message """ # Copilot: create user-friendly error messages with context ``` ### 4. AI Integration Patterns ```python # Template for AI API calls async def get_ai_suggestion(error_context, rdf_snippet): """ Get AI suggestions for fixing RDF validation errors. Uses Hugging Face Inference API with proper error handling. """ # Copilot: implement with retry logic and timeout handling ``` ## Debugging Helpers ### SHACL Validation Issues ```python # Debug template for missing validations def debug_shacl_targeting(): """ Log all target classes and matching nodes in the data graph. Helps diagnose why validations aren't triggering. """ # Copilot: implement comprehensive logging of shapes and targets ``` ### Namespace Resolution ```python # Helper for namespace issues def resolve_prefixed_uri(prefixed_id, namespace_map): """ Resolve prefixed identifiers like 'bf:Work' to full URIs. Handle edge cases: no prefix, already full URI, unknown prefix """ # Copilot: implement robust prefix resolution ``` ## MCP Server Implementation ### Tool Registration Pattern ```python # MCP tool definition template @mcp_server.tool() async def new_mcp_tool(param1: str, param2: Optional[str] = None) -> dict: """ MCP tool implementation. Returns: {"success": bool, "result": Any, "error": Optional[str]} """ # Copilot: implement with proper error handling and logging ``` ### SSE Event Formatting ```python # Server-Sent Events response pattern def format_sse_response(tool_name, result): """ Format MCP tool response as SSE event. Include proper event type and JSON encoding. """ # Copilot: implement SSE formatting with error states ``` ## Testing Patterns ### Unit Test Templates ```python # Test SHACL shape generation def test_shape_generation(): """ Test that TSV rows correctly generate SHACL shapes. Include: basic properties, cardinality, value shapes """ # Copilot: generate comprehensive test cases # Test RDF validation def test_rdf_validation(): """ Test validation with various RDF inputs. Include: valid, invalid, edge cases """ # Copilot: create test data and assertions ``` ### Integration Test Patterns ```python # Test MCP server endpoints async def test_mcp_endpoints(): """ Test all MCP tools with realistic inputs. Verify: response format, error handling, performance """ # Copilot: implement async test scenarios ``` ## Performance Optimization ### Caching Strategies ```python # Cache compiled SHACL graphs @lru_cache(maxsize=10) def get_compiled_shacl_graph(template_name): """ Cache parsed SHACL graphs to avoid repeated parsing. """ # Copilot: implement with proper cache invalidation # Cache namespace resolutions @lru_cache(maxsize=1000) def cached_uri_resolution(prefixed_id, namespace_json): """ Cache URI resolutions to improve performance. """ # Copilot: implement with hashable inputs ``` ### Batch Processing ```python # Process multiple RDF documents efficiently async def batch_validate_rdf(rdf_documents: List[str]): """ Validate multiple RDF documents in parallel. Use asyncio for concurrent processing. """ # Copilot: implement with progress tracking ``` ## Common Pitfalls to Avoid 1. **Namespace Conflicts**: Always use `override=True` when binding namespaces 2. **Graph Parsing**: Specify format explicitly, don't rely on auto-detection 3. **SPARQL Queries**: Escape special characters in URIs 4. **Async/Await**: Don't mix synchronous and asynchronous code 5. **Error Messages**: Always include context for debugging ## Gradio UI Enhancements ### Adding New UI Components ```python # Template for new Gradio components def create_validation_interface(): """ Create Gradio interface with: - File upload for RDF - Template selection - Real-time validation - Export functionality """ # Copilot: implement with proper event handlers ``` ### Custom CSS/Theming ```python # Apply custom styling to Gradio components custom_css = """ .validation-error { color: red; font-weight: bold; } .validation-warning { color: orange; } .validation-info { color: blue; } """ # Copilot: suggest CSS for better UX ``` ## Deployment Considerations ### Hugging Face Spaces Configuration ```python # Environment variable handling HF_API_KEY = os.environ.get("HF_API_KEY") if not HF_API_KEY: logger.warning("HF_API_KEY not set, AI features disabled") # Gradio launch configuration for Spaces demo.launch( server_name="0.0.0.0", server_port=7860, share=False # Don't use share=True on Spaces ) ``` ### Error Recovery ```python # Implement graceful degradation def safe_ai_call(func): """ Decorator for AI calls that falls back gracefully. """ # Copilot: implement with fallback behavior ``` ## Quick Reference ### Essential Imports ```python import gradio as gr import rdflib from rdflib import Graph, URIRef, Literal, Namespace, RDF, RDFS from pyshacl import validate import pandas as pd import logging import asyncio from typing import Optional, Dict, List, Any ``` ### Debugging Commands ```python # Log graph contents logger.debug(f"Graph has {len(graph)} triples") logger.debug(graph.serialize(format='turtle')) # Log validation details conforms, results_graph, results_text = validate( data_graph, shacl_graph=shapes, debug=True, inference='rdfs' ) ``` ### Common SHACL Properties - `sh:targetClass` - Define which RDF types to validate - `sh:path` - Property to validate - `sh:minCount` - Minimum occurrences (1 for mandatory) - `sh:maxCount` - Maximum occurrences (1 for non-repeatable) - `sh:datatype` - Expected datatype - `sh:node` - Link to another shape (valueShape) - `sh:severity` - sh:Violation, sh:Warning, or sh:Info Remember: Always test with real BIBFRAME data and verify MCP endpoints are accessible!