| # GitHub Copilot Instructions for MCP4RDF Project | |
| ## Project Context | |
| This is an RDF validation tool with AI features, deployed on Hugging Face Spaces. It validates RDF/XML against SHACL schemas and provides AI-powered suggestions for fixing validation errors. | |
| ### Key Technologies | |
| - **Frontend**: Gradio 5.33.0 | |
| - **RDF Processing**: rdflib, pyshacl | |
| - **AI Integration**: Hugging Face Inference API | |
| - **Protocol**: MCP (Model Context Protocol) | |
| - **Deployment**: Hugging Face Spaces | |
| ### Project Structure | |
| ``` | |
| mcp4rdf-hf-space/ | |
| βββ app.py # Main Gradio application | |
| βββ validator.py # Core SHACL validation logic | |
| βββ mcp_server_gradio.py # MCP server implementation | |
| βββ MonographDCTAP/ # TSV files with SHACL definitions | |
| βββ electronic_MonographDCTAP/ # Electronic format SHACL definitions | |
| βββ requirements.txt # Python dependencies | |
| ``` | |
| ## Code Style Guidelines | |
| ### Python Standards | |
| - Use type hints for function parameters and return values | |
| - Follow PEP 8 naming conventions | |
| - Add docstrings for all public functions | |
| - Use logging instead of print statements | |
| ### RDF/SHACL Patterns | |
| ```python | |
| # Always bind common namespaces | |
| STANDARD_NAMESPACES = { | |
| "bf": "http://id.loc.gov/ontologies/bibframe/", | |
| "bflc": "http://id.loc.gov/ontologies/bflc/", | |
| "rdfs": "http://www.w3.org/2000/01/rdf-schema#", | |
| "sh": "http://www.w3.org/ns/shacl#" | |
| } | |
| # Use URIRef for RDF predicates | |
| from rdflib import URIRef, Literal, Graph | |
| sh_path = URIRef("http://www.w3.org/ns/shacl#path") | |
| ``` | |
| ## Common Tasks and Templates | |
| ### 1. Adding New SHACL Validation Rules | |
| ```python | |
| # Template for adding a new property constraint | |
| def add_property_constraint(shape_node, property_id, constraints): | |
| """ | |
| Add SHACL property constraints to a shape. | |
| Args: | |
| shape_node: RDF node representing the shape | |
| property_id: Property identifier (e.g., "bf:title") | |
| constraints: Dict with keys like 'mandatory', 'repeatable', 'datatype' | |
| """ | |
| # Copilot: implement SHACL property shape creation | |
| ``` | |
| ### 2. Parsing TSV to SHACL | |
| ```python | |
| # When converting TSV rows to SHACL shapes, use this pattern: | |
| def tsv_row_to_shacl(row, graph, prefixes): | |
| """ | |
| Convert a TSV row to SHACL constraints. | |
| Expected columns: shapeID, propertyID, mandatory, repeatable, valueShape | |
| """ | |
| # Copilot: handle prefix expansion and constraint mapping | |
| ``` | |
| ### 3. Error Message Formatting | |
| ```python | |
| # Format validation errors for user display | |
| def format_validation_error(result): | |
| """ | |
| Format pyshacl validation result for Gradio display. | |
| Include: severity, focus node, property path, and message | |
| """ | |
| # Copilot: create user-friendly error messages with context | |
| ``` | |
| ### 4. AI Integration Patterns | |
| ```python | |
| # Template for AI API calls | |
| async def get_ai_suggestion(error_context, rdf_snippet): | |
| """ | |
| Get AI suggestions for fixing RDF validation errors. | |
| Uses Hugging Face Inference API with proper error handling. | |
| """ | |
| # Copilot: implement with retry logic and timeout handling | |
| ``` | |
| ## Debugging Helpers | |
| ### SHACL Validation Issues | |
| ```python | |
| # Debug template for missing validations | |
| def debug_shacl_targeting(): | |
| """ | |
| Log all target classes and matching nodes in the data graph. | |
| Helps diagnose why validations aren't triggering. | |
| """ | |
| # Copilot: implement comprehensive logging of shapes and targets | |
| ``` | |
| ### Namespace Resolution | |
| ```python | |
| # Helper for namespace issues | |
| def resolve_prefixed_uri(prefixed_id, namespace_map): | |
| """ | |
| Resolve prefixed identifiers like 'bf:Work' to full URIs. | |
| Handle edge cases: no prefix, already full URI, unknown prefix | |
| """ | |
| # Copilot: implement robust prefix resolution | |
| ``` | |
| ## MCP Server Implementation | |
| ### Tool Registration Pattern | |
| ```python | |
| # MCP tool definition template | |
| @mcp_server.tool() | |
| async def new_mcp_tool(param1: str, param2: Optional[str] = None) -> dict: | |
| """ | |
| MCP tool implementation. | |
| Returns: {"success": bool, "result": Any, "error": Optional[str]} | |
| """ | |
| # Copilot: implement with proper error handling and logging | |
| ``` | |
| ### SSE Event Formatting | |
| ```python | |
| # Server-Sent Events response pattern | |
| def format_sse_response(tool_name, result): | |
| """ | |
| Format MCP tool response as SSE event. | |
| Include proper event type and JSON encoding. | |
| """ | |
| # Copilot: implement SSE formatting with error states | |
| ``` | |
| ## Testing Patterns | |
| ### Unit Test Templates | |
| ```python | |
| # Test SHACL shape generation | |
| def test_shape_generation(): | |
| """ | |
| Test that TSV rows correctly generate SHACL shapes. | |
| Include: basic properties, cardinality, value shapes | |
| """ | |
| # Copilot: generate comprehensive test cases | |
| # Test RDF validation | |
| def test_rdf_validation(): | |
| """ | |
| Test validation with various RDF inputs. | |
| Include: valid, invalid, edge cases | |
| """ | |
| # Copilot: create test data and assertions | |
| ``` | |
| ### Integration Test Patterns | |
| ```python | |
| # Test MCP server endpoints | |
| async def test_mcp_endpoints(): | |
| """ | |
| Test all MCP tools with realistic inputs. | |
| Verify: response format, error handling, performance | |
| """ | |
| # Copilot: implement async test scenarios | |
| ``` | |
| ## Performance Optimization | |
| ### Caching Strategies | |
| ```python | |
| # Cache compiled SHACL graphs | |
| @lru_cache(maxsize=10) | |
| def get_compiled_shacl_graph(template_name): | |
| """ | |
| Cache parsed SHACL graphs to avoid repeated parsing. | |
| """ | |
| # Copilot: implement with proper cache invalidation | |
| # Cache namespace resolutions | |
| @lru_cache(maxsize=1000) | |
| def cached_uri_resolution(prefixed_id, namespace_json): | |
| """ | |
| Cache URI resolutions to improve performance. | |
| """ | |
| # Copilot: implement with hashable inputs | |
| ``` | |
| ### Batch Processing | |
| ```python | |
| # Process multiple RDF documents efficiently | |
| async def batch_validate_rdf(rdf_documents: List[str]): | |
| """ | |
| Validate multiple RDF documents in parallel. | |
| Use asyncio for concurrent processing. | |
| """ | |
| # Copilot: implement with progress tracking | |
| ``` | |
| ## Common Pitfalls to Avoid | |
| 1. **Namespace Conflicts**: Always use `override=True` when binding namespaces | |
| 2. **Graph Parsing**: Specify format explicitly, don't rely on auto-detection | |
| 3. **SPARQL Queries**: Escape special characters in URIs | |
| 4. **Async/Await**: Don't mix synchronous and asynchronous code | |
| 5. **Error Messages**: Always include context for debugging | |
| ## Gradio UI Enhancements | |
| ### Adding New UI Components | |
| ```python | |
| # Template for new Gradio components | |
| def create_validation_interface(): | |
| """ | |
| Create Gradio interface with: | |
| - File upload for RDF | |
| - Template selection | |
| - Real-time validation | |
| - Export functionality | |
| """ | |
| # Copilot: implement with proper event handlers | |
| ``` | |
| ### Custom CSS/Theming | |
| ```python | |
| # Apply custom styling to Gradio components | |
| custom_css = """ | |
| .validation-error { color: red; font-weight: bold; } | |
| .validation-warning { color: orange; } | |
| .validation-info { color: blue; } | |
| """ | |
| # Copilot: suggest CSS for better UX | |
| ``` | |
| ## Deployment Considerations | |
| ### Hugging Face Spaces Configuration | |
| ```python | |
| # Environment variable handling | |
| HF_API_KEY = os.environ.get("HF_API_KEY") | |
| if not HF_API_KEY: | |
| logger.warning("HF_API_KEY not set, AI features disabled") | |
| # Gradio launch configuration for Spaces | |
| demo.launch( | |
| server_name="0.0.0.0", | |
| server_port=7860, | |
| share=False # Don't use share=True on Spaces | |
| ) | |
| ``` | |
| ### Error Recovery | |
| ```python | |
| # Implement graceful degradation | |
| def safe_ai_call(func): | |
| """ | |
| Decorator for AI calls that falls back gracefully. | |
| """ | |
| # Copilot: implement with fallback behavior | |
| ``` | |
| ## Quick Reference | |
| ### Essential Imports | |
| ```python | |
| import gradio as gr | |
| import rdflib | |
| from rdflib import Graph, URIRef, Literal, Namespace, RDF, RDFS | |
| from pyshacl import validate | |
| import pandas as pd | |
| import logging | |
| import asyncio | |
| from typing import Optional, Dict, List, Any | |
| ``` | |
| ### Debugging Commands | |
| ```python | |
| # Log graph contents | |
| logger.debug(f"Graph has {len(graph)} triples") | |
| logger.debug(graph.serialize(format='turtle')) | |
| # Log validation details | |
| conforms, results_graph, results_text = validate( | |
| data_graph, | |
| shacl_graph=shapes, | |
| debug=True, | |
| inference='rdfs' | |
| ) | |
| ``` | |
| ### Common SHACL Properties | |
| - `sh:targetClass` - Define which RDF types to validate | |
| - `sh:path` - Property to validate | |
| - `sh:minCount` - Minimum occurrences (1 for mandatory) | |
| - `sh:maxCount` - Maximum occurrences (1 for non-repeatable) | |
| - `sh:datatype` - Expected datatype | |
| - `sh:node` - Link to another shape (valueShape) | |
| - `sh:severity` - sh:Violation, sh:Warning, or sh:Info | |
| Remember: Always test with real BIBFRAME data and verify MCP endpoints are accessible! |