GitHub Copilot Instructions for MCP4RDF Project
Project Context
This is an RDF validation tool with AI features, deployed on Hugging Face Spaces. It validates RDF/XML against SHACL schemas and provides AI-powered suggestions for fixing validation errors.
Key Technologies
- Frontend: Gradio 5.33.0
- RDF Processing: rdflib, pyshacl
- AI Integration: Hugging Face Inference API
- Protocol: MCP (Model Context Protocol)
- Deployment: Hugging Face Spaces
Project Structure
mcp4rdf-hf-space/
βββ app.py # Main Gradio application
βββ validator.py # Core SHACL validation logic
βββ mcp_server_gradio.py # MCP server implementation
βββ MonographDCTAP/ # TSV files with SHACL definitions
βββ electronic_MonographDCTAP/ # Electronic format SHACL definitions
βββ requirements.txt # Python dependencies
Code Style Guidelines
Python Standards
- Use type hints for function parameters and return values
- Follow PEP 8 naming conventions
- Add docstrings for all public functions
- Use logging instead of print statements
RDF/SHACL Patterns
# Always bind common namespaces
STANDARD_NAMESPACES = {
"bf": "http://id.loc.gov/ontologies/bibframe/",
"bflc": "http://id.loc.gov/ontologies/bflc/",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"sh": "http://www.w3.org/ns/shacl#"
}
# Use URIRef for RDF predicates
from rdflib import URIRef, Literal, Graph
sh_path = URIRef("http://www.w3.org/ns/shacl#path")
Common Tasks and Templates
1. Adding New SHACL Validation Rules
# Template for adding a new property constraint
def add_property_constraint(shape_node, property_id, constraints):
"""
Add SHACL property constraints to a shape.
Args:
shape_node: RDF node representing the shape
property_id: Property identifier (e.g., "bf:title")
constraints: Dict with keys like 'mandatory', 'repeatable', 'datatype'
"""
# Copilot: implement SHACL property shape creation
2. Parsing TSV to SHACL
# When converting TSV rows to SHACL shapes, use this pattern:
def tsv_row_to_shacl(row, graph, prefixes):
"""
Convert a TSV row to SHACL constraints.
Expected columns: shapeID, propertyID, mandatory, repeatable, valueShape
"""
# Copilot: handle prefix expansion and constraint mapping
3. Error Message Formatting
# Format validation errors for user display
def format_validation_error(result):
"""
Format pyshacl validation result for Gradio display.
Include: severity, focus node, property path, and message
"""
# Copilot: create user-friendly error messages with context
4. AI Integration Patterns
# Template for AI API calls
async def get_ai_suggestion(error_context, rdf_snippet):
"""
Get AI suggestions for fixing RDF validation errors.
Uses Hugging Face Inference API with proper error handling.
"""
# Copilot: implement with retry logic and timeout handling
Debugging Helpers
SHACL Validation Issues
# Debug template for missing validations
def debug_shacl_targeting():
"""
Log all target classes and matching nodes in the data graph.
Helps diagnose why validations aren't triggering.
"""
# Copilot: implement comprehensive logging of shapes and targets
Namespace Resolution
# Helper for namespace issues
def resolve_prefixed_uri(prefixed_id, namespace_map):
"""
Resolve prefixed identifiers like 'bf:Work' to full URIs.
Handle edge cases: no prefix, already full URI, unknown prefix
"""
# Copilot: implement robust prefix resolution
MCP Server Implementation
Tool Registration Pattern
# MCP tool definition template
@mcp_server.tool()
async def new_mcp_tool(param1: str, param2: Optional[str] = None) -> dict:
"""
MCP tool implementation.
Returns: {"success": bool, "result": Any, "error": Optional[str]}
"""
# Copilot: implement with proper error handling and logging
SSE Event Formatting
# Server-Sent Events response pattern
def format_sse_response(tool_name, result):
"""
Format MCP tool response as SSE event.
Include proper event type and JSON encoding.
"""
# Copilot: implement SSE formatting with error states
Testing Patterns
Unit Test Templates
# Test SHACL shape generation
def test_shape_generation():
"""
Test that TSV rows correctly generate SHACL shapes.
Include: basic properties, cardinality, value shapes
"""
# Copilot: generate comprehensive test cases
# Test RDF validation
def test_rdf_validation():
"""
Test validation with various RDF inputs.
Include: valid, invalid, edge cases
"""
# Copilot: create test data and assertions
Integration Test Patterns
# Test MCP server endpoints
async def test_mcp_endpoints():
"""
Test all MCP tools with realistic inputs.
Verify: response format, error handling, performance
"""
# Copilot: implement async test scenarios
Performance Optimization
Caching Strategies
# Cache compiled SHACL graphs
@lru_cache(maxsize=10)
def get_compiled_shacl_graph(template_name):
"""
Cache parsed SHACL graphs to avoid repeated parsing.
"""
# Copilot: implement with proper cache invalidation
# Cache namespace resolutions
@lru_cache(maxsize=1000)
def cached_uri_resolution(prefixed_id, namespace_json):
"""
Cache URI resolutions to improve performance.
"""
# Copilot: implement with hashable inputs
Batch Processing
# Process multiple RDF documents efficiently
async def batch_validate_rdf(rdf_documents: List[str]):
"""
Validate multiple RDF documents in parallel.
Use asyncio for concurrent processing.
"""
# Copilot: implement with progress tracking
Common Pitfalls to Avoid
- Namespace Conflicts: Always use
override=Truewhen binding namespaces - Graph Parsing: Specify format explicitly, don't rely on auto-detection
- SPARQL Queries: Escape special characters in URIs
- Async/Await: Don't mix synchronous and asynchronous code
- Error Messages: Always include context for debugging
Gradio UI Enhancements
Adding New UI Components
# Template for new Gradio components
def create_validation_interface():
"""
Create Gradio interface with:
- File upload for RDF
- Template selection
- Real-time validation
- Export functionality
"""
# Copilot: implement with proper event handlers
Custom CSS/Theming
# Apply custom styling to Gradio components
custom_css = """
.validation-error { color: red; font-weight: bold; }
.validation-warning { color: orange; }
.validation-info { color: blue; }
"""
# Copilot: suggest CSS for better UX
Deployment Considerations
Hugging Face Spaces Configuration
# Environment variable handling
HF_API_KEY = os.environ.get("HF_API_KEY")
if not HF_API_KEY:
logger.warning("HF_API_KEY not set, AI features disabled")
# Gradio launch configuration for Spaces
demo.launch(
server_name="0.0.0.0",
server_port=7860,
share=False # Don't use share=True on Spaces
)
Error Recovery
# Implement graceful degradation
def safe_ai_call(func):
"""
Decorator for AI calls that falls back gracefully.
"""
# Copilot: implement with fallback behavior
Quick Reference
Essential Imports
import gradio as gr
import rdflib
from rdflib import Graph, URIRef, Literal, Namespace, RDF, RDFS
from pyshacl import validate
import pandas as pd
import logging
import asyncio
from typing import Optional, Dict, List, Any
Debugging Commands
# Log graph contents
logger.debug(f"Graph has {len(graph)} triples")
logger.debug(graph.serialize(format='turtle'))
# Log validation details
conforms, results_graph, results_text = validate(
data_graph,
shacl_graph=shapes,
debug=True,
inference='rdfs'
)
Common SHACL Properties
sh:targetClass- Define which RDF types to validatesh:path- Property to validatesh:minCount- Minimum occurrences (1 for mandatory)sh:maxCount- Maximum occurrences (1 for non-repeatable)sh:datatype- Expected datatypesh:node- Link to another shape (valueShape)sh:severity- sh:Violation, sh:Warning, or sh:Info
Remember: Always test with real BIBFRAME data and verify MCP endpoints are accessible!