File size: 8,664 Bytes
4f8c53c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 |
# GitHub Copilot Instructions for MCP4RDF Project
## Project Context
This is an RDF validation tool with AI features, deployed on Hugging Face Spaces. It validates RDF/XML against SHACL schemas and provides AI-powered suggestions for fixing validation errors.
### Key Technologies
- **Frontend**: Gradio 5.33.0
- **RDF Processing**: rdflib, pyshacl
- **AI Integration**: Hugging Face Inference API
- **Protocol**: MCP (Model Context Protocol)
- **Deployment**: Hugging Face Spaces
### Project Structure
```
mcp4rdf-hf-space/
βββ app.py # Main Gradio application
βββ validator.py # Core SHACL validation logic
βββ mcp_server_gradio.py # MCP server implementation
βββ MonographDCTAP/ # TSV files with SHACL definitions
βββ electronic_MonographDCTAP/ # Electronic format SHACL definitions
βββ requirements.txt # Python dependencies
```
## Code Style Guidelines
### Python Standards
- Use type hints for function parameters and return values
- Follow PEP 8 naming conventions
- Add docstrings for all public functions
- Use logging instead of print statements
### RDF/SHACL Patterns
```python
# Always bind common namespaces
STANDARD_NAMESPACES = {
"bf": "http://id.loc.gov/ontologies/bibframe/",
"bflc": "http://id.loc.gov/ontologies/bflc/",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"sh": "http://www.w3.org/ns/shacl#"
}
# Use URIRef for RDF predicates
from rdflib import URIRef, Literal, Graph
sh_path = URIRef("http://www.w3.org/ns/shacl#path")
```
## Common Tasks and Templates
### 1. Adding New SHACL Validation Rules
```python
# Template for adding a new property constraint
def add_property_constraint(shape_node, property_id, constraints):
"""
Add SHACL property constraints to a shape.
Args:
shape_node: RDF node representing the shape
property_id: Property identifier (e.g., "bf:title")
constraints: Dict with keys like 'mandatory', 'repeatable', 'datatype'
"""
# Copilot: implement SHACL property shape creation
```
### 2. Parsing TSV to SHACL
```python
# When converting TSV rows to SHACL shapes, use this pattern:
def tsv_row_to_shacl(row, graph, prefixes):
"""
Convert a TSV row to SHACL constraints.
Expected columns: shapeID, propertyID, mandatory, repeatable, valueShape
"""
# Copilot: handle prefix expansion and constraint mapping
```
### 3. Error Message Formatting
```python
# Format validation errors for user display
def format_validation_error(result):
"""
Format pyshacl validation result for Gradio display.
Include: severity, focus node, property path, and message
"""
# Copilot: create user-friendly error messages with context
```
### 4. AI Integration Patterns
```python
# Template for AI API calls
async def get_ai_suggestion(error_context, rdf_snippet):
"""
Get AI suggestions for fixing RDF validation errors.
Uses Hugging Face Inference API with proper error handling.
"""
# Copilot: implement with retry logic and timeout handling
```
## Debugging Helpers
### SHACL Validation Issues
```python
# Debug template for missing validations
def debug_shacl_targeting():
"""
Log all target classes and matching nodes in the data graph.
Helps diagnose why validations aren't triggering.
"""
# Copilot: implement comprehensive logging of shapes and targets
```
### Namespace Resolution
```python
# Helper for namespace issues
def resolve_prefixed_uri(prefixed_id, namespace_map):
"""
Resolve prefixed identifiers like 'bf:Work' to full URIs.
Handle edge cases: no prefix, already full URI, unknown prefix
"""
# Copilot: implement robust prefix resolution
```
## MCP Server Implementation
### Tool Registration Pattern
```python
# MCP tool definition template
@mcp_server.tool()
async def new_mcp_tool(param1: str, param2: Optional[str] = None) -> dict:
"""
MCP tool implementation.
Returns: {"success": bool, "result": Any, "error": Optional[str]}
"""
# Copilot: implement with proper error handling and logging
```
### SSE Event Formatting
```python
# Server-Sent Events response pattern
def format_sse_response(tool_name, result):
"""
Format MCP tool response as SSE event.
Include proper event type and JSON encoding.
"""
# Copilot: implement SSE formatting with error states
```
## Testing Patterns
### Unit Test Templates
```python
# Test SHACL shape generation
def test_shape_generation():
"""
Test that TSV rows correctly generate SHACL shapes.
Include: basic properties, cardinality, value shapes
"""
# Copilot: generate comprehensive test cases
# Test RDF validation
def test_rdf_validation():
"""
Test validation with various RDF inputs.
Include: valid, invalid, edge cases
"""
# Copilot: create test data and assertions
```
### Integration Test Patterns
```python
# Test MCP server endpoints
async def test_mcp_endpoints():
"""
Test all MCP tools with realistic inputs.
Verify: response format, error handling, performance
"""
# Copilot: implement async test scenarios
```
## Performance Optimization
### Caching Strategies
```python
# Cache compiled SHACL graphs
@lru_cache(maxsize=10)
def get_compiled_shacl_graph(template_name):
"""
Cache parsed SHACL graphs to avoid repeated parsing.
"""
# Copilot: implement with proper cache invalidation
# Cache namespace resolutions
@lru_cache(maxsize=1000)
def cached_uri_resolution(prefixed_id, namespace_json):
"""
Cache URI resolutions to improve performance.
"""
# Copilot: implement with hashable inputs
```
### Batch Processing
```python
# Process multiple RDF documents efficiently
async def batch_validate_rdf(rdf_documents: List[str]):
"""
Validate multiple RDF documents in parallel.
Use asyncio for concurrent processing.
"""
# Copilot: implement with progress tracking
```
## Common Pitfalls to Avoid
1. **Namespace Conflicts**: Always use `override=True` when binding namespaces
2. **Graph Parsing**: Specify format explicitly, don't rely on auto-detection
3. **SPARQL Queries**: Escape special characters in URIs
4. **Async/Await**: Don't mix synchronous and asynchronous code
5. **Error Messages**: Always include context for debugging
## Gradio UI Enhancements
### Adding New UI Components
```python
# Template for new Gradio components
def create_validation_interface():
"""
Create Gradio interface with:
- File upload for RDF
- Template selection
- Real-time validation
- Export functionality
"""
# Copilot: implement with proper event handlers
```
### Custom CSS/Theming
```python
# Apply custom styling to Gradio components
custom_css = """
.validation-error { color: red; font-weight: bold; }
.validation-warning { color: orange; }
.validation-info { color: blue; }
"""
# Copilot: suggest CSS for better UX
```
## Deployment Considerations
### Hugging Face Spaces Configuration
```python
# Environment variable handling
HF_API_KEY = os.environ.get("HF_API_KEY")
if not HF_API_KEY:
logger.warning("HF_API_KEY not set, AI features disabled")
# Gradio launch configuration for Spaces
demo.launch(
server_name="0.0.0.0",
server_port=7860,
share=False # Don't use share=True on Spaces
)
```
### Error Recovery
```python
# Implement graceful degradation
def safe_ai_call(func):
"""
Decorator for AI calls that falls back gracefully.
"""
# Copilot: implement with fallback behavior
```
## Quick Reference
### Essential Imports
```python
import gradio as gr
import rdflib
from rdflib import Graph, URIRef, Literal, Namespace, RDF, RDFS
from pyshacl import validate
import pandas as pd
import logging
import asyncio
from typing import Optional, Dict, List, Any
```
### Debugging Commands
```python
# Log graph contents
logger.debug(f"Graph has {len(graph)} triples")
logger.debug(graph.serialize(format='turtle'))
# Log validation details
conforms, results_graph, results_text = validate(
data_graph,
shacl_graph=shapes,
debug=True,
inference='rdfs'
)
```
### Common SHACL Properties
- `sh:targetClass` - Define which RDF types to validate
- `sh:path` - Property to validate
- `sh:minCount` - Minimum occurrences (1 for mandatory)
- `sh:maxCount` - Maximum occurrences (1 for non-repeatable)
- `sh:datatype` - Expected datatype
- `sh:node` - Link to another shape (valueShape)
- `sh:severity` - sh:Violation, sh:Warning, or sh:Info
Remember: Always test with real BIBFRAME data and verify MCP endpoints are accessible! |