File size: 8,664 Bytes
4f8c53c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
# GitHub Copilot Instructions for MCP4RDF Project

## Project Context

This is an RDF validation tool with AI features, deployed on Hugging Face Spaces. It validates RDF/XML against SHACL schemas and provides AI-powered suggestions for fixing validation errors.

### Key Technologies
- **Frontend**: Gradio 5.33.0
- **RDF Processing**: rdflib, pyshacl
- **AI Integration**: Hugging Face Inference API
- **Protocol**: MCP (Model Context Protocol)
- **Deployment**: Hugging Face Spaces

### Project Structure
```
mcp4rdf-hf-space/
β”œβ”€β”€ app.py                    # Main Gradio application
β”œβ”€β”€ validator.py              # Core SHACL validation logic
β”œβ”€β”€ mcp_server_gradio.py      # MCP server implementation
β”œβ”€β”€ MonographDCTAP/           # TSV files with SHACL definitions
β”œβ”€β”€ electronic_MonographDCTAP/ # Electronic format SHACL definitions
└── requirements.txt          # Python dependencies
```

## Code Style Guidelines

### Python Standards
- Use type hints for function parameters and return values
- Follow PEP 8 naming conventions
- Add docstrings for all public functions
- Use logging instead of print statements

### RDF/SHACL Patterns
```python
# Always bind common namespaces
STANDARD_NAMESPACES = {
    "bf": "http://id.loc.gov/ontologies/bibframe/",
    "bflc": "http://id.loc.gov/ontologies/bflc/",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "sh": "http://www.w3.org/ns/shacl#"
}

# Use URIRef for RDF predicates
from rdflib import URIRef, Literal, Graph
sh_path = URIRef("http://www.w3.org/ns/shacl#path")
```

## Common Tasks and Templates

### 1. Adding New SHACL Validation Rules
```python
# Template for adding a new property constraint
def add_property_constraint(shape_node, property_id, constraints):
    """
    Add SHACL property constraints to a shape.
    
    Args:
        shape_node: RDF node representing the shape
        property_id: Property identifier (e.g., "bf:title")
        constraints: Dict with keys like 'mandatory', 'repeatable', 'datatype'
    """
    # Copilot: implement SHACL property shape creation
```

### 2. Parsing TSV to SHACL
```python
# When converting TSV rows to SHACL shapes, use this pattern:
def tsv_row_to_shacl(row, graph, prefixes):
    """
    Convert a TSV row to SHACL constraints.
    Expected columns: shapeID, propertyID, mandatory, repeatable, valueShape
    """
    # Copilot: handle prefix expansion and constraint mapping
```

### 3. Error Message Formatting
```python
# Format validation errors for user display
def format_validation_error(result):
    """
    Format pyshacl validation result for Gradio display.
    Include: severity, focus node, property path, and message
    """
    # Copilot: create user-friendly error messages with context
```

### 4. AI Integration Patterns
```python
# Template for AI API calls
async def get_ai_suggestion(error_context, rdf_snippet):
    """
    Get AI suggestions for fixing RDF validation errors.
    Uses Hugging Face Inference API with proper error handling.
    """
    # Copilot: implement with retry logic and timeout handling
```

## Debugging Helpers

### SHACL Validation Issues
```python
# Debug template for missing validations
def debug_shacl_targeting():
    """
    Log all target classes and matching nodes in the data graph.
    Helps diagnose why validations aren't triggering.
    """
    # Copilot: implement comprehensive logging of shapes and targets
```

### Namespace Resolution
```python
# Helper for namespace issues
def resolve_prefixed_uri(prefixed_id, namespace_map):
    """
    Resolve prefixed identifiers like 'bf:Work' to full URIs.
    Handle edge cases: no prefix, already full URI, unknown prefix
    """
    # Copilot: implement robust prefix resolution
```

## MCP Server Implementation

### Tool Registration Pattern
```python
# MCP tool definition template
@mcp_server.tool()
async def new_mcp_tool(param1: str, param2: Optional[str] = None) -> dict:
    """
    MCP tool implementation.
    Returns: {"success": bool, "result": Any, "error": Optional[str]}
    """
    # Copilot: implement with proper error handling and logging
```

### SSE Event Formatting
```python
# Server-Sent Events response pattern
def format_sse_response(tool_name, result):
    """
    Format MCP tool response as SSE event.
    Include proper event type and JSON encoding.
    """
    # Copilot: implement SSE formatting with error states
```

## Testing Patterns

### Unit Test Templates
```python
# Test SHACL shape generation
def test_shape_generation():
    """
    Test that TSV rows correctly generate SHACL shapes.
    Include: basic properties, cardinality, value shapes
    """
    # Copilot: generate comprehensive test cases

# Test RDF validation
def test_rdf_validation():
    """
    Test validation with various RDF inputs.
    Include: valid, invalid, edge cases
    """
    # Copilot: create test data and assertions
```

### Integration Test Patterns
```python
# Test MCP server endpoints
async def test_mcp_endpoints():
    """
    Test all MCP tools with realistic inputs.
    Verify: response format, error handling, performance
    """
    # Copilot: implement async test scenarios
```

## Performance Optimization

### Caching Strategies
```python
# Cache compiled SHACL graphs
@lru_cache(maxsize=10)
def get_compiled_shacl_graph(template_name):
    """
    Cache parsed SHACL graphs to avoid repeated parsing.
    """
    # Copilot: implement with proper cache invalidation

# Cache namespace resolutions
@lru_cache(maxsize=1000)
def cached_uri_resolution(prefixed_id, namespace_json):
    """
    Cache URI resolutions to improve performance.
    """
    # Copilot: implement with hashable inputs
```

### Batch Processing
```python
# Process multiple RDF documents efficiently
async def batch_validate_rdf(rdf_documents: List[str]):
    """
    Validate multiple RDF documents in parallel.
    Use asyncio for concurrent processing.
    """
    # Copilot: implement with progress tracking
```

## Common Pitfalls to Avoid

1. **Namespace Conflicts**: Always use `override=True` when binding namespaces
2. **Graph Parsing**: Specify format explicitly, don't rely on auto-detection
3. **SPARQL Queries**: Escape special characters in URIs
4. **Async/Await**: Don't mix synchronous and asynchronous code
5. **Error Messages**: Always include context for debugging

## Gradio UI Enhancements

### Adding New UI Components
```python
# Template for new Gradio components
def create_validation_interface():
    """
    Create Gradio interface with:
    - File upload for RDF
    - Template selection
    - Real-time validation
    - Export functionality
    """
    # Copilot: implement with proper event handlers
```

### Custom CSS/Theming
```python
# Apply custom styling to Gradio components
custom_css = """
    .validation-error { color: red; font-weight: bold; }
    .validation-warning { color: orange; }
    .validation-info { color: blue; }
"""
# Copilot: suggest CSS for better UX
```

## Deployment Considerations

### Hugging Face Spaces Configuration
```python
# Environment variable handling
HF_API_KEY = os.environ.get("HF_API_KEY")
if not HF_API_KEY:
    logger.warning("HF_API_KEY not set, AI features disabled")
    
# Gradio launch configuration for Spaces
demo.launch(
    server_name="0.0.0.0",
    server_port=7860,
    share=False  # Don't use share=True on Spaces
)
```

### Error Recovery
```python
# Implement graceful degradation
def safe_ai_call(func):
    """
    Decorator for AI calls that falls back gracefully.
    """
    # Copilot: implement with fallback behavior
```

## Quick Reference

### Essential Imports
```python
import gradio as gr
import rdflib
from rdflib import Graph, URIRef, Literal, Namespace, RDF, RDFS
from pyshacl import validate
import pandas as pd
import logging
import asyncio
from typing import Optional, Dict, List, Any
```

### Debugging Commands
```python
# Log graph contents
logger.debug(f"Graph has {len(graph)} triples")
logger.debug(graph.serialize(format='turtle'))

# Log validation details
conforms, results_graph, results_text = validate(
    data_graph, 
    shacl_graph=shapes, 
    debug=True,
    inference='rdfs'
)
```

### Common SHACL Properties
- `sh:targetClass` - Define which RDF types to validate
- `sh:path` - Property to validate
- `sh:minCount` - Minimum occurrences (1 for mandatory)
- `sh:maxCount` - Maximum occurrences (1 for non-repeatable)
- `sh:datatype` - Expected datatype
- `sh:node` - Link to another shape (valueShape)
- `sh:severity` - sh:Violation, sh:Warning, or sh:Info

Remember: Always test with real BIBFRAME data and verify MCP endpoints are accessible!