Spaces:

zuup1
/

zuup-preference-collection

Running

App Files Files Community

zuup1 commited on Dec 24, 2025

Commit

2f0ec0c

unverified ·

0 Parent(s):

Add files via upload

Browse files

Files changed (7) hide show

README (5).md +144 -0
README.md +144 -0
__init__.py +2 -0
app.py +346 -0
prompt_generator.py +267 -0
requirements.txt +4 -0
taxonomy.py +720 -0

README (5).md ADDED Viewed

	@@ -0,0 +1,144 @@

+# Zuup Domain-Specific Preference Collection
+Collect human preference data for training domain-expert AI systems across 10 Zuup platforms.
+## Domains
+| Domain | Platform | Description |
+|--------|----------|-------------|
+| Fed/SLED Procurement | Aureon | Government contracting, FAR/DFARS |
+| Biomedical GB-CI | Symbion | Gut-brain interface, biosensors |
+| Ingestible GB-CI | Symbion HW | Capsule endoscopy, in-vivo |
+| Legacy Refactoring | Relian | COBOL migration, mainframe |
+| Autonomy OS | Veyra | Agent systems, AI safety |
+| Quantum Archaeology | QAWM | Historical reconstruction |
+| Defense World Models | Orb | 3D scene, ISR applications |
+| Halal Compliance | Civium | Certification, supply chain |
+| Mobile Data Center | PodX | Edge computing, DDIL |
+| HUBZone | Aureon | Small business contracting |
+## Quick Start
+### 1. Open in Cursor (or any IDE with terminal)
+```bash
+# Open this folder in Cursor
+# File → Open Folder → select zuup-preferences
+```
+### 2. Setup Environment
+```bash
+# In Cursor terminal (Ctrl+` to open)
+python -m venv venv
+source venv/bin/activate  # Windows: venv\Scripts\activate
+pip install -r requirements.txt
+```
+### 3. Run Collection UI
+```bash
+python collection/app.py
+```
+Output:
+```
+🎯 Zuup Preference Collection
+==================================================
+Local URL:  http://127.0.0.1:7860
+Share URL:  https://xxxxx.gradio.live  ← Share with annotators
+```
+### 4. Collect Preferences
+1. Open http://127.0.0.1:7860 in browser
+2. Enter your annotator ID
+3. Select domain
+4. Click "Load New Pair"
+5. Compare responses A vs B
+6. Rate dimensions + select winner
+7. Submit
+## Project Structure
+```
+zuup-preferences/
+├── domains/
+│   ├── taxonomy.py          # Domain definitions & rubrics
+│   └── prompt_generator.py  # Seed prompts per domain
+├── collection/
+│   └── app.py               # Gradio collection UI
+├── preference_data/         # Collected annotations (gitignore)
+│   └── {domain}_preferences.jsonl
+├── requirements.txt
+└── README.md
+```
+## Data Format
+Each annotation is stored as JSONL:
+```json
+{
+  "domain": "procurement",
+  "category": "RFP_analysis",
+  "prompt": "Analyze this RFP...",
+  "response_a": "...",
+  "response_b": "...",
+  "annotator_id": "khaalis",
+  "preference": "A",
+  "dimension_scores": {
+    "accuracy": 4,
+    "safety": 5,
+    "actionability": 4,
+    "clarity": 3
+  },
+  "timestamp": "2024-12-24T...",
+  "record_hash": "a1b2c3d4..."
+}
+```
+## Export for Training
+```python
+from collection.app import PreferenceStore
+store = PreferenceStore()
+df = store.export_for_training("procurement", format="dpo")
+df.to_json("procurement_dpo.jsonl", orient="records", lines=True)
+```
+## Adding Real Response Generation
+Edit `collection/app.py`, replace placeholder responses with Ollama calls:
+```python
+import httpx
+def generate_response(prompt: str, temperature: float = 0.3) -> str:
+    response = httpx.post(
+        "http://localhost:11434/api/generate",
+        json={
+            "model": "llama3.1:8b",
+            "prompt": prompt,
+            "temperature": temperature,
+            "stream": False
+        },
+        timeout=60.0
+    )
+    return response.json()["response"]
+```
+## Target Collection Size
+| Domain | Min Samples | Annotator Requirements |
+|--------|-------------|------------------------|
+| Procurement | 500 | Gov contracting exp |
+| Legacy | 300 | COBOL/mainframe exp |
+| Defense WM | 300 | GEOINT background |
+| Biomedical | 400 | Biomed/neuro |
+| Autonomy | 300 | AI safety familiarity |
+## License
+Internal Zuup Innovation Lab use.

README.md ADDED Viewed

	@@ -0,0 +1,144 @@

+# Zuup Domain-Specific Preference Collection
+Collect human preference data for training domain-expert AI systems across 10 Zuup platforms.
+## Domains
+| Domain | Platform | Description |
+|--------|----------|-------------|
+| Fed/SLED Procurement | Aureon | Government contracting, FAR/DFARS |
+| Biomedical GB-CI | Symbion | Gut-brain interface, biosensors |
+| Ingestible GB-CI | Symbion HW | Capsule endoscopy, in-vivo |
+| Legacy Refactoring | Relian | COBOL migration, mainframe |
+| Autonomy OS | Veyra | Agent systems, AI safety |
+| Quantum Archaeology | QAWM | Historical reconstruction |
+| Defense World Models | Orb | 3D scene, ISR applications |
+| Halal Compliance | Civium | Certification, supply chain |
+| Mobile Data Center | PodX | Edge computing, DDIL |
+| HUBZone | Aureon | Small business contracting |
+## Quick Start
+### 1. Open in Cursor (or any IDE with terminal)
+```bash
+# Open this folder in Cursor
+# File → Open Folder → select zuup-preferences
+```
+### 2. Setup Environment
+```bash
+# In Cursor terminal (Ctrl+` to open)
+python -m venv venv
+source venv/bin/activate  # Windows: venv\Scripts\activate
+pip install -r requirements.txt
+```
+### 3. Run Collection UI
+```bash
+python collection/app.py
+```
+Output:
+```
+🎯 Zuup Preference Collection
+==================================================
+Local URL:  http://127.0.0.1:7860
+Share URL:  https://xxxxx.gradio.live  ← Share with annotators
+```
+### 4. Collect Preferences
+1. Open http://127.0.0.1:7860 in browser
+2. Enter your annotator ID
+3. Select domain
+4. Click "Load New Pair"
+5. Compare responses A vs B
+6. Rate dimensions + select winner
+7. Submit
+## Project Structure
+```
+zuup-preferences/
+├── domains/
+│   ├── taxonomy.py          # Domain definitions & rubrics
+│   └── prompt_generator.py  # Seed prompts per domain
+├── collection/
+│   └── app.py               # Gradio collection UI
+├── preference_data/         # Collected annotations (gitignore)
+│   └── {domain}_preferences.jsonl
+├── requirements.txt
+└── README.md
+```
+## Data Format
+Each annotation is stored as JSONL:
+```json
+{
+  "domain": "procurement",
+  "category": "RFP_analysis",
+  "prompt": "Analyze this RFP...",
+  "response_a": "...",
+  "response_b": "...",
+  "annotator_id": "khaalis",
+  "preference": "A",
+  "dimension_scores": {
+    "accuracy": 4,
+    "safety": 5,
+    "actionability": 4,
+    "clarity": 3
+  },
+  "timestamp": "2024-12-24T...",
+  "record_hash": "a1b2c3d4..."
+}
+```
+## Export for Training
+```python
+from collection.app import PreferenceStore
+store = PreferenceStore()
+df = store.export_for_training("procurement", format="dpo")
+df.to_json("procurement_dpo.jsonl", orient="records", lines=True)
+```
+## Adding Real Response Generation
+Edit `collection/app.py`, replace placeholder responses with Ollama calls:
+```python
+import httpx
+def generate_response(prompt: str, temperature: float = 0.3) -> str:
+    response = httpx.post(
+        "http://localhost:11434/api/generate",
+        json={
+            "model": "llama3.1:8b",
+            "prompt": prompt,
+            "temperature": temperature,
+            "stream": False
+        },
+        timeout=60.0
+    )
+    return response.json()["response"]
+```
+## Target Collection Size
+| Domain | Min Samples | Annotator Requirements |
+|--------|-------------|------------------------|
+| Procurement | 500 | Gov contracting exp |
+| Legacy | 300 | COBOL/mainframe exp |
+| Defense WM | 300 | GEOINT background |
+| Biomedical | 400 | Biomed/neuro |
+| Autonomy | 300 | AI safety familiarity |
+## License
+Internal Zuup Innovation Lab use.

__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ # Collection module
2	+ from .app import PreferenceStore, CollectionApp, create_ui

app.py ADDED Viewed

	@@ -0,0 +1,346 @@

+# collection/app.py — Multi-domain preference collection UI
+# Run: pip install gradio pandas
+import gradio as gr
+import json
+import pandas as pd
+from datetime import datetime
+from pathlib import Path
+from typing import Optional
+import hashlib
+import random
+import sys
+# Add parent to path for imports
+sys.path.insert(0, str(Path(__file__).parent.parent))
+from domains.taxonomy import DomainID, DOMAINS, get_quality_rubric
+from domains.prompt_generator import DomainPromptGenerator, SEED_PROMPTS
+class PreferenceStore:
+    """Manages preference data storage with audit trail."""
+    def __init__(self, base_path: str = None):
+        if base_path is None:
+            base_path = Path(__file__).parent.parent / "preference_data"
+        self.base_path = Path(base_path)
+        self.base_path.mkdir(parents=True, exist_ok=True)
+    def _get_domain_file(self, domain_id: str) -> Path:
+        return self.base_path / f"{domain_id}_preferences.jsonl"
+    def save(self, record: dict) -> str:
+        """Save a preference record with integrity hash."""
+        record["timestamp"] = datetime.utcnow().isoformat()
+        record["record_hash"] = hashlib.sha256(
+            json.dumps(record, sort_keys=True).encode()
+        ).hexdigest()[:16]
+        filepath = self._get_domain_file(record["domain"])
+        with open(filepath, "a") as f:
+            f.write(json.dumps(record) + "\n")
+        return record["record_hash"]
+    def get_stats(self, domain_id: str = None) -> dict:
+        """Get collection statistics."""
+        stats = {}
+        if domain_id:
+            files = [self._get_domain_file(domain_id)]
+        else:
+            files = list(self.base_path.glob("*_preferences.jsonl"))
+        for f in files:
+            if f.exists():
+                domain = f.stem.replace("_preferences", "")
+                count = sum(1 for _ in open(f))
+                stats[domain] = count
+        return stats
+    def export_for_training(self, domain_id: str, format: str = "dpo") -> pd.DataFrame:
+        """Export data in training-ready format."""
+        filepath = self._get_domain_file(domain_id)
+        if not filepath.exists():
+            return pd.DataFrame()
+        records = [json.loads(line) for line in open(filepath)]
+        if format == "dpo":
+            # Direct Preference Optimization format
+            data = []
+            for r in records:
+                if r.get("preference") in ["A", "B"]:
+                    chosen = r["response_a"] if r["preference"] == "A" else r["response_b"]
+                    rejected = r["response_b"] if r["preference"] == "A" else r["response_a"]
+                    data.append({
+                        "prompt": r["prompt"],
+                        "chosen": chosen,
+                        "rejected": rejected,
+                        "domain": r["domain"],
+                        "category": r.get("category", "unknown")
+                    })
+            return pd.DataFrame(data)
+        return pd.DataFrame(records)
+class CollectionApp:
+    """Multi-domain preference collection application."""
+    def __init__(self):
+        self.store = PreferenceStore()
+        self.current_pair = None
+        self.generators = {
+            domain_id: DomainPromptGenerator(domain_id)
+            for domain_id in DomainID
+        }
+    def get_next_pair(self, domain: str, category: str = None) -> tuple:
+        """Get next prompt and response pair for annotation."""
+        domain_id = DomainID(domain)
+        generator = self.generators[domain_id]
+        # Get prompt
+        prompt_data = generator.get_random_prompt(category if category != "all" else None)
+        prompt = prompt_data["prompt"]
+        # For demo, generate placeholder responses
+        # In production, call your generator model (Ollama, API, etc.)
+        response_a = f"""[Response A]
+This is a placeholder response for the prompt. In production, this would be generated by your LLM (e.g., Ollama llama3.1:8b).
+The response would address: {prompt[:100]}...
+To enable real generation:
+1. Set up Ollama: `ollama serve && ollama pull llama3.1:8b`
+2. Uncomment the generation code in this file
+3. Responses will be generated with different temperatures for quality variance"""
+        response_b = f"""[Response B]
+This is an alternative placeholder response. In production, this would be generated with higher temperature (0.9) to create natural quality variance.
+The response addresses: {prompt[:100]}...
+Quality differences emerge from:
+- Temperature variation (0.3 vs 0.9)
+- Token limits (1024 vs 512)
+- Different model checkpoints"""
+        self.current_pair = {
+            "domain": domain,
+            "category": prompt_data.get("category", "unknown"),
+            "prompt": prompt,
+            "response_a": response_a,
+            "response_b": response_b
+        }
+        return prompt, response_a, response_b
+    def submit_preference(self,
+                          annotator_id: str,
+                          preference: str,
+                          dimension_scores: dict,
+                          notes: str) -> str:
+        """Submit a preference annotation."""
+        if not self.current_pair:
+            return "❌ No active pair. Load a new pair first."
+        if not annotator_id:
+            return "❌ Please enter your annotator ID."
+        if not preference:
+            return "❌ Please select a preference (A, B, tie, or both_bad)."
+        record = {
+            **self.current_pair,
+            "annotator_id": annotator_id,
+            "preference": preference,
+            "dimension_scores": dimension_scores,
+            "notes": notes
+        }
+        record_hash = self.store.save(record)
+        stats = self.store.get_stats(self.current_pair["domain"])
+        return f"✓ Saved [{record_hash}]. Domain total: {stats.get(self.current_pair['domain'], 0)}"
+    def get_rubric(self, domain: str) -> str:
+        """Get the quality rubric for a domain."""
+        try:
+            domain_id = DomainID(domain)
+            return get_quality_rubric(domain_id)
+        except ValueError:
+            return "Invalid domain"
+def create_ui():
+    """Create the Gradio UI."""
+    app = CollectionApp()
+    domain_choices = [(d.name.replace("_", " ").title(), d.value) for d in DomainID]
+    with gr.Blocks(title="Zuup Preference Collection", theme=gr.themes.Soft()) as demo:
+        gr.Markdown("# 🎯 Zuup Domain-Specific Preference Collection")
+        gr.Markdown("Collect human preferences for training domain-expert AI systems.")
+        with gr.Row():
+            with gr.Column(scale=1):
+                annotator_id = gr.Textbox(
+                    label="Annotator ID",
+                    placeholder="your_name",
+                    info="Your unique identifier for tracking"
+                )
+                domain_select = gr.Dropdown(
+                    choices=domain_choices,
+                    label="Domain",
+                    value="procurement"
+                )
+                category_select = gr.Dropdown(
+                    choices=["all"],
+                    label="Category",
+                    value="all"
+                )
+                load_btn = gr.Button("🔄 Load New Pair", variant="primary")
+            with gr.Column(scale=3):
+                stats_display = gr.Markdown("*Click 'Load New Pair' to start*")
+        with gr.Row():
+            with gr.Column():
+                prompt_display = gr.Textbox(
+                    label="📝 Prompt",
+                    lines=4,
+                    interactive=False
+                )
+        with gr.Row():
+            with gr.Column():
+                response_a = gr.Textbox(
+                    label="Response A",
+                    lines=12,
+                    interactive=False
+                )
+            with gr.Column():
+                response_b = gr.Textbox(
+                    label="Response B",
+                    lines=12,
+                    interactive=False
+                )
+        gr.Markdown("### ⚖️ Evaluation")
+        with gr.Row():
+            with gr.Column():
+                preference = gr.Radio(
+                    choices=["A", "B", "tie", "both_bad"],
+                    label="Which response is better?",
+                    info="Select the better response or indicate a tie/both bad"
+                )
+            with gr.Column():
+                # Dimension scoring
+                dim_accuracy = gr.Slider(1, 5, step=1, label="Accuracy/Correctness", value=3)
+                dim_safety = gr.Slider(1, 5, step=1, label="Safety/Compliance", value=3)
+                dim_actionability = gr.Slider(1, 5, step=1, label="Actionability", value=3)
+                dim_clarity = gr.Slider(1, 5, step=1, label="Clarity", value=3)
+        notes = gr.Textbox(
+            label="Notes (optional)",
+            placeholder="Any observations about quality differences...",
+            lines=2
+        )
+        with gr.Row():
+            submit_btn = gr.Button("✅ Submit Preference", variant="primary", size="lg")
+            skip_btn = gr.Button("⏭️ Skip (Low Quality Pair)", variant="secondary")
+        output = gr.Textbox(label="Status", interactive=False)
+        with gr.Accordion("📋 Quality Rubric (click to expand)", open=False):
+            rubric_display = gr.Markdown()
+        # Update categories when domain changes
+        def update_categories(domain):
+            try:
+                domain_id = DomainID(domain)
+                if domain_id in SEED_PROMPTS:
+                    cats = list(SEED_PROMPTS[domain_id].keys())
+                    return gr.Dropdown(choices=["all"] + cats, value="all")
+            except:
+                pass
+            return gr.Dropdown(choices=["all"], value="all")
+        domain_select.change(
+            update_categories,
+            inputs=[domain_select],
+            outputs=[category_select]
+        )
+        # Load new pair
+        def load_pair(domain, category):
+            prompt, resp_a, resp_b = app.get_next_pair(domain, category)
+            stats = app.store.get_stats()
+            if stats:
+                stats_md = "**📊 Collection Stats:** " + ", ".join([f"{k}: {v}" for k, v in stats.items()])
+            else:
+                stats_md = "**📊 Collection Stats:** No data yet"
+            rubric = app.get_rubric(domain)
+            return prompt, resp_a, resp_b, stats_md, rubric
+        load_btn.click(
+            load_pair,
+            inputs=[domain_select, category_select],
+            outputs=[prompt_display, response_a, response_b, stats_display, rubric_display]
+        )
+        # Submit preference
+        def submit(annotator, pref, acc, safety, action, clarity, notes_text):
+            dims = {
+                "accuracy": acc,
+                "safety": safety,
+                "actionability": action,
+                "clarity": clarity
+            }
+            return app.submit_preference(annotator, pref, dims, notes_text)
+        submit_btn.click(
+            submit,
+            inputs=[annotator_id, preference, dim_accuracy, dim_safety, dim_actionability, dim_clarity, notes],
+            outputs=[output]
+        )
+        # Skip
+        def skip(annotator):
+            if app.current_pair:
+                record = {
+                    **app.current_pair,
+                    "annotator_id": annotator or "anonymous",
+                    "preference": "skipped",
+                    "skip_reason": "low_quality_pair"
+                }
+                app.store.save(record)
+                return "⏭️ Skipped and logged."
+            return "No pair to skip."
+        skip_btn.click(skip, inputs=[annotator_id], outputs=[output])
+    return demo
+if __name__ == "__main__":
+    print("=" * 50)
+    print("🎯 Zuup Preference Collection")
+    print("=" * 50)
+    print("\nStarting Gradio server...")
+    print("Local URL:  http://127.0.0.1:7860")
+    print("Share URL:  Will be generated below\n")
+    demo = create_ui()
+    demo.launch(
+        share=True,  # Creates public URL for annotators
+        server_name="0.0.0.0",
+        server_port=7860
+    )

prompt_generator.py ADDED Viewed

	@@ -0,0 +1,267 @@

+# domains/prompt_generator.py — Generate domain-specific prompts
+import random
+from typing import List, Dict
+from domains.taxonomy import DomainID, DOMAINS
+# Seed prompts per domain
+SEED_PROMPTS: Dict[DomainID, Dict[str, List[str]]] = {
+    DomainID.FED_SLED_PROCUREMENT: {
+        "RFP_analysis": [
+            "Analyze this RFP for a cloud migration contract. What are the key evaluation factors and how should we weight our response?",
+            "The solicitation mentions 'best value' but doesn't specify weights. How should we interpret this?",
+            "What are the protest risks in this sole-source justification?",
+        ],
+        "proposal_writing": [
+            "Write a technical approach section for a cybersecurity assessment contract.",
+            "How should we structure our past performance volume for a DoD contract?",
+            "Draft an executive summary for a $50M IT modernization proposal.",
+        ],
+        "compliance_check": [
+            "Review this subcontracting plan for FAR 52.219-9 compliance.",
+            "Does our teaming arrangement create an OCI? How do we mitigate?",
+            "What CMMC level is required for this CUI-handling contract?",
+        ],
+    },
+    DomainID.BIOMEDICAL_GBCI: {
+        "signal_processing": [
+            "Design a filtering pipeline for EGG signals to extract gastric slow wave activity.",
+            "How do I handle motion artifacts in wearable gut biosensor data?",
+            "What's the optimal sampling rate for detecting gut-brain vagal signaling?",
+        ],
+        "microbiome_analysis": [
+            "Design a study to correlate gut microbiome composition with anxiety symptoms.",
+            "What are the confounders in microbiome-mood association studies?",
+            "How should we handle the compositional nature of 16S data in our analysis?",
+        ],
+        "regulatory_pathway": [
+            "What FDA classification would a gut motility monitoring patch fall under?",
+            "Design a clinical validation study for a gut-brain biomarker device.",
+            "What's the predicate device strategy for a novel intestinal biosensor?",
+        ],
+    },
+    DomainID.INGESTIBLE_GBCI: {
+        "capsule_design": [
+            "What are the size constraints for an ingestible capsule to ensure safe GI transit?",
+            "Design a biocompatible encapsulation strategy for an electronic capsule.",
+            "How do we ensure the capsule passes naturally without retention?",
+        ],
+        "telemetry": [
+            "Calculate the RF link budget for in-body to external receiver communication.",
+            "What frequencies are approved for medical ingestible device telemetry?",
+            "Design a low-power protocol for continuous gut parameter transmission.",
+        ],
+        "clinical_validation": [
+            "Design a clinical study comparing our ingestible sensor to colonoscopy.",
+            "What are the primary endpoints for an ingestible gut motility monitor trial?",
+            "How do we handle capsule retention as an adverse event in our protocol?",
+        ],
+    },
+    DomainID.LEGACY_REFACTORING: {
+        "code_translation": [
+            "Translate this COBOL PERFORM VARYING loop to Python.",
+            "How do I handle COBOL REDEFINES clauses in a modern data model?",
+            "Convert this CICS transaction to a REST API while preserving semantics.",
+        ],
+        "testing_strategy": [
+            "Design characterization tests for a COBOL batch job with no documentation.",
+            "How do we ensure decimal precision parity between COBOL COMP-3 and Python?",
+            "Create a parallel run strategy to validate our migrated system.",
+        ],
+        "strangler_pattern": [
+            "Design a strangler fig architecture for migrating a mainframe banking system.",
+            "How do we route traffic between legacy and new systems during migration?",
+            "What's the rollback strategy if the new component fails in production?",
+        ],
+    },
+    DomainID.AUTONOMY_OS: {
+        "agent_design": [
+            "Design a tool permission system for an autonomous coding agent.",
+            "How should multi-agent systems handle conflicting goals?",
+            "What's the architecture for a self-improving agent with safety constraints?",
+        ],
+        "safety_constraints": [
+            "Implement a human approval gate for high-impact autonomous actions.",
+            "How do we ensure an agent can always be shut down?",
+            "Design a monitoring system to detect agent capability jumps.",
+        ],
+        "capability_assessment": [
+            "How do we measure if an autonomous agent is safe to deploy?",
+            "What benchmarks should we use for tool-use safety evaluation?",
+            "Design an eval suite for multi-agent coordination correctness.",
+        ],
+    },
+    DomainID.QUANTUM_ARCHAEOLOGY: {
+        "event_reconstruction": [
+            "Reconstruct the logistics of Alexander's army crossing the Hindu Kush.",
+            "What's the uncertainty range for the population of Rome in 100 CE?",
+            "Synthesize archaeological and textual evidence for the Exodus route.",
+        ],
+        "source_analysis": [
+            "How should we weight Herodotus vs archaeological evidence for Persian forces at Thermopylae?",
+            "Design a provenance tracking system for historical source documents.",
+            "What's the methodology for detecting interpolations in ancient manuscripts?",
+        ],
+        "uncertainty_modeling": [
+            "Build a Bayesian model for dating the Thera eruption.",
+            "How do we quantify uncertainty in historical population estimates?",
+            "Design a confidence framework for AI-reconstructed historical events.",
+        ],
+    },
+    DomainID.DEFENSE_WORLD_MODELS: {
+        "scene_reconstruction": [
+            "Design a pipeline for 3D reconstruction from drone imagery in contested environments.",
+            "How do we handle GPS-denied localization for world model construction?",
+            "What's the uncertainty quantification approach for terrain reconstruction?",
+        ],
+        "sensor_fusion": [
+            "Fuse EO, IR, and SAR data for a unified 3D scene representation.",
+            "How do we handle temporal misalignment in multi-sensor fusion?",
+            "Design a confidence metric for fused intelligence products.",
+        ],
+        "tactical_planning": [
+            "Generate terrain analysis for route planning with concealment optimization.",
+            "How should the world model support line-of-sight calculations?",
+            "Design an interface for human-AI collaborative mission planning.",
+        ],
+    },
+    DomainID.HALAL_COMPLIANCE: {
+        "ingredient_analysis": [
+            "Analyze this ingredient list for halal compliance across GSO and JAKIM standards.",
+            "How do we handle E471 (mono- and diglycerides) which may be plant or animal derived?",
+            "What's the ruling on alcohol in vanilla extract under different madhabs?",
+        ],
+        "certification_mapping": [
+            "Map our product certification to OIC/SMIIC mutual recognition requirements.",
+            "What additional testing is required for UAE vs Malaysian halal certification?",
+            "Design a system to track certification status across multiple jurisdictions.",
+        ],
+        "supply_chain": [
+            "Design a blockchain-based provenance system for halal meat supply chain.",
+            "How do we prevent cross-contamination in shared manufacturing facilities?",
+            "What's the audit protocol for verifying halal slaughter compliance?",
+        ],
+    },
+    DomainID.MOBILE_DATA_CENTER: {
+        "architecture_design": [
+            "Design a compute architecture for a 20kW mobile data center in a transit case.",
+            "How do we handle storage redundancy in a single-node deployable unit?",
+            "What's the network topology for a mesh of mobile data centers?",
+        ],
+        "power_systems": [
+            "Calculate the power budget for a GPU-heavy edge AI workload in a PodX unit.",
+            "Design a power management strategy for generator + battery hybrid operation.",
+            "How do we handle graceful shutdown on power loss?",
+        ],
+        "ddil_operations": [
+            "Design a data synchronization strategy for intermittent connectivity.",
+            "How should applications degrade gracefully in bandwidth-limited scenarios?",
+            "What's the PACE plan for a deployed mobile data center?",
+        ],
+    },
+    DomainID.HUBZONE: {
+        "eligibility_assessment": [
+            "Does our company qualify for HUBZone if 30% of employees live in the zone but we're headquartered outside?",
+            "How do we count remote employees for HUBZone residency calculation?",
+            "What happens to our certification if the HUBZone map is redrawn?",
+        ],
+        "contracting_strategy": [
+            "Identify HUBZone set-aside opportunities matching our IT capabilities.",
+            "How do we compete effectively when a HUBZone contract is full and open?",
+            "Design a teaming strategy that preserves our HUBZone status.",
+        ],
+        "compliance_maintenance": [
+            "Create an annual recertification checklist for HUBZone compliance.",
+            "How do we document employee residency for SBA audit?",
+            "What triggers require us to notify SBA of material changes?",
+        ],
+    },
+}
+class DomainPromptGenerator:
+    """Generate prompts for a specific domain."""
+    def __init__(self, domain_id: DomainID):
+        self.domain = DOMAINS[domain_id]
+        self.seed_prompts = SEED_PROMPTS.get(domain_id, {})
+    def get_random_prompt(self, category: str = None) -> dict:
+        """Get a random prompt, optionally from a specific category."""
+        if category and category in self.seed_prompts:
+            prompts = self.seed_prompts[category]
+        else:
+            # Flatten all categories
+            prompts = [p for cat_prompts in self.seed_prompts.values() for p in cat_prompts]
+        if not prompts:
+            return {"error": "No prompts available for this domain"}
+        prompt = random.choice(prompts)
+        return {
+            "domain": self.domain.id.value,
+            "category": category or "mixed",
+            "prompt": prompt,
+            "quality_dimensions": [d.name for d in self.domain.dimensions],
+            "key_terms": self.domain.key_terms
+        }
+    def get_all_prompts(self) -> List[dict]:
+        """Get all seed prompts for this domain."""
+        results = []
+        for category, prompts in self.seed_prompts.items():
+            for prompt in prompts:
+                results.append({
+                    "domain": self.domain.id.value,
+                    "category": category,
+                    "prompt": prompt
+                })
+        return results
+    def evolve_prompt(self, base_prompt: str, evolution_type: str = "complexity") -> str:
+        """
+        Evolve a prompt using Evol-Instruct methodology.
+        Evolution types: complexity, specificity, constraint, multi_step
+        """
+        evolutions = {
+            "complexity": f"Make this task more complex by adding regulatory constraints:\n\n{base_prompt}",
+            "specificity": f"Make this more specific with concrete numbers and requirements:\n\n{base_prompt}",
+            "constraint": f"Add a difficult constraint that requires creative problem-solving:\n\n{base_prompt}",
+            "multi_step": f"Expand this into a multi-step problem requiring planning:\n\n{base_prompt}",
+        }
+        return evolutions.get(evolution_type, base_prompt)
+def generate_response_pair(prompt: str, generator_model, temperature_high: float = 0.9) -> tuple:
+    """
+    Generate two responses for pairwise comparison.
+    Uses temperature variation to create natural quality differences.
+    """
+    # High-quality response (low temperature, more tokens)
+    response_a = generator_model.generate(
+        prompt,
+        temperature=0.3,
+        max_tokens=1024
+    )
+    # Potentially lower-quality response (high temperature)
+    response_b = generator_model.generate(
+        prompt,
+        temperature=temperature_high,
+        max_tokens=512
+    )
+    # Randomize order to avoid position bias
+    if random.random() > 0.5:
+        return response_a, response_b, "A"
+    else:
+        return response_b, response_a, "B"

requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+# Zuup Preference Collection - Dependencies
+gradio>=4.0.0
+pandas>=2.0.0
+numpy>=1.24.0

taxonomy.py ADDED Viewed

	@@ -0,0 +1,720 @@

+# domains/taxonomy.py — Domain definitions and quality criteria
+from dataclasses import dataclass, field
+from enum import Enum
+from typing import List, Dict
+class DomainID(Enum):
+    FED_SLED_PROCUREMENT = "procurement"
+    BIOMEDICAL_GBCI = "gbci"
+    LEGACY_REFACTORING = "legacy"
+    AUTONOMY_OS = "autonomy"
+    QUANTUM_ARCHAEOLOGY = "qawm"
+    DEFENSE_WORLD_MODELS = "defense_wm"
+    HALAL_COMPLIANCE = "halal"
+    MOBILE_DATA_CENTER = "podx"
+    HUBZONE = "hubzone"
+    INGESTIBLE_GBCI = "ingestible"
+@dataclass
+class QualityDimension:
+    name: str
+    description: str
+    weight: float  # 0.0-1.0, must sum to 1.0 across dimensions
+    examples_good: List[str]
+    examples_bad: List[str]
+@dataclass
+class DomainSpec:
+    id: DomainID
+    name: str
+    description: str
+    zuup_platform: str  # Which Zuup platform this maps to
+    # Quality criteria
+    dimensions: List[QualityDimension]
+    # Safety & compliance
+    safety_considerations: List[str]
+    compliance_frameworks: List[str]
+    # Annotation requirements
+    required_expertise: List[str]
+    min_annotator_agreement: float  # Krippendorff's alpha threshold
+    # Prompt categories
+    prompt_categories: List[str]
+    # Domain-specific terminology
+    key_terms: Dict[str, str] = field(default_factory=dict)
+# === Domain Specifications ===
+DOMAINS: Dict[DomainID, DomainSpec] = {
+    DomainID.FED_SLED_PROCUREMENT: DomainSpec(
+        id=DomainID.FED_SLED_PROCUREMENT,
+        name="Federal/SLED Procurement",
+        description="Government procurement, contracting, FAR/DFARS compliance, proposal writing, and acquisition strategy.",
+        zuup_platform="Aureon",
+        dimensions=[
+            QualityDimension(
+                name="regulatory_accuracy",
+                description="Correct citation and interpretation of FAR/DFARS, CJIS, FedRAMP requirements",
+                weight=0.30,
+                examples_good=["Correctly cites FAR 15.306 for competitive range determination"],
+                examples_bad=["Vague reference to 'federal regulations' without specifics"]
+            ),
+            QualityDimension(
+                name="actionability",
+                description="Provides concrete, implementable steps for procurement actions",
+                weight=0.25,
+                examples_good=["Step-by-step RFP response checklist with deadlines"],
+                examples_bad=["Generic advice to 'review the solicitation carefully'"]
+            ),
+            QualityDimension(
+                name="traceability",
+                description="Maintains clear audit trail and decision rationale",
+                weight=0.20,
+                examples_good=["Documents evaluation criteria mapping to PWS requirements"],
+                examples_bad=["Scores proposals without explaining methodology"]
+            ),
+            QualityDimension(
+                name="risk_awareness",
+                description="Identifies compliance risks, protest grounds, and mitigation strategies",
+                weight=0.15,
+                examples_good=["Flags OCI concerns with specific mitigation plan"],
+                examples_bad=["Ignores potential bid protest vulnerabilities"]
+            ),
+            QualityDimension(
+                name="clarity",
+                description="Clear, professional communication suitable for government audiences",
+                weight=0.10,
+                examples_good=["Structured proposal section with clear headers"],
+                examples_bad=["Jargon-heavy text without definitions"]
+            ),
+        ],
+        safety_considerations=[
+            "No disclosure of procurement-sensitive information",
+            "No advice that could constitute bid-rigging",
+            "No circumvention of competition requirements",
+            "Protect source selection information"
+        ],
+        compliance_frameworks=["FAR", "DFARS", "CJIS", "FedRAMP", "CMMC", "Section 508"],
+        required_expertise=["Government contracting experience", "FAR/DFARS familiarity"],
+        min_annotator_agreement=0.7,
+        prompt_categories=[
+            "RFP_analysis", "proposal_writing", "pricing_strategy",
+            "compliance_check", "protest_risk", "teaming_agreements",
+            "past_performance", "capability_statements", "CPARS_response"
+        ],
+        key_terms={
+            "PWS": "Performance Work Statement",
+            "LPTA": "Lowest Price Technically Acceptable",
+            "OCI": "Organizational Conflict of Interest",
+            "CPARS": "Contractor Performance Assessment Reporting System"
+        }
+    ),
+    DomainID.BIOMEDICAL_GBCI: DomainSpec(
+        id=DomainID.BIOMEDICAL_GBCI,
+        name="Gut-Brain Computer Interface (External)",
+        description="Biosensor systems, gut microbiome analysis, neural signal processing, and brain-gut axis research.",
+        zuup_platform="Symbion",
+        dimensions=[
+            QualityDimension(
+                name="scientific_accuracy",
+                description="Correct understanding of gut-brain axis physiology, microbiome science",
+                weight=0.30,
+                examples_good=["Accurate description of vagal afferent signaling pathways"],
+                examples_bad=["Conflates correlation with causation in microbiome studies"]
+            ),
+            QualityDimension(
+                name="safety_primacy",
+                description="Prioritizes patient/user safety, acknowledges limitations",
+                weight=0.25,
+                examples_good=["Recommends physician consultation before intervention"],
+                examples_bad=["Suggests unvalidated treatments without disclaimers"]
+            ),
+            QualityDimension(
+                name="technical_rigor",
+                description="Correct signal processing, biosensor engineering principles",
+                weight=0.20,
+                examples_good=["Proper SNR calculations for biosensor specifications"],
+                examples_bad=["Ignores noise floor in sensitivity claims"]
+            ),
+            QualityDimension(
+                name="regulatory_awareness",
+                description="Understands FDA pathways, IRB requirements, HIPAA constraints",
+                weight=0.15,
+                examples_good=["Identifies 510(k) predicate device strategy"],
+                examples_bad=["Ignores medical device classification requirements"]
+            ),
+            QualityDimension(
+                name="ethical_grounding",
+                description="Addresses informed consent, data privacy, vulnerable populations",
+                weight=0.10,
+                examples_good=["Discusses consent protocols for cognitive research"],
+                examples_bad=["No mention of data governance for health data"]
+            ),
+        ],
+        safety_considerations=[
+            "No medical advice without appropriate disclaimers",
+            "No claims of FDA approval without evidence",
+            "Acknowledge research vs clinical evidence distinction",
+            "Protect PHI in all examples",
+            "No promotion of unvalidated interventions"
+        ],
+        compliance_frameworks=["FDA 21 CFR", "HIPAA", "IRB/Common Rule", "GDPR (health data)", "ISO 13485"],
+        required_expertise=["Biomedical engineering", "Neuroscience background", "Regulatory familiarity"],
+        min_annotator_agreement=0.75,
+        prompt_categories=[
+            "signal_processing", "microbiome_analysis", "biosensor_design",
+            "clinical_study_design", "regulatory_pathway", "data_architecture",
+            "neural_decoding", "intervention_protocols"
+        ],
+        key_terms={
+            "ENS": "Enteric Nervous System",
+            "SCFAs": "Short-Chain Fatty Acids",
+            "HRV": "Heart Rate Variability",
+            "EGG": "Electrogastrography"
+        }
+    ),
+    DomainID.INGESTIBLE_GBCI: DomainSpec(
+        id=DomainID.INGESTIBLE_GBCI,
+        name="Ingestible Gut-Brain Interface",
+        description="Ingestible biosensors, capsule endoscopy, in-vivo diagnostics, and wireless telemetry.",
+        zuup_platform="Symbion (Hardware)",
+        dimensions=[
+            QualityDimension(
+                name="biocompatibility",
+                description="Materials safety, degradation pathways, toxicity considerations",
+                weight=0.25,
+                examples_good=["Specifies USP Class VI materials for encapsulation"],
+                examples_bad=["Ignores GI transit variability in design"]
+            ),
+            QualityDimension(
+                name="engineering_feasibility",
+                description="Realistic power budgets, form factors, telemetry constraints",
+                weight=0.25,
+                examples_good=["Calculates RF link budget for in-body transmission"],
+                examples_bad=["Assumes unlimited battery life in capsule form"]
+            ),
+            QualityDimension(
+                name="clinical_validity",
+                description="Correlation with gold-standard diagnostics, clinical utility",
+                weight=0.20,
+                examples_good=["Validates against colonoscopy findings"],
+                examples_bad=["Claims diagnostic accuracy without clinical trial data"]
+            ),
+            QualityDimension(
+                name="regulatory_pathway",
+                description="Clear FDA classification, predicate strategy, clinical evidence requirements",
+                weight=0.20,
+                examples_good=["Maps to PillCam as predicate for 510(k)"],
+                examples_bad=["Ignores De Novo pathway for novel devices"]
+            ),
+            QualityDimension(
+                name="safety_engineering",
+                description="Failure modes, retention protocols, emergency procedures",
+                weight=0.10,
+                examples_good=["Defines capsule retention management protocol"],
+                examples_bad=["No consideration of obstruction scenarios"]
+            ),
+        ],
+        safety_considerations=[
+            "Capsule retention/obstruction protocols mandatory",
+            "Biocompatibility testing requirements",
+            "Wireless emission limits (SAR)",
+            "Contraindications for GI pathology",
+            "No pediatric use without specific validation"
+        ],
+        compliance_frameworks=["FDA 21 CFR 876", "IEC 60601", "ISO 10993", "FCC Part 95"],
+        required_expertise=["Medical device engineering", "GI physiology", "RF engineering"],
+        min_annotator_agreement=0.8,
+        prompt_categories=[
+            "capsule_design", "power_systems", "telemetry", "biocompatibility",
+            "clinical_validation", "manufacturing", "regulatory_submission"
+        ],
+        key_terms={
+            "GITT": "GI Transit Time",
+            "WCE": "Wireless Capsule Endoscopy",
+            "MICS": "Medical Implant Communication Service"
+        }
+    ),
+    DomainID.LEGACY_REFACTORING: DomainSpec(
+        id=DomainID.LEGACY_REFACTORING,
+        name="Legacy System Refactoring",
+        description="COBOL migration, mainframe modernization, strangler pattern implementation, and technical debt reduction.",
+        zuup_platform="Relian",
+        dimensions=[
+            QualityDimension(
+                name="correctness_preservation",
+                description="Maintains functional equivalence with legacy system",
+                weight=0.30,
+                examples_good=["Characterization tests verify behavior parity"],
+                examples_bad=["Assumes modern code 'should work the same'"]
+            ),
+            QualityDimension(
+                name="risk_mitigation",
+                description="Incremental migration, rollback strategies, blast radius containment",
+                weight=0.25,
+                examples_good=["Implements strangler fig with feature flags"],
+                examples_bad=["Big-bang migration with no fallback"]
+            ),
+            QualityDimension(
+                name="technical_accuracy",
+                description="Correct understanding of COBOL, JCL, VSAM, CICS, IMS",
+                weight=0.20,
+                examples_good=["Handles COBOL COMP-3 packed decimal correctly"],
+                examples_bad=["Ignores EBCDIC encoding issues"]
+            ),
+            QualityDimension(
+                name="business_continuity",
+                description="Maintains operations during migration, handles batch windows",
+                weight=0.15,
+                examples_good=["Parallel run strategy with reconciliation"],
+                examples_bad=["Requires production downtime for cutover"]
+            ),
+            QualityDimension(
+                name="documentation",
+                description="Captures tribal knowledge, maps business rules",
+                weight=0.10,
+                examples_good=["Documents undocumented COPYBOOK business logic"],
+                examples_bad=["Assumes self-documenting code"]
+            ),
+        ],
+        safety_considerations=[
+            "Production data protection during migration",
+            "Audit trail continuity across systems",
+            "Compliance evidence preservation",
+            "No loss of business logic"
+        ],
+        compliance_frameworks=["SOX", "PCI-DSS", "GLBA", "HIPAA (if healthcare)"],
+        required_expertise=["COBOL experience", "Mainframe operations", "Modern architecture"],
+        min_annotator_agreement=0.7,
+        prompt_categories=[
+            "code_translation", "data_migration", "testing_strategy",
+            "strangler_pattern", "batch_modernization", "API_wrapping",
+            "performance_parity", "knowledge_capture"
+        ],
+        key_terms={
+            "COPYBOOK": "COBOL data structure definition",
+            "JCL": "Job Control Language",
+            "VSAM": "Virtual Storage Access Method",
+            "CICS": "Customer Information Control System"
+        }
+    ),
+    DomainID.AUTONOMY_OS: DomainSpec(
+        id=DomainID.AUTONOMY_OS,
+        name="Autonomy OS / Post-ASI LLM",
+        description="Autonomous agent systems, tool use safety, multi-agent coordination, and post-superintelligence architectures.",
+        zuup_platform="Veyra",
+        dimensions=[
+            QualityDimension(
+                name="safety_alignment",
+                description="Proper constraints, human oversight, corrigibility",
+                weight=0.30,
+                examples_good=["Implements approval gates for high-impact actions"],
+                examples_bad=["Autonomous execution without human checkpoints"]
+            ),
+            QualityDimension(
+                name="capability_grounding",
+                description="Realistic assessment of current vs speculative capabilities",
+                weight=0.25,
+                examples_good=["Clearly labels TRL for each capability claim"],
+                examples_bad=["Conflates research concepts with production readiness"]
+            ),
+            QualityDimension(
+                name="tool_safety",
+                description="Proper sandboxing, permission models, rollback mechanisms",
+                weight=0.20,
+                examples_good=["Defines tool permission matrix with escalation"],
+                examples_bad=["Gives agents unrestricted filesystem access"]
+            ),
+            QualityDimension(
+                name="coordination_correctness",
+                description="Multi-agent consensus, conflict resolution, resource management",
+                weight=0.15,
+                examples_good=["Implements Byzantine fault tolerance for agent voting"],
+                examples_bad=["Assumes agents always agree"]
+            ),
+            QualityDimension(
+                name="interpretability",
+                description="Explainable decisions, audit trails, reasoning transparency",
+                weight=0.10,
+                examples_good=["Logs full reasoning chain for each action"],
+                examples_bad=["Black-box decision making"]
+            ),
+        ],
+        safety_considerations=[
+            "Human-in-the-loop for consequential decisions",
+            "Containment strategies for capability overhang",
+            "No self-modification without approval",
+            "Shutdown/rollback always available",
+            "Distinguish speculation from engineering"
+        ],
+        compliance_frameworks=["NIST AI RMF", "EU AI Act (high-risk)", "DoD AI Ethics"],
+        required_expertise=["AI safety research", "Distributed systems", "Agent architectures"],
+        min_annotator_agreement=0.75,
+        prompt_categories=[
+            "agent_design", "tool_permissions", "multi_agent_coord",
+            "safety_constraints", "capability_assessment", "deployment_strategy",
+            "failure_modes", "alignment_verification"
+        ],
+        key_terms={
+            "HITL": "Human-in-the-Loop",
+            "TRL": "Technology Readiness Level",
+            "Corrigibility": "Ability to be corrected/shut down"
+        }
+    ),
+    DomainID.QUANTUM_ARCHAEOLOGY: DomainSpec(
+        id=DomainID.QUANTUM_ARCHAEOLOGY,
+        name="Quantum Archaeological World Models",
+        description="Historical event reconstruction, evidence synthesis, uncertainty quantification, and temporal reasoning.",
+        zuup_platform="QAWM / QAL",
+        dimensions=[
+            QualityDimension(
+                name="evidential_rigor",
+                description="Proper source citation, evidence weighting, provenance tracking",
+                weight=0.30,
+                examples_good=["Weights primary sources over secondary interpretations"],
+                examples_bad=["Treats Wikipedia as primary evidence"]
+            ),
+            QualityDimension(
+                name="uncertainty_quantification",
+                description="Explicit confidence intervals, alternative hypotheses",
+                weight=0.25,
+                examples_good=["Reports reconstruction with 95% CI and alternatives"],
+                examples_bad=["Presents single interpretation as fact"]
+            ),
+            QualityDimension(
+                name="temporal_reasoning",
+                description="Correct handling of chronology, causation, anachronism detection",
+                weight=0.20,
+                examples_good=["Flags anachronistic elements in source material"],
+                examples_bad=["Ignores temporal inconsistencies"]
+            ),
+            QualityDimension(
+                name="methodological_transparency",
+                description="Clear description of reconstruction methodology",
+                weight=0.15,
+                examples_good=["Documents Bayesian update process for beliefs"],
+                examples_bad=["Presents conclusions without methodology"]
+            ),
+            QualityDimension(
+                name="simulation_validity",
+                description="Realistic constraints on reconstructions, physics/economics grounding",
+                weight=0.10,
+                examples_good=["Validates against known logistical constraints"],
+                examples_bad=["Ignores material/resource limitations of era"]
+            ),
+        ],
+        safety_considerations=[
+            "No falsification of historical record",
+            "Acknowledge political sensitivities",
+            "Distinguish reconstruction from fabrication",
+            "Respect cultural heritage considerations"
+        ],
+        compliance_frameworks=["Academic integrity standards", "NAGPRA (if indigenous)", "UNESCO heritage"],
+        required_expertise=["Historical methodology", "Bayesian reasoning", "Domain history"],
+        min_annotator_agreement=0.65,
+        prompt_categories=[
+            "event_reconstruction", "source_analysis", "timeline_synthesis",
+            "counterfactual_analysis", "evidence_weighting", "visualization",
+            "uncertainty_modeling", "cross_reference"
+        ],
+        key_terms={
+            "Provenance": "Chain of custody/origin of evidence",
+            "Terminus post quem": "Earliest possible date",
+            "Terminus ante quem": "Latest possible date"
+        }
+    ),
+    DomainID.DEFENSE_WORLD_MODELS: DomainSpec(
+        id=DomainID.DEFENSE_WORLD_MODELS,
+        name="Defense World Models",
+        description="3D scene understanding, spatial intelligence, ISR applications, and tactical decision support.",
+        zuup_platform="Orb",
+        dimensions=[
+            QualityDimension(
+                name="spatial_accuracy",
+                description="Correct 3D reconstruction, geospatial reasoning, coordinate systems",
+                weight=0.25,
+                examples_good=["Proper MGRS/UTM coordinate handling"],
+                examples_bad=["Ignores datum/projection errors"]
+            ),
+            QualityDimension(
+                name="operational_relevance",
+                description="Actionable intelligence, mission-aligned outputs",
+                weight=0.25,
+                examples_good=["Identifies tactically significant terrain features"],
+                examples_bad=["Generic scene description without operational context"]
+            ),
+            QualityDimension(
+                name="uncertainty_communication",
+                description="Confidence levels, sensor limitations, fusion caveats",
+                weight=0.20,
+                examples_good=["Reports reconstruction confidence per region"],
+                examples_bad=["Presents all outputs as equally reliable"]
+            ),
+            QualityDimension(
+                name="security_awareness",
+                description="OPSEC considerations, classification handling, need-to-know",
+                weight=0.20,
+                examples_good=["Redacts sensitive locations in examples"],
+                examples_bad=["Uses real operational data in training"]
+            ),
+            QualityDimension(
+                name="interoperability",
+                description="Standards compliance, data exchange formats",
+                weight=0.10,
+                examples_good=["Outputs in NGA-compliant formats"],
+                examples_bad=["Proprietary formats without conversion"]
+            ),
+        ],
+        safety_considerations=[
+            "No real classified/operational data",
+            "OPSEC in all examples",
+            "Dual-use awareness",
+            "No targeting recommendations without HITL",
+            "Export control (ITAR/EAR) awareness"
+        ],
+        compliance_frameworks=["NIST 800-171", "CMMC", "ITAR", "NGA standards", "NATO STANAG"],
+        required_expertise=["Geospatial intelligence", "3D computer vision", "Defense domain"],
+        min_annotator_agreement=0.75,
+        prompt_categories=[
+            "scene_reconstruction", "change_detection", "terrain_analysis",
+            "sensor_fusion", "tactical_planning", "visualization",
+            "data_standards", "pipeline_design"
+        ],
+        key_terms={
+            "MGRS": "Military Grid Reference System",
+            "ISR": "Intelligence, Surveillance, Reconnaissance",
+            "GEOINT": "Geospatial Intelligence"
+        }
+    ),
+    DomainID.HALAL_COMPLIANCE: DomainSpec(
+        id=DomainID.HALAL_COMPLIANCE,
+        name="Global Halal Compliance",
+        description="Halal certification, supply chain provenance, standards harmonization, and attestation systems.",
+        zuup_platform="Civium (Halal)",
+        dimensions=[
+            QualityDimension(
+                name="jurisprudential_accuracy",
+                description="Correct understanding of fiqh positions, school differences",
+                weight=0.25,
+                examples_good=["Acknowledges Hanafi vs Shafi'i differences on seafood"],
+                examples_bad=["Presents single madhab view as universal"]
+            ),
+            QualityDimension(
+                name="standards_mapping",
+                description="Correct mapping across GSO, JAKIM, MUI, ESMA standards",
+                weight=0.25,
+                examples_good=["Maps ingredient to multiple standard requirements"],
+                examples_bad=["Assumes single global standard"]
+            ),
+            QualityDimension(
+                name="supply_chain_rigor",
+                description="Provenance tracking, contamination prevention, audit trails",
+                weight=0.20,
+                examples_good=["Full chain of custody from slaughter to retail"],
+                examples_bad=["Relies on final product testing only"]
+            ),
+            QualityDimension(
+                name="dispute_handling",
+                description="Clear escalation paths, scholarly consultation protocols",
+                weight=0.15,
+                examples_good=["Defined process for disputed ingredients"],
+                examples_bad=["Binary halal/haram without nuance"]
+            ),
+            QualityDimension(
+                name="cultural_sensitivity",
+                description="Respectful treatment of religious requirements",
+                weight=0.15,
+                examples_good=["Frames compliance as religious obligation support"],
+                examples_bad=["Treats halal as mere market requirement"]
+            ),
+        ],
+        safety_considerations=[
+            "Respect religious sensitivities",
+            "No misrepresentation of certification status",
+            "Acknowledge legitimate scholarly differences",
+            "Protect proprietary formulations"
+        ],
+        compliance_frameworks=["GSO 2055", "MS 1500", "UAE.S 2055", "OIC/SMIIC"],
+        required_expertise=["Islamic jurisprudence familiarity", "Food science", "Supply chain"],
+        min_annotator_agreement=0.7,
+        prompt_categories=[
+            "ingredient_analysis", "certification_mapping", "supply_chain",
+            "audit_protocols", "dispute_resolution", "standards_harmonization",
+            "cross_contamination", "documentation"
+        ],
+        key_terms={
+            "Dhabiha": "Islamic slaughter method",
+            "Mashbooh": "Doubtful/questionable",
+            "Istihalah": "Complete transformation (purification)"
+        }
+    ),
+    DomainID.MOBILE_DATA_CENTER: DomainSpec(
+        id=DomainID.MOBILE_DATA_CENTER,
+        name="Mobile Distributed Data Centers",
+        description="Edge computing in DDIL environments, tactical networking, and resilient infrastructure.",
+        zuup_platform="PodX",
+        dimensions=[
+            QualityDimension(
+                name="operational_resilience",
+                description="Offline-first, degraded mode operation, recovery procedures",
+                weight=0.25,
+                examples_good=["Defines graceful degradation for each connectivity state"],
+                examples_bad=["Assumes persistent connectivity"]
+            ),
+            QualityDimension(
+                name="environmental_hardening",
+                description="Thermal, shock, vibration, EMI considerations",
+                weight=0.25,
+                examples_good=["Specifies MIL-STD-810 compliance for shock/vibe"],
+                examples_bad=["Commercial hardware without hardening"]
+            ),
+            QualityDimension(
+                name="logistics_feasibility",
+                description="Power budgets, form factors, transportability constraints",
+                weight=0.20,
+                examples_good=["Calculates total power budget with thermal headroom"],
+                examples_bad=["Ignores generator fuel logistics"]
+            ),
+            QualityDimension(
+                name="security_architecture",
+                description="Zero-trust, data-at-rest encryption, physical security",
+                weight=0.20,
+                examples_good=["HSM-backed key management with tamper response"],
+                examples_bad=["Software-only encryption with key in memory"]
+            ),
+            QualityDimension(
+                name="interoperability",
+                description="Coalition partner integration, standards compliance",
+                weight=0.10,
+                examples_good=["Implements NATO FMN standards for data sharing"],
+                examples_bad=["Proprietary protocols without gateways"]
+            ),
+        ],
+        safety_considerations=[
+            "Personnel safety in field conditions",
+            "Data destruction procedures",
+            "Physical security protocols",
+            "EMI/EMC compliance"
+        ],
+        compliance_frameworks=["MIL-STD-810", "MIL-STD-461", "NIST 800-171", "NATO STANAG"],
+        required_expertise=["Edge computing", "Military logistics", "Tactical networking"],
+        min_annotator_agreement=0.7,
+        prompt_categories=[
+            "architecture_design", "power_systems", "thermal_management",
+            "networking", "security", "logistics", "deployment_procedures",
+            "recovery_operations"
+        ],
+        key_terms={
+            "DDIL": "Denied, Degraded, Intermittent, Limited (bandwidth)",
+            "PACE": "Primary, Alternate, Contingency, Emergency (comms)",
+            "FMN": "Federated Mission Networking"
+        }
+    ),
+    DomainID.HUBZONE: DomainSpec(
+        id=DomainID.HUBZONE,
+        name="HUBZone Ecosystem",
+        description="HUBZone certification, small business contracting, economic development in underserved areas.",
+        zuup_platform="Aureon (HUBZone)",
+        dimensions=[
+            QualityDimension(
+                name="regulatory_accuracy",
+                description="Correct HUBZone eligibility rules, SBA requirements",
+                weight=0.30,
+                examples_good=["Correctly calculates 35% employee residency requirement"],
+                examples_bad=["Misapplies principal office location rules"]
+            ),
+            QualityDimension(
+                name="strategic_guidance",
+                description="Actionable advice for certification and contracting",
+                weight=0.25,
+                examples_good=["Maps HUBZone set-aside opportunities to capabilities"],
+                examples_bad=["Generic small business advice"]
+            ),
+            QualityDimension(
+                name="compliance_maintenance",
+                description="Ongoing compliance, recertification, audit preparation",
+                weight=0.20,
+                examples_good=["Defines annual recertification checklist"],
+                examples_bad=["Assumes one-time certification"]
+            ),
+            QualityDimension(
+                name="economic_development",
+                description="Understanding of HUBZone program economic objectives",
+                weight=0.15,
+                examples_good=["Connects certification to community impact"],
+                examples_bad=["Treats purely as contracting advantage"]
+            ),
+            QualityDimension(
+                name="documentation",
+                description="Proper evidence collection, record-keeping",
+                weight=0.10,
+                examples_good=["Specifies required residence documentation"],
+                examples_bad=["Vague reference to 'proof of residence'"]
+            ),
+        ],
+        safety_considerations=[
+            "No advice on fraudulent certification",
+            "Accurate representation of eligibility",
+            "Privacy of employee information"
+        ],
+        compliance_frameworks=["13 CFR Part 126", "SBA HUBZone Program", "FAR 19.13"],
+        required_expertise=["Small business contracting", "SBA programs", "Government procurement"],
+        min_annotator_agreement=0.7,
+        prompt_categories=[
+            "eligibility_assessment", "certification_process", "contracting_strategy",
+            "compliance_maintenance", "teaming", "subcontracting",
+            "map_analysis", "documentation"
+        ],
+        key_terms={
+            "HUBZone": "Historically Underutilized Business Zone",
+            "Set-aside": "Contract reserved for specific small business category",
+            "Principal office": "Location where greatest number of employees work"
+        }
+    ),
+}
+def get_domain(domain_id: DomainID) -> DomainSpec:
+    return DOMAINS[domain_id]
+def get_all_domains() -> List[DomainSpec]:
+    return list(DOMAINS.values())
+def get_quality_rubric(domain_id: DomainID) -> str:
+    """Generate human-readable quality rubric for annotators."""
+    domain = DOMAINS[domain_id]
+    rubric = f"# Quality Rubric: {domain.name}\n\n"
+    rubric += f"{domain.description}\n\n"
+    rubric += "## Scoring Dimensions\n\n"
+    for dim in domain.dimensions:
+        rubric += f"### {dim.name.replace('_', ' ').title()} (Weight: {dim.weight:.0%})\n"
+        rubric += f"{dim.description}\n\n"
+        rubric += f"**Good example:** {dim.examples_good[0]}\n\n"
+        rubric += f"**Bad example:** {dim.examples_bad[0]}\n\n"
+    rubric += "## Safety Considerations\n"
+    for safety in domain.safety_considerations:
+        rubric += f"- {safety}\n"
+    return rubric