open-navigator / website /docs /deployment /databricks-migration.md
jcbowyer's picture
Clean HuggingFace deployment without binary files
61d29fc
metadata
sidebar_position: 2

Databricks Agent Bricks Refactoring - Summary

What Was Done

This system has been refactored to support Databricks Agent Bricks (Mosaic AI Agent Framework), enabling production-ready deployment on Databricks with full governance, monitoring, and scalability.

New Files Created

1. Core Agent Infrastructure

  • agents/mlflow_base.py - MLflow Pyfunc base classes for agents

    • MLflowAgentBase: Base class with tracing, Model Serving compatibility
    • MLflowChainAgent: LangChain integration with automatic logging
    • Automatic signature inference
    • Unity Catalog registration methods
    • Model Serving deployment helpers
  • agents/mlflow_classifier.py - Production classifier agent

    • Hybrid keyword + LLM classification
    • MLflow tracing for all calls
    • Unity Catalog ready
    • Can be deployed to Model Serving
    • Includes registration script

2. Deployment & Operations

  • databricks/deployment.py - Deployment automation

    • AgentDeploymentManager class
    • Register agents to Unity Catalog
    • Deploy to Model Serving endpoints
    • Multi-agent endpoints with traffic splitting
    • Endpoint testing and monitoring
    • Auto-scaling configuration
  • databricks/evaluation.py - Quality assurance

    • AgentEvaluator class
    • Automated evaluation pipelines
    • A/B testing between versions
    • Metrics: accuracy, precision, recall, F1, latency
    • Confusion matrix generation
    • Feedback loop integration with Delta Lake

3. Interactive Development

  • databricks/notebooks/01_agent_bricks_quickstart.py - Databricks notebook

    • Step-by-step deployment guide
    • Local testing examples
    • Unity Catalog registration
    • Model Serving deployment
    • Evaluation examples
    • Delta Lake queries
    • Monitoring and observability
  • databricks/README.md - Comprehensive documentation

    • Architecture diagrams
    • Deployment workflows
    • API usage examples
    • Cost considerations
    • Troubleshooting guide
    • Best practices

4. Dependencies

  • Updated requirements-cpu.txt with:
    • mlflow>=2.10.0 - MLflow tracking and serving
    • databricks-agents>=0.1.0 - Agent Framework
    • databricks-vectorsearch>=0.22.0 - Vector search
    • langgraph>=0.0.20 - Stateful agent graphs
    • databricks-sdk>=0.18.0 - Databricks API client

5. Updated Existing Files

  • README.md - Added Databricks Agent Bricks section
  • install.sh - Detects and uses requirements-cpu.txt

Key Features Added

1. MLflow Integration

βœ… Automatic tracing of all agent calls βœ… LLM request/response logging βœ… Metrics tracking (latency, tokens, cost) βœ… Experiment tracking and versioning βœ… Model registry integration

2. Unity Catalog Governance

βœ… Centralized model registration βœ… Permissions and access control βœ… Data lineage tracking βœ… Version management βœ… Tag-based organization

3. Model Serving

βœ… REST API endpoints βœ… Auto-scaling (scale-to-zero capable) βœ… A/B testing with traffic splitting βœ… Multi-agent pipelines βœ… Monitoring and alerting

4. Evaluation Framework

βœ… Automated quality metrics βœ… Regression detection βœ… Version comparison βœ… Confusion matrices βœ… Feedback loop from production

5. Production Ready

βœ… CPU-only compatibility (no GPU needed) βœ… Enterprise monitoring βœ… Cost optimization (keyword filtering before LLM) βœ… Error handling and retries βœ… Comprehensive logging

Deployment Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Databricks Workspace                   β”‚
β”‚                                                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚
β”‚  β”‚  Unity Catalog   │◄──────  MLflow Tracking   β”‚       β”‚
β”‚  β”‚  - Policy Class. β”‚      β”‚  - Experiments     β”‚       β”‚
β”‚  β”‚  - Sentiment An. β”‚      β”‚  - Traces          β”‚       β”‚
β”‚  β”‚  - Advocacy Gen. β”‚      β”‚  - Metrics         β”‚       β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚
β”‚           β”‚                                               β”‚
β”‚           β–Ό                                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”‚
β”‚  β”‚      Model Serving Endpoints              β”‚           β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚           β”‚
β”‚  β”‚  β”‚ Classifier β”‚  β”‚ Sentiment Analyzer  β”‚ β”‚           β”‚
β”‚  β”‚  β”‚ (Small)    β”‚  β”‚ (Small)             β”‚ β”‚           β”‚
β”‚  β”‚  β”‚ Scale-to-0 β”‚  β”‚ Scale-to-0          β”‚ β”‚           β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚           β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β”‚
β”‚                β”‚                                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚
                 β–Ό
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚  External API  β”‚
         β”‚  FastAPI App   β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Usage Examples

Register Agent to Unity Catalog

from agents.mlflow_classifier import PolicyClassifierAgent
from databricks.deployment import AgentDeploymentManager

manager = AgentDeploymentManager()

version = manager.register_agent(
    agent_class=PolicyClassifierAgent,
    agent_name="policy_classifier",
    description="Classifies documents for oral health topics",
    tags={"team": "advocacy"}
)

Deploy to Model Serving

endpoint_url = manager.deploy_agent(
    agent_name="policy_classifier",
    endpoint_name="policy-classifier-prod",
    workload_size="Small",
    scale_to_zero=True
)

Evaluate Agent

from databricks.evaluation import AgentEvaluator

evaluator = AgentEvaluator("policy_classifier")

metrics = evaluator.evaluate_classifier(
    model_uri="models:/main.agents.policy_classifier/1",
    test_documents=test_docs,
    ground_truth=labels
)

print(f"Accuracy: {metrics.accuracy:.2%}")

Invoke via API

curl -X POST https://workspace.cloud.databricks.com/serving-endpoints/policy-classifier-prod/invocations \
  -H "Authorization: Bearer $DATABRICKS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "dataframe_records": [{
      "document_id": "doc_001",
      "title": "Meeting",
      "content": "Fluoride discussion..."
    }]
  }'

Benefits

Before (Custom Implementation)

  • ❌ Manual deployment and versioning
  • ❌ No built-in observability
  • ❌ Limited scalability
  • ❌ No governance or lineage
  • ❌ Manual evaluation pipelines
  • ❌ Complex monitoring setup

After (Databricks Agent Bricks)

  • βœ… One-command deployment
  • βœ… Automatic tracing and logging
  • βœ… Auto-scaling Model Serving
  • βœ… Unity Catalog governance
  • βœ… Built-in evaluation framework
  • βœ… Enterprise monitoring included

Cost Optimization

The refactored system includes several cost optimizations:

  1. Hybrid Classification: Uses keyword matching before expensive LLM calls
  2. Scale-to-Zero: Endpoints scale down when idle
  3. Batch Processing: Supports bulk document classification
  4. Caching: Frequently requested results can be cached
  5. Small Workloads: Starts with small endpoints, scales on demand

Estimated cost: ~$0.10-0.50/hour for active endpoints (much less with scale-to-zero)

Next Steps

  1. Deploy to Databricks:

    python -m databricks.deployment
    
  2. Run Evaluation:

    python -m databricks.evaluation
    
  3. Test in Notebook: Open databricks/notebooks/01_agent_bricks_quickstart.py

  4. Monitor Production: Set up alerts in Databricks UI

  5. Add Feedback Loop: Collect corrections and retrain

Migration Path

For existing users:

  1. βœ… Standalone mode still works - No breaking changes to existing code
  2. πŸ”„ Gradual migration - Can use both modes simultaneously
  3. ☁️ Databricks optional - Only needed for production scale
  4. 🎯 Choose your path:
    • Small projects: Use standalone mode
    • Production/Enterprise: Use Databricks Agent Bricks

Questions?

  • See databricks/README.md for detailed docs
  • Run databricks/notebooks/01_agent_bricks_quickstart.py for hands-on tutorial
  • Check examples in databricks/deployment.py and databricks/evaluation.py

Summary

This refactoring transforms the Oral Health Policy Pulse from a standalone multi-agent system into a production-ready, enterprise-grade application that leverages Databricks' full stack for AI governance, deployment, and monitoring. The system now has:

  • 🏒 Enterprise deployment via Model Serving
  • πŸ“Š Automatic observability with MLflow tracing
  • πŸ” Data governance through Unity Catalog
  • πŸ“ˆ Quality assurance with evaluation framework
  • πŸ’° Cost optimization with scale-to-zero and hybrid approach
  • πŸš€ Production readiness out of the box

All while maintaining backward compatibility with the standalone mode! πŸŽ‰