open-navigator / website /docs /deployment /databricks-migration.md
jcbowyer's picture
Clean HuggingFace deployment without binary files
61d29fc
---
sidebar_position: 2
---
# Databricks Agent Bricks Refactoring - Summary
## What Was Done
This system has been refactored to support **Databricks Agent Bricks** (Mosaic AI Agent Framework), enabling production-ready deployment on Databricks with full governance, monitoring, and scalability.
## New Files Created
### 1. **Core Agent Infrastructure**
- `agents/mlflow_base.py` - MLflow Pyfunc base classes for agents
- `MLflowAgentBase`: Base class with tracing, Model Serving compatibility
- `MLflowChainAgent`: LangChain integration with automatic logging
- Automatic signature inference
- Unity Catalog registration methods
- Model Serving deployment helpers
- `agents/mlflow_classifier.py` - Production classifier agent
- Hybrid keyword + LLM classification
- MLflow tracing for all calls
- Unity Catalog ready
- Can be deployed to Model Serving
- Includes registration script
### 2. **Deployment & Operations**
- `databricks/deployment.py` - Deployment automation
- `AgentDeploymentManager` class
- Register agents to Unity Catalog
- Deploy to Model Serving endpoints
- Multi-agent endpoints with traffic splitting
- Endpoint testing and monitoring
- Auto-scaling configuration
- `databricks/evaluation.py` - Quality assurance
- `AgentEvaluator` class
- Automated evaluation pipelines
- A/B testing between versions
- Metrics: accuracy, precision, recall, F1, latency
- Confusion matrix generation
- Feedback loop integration with Delta Lake
### 3. **Interactive Development**
- `databricks/notebooks/01_agent_bricks_quickstart.py` - Databricks notebook
- Step-by-step deployment guide
- Local testing examples
- Unity Catalog registration
- Model Serving deployment
- Evaluation examples
- Delta Lake queries
- Monitoring and observability
- `databricks/README.md` - Comprehensive documentation
- Architecture diagrams
- Deployment workflows
- API usage examples
- Cost considerations
- Troubleshooting guide
- Best practices
### 4. **Dependencies**
- Updated `requirements-cpu.txt` with:
- `mlflow>=2.10.0` - MLflow tracking and serving
- `databricks-agents>=0.1.0` - Agent Framework
- `databricks-vectorsearch>=0.22.0` - Vector search
- `langgraph>=0.0.20` - Stateful agent graphs
- `databricks-sdk>=0.18.0` - Databricks API client
### 5. **Updated Existing Files**
- `README.md` - Added Databricks Agent Bricks section
- `install.sh` - Detects and uses `requirements-cpu.txt`
## Key Features Added
### 1. **MLflow Integration**
βœ… Automatic tracing of all agent calls
βœ… LLM request/response logging
βœ… Metrics tracking (latency, tokens, cost)
βœ… Experiment tracking and versioning
βœ… Model registry integration
### 2. **Unity Catalog Governance**
βœ… Centralized model registration
βœ… Permissions and access control
βœ… Data lineage tracking
βœ… Version management
βœ… Tag-based organization
### 3. **Model Serving**
βœ… REST API endpoints
βœ… Auto-scaling (scale-to-zero capable)
βœ… A/B testing with traffic splitting
βœ… Multi-agent pipelines
βœ… Monitoring and alerting
### 4. **Evaluation Framework**
βœ… Automated quality metrics
βœ… Regression detection
βœ… Version comparison
βœ… Confusion matrices
βœ… Feedback loop from production
### 5. **Production Ready**
βœ… CPU-only compatibility (no GPU needed)
βœ… Enterprise monitoring
βœ… Cost optimization (keyword filtering before LLM)
βœ… Error handling and retries
βœ… Comprehensive logging
## Deployment Architecture
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Databricks Workspace β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Unity Catalog │◄────── MLflow Tracking β”‚ β”‚
β”‚ β”‚ - Policy Class. β”‚ β”‚ - Experiments β”‚ β”‚
β”‚ β”‚ - Sentiment An. β”‚ β”‚ - Traces β”‚ β”‚
β”‚ β”‚ - Advocacy Gen. β”‚ β”‚ - Metrics β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β”‚ β–Ό β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Model Serving Endpoints β”‚ β”‚
β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚
β”‚ β”‚ β”‚ Classifier β”‚ β”‚ Sentiment Analyzer β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ (Small) β”‚ β”‚ (Small) β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ Scale-to-0 β”‚ β”‚ Scale-to-0 β”‚ β”‚ β”‚
β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ External API β”‚
β”‚ FastAPI App β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
## Usage Examples
### Register Agent to Unity Catalog
```python
from agents.mlflow_classifier import PolicyClassifierAgent
from databricks.deployment import AgentDeploymentManager
manager = AgentDeploymentManager()
version = manager.register_agent(
agent_class=PolicyClassifierAgent,
agent_name="policy_classifier",
description="Classifies documents for oral health topics",
tags={"team": "advocacy"}
)
```
### Deploy to Model Serving
```python
endpoint_url = manager.deploy_agent(
agent_name="policy_classifier",
endpoint_name="policy-classifier-prod",
workload_size="Small",
scale_to_zero=True
)
```
### Evaluate Agent
```python
from databricks.evaluation import AgentEvaluator
evaluator = AgentEvaluator("policy_classifier")
metrics = evaluator.evaluate_classifier(
model_uri="models:/main.agents.policy_classifier/1",
test_documents=test_docs,
ground_truth=labels
)
print(f"Accuracy: {metrics.accuracy:.2%}")
```
### Invoke via API
```bash
curl -X POST https://workspace.cloud.databricks.com/serving-endpoints/policy-classifier-prod/invocations \
-H "Authorization: Bearer $DATABRICKS_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"dataframe_records": [{
"document_id": "doc_001",
"title": "Meeting",
"content": "Fluoride discussion..."
}]
}'
```
## Benefits
### Before (Custom Implementation)
- ❌ Manual deployment and versioning
- ❌ No built-in observability
- ❌ Limited scalability
- ❌ No governance or lineage
- ❌ Manual evaluation pipelines
- ❌ Complex monitoring setup
### After (Databricks Agent Bricks)
- βœ… One-command deployment
- βœ… Automatic tracing and logging
- βœ… Auto-scaling Model Serving
- βœ… Unity Catalog governance
- βœ… Built-in evaluation framework
- βœ… Enterprise monitoring included
## Cost Optimization
The refactored system includes several cost optimizations:
1. **Hybrid Classification**: Uses keyword matching before expensive LLM calls
2. **Scale-to-Zero**: Endpoints scale down when idle
3. **Batch Processing**: Supports bulk document classification
4. **Caching**: Frequently requested results can be cached
5. **Small Workloads**: Starts with small endpoints, scales on demand
Estimated cost: **~$0.10-0.50/hour for active endpoints** (much less with scale-to-zero)
## Next Steps
1. **Deploy to Databricks**:
```bash
python -m databricks.deployment
```
2. **Run Evaluation**:
```bash
python -m databricks.evaluation
```
3. **Test in Notebook**: Open `databricks/notebooks/01_agent_bricks_quickstart.py`
4. **Monitor Production**: Set up alerts in Databricks UI
5. **Add Feedback Loop**: Collect corrections and retrain
## Migration Path
For existing users:
1. βœ… **Standalone mode still works** - No breaking changes to existing code
2. πŸ”„ **Gradual migration** - Can use both modes simultaneously
3. ☁️ **Databricks optional** - Only needed for production scale
4. 🎯 **Choose your path**:
- Small projects: Use standalone mode
- Production/Enterprise: Use Databricks Agent Bricks
## Questions?
- See [`databricks/README.md`](databricks/README.md) for detailed docs
- Run `databricks/notebooks/01_agent_bricks_quickstart.py` for hands-on tutorial
- Check examples in `databricks/deployment.py` and `databricks/evaluation.py`
## Summary
This refactoring transforms the Oral Health Policy Pulse from a standalone multi-agent system into a **production-ready, enterprise-grade application** that leverages Databricks' full stack for AI governance, deployment, and monitoring. The system now has:
- 🏒 **Enterprise deployment** via Model Serving
- πŸ“Š **Automatic observability** with MLflow tracing
- πŸ” **Data governance** through Unity Catalog
- πŸ“ˆ **Quality assurance** with evaluation framework
- πŸ’° **Cost optimization** with scale-to-zero and hybrid approach
- πŸš€ **Production readiness** out of the box
All while maintaining backward compatibility with the standalone mode! πŸŽ‰