Spaces:
Running
Running
Version Control Strategy - BDR Agent Factory
Overview
Comprehensive versioning strategy for AI capabilities, models, and system components to ensure backward compatibility, traceability, and controlled rollouts.
Semantic Versioning
Version Format: MAJOR.MINOR.PATCH
v1.2.3
β β β
β β ββ PATCH: Bug fixes, minor improvements (backward compatible)
β ββββ MINOR: New features, enhancements (backward compatible)
ββββββ MAJOR: Breaking changes (not backward compatible)
Version Increment Rules
MAJOR Version (X.0.0)
Increment when:
- Breaking API changes
- Incompatible capability interface changes
- Major model architecture changes
- Removal of deprecated features
- Significant governance requirement changes
Example: 1.5.2 β 2.0.0
MINOR Version (x.Y.0)
Increment when:
- New capabilities added
- New features in existing capabilities
- Model performance improvements
- New compliance framework support
- Backward-compatible API enhancements
Example: 1.5.2 β 1.6.0
PATCH Version (x.y.Z)
Increment when:
- Bug fixes
- Security patches
- Performance optimizations
- Documentation updates
- Minor model fine-tuning
Example: 1.5.2 β 1.5.3
Capability Versioning
Capability Version Structure
id: cap_text_classification
name: Text Classification
version: 2.1.0
model_version: 2.1.0-bert-large
api_version: v1
status: production
released_at: "2026-01-03T00:00:00Z"
previous_versions:
- version: 2.0.0
status: deprecated
deprecated_at: "2025-12-01T00:00:00Z"
sunset_at: "2026-06-01T00:00:00Z"
- version: 1.5.0
status: retired
retired_at: "2025-11-01T00:00:00Z"
Version Lifecycle
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Version Lifecycle β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Development β Beta β Production β Deprecated β Retired β
β β β β β β β
β Internal Limited General Sunset Removed β
β Testing Access Available Warning β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Status Definitions
Development (
dev)- Internal testing only
- Unstable, subject to change
- No SLA guarantees
- Duration: Variable
Beta (
beta)- Limited external access
- Feature-complete but may have bugs
- Limited SLA (95% uptime)
- Duration: 2-4 weeks
Production (
production)- Generally available
- Full SLA guarantees (99.9% uptime)
- Fully supported
- Duration: Until deprecated
Deprecated (
deprecated)- Still available but not recommended
- Security updates only
- Sunset date announced
- Duration: 6 months minimum
Retired (
retired)- No longer available
- Removed from production
- Historical reference only
Deprecation Policy
class DeprecationPolicy:
# Minimum notice periods
MAJOR_VERSION_NOTICE = 180 # 6 months
MINOR_VERSION_NOTICE = 90 # 3 months
PATCH_VERSION_NOTICE = 30 # 1 month
@staticmethod
def deprecate_version(capability_id, version, reason):
"""
Deprecate a capability version
Args:
capability_id: Capability identifier
version: Version to deprecate
reason: Reason for deprecation
"""
# Calculate sunset date based on version type
version_parts = version.split('.')
major_change = int(version_parts[0]) > 1
if major_change:
sunset_days = DeprecationPolicy.MAJOR_VERSION_NOTICE
else:
sunset_days = DeprecationPolicy.MINOR_VERSION_NOTICE
sunset_date = datetime.now() + timedelta(days=sunset_days)
# Update capability status
update_capability_status(
capability_id=capability_id,
version=version,
status='deprecated',
deprecated_at=datetime.now(),
sunset_at=sunset_date,
deprecation_reason=reason
)
# Notify users
notify_deprecation(
capability_id=capability_id,
version=version,
sunset_date=sunset_date,
reason=reason
)
# Add deprecation warning to API responses
add_deprecation_header(
capability_id=capability_id,
version=version,
sunset_date=sunset_date
)
Deprecation Headers
HTTP/1.1 200 OK
Deprecation: true
Sunset: Sat, 01 Jun 2026 00:00:00 GMT
Link: <https://docs.bdragentfactory.com/migration/v2>; rel="deprecation"
Warning: 299 - "This capability version is deprecated and will be retired on 2026-06-01"
Model Versioning
Model Version Format
version: 2.1.0-bert-large-20260103
β β β β β
β β β β ββ Training date (YYYYMMDD)
β β β ββββββββββββ Model architecture
β β βββββββββββββββ Patch version
β βββββββββββββββββ Minor version
βββββββββββββββββββ Major version
Model Registry
class ModelRegistry:
def __init__(self):
self.models = {}
def register_model(self, capability_id, version, model_info):
"""
Register a new model version
Args:
capability_id: Capability identifier
version: Model version
model_info: Model metadata
"""
model_record = {
'capability_id': capability_id,
'version': version,
'architecture': model_info['architecture'],
'training_date': model_info['training_date'],
'training_data_size': model_info['training_data_size'],
'performance_metrics': model_info['metrics'],
'model_path': model_info['path'],
'checksum': model_info['checksum'],
'status': 'registered',
'registered_at': datetime.now()
}
self.models[f"{capability_id}:{version}"] = model_record
return model_record
def get_model(self, capability_id, version='latest'):
"""
Retrieve model by version
Args:
capability_id: Capability identifier
version: Model version or 'latest'
"""
if version == 'latest':
# Get latest production version
versions = [
v for k, v in self.models.items()
if k.startswith(f"{capability_id}:") and v['status'] == 'production'
]
if versions:
return max(versions, key=lambda x: x['version'])
return self.models.get(f"{capability_id}:{version}")
Model Performance Tracking
class ModelPerformanceTracker:
def __init__(self):
self.metrics = {}
def track_performance(self, capability_id, version, metrics):
"""
Track model performance metrics
Args:
capability_id: Capability identifier
version: Model version
metrics: Performance metrics
"""
key = f"{capability_id}:{version}"
if key not in self.metrics:
self.metrics[key] = []
self.metrics[key].append({
'timestamp': datetime.now(),
'accuracy': metrics.get('accuracy'),
'precision': metrics.get('precision'),
'recall': metrics.get('recall'),
'f1_score': metrics.get('f1_score'),
'latency_ms': metrics.get('latency_ms'),
'throughput_rps': metrics.get('throughput_rps')
})
def compare_versions(self, capability_id, version1, version2):
"""
Compare performance between two versions
Args:
capability_id: Capability identifier
version1: First version
version2: Second version
"""
metrics1 = self.get_average_metrics(capability_id, version1)
metrics2 = self.get_average_metrics(capability_id, version2)
comparison = {}
for metric in metrics1.keys():
if metric in metrics2:
diff = metrics2[metric] - metrics1[metric]
pct_change = (diff / metrics1[metric]) * 100 if metrics1[metric] != 0 else 0
comparison[metric] = {
'version1': metrics1[metric],
'version2': metrics2[metric],
'difference': diff,
'percent_change': pct_change
}
return comparison
Change Management
Change Request Process
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Change Request Workflow β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β 1. Submit Change Request β
β β β
β 2. Technical Review β
β β β
β 3. Impact Assessment β
β β β
β 4. Governance Approval β
β β β
β 5. Implementation β
β β β
β 6. Testing & Validation β
β β β
β 7. Deployment β
β β β
β 8. Post-Deployment Verification β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Change Request Template
change_request:
id: CR-2026-001
title: "Upgrade Text Classification to BERT-Large"
type: minor_version # major_version, minor_version, patch
capability_id: cap_text_classification
current_version: 2.0.0
proposed_version: 2.1.0
description: |
Upgrade text classification model from BERT-Base to BERT-Large
to improve accuracy on complex insurance claim descriptions.
justification: |
Current model accuracy is 92%. BERT-Large achieves 95% accuracy
in testing, reducing misclassification rate by 37.5%.
impact_assessment:
breaking_changes: false
backward_compatible: true
affected_systems:
- ClaimsGPT
- CustomerServiceAgent
estimated_downtime: 0 minutes
rollback_plan: "Revert to v2.0.0 via feature flag"
testing:
unit_tests: passed
integration_tests: passed
performance_tests: passed
compliance_tests: passed
approvals:
technical_lead: approved
security_team: approved
compliance_team: approved
product_owner: approved
deployment:
strategy: canary # blue_green, rolling, canary
rollout_percentage: 10%
monitoring_period: 24 hours
success_criteria:
- error_rate < 0.1%
- p95_latency < 300ms
- accuracy > 94%
Rollback Procedures
Automated Rollback
class RollbackManager:
def __init__(self):
self.rollback_triggers = {
'error_rate': 0.05, # 5% error rate
'latency_p95': 500, # 500ms P95 latency
'accuracy_drop': 0.02, # 2% accuracy drop
}
def monitor_deployment(self, capability_id, new_version, old_version):
"""
Monitor deployment and trigger rollback if needed
Args:
capability_id: Capability identifier
new_version: Newly deployed version
old_version: Previous version
"""
metrics = self.get_current_metrics(capability_id, new_version)
# Check error rate
if metrics['error_rate'] > self.rollback_triggers['error_rate']:
self.trigger_rollback(
capability_id,
new_version,
old_version,
reason='High error rate'
)
return
# Check latency
if metrics['latency_p95'] > self.rollback_triggers['latency_p95']:
self.trigger_rollback(
capability_id,
new_version,
old_version,
reason='High latency'
)
return
# Check accuracy
baseline_accuracy = self.get_baseline_accuracy(capability_id, old_version)
if metrics['accuracy'] < baseline_accuracy - self.rollback_triggers['accuracy_drop']:
self.trigger_rollback(
capability_id,
new_version,
old_version,
reason='Accuracy degradation'
)
return
def trigger_rollback(self, capability_id, from_version, to_version, reason):
"""
Trigger automatic rollback
Args:
capability_id: Capability identifier
from_version: Version to roll back from
to_version: Version to roll back to
reason: Reason for rollback
"""
logger.warning(
f"Triggering rollback for {capability_id}",
from_version=from_version,
to_version=to_version,
reason=reason
)
# Update feature flag to route to old version
self.update_version_routing(
capability_id=capability_id,
version=to_version,
percentage=100
)
# Create incident
self.create_rollback_incident(
capability_id=capability_id,
from_version=from_version,
to_version=to_version,
reason=reason
)
# Notify team
self.notify_rollback(
capability_id=capability_id,
from_version=from_version,
to_version=to_version,
reason=reason
)
Manual Rollback
# Rollback capability to previous version
./scripts/rollback.sh cap_text_classification 2.0.0
# Verify rollback
curl -X GET "https://api.bdragentfactory.com/v1/capabilities/cap_text_classification" \
-H "Authorization: Bearer $TOKEN" | jq '.version'
Deployment Strategies
1. Blue-Green Deployment
class BlueGreenDeployment:
def deploy(self, capability_id, new_version):
"""
Deploy new version using blue-green strategy
Args:
capability_id: Capability identifier
new_version: New version to deploy
"""
# Deploy to green environment
self.deploy_to_environment(
capability_id=capability_id,
version=new_version,
environment='green'
)
# Run smoke tests
if not self.run_smoke_tests('green'):
raise Exception('Smoke tests failed')
# Switch traffic to green
self.switch_traffic('green')
# Monitor for issues
self.monitor_deployment(capability_id, new_version)
# If successful, green becomes blue
self.promote_environment('green', 'blue')
2. Canary Deployment
class CanaryDeployment:
def deploy(self, capability_id, new_version, canary_percentage=10):
"""
Deploy new version using canary strategy
Args:
capability_id: Capability identifier
new_version: New version to deploy
canary_percentage: Percentage of traffic to route to new version
"""
# Deploy canary
self.deploy_canary(
capability_id=capability_id,
version=new_version
)
# Route small percentage of traffic
self.update_traffic_split(
capability_id=capability_id,
canary_version=new_version,
canary_percentage=canary_percentage
)
# Monitor canary
canary_healthy = self.monitor_canary(
capability_id=capability_id,
version=new_version,
duration_minutes=30
)
if canary_healthy:
# Gradually increase traffic
for percentage in [25, 50, 75, 100]:
self.update_traffic_split(
capability_id=capability_id,
canary_version=new_version,
canary_percentage=percentage
)
time.sleep(600) # Wait 10 minutes
if not self.monitor_canary(capability_id, new_version, 10):
self.rollback(capability_id, new_version)
return False
else:
self.rollback(capability_id, new_version)
return False
return True
3. Rolling Deployment
class RollingDeployment:
def deploy(self, capability_id, new_version, batch_size=1):
"""
Deploy new version using rolling strategy
Args:
capability_id: Capability identifier
new_version: New version to deploy
batch_size: Number of instances to update at once
"""
instances = self.get_instances(capability_id)
for i in range(0, len(instances), batch_size):
batch = instances[i:i+batch_size]
# Update batch
for instance in batch:
self.update_instance(
instance_id=instance.id,
version=new_version
)
# Wait for health check
if not self.wait_for_healthy(batch):
self.rollback_batch(batch)
raise Exception('Deployment failed')
# Monitor batch
time.sleep(60) # Wait 1 minute between batches
Version Compatibility Matrix
compatibility_matrix:
api_v1:
compatible_capability_versions:
- 1.x.x
- 2.x.x
api_v2:
compatible_capability_versions:
- 2.x.x
- 3.x.x
capability_v2:
compatible_systems:
- ClaimsGPT: ">=2.0.0"
- FraudDetectionAgent: ">=1.5.0"
- PolicyIntelligenceAgent: ">=1.0.0"
compatible_models:
- bert-base: ">=1.0.0"
- bert-large: ">=2.0.0"
- roberta: ">=2.1.0"
Migration Guides
Migration from v1 to v2
# Migration Guide: v1.x to v2.x
## Breaking Changes
1. **API Endpoint Changes**
- Old: `/capabilities/{id}/classify`
- New: `/capabilities/{id}/invoke`
2. **Request Format**
- Old: `{"text": "..."}`
- New: `{"input": {"text": "..."}}`
3. **Response Format**
- Old: `{"class": "...", "score": 0.95}`
- New: `{"result": {"predicted_class": "...", "confidence": 0.95}}`
## Migration Steps
1. Update API endpoint URLs
2. Update request payload structure
3. Update response parsing logic
4. Test with v2 in staging environment
5. Deploy to production
## Code Examples
### Before (v1)
```python
response = client.post(
f"/capabilities/{capability_id}/classify",
json={"text": "Claim description"}
)
result_class = response.json()["class"]
After (v2)
response = client.post(
f"/capabilities/{capability_id}/invoke",
json={"input": {"text": "Claim description"}}
)
result_class = response.json()["result"]["predicted_class"]
---
## Version Documentation
### CHANGELOG.md
```markdown
# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [2.1.0] - 2026-01-03
### Added
- New BERT-Large model for improved accuracy
- Support for batch processing
- Enhanced explainability features
### Changed
- Improved P95 latency from 300ms to 250ms
- Updated model accuracy from 92% to 95%
### Fixed
- Fixed edge case with special characters in input
- Resolved memory leak in batch processing
### Security
- Updated dependencies to patch CVE-2025-12345
## [2.0.0] - 2025-12-01
### Added
- New API v2 with improved request/response format
- Support for multiple compliance frameworks
### Changed
- **BREAKING**: Changed API endpoint from `/classify` to `/invoke`
- **BREAKING**: Updated request/response format
### Deprecated
- API v1 (sunset date: 2026-06-01)
### Removed
- Legacy authentication method
Best Practices
- Always use semantic versioning
- Maintain backward compatibility in minor versions
- Provide migration guides for major versions
- Give adequate deprecation notice (6 months minimum)
- Test thoroughly before releasing
- Monitor deployments closely
- Have rollback procedures ready
- Document all changes in CHANGELOG
- Version models separately from capabilities
- Track performance across versions
Support
For version control questions:
- Documentation: https://docs.bdragentfactory.com/versioning
- Email: engineering@bdragentfactory.com