# Version Control Strategy - BDR Agent Factory ## Overview Comprehensive versioning strategy for AI capabilities, models, and system components to ensure backward compatibility, traceability, and controlled rollouts. --- ## Semantic Versioning ### Version Format: MAJOR.MINOR.PATCH ``` v1.2.3 │ │ │ │ │ └─ PATCH: Bug fixes, minor improvements (backward compatible) │ └─── MINOR: New features, enhancements (backward compatible) └───── MAJOR: Breaking changes (not backward compatible) ``` ### Version Increment Rules #### MAJOR Version (X.0.0) Increment when: - Breaking API changes - Incompatible capability interface changes - Major model architecture changes - Removal of deprecated features - Significant governance requirement changes **Example**: `1.5.2` → `2.0.0` #### MINOR Version (x.Y.0) Increment when: - New capabilities added - New features in existing capabilities - Model performance improvements - New compliance framework support - Backward-compatible API enhancements **Example**: `1.5.2` → `1.6.0` #### PATCH Version (x.y.Z) Increment when: - Bug fixes - Security patches - Performance optimizations - Documentation updates - Minor model fine-tuning **Example**: `1.5.2` → `1.5.3` --- ## Capability Versioning ### Capability Version Structure ```yaml id: cap_text_classification name: Text Classification version: 2.1.0 model_version: 2.1.0-bert-large api_version: v1 status: production released_at: "2026-01-03T00:00:00Z" previous_versions: - version: 2.0.0 status: deprecated deprecated_at: "2025-12-01T00:00:00Z" sunset_at: "2026-06-01T00:00:00Z" - version: 1.5.0 status: retired retired_at: "2025-11-01T00:00:00Z" ``` ### Version Lifecycle ``` ┌──────────────────────────────────────────────────────────────────┐ │ Version Lifecycle │ ├──────────────────────────────────────────────────────────────────┤ │ │ │ Development → Beta → Production → Deprecated → Retired │ │ ↓ ↓ ↓ ↓ ↓ │ │ Internal Limited General Sunset Removed │ │ Testing Access Available Warning │ │ │ └──────────────────────────────────────────────────────────────────┘ ``` #### Status Definitions 1. **Development** (`dev`) - Internal testing only - Unstable, subject to change - No SLA guarantees - Duration: Variable 2. **Beta** (`beta`) - Limited external access - Feature-complete but may have bugs - Limited SLA (95% uptime) - Duration: 2-4 weeks 3. **Production** (`production`) - Generally available - Full SLA guarantees (99.9% uptime) - Fully supported - Duration: Until deprecated 4. **Deprecated** (`deprecated`) - Still available but not recommended - Security updates only - Sunset date announced - Duration: 6 months minimum 5. **Retired** (`retired`) - No longer available - Removed from production - Historical reference only ### Deprecation Policy ```python class DeprecationPolicy: # Minimum notice periods MAJOR_VERSION_NOTICE = 180 # 6 months MINOR_VERSION_NOTICE = 90 # 3 months PATCH_VERSION_NOTICE = 30 # 1 month @staticmethod def deprecate_version(capability_id, version, reason): """ Deprecate a capability version Args: capability_id: Capability identifier version: Version to deprecate reason: Reason for deprecation """ # Calculate sunset date based on version type version_parts = version.split('.') major_change = int(version_parts[0]) > 1 if major_change: sunset_days = DeprecationPolicy.MAJOR_VERSION_NOTICE else: sunset_days = DeprecationPolicy.MINOR_VERSION_NOTICE sunset_date = datetime.now() + timedelta(days=sunset_days) # Update capability status update_capability_status( capability_id=capability_id, version=version, status='deprecated', deprecated_at=datetime.now(), sunset_at=sunset_date, deprecation_reason=reason ) # Notify users notify_deprecation( capability_id=capability_id, version=version, sunset_date=sunset_date, reason=reason ) # Add deprecation warning to API responses add_deprecation_header( capability_id=capability_id, version=version, sunset_date=sunset_date ) ``` ### Deprecation Headers ```http HTTP/1.1 200 OK Deprecation: true Sunset: Sat, 01 Jun 2026 00:00:00 GMT Link: ; rel="deprecation" Warning: 299 - "This capability version is deprecated and will be retired on 2026-06-01" ``` --- ## Model Versioning ### Model Version Format ``` version: 2.1.0-bert-large-20260103 │ │ │ │ │ │ │ │ │ └─ Training date (YYYYMMDD) │ │ │ └─────────── Model architecture │ │ └────────────── Patch version │ └──────────────── Minor version └────────────────── Major version ``` ### Model Registry ```python class ModelRegistry: def __init__(self): self.models = {} def register_model(self, capability_id, version, model_info): """ Register a new model version Args: capability_id: Capability identifier version: Model version model_info: Model metadata """ model_record = { 'capability_id': capability_id, 'version': version, 'architecture': model_info['architecture'], 'training_date': model_info['training_date'], 'training_data_size': model_info['training_data_size'], 'performance_metrics': model_info['metrics'], 'model_path': model_info['path'], 'checksum': model_info['checksum'], 'status': 'registered', 'registered_at': datetime.now() } self.models[f"{capability_id}:{version}"] = model_record return model_record def get_model(self, capability_id, version='latest'): """ Retrieve model by version Args: capability_id: Capability identifier version: Model version or 'latest' """ if version == 'latest': # Get latest production version versions = [ v for k, v in self.models.items() if k.startswith(f"{capability_id}:") and v['status'] == 'production' ] if versions: return max(versions, key=lambda x: x['version']) return self.models.get(f"{capability_id}:{version}") ``` ### Model Performance Tracking ```python class ModelPerformanceTracker: def __init__(self): self.metrics = {} def track_performance(self, capability_id, version, metrics): """ Track model performance metrics Args: capability_id: Capability identifier version: Model version metrics: Performance metrics """ key = f"{capability_id}:{version}" if key not in self.metrics: self.metrics[key] = [] self.metrics[key].append({ 'timestamp': datetime.now(), 'accuracy': metrics.get('accuracy'), 'precision': metrics.get('precision'), 'recall': metrics.get('recall'), 'f1_score': metrics.get('f1_score'), 'latency_ms': metrics.get('latency_ms'), 'throughput_rps': metrics.get('throughput_rps') }) def compare_versions(self, capability_id, version1, version2): """ Compare performance between two versions Args: capability_id: Capability identifier version1: First version version2: Second version """ metrics1 = self.get_average_metrics(capability_id, version1) metrics2 = self.get_average_metrics(capability_id, version2) comparison = {} for metric in metrics1.keys(): if metric in metrics2: diff = metrics2[metric] - metrics1[metric] pct_change = (diff / metrics1[metric]) * 100 if metrics1[metric] != 0 else 0 comparison[metric] = { 'version1': metrics1[metric], 'version2': metrics2[metric], 'difference': diff, 'percent_change': pct_change } return comparison ``` --- ## Change Management ### Change Request Process ``` ┌─────────────────────────────────────────────────────────────────┐ │ Change Request Workflow │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ 1. Submit Change Request │ │ ↓ │ │ 2. Technical Review │ │ ↓ │ │ 3. Impact Assessment │ │ ↓ │ │ 4. Governance Approval │ │ ↓ │ │ 5. Implementation │ │ ↓ │ │ 6. Testing & Validation │ │ ↓ │ │ 7. Deployment │ │ ↓ │ │ 8. Post-Deployment Verification │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` ### Change Request Template ```yaml change_request: id: CR-2026-001 title: "Upgrade Text Classification to BERT-Large" type: minor_version # major_version, minor_version, patch capability_id: cap_text_classification current_version: 2.0.0 proposed_version: 2.1.0 description: | Upgrade text classification model from BERT-Base to BERT-Large to improve accuracy on complex insurance claim descriptions. justification: | Current model accuracy is 92%. BERT-Large achieves 95% accuracy in testing, reducing misclassification rate by 37.5%. impact_assessment: breaking_changes: false backward_compatible: true affected_systems: - ClaimsGPT - CustomerServiceAgent estimated_downtime: 0 minutes rollback_plan: "Revert to v2.0.0 via feature flag" testing: unit_tests: passed integration_tests: passed performance_tests: passed compliance_tests: passed approvals: technical_lead: approved security_team: approved compliance_team: approved product_owner: approved deployment: strategy: canary # blue_green, rolling, canary rollout_percentage: 10% monitoring_period: 24 hours success_criteria: - error_rate < 0.1% - p95_latency < 300ms - accuracy > 94% ``` --- ## Rollback Procedures ### Automated Rollback ```python class RollbackManager: def __init__(self): self.rollback_triggers = { 'error_rate': 0.05, # 5% error rate 'latency_p95': 500, # 500ms P95 latency 'accuracy_drop': 0.02, # 2% accuracy drop } def monitor_deployment(self, capability_id, new_version, old_version): """ Monitor deployment and trigger rollback if needed Args: capability_id: Capability identifier new_version: Newly deployed version old_version: Previous version """ metrics = self.get_current_metrics(capability_id, new_version) # Check error rate if metrics['error_rate'] > self.rollback_triggers['error_rate']: self.trigger_rollback( capability_id, new_version, old_version, reason='High error rate' ) return # Check latency if metrics['latency_p95'] > self.rollback_triggers['latency_p95']: self.trigger_rollback( capability_id, new_version, old_version, reason='High latency' ) return # Check accuracy baseline_accuracy = self.get_baseline_accuracy(capability_id, old_version) if metrics['accuracy'] < baseline_accuracy - self.rollback_triggers['accuracy_drop']: self.trigger_rollback( capability_id, new_version, old_version, reason='Accuracy degradation' ) return def trigger_rollback(self, capability_id, from_version, to_version, reason): """ Trigger automatic rollback Args: capability_id: Capability identifier from_version: Version to roll back from to_version: Version to roll back to reason: Reason for rollback """ logger.warning( f"Triggering rollback for {capability_id}", from_version=from_version, to_version=to_version, reason=reason ) # Update feature flag to route to old version self.update_version_routing( capability_id=capability_id, version=to_version, percentage=100 ) # Create incident self.create_rollback_incident( capability_id=capability_id, from_version=from_version, to_version=to_version, reason=reason ) # Notify team self.notify_rollback( capability_id=capability_id, from_version=from_version, to_version=to_version, reason=reason ) ``` ### Manual Rollback ```bash # Rollback capability to previous version ./scripts/rollback.sh cap_text_classification 2.0.0 # Verify rollback curl -X GET "https://api.bdragentfactory.com/v1/capabilities/cap_text_classification" \ -H "Authorization: Bearer $TOKEN" | jq '.version' ``` --- ## Deployment Strategies ### 1. Blue-Green Deployment ```python class BlueGreenDeployment: def deploy(self, capability_id, new_version): """ Deploy new version using blue-green strategy Args: capability_id: Capability identifier new_version: New version to deploy """ # Deploy to green environment self.deploy_to_environment( capability_id=capability_id, version=new_version, environment='green' ) # Run smoke tests if not self.run_smoke_tests('green'): raise Exception('Smoke tests failed') # Switch traffic to green self.switch_traffic('green') # Monitor for issues self.monitor_deployment(capability_id, new_version) # If successful, green becomes blue self.promote_environment('green', 'blue') ``` ### 2. Canary Deployment ```python class CanaryDeployment: def deploy(self, capability_id, new_version, canary_percentage=10): """ Deploy new version using canary strategy Args: capability_id: Capability identifier new_version: New version to deploy canary_percentage: Percentage of traffic to route to new version """ # Deploy canary self.deploy_canary( capability_id=capability_id, version=new_version ) # Route small percentage of traffic self.update_traffic_split( capability_id=capability_id, canary_version=new_version, canary_percentage=canary_percentage ) # Monitor canary canary_healthy = self.monitor_canary( capability_id=capability_id, version=new_version, duration_minutes=30 ) if canary_healthy: # Gradually increase traffic for percentage in [25, 50, 75, 100]: self.update_traffic_split( capability_id=capability_id, canary_version=new_version, canary_percentage=percentage ) time.sleep(600) # Wait 10 minutes if not self.monitor_canary(capability_id, new_version, 10): self.rollback(capability_id, new_version) return False else: self.rollback(capability_id, new_version) return False return True ``` ### 3. Rolling Deployment ```python class RollingDeployment: def deploy(self, capability_id, new_version, batch_size=1): """ Deploy new version using rolling strategy Args: capability_id: Capability identifier new_version: New version to deploy batch_size: Number of instances to update at once """ instances = self.get_instances(capability_id) for i in range(0, len(instances), batch_size): batch = instances[i:i+batch_size] # Update batch for instance in batch: self.update_instance( instance_id=instance.id, version=new_version ) # Wait for health check if not self.wait_for_healthy(batch): self.rollback_batch(batch) raise Exception('Deployment failed') # Monitor batch time.sleep(60) # Wait 1 minute between batches ``` --- ## Version Compatibility Matrix ```yaml compatibility_matrix: api_v1: compatible_capability_versions: - 1.x.x - 2.x.x api_v2: compatible_capability_versions: - 2.x.x - 3.x.x capability_v2: compatible_systems: - ClaimsGPT: ">=2.0.0" - FraudDetectionAgent: ">=1.5.0" - PolicyIntelligenceAgent: ">=1.0.0" compatible_models: - bert-base: ">=1.0.0" - bert-large: ">=2.0.0" - roberta: ">=2.1.0" ``` --- ## Migration Guides ### Migration from v1 to v2 ```markdown # Migration Guide: v1.x to v2.x ## Breaking Changes 1. **API Endpoint Changes** - Old: `/capabilities/{id}/classify` - New: `/capabilities/{id}/invoke` 2. **Request Format** - Old: `{"text": "..."}` - New: `{"input": {"text": "..."}}` 3. **Response Format** - Old: `{"class": "...", "score": 0.95}` - New: `{"result": {"predicted_class": "...", "confidence": 0.95}}` ## Migration Steps 1. Update API endpoint URLs 2. Update request payload structure 3. Update response parsing logic 4. Test with v2 in staging environment 5. Deploy to production ## Code Examples ### Before (v1) ```python response = client.post( f"/capabilities/{capability_id}/classify", json={"text": "Claim description"} ) result_class = response.json()["class"] ``` ### After (v2) ```python response = client.post( f"/capabilities/{capability_id}/invoke", json={"input": {"text": "Claim description"}} ) result_class = response.json()["result"]["predicted_class"] ``` ``` --- ## Version Documentation ### CHANGELOG.md ```markdown # Changelog All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [2.1.0] - 2026-01-03 ### Added - New BERT-Large model for improved accuracy - Support for batch processing - Enhanced explainability features ### Changed - Improved P95 latency from 300ms to 250ms - Updated model accuracy from 92% to 95% ### Fixed - Fixed edge case with special characters in input - Resolved memory leak in batch processing ### Security - Updated dependencies to patch CVE-2025-12345 ## [2.0.0] - 2025-12-01 ### Added - New API v2 with improved request/response format - Support for multiple compliance frameworks ### Changed - **BREAKING**: Changed API endpoint from `/classify` to `/invoke` - **BREAKING**: Updated request/response format ### Deprecated - API v1 (sunset date: 2026-06-01) ### Removed - Legacy authentication method ``` --- ## Best Practices 1. **Always use semantic versioning** 2. **Maintain backward compatibility in minor versions** 3. **Provide migration guides for major versions** 4. **Give adequate deprecation notice (6 months minimum)** 5. **Test thoroughly before releasing** 6. **Monitor deployments closely** 7. **Have rollback procedures ready** 8. **Document all changes in CHANGELOG** 9. **Version models separately from capabilities** 10. **Track performance across versions** --- ## Support For version control questions: - Documentation: https://docs.bdragentfactory.com/versioning - Email: engineering@bdragentfactory.com