--- language: en license: mit library_name: pytorch tags: - task-routing - multi-task-learning - foundation-model - synthetic-data - balanced-training - software-engineering metrics: - accuracy model-index: - name: corch-v13-balanced results: - task: type: text-classification name: Task Routing metrics: - type: accuracy value: 87.30 name: Average Accuracy - type: accuracy value: 100.00 name: Domain Accuracy - type: accuracy value: 100.00 name: Capability Accuracy --- # Corch V13 Balanced: Task Routing Foundation Model **87.30% Average Accuracy** | Perfect Domain & Capability Classification A multi-task foundation model for intelligent software engineering task routing, achieving breakthrough performance through balanced synthetic data generation. ## Model Description Corch V13 Balanced is a 805K parameter neural network that classifies software engineering tasks across 4 dimensions: 1. **Domain** (19 classes): frontend, backend, machine_learning, etc. - **100% accuracy** 🎯 2. **Capability** (8 classes): code_generation, debugging, testing, etc. - **100% accuracy** 🎯 3. **Strategy** (2 classes): DIRECT vs ORCHESTRATE - **85.98% accuracy** 4. **Execution Type** (5 classes): single_task, multi_step, etc. - **63.20% accuracy** ## Performance | Task | Accuracy | Improvement from V10 | |------|----------|---------------------| | **Average** | **87.30%** | +20.46% | | **Domain** | **100.00%** 🎯 | +14.59% | | **Capability** | **100.00%** 🎯 | +39.61% | | **Strategy** | **85.98%** | +12.55% | | **Execution** | **63.20%** | +7.94% | ## Key Innovation: Balanced Synthetic Data The breakthrough came from solving severe class imbalance (324:1 ratio): - Generated **49,307 synthetic examples** using GPT-5-Pro - Balanced dataset to ~10K examples per domain - Eliminated rare class zero-accuracy problem **Before balancing:** - `machine_learning` domain: 88 examples → 0% accuracy - `other` domain: 57 examples → 0% accuracy **After balancing:** - All domains: ~10K examples → 100% accuracy ✅ ## Architecture ``` Input Text → BGE-large-en-v1.5 Embedding (1024d) ↓ Shared Layers: - Linear(1024 → 512) + ReLU + Dropout(0.3) - Linear(512 → 512) + ReLU + Dropout(0.3) ↓ Task-Specific Heads: ├─ Strategy Head → Linear(512 → 2) ├─ Capability Head → Linear(512 → 8) ├─ Domain Head → Linear(512 → 19) └─ Execution Head → Linear(512 → 5) ``` **Parameters:** 804,898 **Training Time:** ~1 minute (30 epochs, early stopped) **Hardware:** AMD MI300X GPU ## Usage ```python import torch from transformers import AutoTokenizer, AutoModel # Load BGE embedding model tokenizer = AutoTokenizer.from_pretrained("BAAI/bge-large-en-v1.5") embedding_model = AutoModel.from_pretrained("BAAI/bge-large-en-v1.5") # Load Corch V13 Balanced model from huggingface_hub import hf_hub_download model_path = hf_hub_download(repo_id="bledden/corch-v13-balanced", filename="model_v13_balanced.pt") # Initialize model class FoundationModelV13(torch.nn.Module): def __init__(self): super().__init__() self.shared = torch.nn.Sequential( torch.nn.Linear(1024, 512), torch.nn.ReLU(), torch.nn.Dropout(0.3), torch.nn.Linear(512, 512), torch.nn.ReLU(), torch.nn.Dropout(0.3) ) self.strategy_head = torch.nn.Linear(512, 2) self.capability_head = torch.nn.Linear(512, 8) self.domain_head = torch.nn.Linear(512, 19) self.execution_head = torch.nn.Linear(512, 5) def forward(self, x): shared = self.shared(x) return { 'strategy': self.strategy_head(shared), 'capability': self.capability_head(shared), 'domain': self.domain_head(shared), 'execution': self.execution_head(shared) } model = FoundationModelV13() checkpoint = torch.load(model_path, weights_only=True) model.load_state_dict(checkpoint['model_state_dict']) model.eval() # Embed and predict def route_task(task_text): # Generate embedding inputs = tokenizer(task_text, return_tensors="pt", truncation=True, max_length=512) with torch.no_grad(): embedding = embedding_model(**inputs).last_hidden_state[:, 0, :] # Get predictions with torch.no_grad(): outputs = model(embedding) strategy = ["DIRECT", "ORCHESTRATE"][outputs['strategy'].argmax().item()] capability = ["code_generation", "debugging", "documentation", "optimization", "refactoring", "testing", "design", "data_analysis"][outputs['capability'].argmax().item()] domain = ["frontend", "backend", "data_processing", "machine_learning", "devops", "testing", "security", "mobile", "data_engineering", "cloud", "database", "api", "ui_ux", "general", "iot", "blockchain", "game_dev", "embedded", "other"][outputs['domain'].argmax().item()] execution = ["single_task", "multi_step", "iterative", "parallel", "sequential"][outputs['execution'].argmax().item()] return { "strategy": strategy, "capability": capability, "domain": domain, "execution_type": execution } # Example result = route_task("Build a CNN image classifier using PyTorch for medical imaging") print(result) # { # 'strategy': 'ORCHESTRATE', # 'capability': 'code_generation', # 'domain': 'machine_learning', # 100% confidence # 'execution_type': 'multi_step' # } ``` ## Training Data - **Training set:** 31,592 examples (balanced) - **Validation set:** 3,495 examples - **Synthetic examples:** 49,307 (generated via GPT-5-Pro) - **Real examples:** ~550K (existing dataset) - **Final dataset:** Balanced to ~10K per domain ### Synthetic Data Generation Used GPT-5-Pro with domain-specific prompts: ``` Generate a realistic software engineering task for: {domain} Required: {capability}, {execution_type}, {strategy} Output: 1-3 sentence task description with realistic terminology ``` **Cost:** ~$500 for 49,307 examples **Quality:** 100% unique, zero duplicates, validated schemas ## Label Mappings **Strategy (2):** DIRECT, ORCHESTRATE **Capability (8):** code_generation, debugging, documentation, optimization, refactoring, testing, design, data_analysis **Domain (19):** frontend, backend, data_processing, machine_learning, devops, testing, security, mobile, data_engineering, cloud, database, api, ui_ux, general, iot, blockchain, game_dev, embedded, other **Execution (5):** single_task, multi_step, iterative, parallel, sequential ## Comparison to Baselines | Model | Architecture | Data | Avg Acc | Domain Acc | |-------|--------------|------|---------|------------| | Logistic Regression | Single-task | Imbalanced | 74.61% | 74.61% | | V10 | Multi-task | Imbalanced | 66.84% | 85.41% | | **V13 Balanced** | **Multi-task** | **Balanced** | **87.30%** | **100.00%** | ## Limitations - Execution type prediction (63.20%) still has room for improvement - Context-independent (doesn't use conversation history yet) - English-only - Focused on software engineering tasks ## Citation ```bibtex @software{corch_v13_balanced_2024, title = {Corch V13 Balanced: Task Routing Foundation Model}, author = {Bledden, Team}, year = {2024}, publisher = {Hugging Face}, url = {https://huggingface.co/bledden/corch-v13-balanced}, note = {87.30% accuracy via balanced synthetic data generation} } ``` ## License MIT License ## Links - **GitHub:** https://github.com/bledden/Corch_by_Fac - **Release Notes:** [RELEASE_V13_BALANCED.md](https://github.com/bledden/Corch_by_Fac/blob/main/RELEASE_V13_BALANCED.md) - **Training Script:** [train_v13_option5_balanced.py](https://github.com/bledden/Corch_by_Fac/blob/main/training/scripts/train_v13_option5_balanced.py) --- Built with ❤️ by the Corch Team | Powered by balanced synthetic data generation