corch-v13-balanced / README.md
NotNow
Upload README.md with huggingface_hub
77e909c verified
---
language: en
license: mit
library_name: pytorch
tags:
- task-routing
- multi-task-learning
- foundation-model
- synthetic-data
- balanced-training
- software-engineering
metrics:
- accuracy
model-index:
- name: corch-v13-balanced
results:
- task:
type: text-classification
name: Task Routing
metrics:
- type: accuracy
value: 87.30
name: Average Accuracy
- type: accuracy
value: 100.00
name: Domain Accuracy
- type: accuracy
value: 100.00
name: Capability Accuracy
---
# Corch V13 Balanced: Task Routing Foundation Model
**87.30% Average Accuracy** | Perfect Domain & Capability Classification
A multi-task foundation model for intelligent software engineering task routing, achieving breakthrough performance through balanced synthetic data generation.
## Model Description
Corch V13 Balanced is a 805K parameter neural network that classifies software engineering tasks across 4 dimensions:
1. **Domain** (19 classes): frontend, backend, machine_learning, etc. - **100% accuracy** 🎯
2. **Capability** (8 classes): code_generation, debugging, testing, etc. - **100% accuracy** 🎯
3. **Strategy** (2 classes): DIRECT vs ORCHESTRATE - **85.98% accuracy**
4. **Execution Type** (5 classes): single_task, multi_step, etc. - **63.20% accuracy**
## Performance
| Task | Accuracy | Improvement from V10 |
|------|----------|---------------------|
| **Average** | **87.30%** | +20.46% |
| **Domain** | **100.00%** 🎯 | +14.59% |
| **Capability** | **100.00%** 🎯 | +39.61% |
| **Strategy** | **85.98%** | +12.55% |
| **Execution** | **63.20%** | +7.94% |
## Key Innovation: Balanced Synthetic Data
The breakthrough came from solving severe class imbalance (324:1 ratio):
- Generated **49,307 synthetic examples** using GPT-5-Pro
- Balanced dataset to ~10K examples per domain
- Eliminated rare class zero-accuracy problem
**Before balancing:**
- `machine_learning` domain: 88 examples → 0% accuracy
- `other` domain: 57 examples → 0% accuracy
**After balancing:**
- All domains: ~10K examples → 100% accuracy ✅
## Architecture
```
Input Text → BGE-large-en-v1.5 Embedding (1024d)
Shared Layers:
- Linear(1024 → 512) + ReLU + Dropout(0.3)
- Linear(512 → 512) + ReLU + Dropout(0.3)
Task-Specific Heads:
├─ Strategy Head → Linear(512 → 2)
├─ Capability Head → Linear(512 → 8)
├─ Domain Head → Linear(512 → 19)
└─ Execution Head → Linear(512 → 5)
```
**Parameters:** 804,898
**Training Time:** ~1 minute (30 epochs, early stopped)
**Hardware:** AMD MI300X GPU
## Usage
```python
import torch
from transformers import AutoTokenizer, AutoModel
# Load BGE embedding model
tokenizer = AutoTokenizer.from_pretrained("BAAI/bge-large-en-v1.5")
embedding_model = AutoModel.from_pretrained("BAAI/bge-large-en-v1.5")
# Load Corch V13 Balanced model
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(repo_id="bledden/corch-v13-balanced", filename="model_v13_balanced.pt")
# Initialize model
class FoundationModelV13(torch.nn.Module):
def __init__(self):
super().__init__()
self.shared = torch.nn.Sequential(
torch.nn.Linear(1024, 512),
torch.nn.ReLU(),
torch.nn.Dropout(0.3),
torch.nn.Linear(512, 512),
torch.nn.ReLU(),
torch.nn.Dropout(0.3)
)
self.strategy_head = torch.nn.Linear(512, 2)
self.capability_head = torch.nn.Linear(512, 8)
self.domain_head = torch.nn.Linear(512, 19)
self.execution_head = torch.nn.Linear(512, 5)
def forward(self, x):
shared = self.shared(x)
return {
'strategy': self.strategy_head(shared),
'capability': self.capability_head(shared),
'domain': self.domain_head(shared),
'execution': self.execution_head(shared)
}
model = FoundationModelV13()
checkpoint = torch.load(model_path, weights_only=True)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
# Embed and predict
def route_task(task_text):
# Generate embedding
inputs = tokenizer(task_text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
embedding = embedding_model(**inputs).last_hidden_state[:, 0, :]
# Get predictions
with torch.no_grad():
outputs = model(embedding)
strategy = ["DIRECT", "ORCHESTRATE"][outputs['strategy'].argmax().item()]
capability = ["code_generation", "debugging", "documentation", "optimization",
"refactoring", "testing", "design", "data_analysis"][outputs['capability'].argmax().item()]
domain = ["frontend", "backend", "data_processing", "machine_learning", "devops",
"testing", "security", "mobile", "data_engineering", "cloud", "database",
"api", "ui_ux", "general", "iot", "blockchain", "game_dev", "embedded",
"other"][outputs['domain'].argmax().item()]
execution = ["single_task", "multi_step", "iterative", "parallel",
"sequential"][outputs['execution'].argmax().item()]
return {
"strategy": strategy,
"capability": capability,
"domain": domain,
"execution_type": execution
}
# Example
result = route_task("Build a CNN image classifier using PyTorch for medical imaging")
print(result)
# {
# 'strategy': 'ORCHESTRATE',
# 'capability': 'code_generation',
# 'domain': 'machine_learning', # 100% confidence
# 'execution_type': 'multi_step'
# }
```
## Training Data
- **Training set:** 31,592 examples (balanced)
- **Validation set:** 3,495 examples
- **Synthetic examples:** 49,307 (generated via GPT-5-Pro)
- **Real examples:** ~550K (existing dataset)
- **Final dataset:** Balanced to ~10K per domain
### Synthetic Data Generation
Used GPT-5-Pro with domain-specific prompts:
```
Generate a realistic software engineering task for: {domain}
Required: {capability}, {execution_type}, {strategy}
Output: 1-3 sentence task description with realistic terminology
```
**Cost:** ~$500 for 49,307 examples
**Quality:** 100% unique, zero duplicates, validated schemas
## Label Mappings
**Strategy (2):** DIRECT, ORCHESTRATE
**Capability (8):** code_generation, debugging, documentation, optimization, refactoring, testing, design, data_analysis
**Domain (19):** frontend, backend, data_processing, machine_learning, devops, testing, security, mobile, data_engineering, cloud, database, api, ui_ux, general, iot, blockchain, game_dev, embedded, other
**Execution (5):** single_task, multi_step, iterative, parallel, sequential
## Comparison to Baselines
| Model | Architecture | Data | Avg Acc | Domain Acc |
|-------|--------------|------|---------|------------|
| Logistic Regression | Single-task | Imbalanced | 74.61% | 74.61% |
| V10 | Multi-task | Imbalanced | 66.84% | 85.41% |
| **V13 Balanced** | **Multi-task** | **Balanced** | **87.30%** | **100.00%** |
## Limitations
- Execution type prediction (63.20%) still has room for improvement
- Context-independent (doesn't use conversation history yet)
- English-only
- Focused on software engineering tasks
## Citation
```bibtex
@software{corch_v13_balanced_2024,
title = {Corch V13 Balanced: Task Routing Foundation Model},
author = {Bledden, Team},
year = {2024},
publisher = {Hugging Face},
url = {https://huggingface.co/bledden/corch-v13-balanced},
note = {87.30% accuracy via balanced synthetic data generation}
}
```
## License
MIT License
## Links
- **GitHub:** https://github.com/bledden/Corch_by_Fac
- **Release Notes:** [RELEASE_V13_BALANCED.md](https://github.com/bledden/Corch_by_Fac/blob/main/RELEASE_V13_BALANCED.md)
- **Training Script:** [train_v13_option5_balanced.py](https://github.com/bledden/Corch_by_Fac/blob/main/training/scripts/train_v13_option5_balanced.py)
---
Built with ❤️ by the Corch Team | Powered by balanced synthetic data generation