NotNow

Upload README.md with huggingface_hub

77e909c verified 7 months ago

8.03 kB

	---
	language: en
	license: mit
	library_name: pytorch
	tags:
	- task-routing
	- multi-task-learning
	- foundation-model
	- synthetic-data
	- balanced-training
	- software-engineering
	metrics:
	- accuracy
	model-index:
	- name: corch-v13-balanced
	results:
	- task:
	type: text-classification
	name: Task Routing
	metrics:
	- type: accuracy
	value: 87.30
	name: Average Accuracy
	- type: accuracy
	value: 100.00
	name: Domain Accuracy
	- type: accuracy
	value: 100.00
	name: Capability Accuracy
	---

	# Corch V13 Balanced: Task Routing Foundation Model

	87.30% Average Accuracy \| Perfect Domain & Capability Classification

	A multi-task foundation model for intelligent software engineering task routing, achieving breakthrough performance through balanced synthetic data generation.

	## Model Description

	Corch V13 Balanced is a 805K parameter neural network that classifies software engineering tasks across 4 dimensions:

	1. Domain (19 classes): frontend, backend, machine_learning, etc. - 100% accuracy 🎯
	2. Capability (8 classes): code_generation, debugging, testing, etc. - 100% accuracy 🎯
	3. Strategy (2 classes): DIRECT vs ORCHESTRATE - 85.98% accuracy
	4. Execution Type (5 classes): single_task, multi_step, etc. - 63.20% accuracy

	## Performance

	\| Task \| Accuracy \| Improvement from V10 \|
	\|------\|----------\|---------------------\|
	\| Average \| 87.30% \| +20.46% \|
	\| Domain \| 100.00% 🎯 \| +14.59% \|
	\| Capability \| 100.00% 🎯 \| +39.61% \|
	\| Strategy \| 85.98% \| +12.55% \|
	\| Execution \| 63.20% \| +7.94% \|

	## Key Innovation: Balanced Synthetic Data

	The breakthrough came from solving severe class imbalance (324:1 ratio):
	- Generated 49,307 synthetic examples using GPT-5-Pro
	- Balanced dataset to ~10K examples per domain
	- Eliminated rare class zero-accuracy problem

	Before balancing:
	- `machine_learning` domain: 88 examples → 0% accuracy
	- `other` domain: 57 examples → 0% accuracy

	After balancing:
	- All domains: ~10K examples → 100% accuracy ✅

	## Architecture

	```
	Input Text → BGE-large-en-v1.5 Embedding (1024d)
	↓
	Shared Layers:
	- Linear(1024 → 512) + ReLU + Dropout(0.3)
	- Linear(512 → 512) + ReLU + Dropout(0.3)
	↓
	Task-Specific Heads:
	├─ Strategy Head → Linear(512 → 2)
	├─ Capability Head → Linear(512 → 8)
	├─ Domain Head → Linear(512 → 19)
	└─ Execution Head → Linear(512 → 5)
	```

	Parameters: 804,898
	Training Time: ~1 minute (30 epochs, early stopped)
	Hardware: AMD MI300X GPU

	## Usage

	```python
	import torch
	from transformers import AutoTokenizer, AutoModel

	# Load BGE embedding model
	tokenizer = AutoTokenizer.from_pretrained("BAAI/bge-large-en-v1.5")
	embedding_model = AutoModel.from_pretrained("BAAI/bge-large-en-v1.5")

	# Load Corch V13 Balanced model
	from huggingface_hub import hf_hub_download
	model_path = hf_hub_download(repo_id="bledden/corch-v13-balanced", filename="model_v13_balanced.pt")

	# Initialize model
	class FoundationModelV13(torch.nn.Module):
	def __init__(self):
	super().__init__()
	self.shared = torch.nn.Sequential(
	torch.nn.Linear(1024, 512),
	torch.nn.ReLU(),
	torch.nn.Dropout(0.3),
	torch.nn.Linear(512, 512),
	torch.nn.ReLU(),
	torch.nn.Dropout(0.3)
	)
	self.strategy_head = torch.nn.Linear(512, 2)
	self.capability_head = torch.nn.Linear(512, 8)
	self.domain_head = torch.nn.Linear(512, 19)
	self.execution_head = torch.nn.Linear(512, 5)

	def forward(self, x):
	shared = self.shared(x)
	return {
	'strategy': self.strategy_head(shared),
	'capability': self.capability_head(shared),
	'domain': self.domain_head(shared),
	'execution': self.execution_head(shared)
	}

	model = FoundationModelV13()
	checkpoint = torch.load(model_path, weights_only=True)
	model.load_state_dict(checkpoint['model_state_dict'])
	model.eval()

	# Embed and predict
	def route_task(task_text):
	# Generate embedding
	inputs = tokenizer(task_text, return_tensors="pt", truncation=True, max_length=512)
	with torch.no_grad():
	embedding = embedding_model(**inputs).last_hidden_state[:, 0, :]

	# Get predictions
	with torch.no_grad():
	outputs = model(embedding)

	strategy = ["DIRECT", "ORCHESTRATE"][outputs['strategy'].argmax().item()]
	capability = ["code_generation", "debugging", "documentation", "optimization",
	"refactoring", "testing", "design", "data_analysis"][outputs['capability'].argmax().item()]
	domain = ["frontend", "backend", "data_processing", "machine_learning", "devops",
	"testing", "security", "mobile", "data_engineering", "cloud", "database",
	"api", "ui_ux", "general", "iot", "blockchain", "game_dev", "embedded",
	"other"][outputs['domain'].argmax().item()]
	execution = ["single_task", "multi_step", "iterative", "parallel",
	"sequential"][outputs['execution'].argmax().item()]

	return {
	"strategy": strategy,
	"capability": capability,
	"domain": domain,
	"execution_type": execution
	}

	# Example
	result = route_task("Build a CNN image classifier using PyTorch for medical imaging")
	print(result)
	# {
	# 'strategy': 'ORCHESTRATE',
	# 'capability': 'code_generation',
	# 'domain': 'machine_learning', # 100% confidence
	# 'execution_type': 'multi_step'
	# }
	```

	## Training Data

	- Training set: 31,592 examples (balanced)
	- Validation set: 3,495 examples
	- Synthetic examples: 49,307 (generated via GPT-5-Pro)
	- Real examples: ~550K (existing dataset)
	- Final dataset: Balanced to ~10K per domain

	### Synthetic Data Generation

	Used GPT-5-Pro with domain-specific prompts:

	```
	Generate a realistic software engineering task for: {domain}
	Required: {capability}, {execution_type}, {strategy}
	Output: 1-3 sentence task description with realistic terminology
	```

	Cost: ~$500 for 49,307 examples
	Quality: 100% unique, zero duplicates, validated schemas

	## Label Mappings

	Strategy (2): DIRECT, ORCHESTRATE
	Capability (8): code_generation, debugging, documentation, optimization, refactoring, testing, design, data_analysis
	Domain (19): frontend, backend, data_processing, machine_learning, devops, testing, security, mobile, data_engineering, cloud, database, api, ui_ux, general, iot, blockchain, game_dev, embedded, other
	Execution (5): single_task, multi_step, iterative, parallel, sequential

	## Comparison to Baselines

	\| Model \| Architecture \| Data \| Avg Acc \| Domain Acc \|
	\|-------\|--------------\|------\|---------\|------------\|
	\| Logistic Regression \| Single-task \| Imbalanced \| 74.61% \| 74.61% \|
	\| V10 \| Multi-task \| Imbalanced \| 66.84% \| 85.41% \|
	\| V13 Balanced \| Multi-task \| Balanced \| 87.30% \| 100.00% \|

	## Limitations

	- Execution type prediction (63.20%) still has room for improvement
	- Context-independent (doesn't use conversation history yet)
	- English-only
	- Focused on software engineering tasks

	## Citation

	```bibtex
	@software{corch_v13_balanced_2024,
	title = {Corch V13 Balanced: Task Routing Foundation Model},
	author = {Bledden, Team},
	year = {2024},
	publisher = {Hugging Face},
	url = {https://huggingface.co/bledden/corch-v13-balanced},
	note = {87.30% accuracy via balanced synthetic data generation}
	}
	```

	## License

	MIT License

	## Links

	- GitHub: https://github.com/bledden/Corch_by_Fac
	- Release Notes: [RELEASE_V13_BALANCED.md](https://github.com/bledden/Corch_by_Fac/blob/main/RELEASE_V13_BALANCED.md)
	- Training Script: [train_v13_option5_balanced.py](https://github.com/bledden/Corch_by_Fac/blob/main/training/scripts/train_v13_option5_balanced.py)

	---

	Built with ❤️ by the Corch Team \| Powered by balanced synthetic data generation