Facilitair CodeBERT Routing Model v1

Accuracy: 99.93% (validation) Task: Multi-task routing for software development tasks License: MIT Base Model: microsoft/codebert-base (125M parameters)


Model Description

This model routes software development tasks to appropriate domains, strategies, capabilities, and execution types with 99.93% accuracy on technical tasks.

Capabilities

The model performs 4 simultaneous predictions:

  1. Domain Classification (19 classes):

    • frontend, backend, data, ml, devops, mobile, cloud, security
    • general, testing, database, infrastructure, api, microservices
    • blockchain, networking, embedded, gaming, system_design
  2. Strategy Classification (2 classes):

    • DIRECT: Execute immediately
    • ORCHESTRATE: Complex multi-step execution
  3. Capability Detection (8 multi-label):

    • code_generation, debugging, testing, refactoring
    • optimization, documentation, deployment, data_analysis
  4. Execution Type (5 classes):

    • single_task, multi_step, iterative, parallel, sequential

Performance

Metric Score
Overall Accuracy 99.93%
Minimum Per-Domain 99.1% (backend)
Perfect Domains 17/19 (100.0%)
Training Time 4.7 hours on AMD MI300X
Model Size 477MB

Usage

Python (Transformers)

import torch
from transformers import RobertaTokenizer, RobertaModel

# Load model and tokenizer
model = RobertaModel.from_pretrained("somethingobscurefordevstuff/facilitair-codebert-routing-v1")
tokenizer = RobertaTokenizer.from_pretrained("microsoft/codebert-base")

# Load trained weights
checkpoint = torch.load("codebert_best_model.pt")
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Tokenize input
task = "Build a React component for user login"
encoding = tokenizer(task, max_length=512, padding='max_length', truncation=True, return_tensors='pt')

# Predict
with torch.no_grad():
    domain_logits, strategy_logits, capability_logits, execution_logits = model(
        encoding['input_ids'],
        encoding['attention_mask']
    )

    # Get domain prediction
    domain_idx = torch.argmax(domain_logits, dim=1).item()
    domains = ["frontend", "backend", "data", "ml", "devops", "mobile", "cloud", "security",
               "general", "testing", "database", "infrastructure", "api", "microservices",
               "blockchain", "networking", "embedded", "gaming", "system_design"]
    print(f"Domain: {domains[domain_idx]}")

Using Facilitair Inference API

from huggingface_hub import hf_hub_download

# Download model
model_path = hf_hub_download(
    repo_id="somethingobscurefordevstuff/facilitair-codebert-routing-v1",
    filename="codebert_best_model.pt"
)

# Use with Facilitair's inference code
from facilitair_inference import CodeBERTRouter

router = CodeBERTRouter(model_path=model_path)
result = router.route_task("Build a React component")

print(f"Domain: {result['domain']}")  # frontend
print(f"Confidence: {result['domain_confidence']:.1%}")  # 95.8%
print(f"Strategy: {result['strategy']}")  # DIRECT
print(f"Capabilities: {result['capabilities']}")  # ['code_generation']

Training Data

  • Size: 149,986 examples
  • Distribution: Perfectly balanced across 19 domains (7,894 per domain)
  • Task Types:
    • 66.6% short (3-8 words)
    • 33.3% medium (10-20 words)
    • 0.1% long (30-50 words)
  • Domains: All technical domains (frontend, backend, DevOps, ML, etc.)
  • Note: Not trained on non-coding tasks (meetings, business analysis, etc.)

Model Architecture

CodeBERT Base (microsoft/codebert-base)
β”œβ”€β”€ 12 transformer layers
β”œβ”€β”€ 768 hidden size
β”œβ”€β”€ 12 attention heads
└── 125M total parameters

Classification Heads:
β”œβ”€β”€ Domain Head: 768 β†’ 256 β†’ 19
β”œβ”€β”€ Strategy Head: 768 β†’ 256 β†’ 2
β”œβ”€β”€ Capability Head: 768 β†’ 256 β†’ 8 (multi-label)
└── Execution Head: 768 β†’ 256 β†’ 5

Training Details

  • Base Model: microsoft/codebert-base
  • Training Examples: 149,986 (135K train, 15K validation)
  • Epochs: 10 (early stopping triggered)
  • Best Epoch: 4 (validation loss: 0.2146)
  • Batch Size: 16
  • Learning Rate: 2e-5
  • Optimizer: AdamW with warmup
  • Hardware: AMD MI300X (192GB HBM3)
  • Training Time: 4.7 hours

Loss Weighting

  • Domain: 50%
  • Capability: 25%
  • Strategy: 15%
  • Execution: 10%

Evaluation Results

Per-Domain Accuracy (Validation Set)

Domain Accuracy Examples
frontend 100.0% 790
backend 99.1% 790
data 100.0% 790
ml 100.0% 790
devops 99.6% 790
mobile 100.0% 790
cloud 100.0% 790
security 100.0% 790
general 100.0% 790
testing 100.0% 790
database 100.0% 790
infrastructure 99.8% 790
api 100.0% 790
microservices 100.0% 790
blockchain 100.0% 790
networking 100.0% 790
embedded 100.0% 790
gaming 100.0% 790
system_design 100.0% 790

Summary: 17/19 domains perfect (100%), minimum 99.1%


Limitations

  1. Non-Coding Tasks: Model is trained exclusively on technical software development tasks. It may misclassify:

    • Business analysis tasks
    • Meeting scheduling
    • Document writing
    • General Q&A
  2. Confidence Thresholds: For production use, consider applying a confidence threshold (e.g., 70%) and fallback to "general" domain for uncertain predictions.

  3. Domain Overlap: Some tasks may legitimately belong to multiple domains. Model predicts single most likely domain.


Citation

If you use this model, please cite:

@software{facilitair_codebert_routing_2025,
  title={Facilitair CodeBERT Routing Model v1},
  author={Facilitair Team},
  year={2025},
  url={https://huggingface.co/somethingobscurefordevstuff/facilitair-codebert-routing-v1}
}

License

MIT License - Free for commercial use


Contact


Version History

v1.0.0 (2025-11-17)

  • Initial release
  • 99.93% validation accuracy
  • 19 domains, 2 strategies, 8 capabilities, 5 execution types
  • Trained on 150K balanced examples

Model Card: Full Model Card Training Details: Training Report

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for somethingobscurefordevstuff/facilitair-codebert-routing-v1

Finetuned
(124)
this model

Evaluation results