Innovate AIDE – Intent Classifier

This Hugging Face repository hosts the intent classification models used by the Innovate AIDE project.

This repository does not contain the full Innovate AIDE application. Instead, it provides the trained intent classifier models used by the application.

When the Innovate AIDE app launches, it downloads the trained model from this Hugging Face repository and uses it to classify user instructions.

The classifier determines what type of action the assistant should perform based on a natural language instruction.


Role in Innovate AIDE

Within the Innovate AIDE system, the intent classifier is responsible for routing user instructions to the correct subsystem.

Example:

User instruction: switch to main.py

Intent classifier output switch_file

The application then executes the corresponding VS Code command.

The model therefore acts as the decision layer between user instructions and system actions.


Model Architecture

The models use DistilBERT fine-tuned for text classification.

Two architectures are implemented.


1. Single-Call Model

A single DistilBERT model directly predicts the final intent.

Possible outputs:

  • llm call
  • smart dictation
  • switch_file
  • create_new_file
  • undo
  • redo
  • other

This approach prioritizes speed and simplicity.


2. Two-Call (Hierarchical) Model

The hierarchical model improves accuracy for VS Code command classification.

Stage 1 – High-Level Category

The first classifier predicts one of the following:

  • llm call
  • smart dictation
  • vscode command

Stage 2 – VS Code Sub-Intent

If Stage 1 predicts vscode command, a second classifier determines the specific command.

Possible outputs:

  • switch_file
  • create_new_file
  • undo
  • redo
  • other

This two-stage design reduces confusion between editor commands and other instruction types.


Repository Structure

single_call/ β”œβ”€β”€ data_train_single.csv β”œβ”€β”€ data_test_single.csv β”œβ”€β”€ intent_classifier_single.py β”œβ”€β”€ benchmark_single.py └── intent_model_single/ (saved trained model, not committed)

double_call/ β”œβ”€β”€ data_train.csv β”œβ”€β”€ data_test.csv β”œβ”€β”€ data_train_vscommand.csv β”œβ”€β”€ data_test_vscommand.csv β”œβ”€β”€ stage2_vscode_train.csv β”œβ”€β”€ stage2_vscode_test.csv β”œβ”€β”€ intent_classifier_stage1_classifier.py β”œβ”€β”€ intent_classifier_stage2.py β”œβ”€β”€ demo_hierarchical.py β”œβ”€β”€ benchmark_for_both β”œβ”€β”€ benchmark_subintent.py └── intent_vscode_sub_model/ (saved trained subintent model, not committed)

.gitignore README.md

The repository contains the training scripts and benchmarking tools used to build the models.

The Innovate AIDE application itself typically only downloads the trained model weights from Hugging Face.


Training

Train the Single-Call Model

Run:

python intent_classifier_single.py

This will:

  • load training and test data from CSV files
  • train a DistilBERT classifier
  • save the trained model to:

./intent_model_single

This folder is not committed to the repository.


Train the Hierarchical Model

The hierarchical system requires two models.

Train the Stage 1 classifier:

python intent_classifier_stage1_classifier.py

Train the Stage 2 VS Code sub-intent classifier:

python intent_classifier_stage2.py


Benchmarking

Benchmark scripts evaluate model performance using:

  • confusion matrix
  • F1 scores
    • macro
    • micro
    • weighted
    • per-class
  • latency metrics

Single-Call Benchmark

Run:

python benchmark_single.py

This script will:

  1. load the trained model from ./intent_model_single
  2. run predictions on data_test_single.csv
  3. compute F1 scores
  4. measure latency
  5. generate a confusion matrix visualization

Example output file:

confusion_matrix_data_test.png


Hierarchical Model Benchmarks

Two benchmark scripts are provided.

Full Pipeline Benchmark

benchmark_for_both

This evaluates:

  • Stage 1 classification
  • Stage 2 VS Code sub-intent classification
  • combined pipeline performance

Sub-Intent Benchmark

Run from the double_call directory:

cd double_call python benchmark_subintent.py

This expects:

A trained model in

./intent_vscode_sub_model

And a test dataset

stage2_vscode_test.csv

The script will:

  • compute macro / micro / weighted F1 scores
  • measure latency
  • generate a confusion matrix image

confusion_matrix_subintent.png


Example Benchmark Output

================================================================================ F1 SCORES

F1 Score (Macro): 0.9833 F1 Score (Micro): 0.9833 F1 Score (Weighted): 0.9833

F1 Score per class: smart dictation : 0.9756 vscode command : 0.9744 llm call : 1.0000

Latency per Sample: Sample 1: 'print length of list' - 23.76 ms Sample 2: 'import json' - 25.51 ms

Overall Latency Statistics: Mean: 30.70 ms Std: 2.65 ms Min: 23.76 ms Max: 35.17 ms


Intended Use

This model is designed for use inside the Innovate AIDE development assistant.

Its primary function is to classify developer instructions so the assistant can route the request to the correct system component.

Typical instruction types include:

  • switching files
  • creating new files
  • undo / redo
  • calling an LLM for code generation
  • dictating code
  • other editor commands

Notes

  • This repository hosts the intent classification model used by the Innovate AIDE project.
  • The Innovate AIDE application downloads the model from this Hugging Face repository at runtime.
  • Model directories generated during training are not committed to the repository.
Downloads last month
113
Safetensors
Model size
67M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support