JaneGPT v2 β€” Intent Classification Model

A lightweight, fast, and accurate intent classification model built from scratch for virtual assistant command understanding.

7.8M parameters | 22 intent classes | 88.6% validation accuracy | ~50ms inference on GPU

Loss Curves


Why I Built This

I'm building JANE β€” a fully offline, privacy-first AI voice assistant. Llama 3 8B was causing 10–22 second delays for simple commands like "turn up the volume."

That's not a voice assistant. That's a waiting game.

So I designed JaneGPT v2 from scratch β€” a model that does exactly one job, does it fast, and runs on consumer hardware without any cloud dependency.


Model Details

Property Value
Architecture Decoder-only Transformer + Classification Head
Parameters ~7.8M
Embedding dim 256
Attention heads 8
KV heads (GQA) 4
Layers 8
FF hidden dim 672
Max sequence length 256
Vocab size 8,192
Tokenizer Custom BPE
Training accuracy ~96.7%
Validation accuracy 88.6%
Checkpoint size ~30MB

Architecture Decisions & Why

Choice Reason
GQA (4 KV heads, 8 attention heads) Reduces memory without losing expressiveness
RoPE positional encoding Better length generalization than learned embeddings
SwiGLU activation Smoother gradients than ReLU at this model size
RMSNorm Simpler and faster than LayerNorm
Custom BPE tokenizer Trained specifically on command-style text

Supported Intents (22 classes)

Category Intents
Volume volume_up, volume_down, volume_set, volume_mute
Brightness brightness_up, brightness_down, brightness_set
Media media_play, media_pause, media_next, media_previous
Apps app_launch, app_close, app_switch
Browser browser_search
Productivity set_reminder, screenshot
Screen read_screen, explain_screen
Control undo, quit_jane
Conversation chat

Performance

Input Predicted Intent Confidence
"increase the volume" volume_up 86%
"make it louder" volume_up 90%
"turn down the brightness" brightness_down 80%
"open chrome" app_launch 98%
"play some music" media_play 96%
"search for cats on youtube" browser_search 94%
"set a reminder for 5 minutes" set_reminder 96%
"take a screenshot" screenshot 88%
"undo that" undo 92%
"hello" chat 97%

Quick Start

Installation

git clone https://huggingface.co/RavinduSen/JaneGPT-v2
cd JaneGPT-v2
pip install -r requirements.txt

Basic Usage

from classifier import JaneGPTClassifier

classifier = JaneGPTClassifier()

intent, confidence = classifier.predict("turn up the volume")
print(f"Intent: {intent}, Confidence: {confidence:.2%}")
# Output: Intent: volume_up, Confidence: 86.10%

intent, confidence = classifier.predict("open chrome")
print(f"Intent: {intent}, Confidence: {confidence:.2%}")
# Output: Intent: app_launch, Confidence: 98.10%

With Conversation Context

intent, confidence = classifier.predict(
    "not enough",
    context={"last_intent": "volume_up"}
)
# Output: Intent: volume_up, Confidence: 79.00%

Training Setup

Component Details
Hardware NVIDIA RTX 3050Ti (4GB VRAM)
CPU AMD Ryzen 9 5900HX
RAM 16GB
Additional Google Colab (extended training runs)
Framework PyTorch 2.0+
Training data Custom command dataset (claude assisted generation under author supervision)

Limitations

  • Intent classification only β€” does not generate text
  • 22 classes β€” commands outside supported set classified as chat
  • English only
  • Optimized for short inputs (1–15 words)
  • No entity extraction β€” returns intent label only

Use Cases

  • Virtual assistant command routing
  • Smart home intent classification
  • Voice command understanding
  • Chatbot intent detection
  • Edge device deployment (small enough for embedded systems)

Part of the JANE Project

This model is the intelligence core of JANE β€” a fully offline, privacy-first AI voice assistant.

πŸ”— JANE AI Assistant on GitHub πŸ”— JaneGPT-v2 on GitHub


Created By

Ravindu Senanayake β€” Computer Science Undergraduate, Sri Lanka

Built from scratch β€” architecture, tokenizer, and training pipeline designed and implemented by the author.

GitHub

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support