JaneGPT v2 β Intent Classification Model
A lightweight, fast, and accurate intent classification model built from scratch for virtual assistant command understanding.
7.8M parameters | 22 intent classes | 88.6% validation accuracy | ~50ms inference on GPU
Why I Built This
I'm building JANE β a fully offline, privacy-first AI voice assistant. Llama 3 8B was causing 10β22 second delays for simple commands like "turn up the volume."
That's not a voice assistant. That's a waiting game.
So I designed JaneGPT v2 from scratch β a model that does exactly one job, does it fast, and runs on consumer hardware without any cloud dependency.
Model Details
| Property | Value |
|---|---|
| Architecture | Decoder-only Transformer + Classification Head |
| Parameters | ~7.8M |
| Embedding dim | 256 |
| Attention heads | 8 |
| KV heads (GQA) | 4 |
| Layers | 8 |
| FF hidden dim | 672 |
| Max sequence length | 256 |
| Vocab size | 8,192 |
| Tokenizer | Custom BPE |
| Training accuracy | ~96.7% |
| Validation accuracy | 88.6% |
| Checkpoint size | ~30MB |
Architecture Decisions & Why
| Choice | Reason |
|---|---|
| GQA (4 KV heads, 8 attention heads) | Reduces memory without losing expressiveness |
| RoPE positional encoding | Better length generalization than learned embeddings |
| SwiGLU activation | Smoother gradients than ReLU at this model size |
| RMSNorm | Simpler and faster than LayerNorm |
| Custom BPE tokenizer | Trained specifically on command-style text |
Supported Intents (22 classes)
| Category | Intents |
|---|---|
| Volume | volume_up, volume_down, volume_set, volume_mute |
| Brightness | brightness_up, brightness_down, brightness_set |
| Media | media_play, media_pause, media_next, media_previous |
| Apps | app_launch, app_close, app_switch |
| Browser | browser_search |
| Productivity | set_reminder, screenshot |
| Screen | read_screen, explain_screen |
| Control | undo, quit_jane |
| Conversation | chat |
Performance
| Input | Predicted Intent | Confidence |
|---|---|---|
| "increase the volume" | volume_up | 86% |
| "make it louder" | volume_up | 90% |
| "turn down the brightness" | brightness_down | 80% |
| "open chrome" | app_launch | 98% |
| "play some music" | media_play | 96% |
| "search for cats on youtube" | browser_search | 94% |
| "set a reminder for 5 minutes" | set_reminder | 96% |
| "take a screenshot" | screenshot | 88% |
| "undo that" | undo | 92% |
| "hello" | chat | 97% |
Quick Start
Installation
git clone https://huggingface.co/RavinduSen/JaneGPT-v2
cd JaneGPT-v2
pip install -r requirements.txt
Basic Usage
from classifier import JaneGPTClassifier
classifier = JaneGPTClassifier()
intent, confidence = classifier.predict("turn up the volume")
print(f"Intent: {intent}, Confidence: {confidence:.2%}")
# Output: Intent: volume_up, Confidence: 86.10%
intent, confidence = classifier.predict("open chrome")
print(f"Intent: {intent}, Confidence: {confidence:.2%}")
# Output: Intent: app_launch, Confidence: 98.10%
With Conversation Context
intent, confidence = classifier.predict(
"not enough",
context={"last_intent": "volume_up"}
)
# Output: Intent: volume_up, Confidence: 79.00%
Training Setup
| Component | Details |
|---|---|
| Hardware | NVIDIA RTX 3050Ti (4GB VRAM) |
| CPU | AMD Ryzen 9 5900HX |
| RAM | 16GB |
| Additional | Google Colab (extended training runs) |
| Framework | PyTorch 2.0+ |
| Training data | Custom command dataset (claude assisted generation under author supervision) |
Limitations
- Intent classification only β does not generate text
- 22 classes β commands outside supported set classified as
chat - English only
- Optimized for short inputs (1β15 words)
- No entity extraction β returns intent label only
Use Cases
- Virtual assistant command routing
- Smart home intent classification
- Voice command understanding
- Chatbot intent detection
- Edge device deployment (small enough for embedded systems)
Part of the JANE Project
This model is the intelligence core of JANE β a fully offline, privacy-first AI voice assistant.
π JANE AI Assistant on GitHub π JaneGPT-v2 on GitHub
Created By
Ravindu Senanayake β Computer Science Undergraduate, Sri Lanka
Built from scratch β architecture, tokenizer, and training pipeline designed and implemented by the author.
- Downloads last month
- 2
