Text Generation
PEFT
Safetensors
English
Hindi
voice-assistant
agent
lora
ssml
tool-calling
motivational-interviewing
whissle
conversational
Eval Results (legacy)
Instructions to use WhissleAI/whissle-agent-lora-3b-test with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use WhissleAI/whissle-agent-lora-3b-test with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-3B-Instruct") model = PeftModel.from_pretrained(base_model, "WhissleAI/whissle-agent-lora-3b-test") - Notebooks
- Google Colab
- Kaggle
Whissle Agent LLM — LoRA Adapter for Qwen2.5-3B-Instruct
A LoRA adapter fine-tuned on top of Qwen/Qwen2.5-3B-Instruct for the Whissle AI voice assistant pipeline.
This model converts structured ASR perception (transcript + emotion + intent + entities) into structured JSON agent responses with:
- SSML prosody tags (emotion, rate, pitch, emphasis, break markers) for TTS
- Tool calls (calendar, weather, reminders, search, etc.)
- Motivational Interviewing (MI) codes for empathetic, behaviorally-informed responses
- Turn control markers for conversation flow management
- Reasoning field for chain-of-thought (hidden from user, used for quality)
Built by Whissle as part of the PromptingNemo framework.
Try It
Open the Colab notebook to test the model on a free T4 GPU — no local setup needed.
Evaluation Results — Base vs LoRA Fine-tuned
| Metric | Base (Qwen 2.5 3B) | LoRA Fine-tuned | Delta |
|---|---|---|---|
| JSON valid | 96% | 100% | +4% |
| Has reasoning | 42% | 96% | +54% |
| Has SSML | 48% | 100% | +52% |
| Has prosody/emotion | 0% | 98% | +98% |
| Has break tags | 8% | 90% | +82% |
| Has tool calls | 2% | 66% | +64% |
| Tool match (vs GT) | 0.0% | 54.5% | +54.5% |
| Has MI codes | 82% | 100% | +18% |
| Voice appropriate | 70% | 96% | +26% |
| Avg response words | 27.9 | 19.5 | -8.4 (more concise) |
| Inference time (50 samples) | 338s | 414s | +22% slower |
Training Details
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen2.5-3B-Instruct |
| Method | QLoRA (PEFT) |
| LoRA rank (r) | 32 |
| LoRA alpha | 64 |
| Target modules | q_proj, k_proj, v_proj, o_proj |
| LoRA dropout | 0.05 |
| Epochs | 3 |
| Learning rate | 0.0002 |
| Max sequence length | 2048 |
| Training samples | 5,171 |
| Validation samples | 272 |
| Precision | bf16 |
| Domains | general, finance, sales |
| Hardware | NVIDIA A100 40GB (GKE) |
How to Use
Quick Start — LoRA Adapter
import json
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
BASE_MODEL = "Qwen/Qwen2.5-3B-Instruct"
ADAPTER = "WhissleAI/whissle-agent-lora-3b-test"
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
BASE_MODEL, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True
)
model = PeftModel.from_pretrained(model, ADAPTER)
model.eval()
SYSTEM_PROMPT = (
"You are Lulu, a helpful AI voice assistant by Whissle. "
"You receive structured perception from the ASR system showing the user's "
"transcript, emotion, intent, and context. Generate a JSON response with "
"turn_control, reasoning, response (with SSML prosody tags), tool_calls, "
"and mi_codes_used. Keep responses under 2 sentences for voice. "
"Available tools: search_web, set_reminder, check_calendar, send_message, "
"play_music, get_weather, set_alarm, create_note, make_call, get_directions, "
"add_to_list, set_timer, translate."
)
perception = {
"transcript": "What's the weather like in San Francisco?",
"emotion": "NEUTRAL",
"speech_act": "QUESTION",
"generic_intent": "REQUEST",
"agent_intent": "WEATHER_CHECK",
"urgency": "LOW",
"language": "en",
"domain": "general",
}
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": f"<|perception|>\n{json.dumps(perception, indent=2)}\n<|/perception|>"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True, top_p=0.9)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
agent_response = json.loads(response)
print(json.dumps(agent_response, indent=2))
Merge Adapter Weights (for faster inference)
import torch
from transformers import AutoModelForCausalLM
from peft import PeftModel
model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-3B-Instruct", torch_dtype=torch.bfloat16, trust_remote_code=True
)
model = PeftModel.from_pretrained(model, "WhissleAI/whissle-agent-lora-3b-test")
model = model.merge_and_unload()
model.save_pretrained("./whissle-agent-merged")
Domain-Specific System Prompts
The model supports three domains, each with a specialized system prompt:
- General — Personal assistant (alarms, weather, reminders, search)
- Finance — Collections & payments with MI-adherent empathetic handling
- Sales — Consultative selling with objection handling
See the training code for the full domain-specific system prompts.
Response Format
The model outputs structured JSON:
{
"turn_control": "RESPOND",
"reasoning": "Simple on/off request. Confirm action.",
"response": "<prosody emotion='professional' rate='medium'>Turning off the lamp in the living room.</prosody>",
"mi_codes_used": ["GI"],
"tool_calls": [
{
"tool": "turn_off",
"parameters": {"device": "living_room_lamp"}
}
]
}
Fields
| Field | Description |
|---|---|
turn_control |
RESPOND, SELF, YIELD, INTERRUPT — controls conversation flow |
reasoning |
Chain-of-thought (hidden from user, used for quality monitoring) |
response |
SSML-tagged text for TTS with prosody emotion, rate, pitch, and break markers |
tool_calls |
Array of tool invocations with tool name and arguments |
mi_codes_used |
Motivational Interviewing behavior codes (AFFIRM, REFLECT, SUPPORT, GI, etc.) |
Perception Input Format
The model expects ASR perception as a structured JSON block wrapped in <|perception|> tags:
{
"transcript": "user's speech transcript",
"emotion": "NEUTRAL|HAPPY|SAD|ANGRY|FEARFUL|SURPRISED|DISGUSTED",
"speech_act": "QUESTION|COMMAND|STATEMENT|GREETING|FAREWELL",
"generic_intent": "REQUEST|INFORM|CONFIRM|DENY|GREET",
"agent_intent": "WEATHER_CHECK|ALARM_SET|REMINDER_SET|...",
"urgency": "LOW|MEDIUM|HIGH|CRITICAL",
"language": "en|hi|...",
"domain": "general|finance|sales",
"entities": [{"entity": "type", "value": "extracted_value"}],
"mi_behavior": "DIRECT|REFLECT|AFFIRM|..."
}
Limitations
- Inference is ~22% slower than base model due to LoRA adapter overhead (use merged weights for production)
- Tool call accuracy is 54.5% vs ground truth — complex multi-tool scenarios need improvement
- Trained primarily on English and Hinglish; other languages may produce degraded output
- Break tag placement is 90% — some edge cases may have suboptimal pause timing
Citation
@misc{whissle-agent-lora-2026,
title={Whissle Agent LLM: LoRA Fine-tuned Qwen2.5-3B for Voice AI Agents},
author={Whissle AI},
year={2026},
url={https://huggingface.co/WhissleAI/whissle-agent-lora-3b-test},
note={Part of the PromptingNemo framework}
}
License
Apache 2.0
- Downloads last month
- 18
Model tree for WhissleAI/whissle-agent-lora-3b-test
Dataset used to train WhissleAI/whissle-agent-lora-3b-test
Viewer • Updated • 5.44k • 9
Evaluation results
- JSON Validself-reported1.000
- Has SSMLself-reported1.000
- Has Prosody/Emotionself-reported0.980
- Has Tool Callsself-reported0.660
- Tool Match vs GTself-reported0.545
- Has MI Codesself-reported1.000
- Voice Appropriate Lengthself-reported0.960