You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Whissle Agent LLM — LoRA Adapter for Qwen2.5-3B-Instruct

Open In Colab

A LoRA adapter fine-tuned on top of Qwen/Qwen2.5-3B-Instruct for the Whissle AI voice assistant pipeline.

This model converts structured ASR perception (transcript + emotion + intent + entities) into structured JSON agent responses with:

  • SSML prosody tags (emotion, rate, pitch, emphasis, break markers) for TTS
  • Tool calls (calendar, weather, reminders, search, etc.)
  • Motivational Interviewing (MI) codes for empathetic, behaviorally-informed responses
  • Turn control markers for conversation flow management
  • Reasoning field for chain-of-thought (hidden from user, used for quality)

Built by Whissle as part of the PromptingNemo framework.

Try It

Open the Colab notebook to test the model on a free T4 GPU — no local setup needed.

Evaluation Results — Base vs LoRA Fine-tuned

Metric Base (Qwen 2.5 3B) LoRA Fine-tuned Delta
JSON valid 96% 100% +4%
Has reasoning 42% 96% +54%
Has SSML 48% 100% +52%
Has prosody/emotion 0% 98% +98%
Has break tags 8% 90% +82%
Has tool calls 2% 66% +64%
Tool match (vs GT) 0.0% 54.5% +54.5%
Has MI codes 82% 100% +18%
Voice appropriate 70% 96% +26%
Avg response words 27.9 19.5 -8.4 (more concise)
Inference time (50 samples) 338s 414s +22% slower

Training Details

Parameter Value
Base model Qwen/Qwen2.5-3B-Instruct
Method QLoRA (PEFT)
LoRA rank (r) 32
LoRA alpha 64
Target modules q_proj, k_proj, v_proj, o_proj
LoRA dropout 0.05
Epochs 3
Learning rate 0.0002
Max sequence length 2048
Training samples 5,171
Validation samples 272
Precision bf16
Domains general, finance, sales
Hardware NVIDIA A100 40GB (GKE)

How to Use

Quick Start — LoRA Adapter

import json
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

BASE_MODEL = "Qwen/Qwen2.5-3B-Instruct"
ADAPTER = "WhissleAI/whissle-agent-lora-3b-test"

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True
)
model = PeftModel.from_pretrained(model, ADAPTER)
model.eval()

SYSTEM_PROMPT = (
    "You are Lulu, a helpful AI voice assistant by Whissle. "
    "You receive structured perception from the ASR system showing the user's "
    "transcript, emotion, intent, and context. Generate a JSON response with "
    "turn_control, reasoning, response (with SSML prosody tags), tool_calls, "
    "and mi_codes_used. Keep responses under 2 sentences for voice. "
    "Available tools: search_web, set_reminder, check_calendar, send_message, "
    "play_music, get_weather, set_alarm, create_note, make_call, get_directions, "
    "add_to_list, set_timer, translate."
)

perception = {
    "transcript": "What's the weather like in San Francisco?",
    "emotion": "NEUTRAL",
    "speech_act": "QUESTION",
    "generic_intent": "REQUEST",
    "agent_intent": "WEATHER_CHECK",
    "urgency": "LOW",
    "language": "en",
    "domain": "general",
}

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": f"<|perception|>\n{json.dumps(perception, indent=2)}\n<|/perception|>"},
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True, top_p=0.9)

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
agent_response = json.loads(response)
print(json.dumps(agent_response, indent=2))

Merge Adapter Weights (for faster inference)

import torch
from transformers import AutoModelForCausalLM
from peft import PeftModel

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-3B-Instruct", torch_dtype=torch.bfloat16, trust_remote_code=True
)
model = PeftModel.from_pretrained(model, "WhissleAI/whissle-agent-lora-3b-test")
model = model.merge_and_unload()
model.save_pretrained("./whissle-agent-merged")

Domain-Specific System Prompts

The model supports three domains, each with a specialized system prompt:

  • General — Personal assistant (alarms, weather, reminders, search)
  • Finance — Collections & payments with MI-adherent empathetic handling
  • Sales — Consultative selling with objection handling

See the training code for the full domain-specific system prompts.

Response Format

The model outputs structured JSON:

{
  "turn_control": "RESPOND",
  "reasoning": "Simple on/off request. Confirm action.",
  "response": "<prosody emotion='professional' rate='medium'>Turning off the lamp in the living room.</prosody>",
  "mi_codes_used": ["GI"],
  "tool_calls": [
    {
      "tool": "turn_off",
      "parameters": {"device": "living_room_lamp"}
    }
  ]
}

Fields

Field Description
turn_control RESPOND, SELF, YIELD, INTERRUPT — controls conversation flow
reasoning Chain-of-thought (hidden from user, used for quality monitoring)
response SSML-tagged text for TTS with prosody emotion, rate, pitch, and break markers
tool_calls Array of tool invocations with tool name and arguments
mi_codes_used Motivational Interviewing behavior codes (AFFIRM, REFLECT, SUPPORT, GI, etc.)

Perception Input Format

The model expects ASR perception as a structured JSON block wrapped in <|perception|> tags:

{
  "transcript": "user's speech transcript",
  "emotion": "NEUTRAL|HAPPY|SAD|ANGRY|FEARFUL|SURPRISED|DISGUSTED",
  "speech_act": "QUESTION|COMMAND|STATEMENT|GREETING|FAREWELL",
  "generic_intent": "REQUEST|INFORM|CONFIRM|DENY|GREET",
  "agent_intent": "WEATHER_CHECK|ALARM_SET|REMINDER_SET|...",
  "urgency": "LOW|MEDIUM|HIGH|CRITICAL",
  "language": "en|hi|...",
  "domain": "general|finance|sales",
  "entities": [{"entity": "type", "value": "extracted_value"}],
  "mi_behavior": "DIRECT|REFLECT|AFFIRM|..."
}

Limitations

  • Inference is ~22% slower than base model due to LoRA adapter overhead (use merged weights for production)
  • Tool call accuracy is 54.5% vs ground truth — complex multi-tool scenarios need improvement
  • Trained primarily on English and Hinglish; other languages may produce degraded output
  • Break tag placement is 90% — some edge cases may have suboptimal pause timing

Citation

@misc{whissle-agent-lora-2026,
  title={Whissle Agent LLM: LoRA Fine-tuned Qwen2.5-3B for Voice AI Agents},
  author={Whissle AI},
  year={2026},
  url={https://huggingface.co/WhissleAI/whissle-agent-lora-3b-test},
  note={Part of the PromptingNemo framework}
}

License

Apache 2.0

Downloads last month
18
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for WhissleAI/whissle-agent-lora-3b-test

Base model

Qwen/Qwen2.5-3B
Adapter
(1275)
this model

Dataset used to train WhissleAI/whissle-agent-lora-3b-test

Evaluation results