You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Whissle Agent LLM — LoRA Adapter for Qwen2.5-3B-Instruct

A LoRA adapter fine-tuned on top of Qwen/Qwen2.5-3B-Instruct for the Whissle AI voice assistant pipeline.

This model converts structured ASR perception (transcript + emotion + intent + entities) into structured JSON agent responses with:

SSML prosody tags (emotion, rate, pitch, emphasis, break markers) for TTS
Tool calls (calendar, weather, reminders, search, etc.)
Motivational Interviewing (MI) codes for empathetic, behaviorally-informed responses
Turn control markers for conversation flow management
Reasoning field for chain-of-thought (hidden from user, used for quality)

Built by Whissle as part of the PromptingNemo framework.

Try It

Open the Colab notebook to test the model on a free T4 GPU — no local setup needed.

Evaluation Results — Base vs LoRA Fine-tuned

Metric	Base (Qwen 2.5 3B)	LoRA Fine-tuned	Delta
JSON valid	96%	100%	+4%
Has reasoning	42%	96%	+54%
Has SSML	48%	100%	+52%
Has prosody/emotion	0%	98%	+98%
Has break tags	8%	90%	+82%
Has tool calls	2%	66%	+64%
Tool match (vs GT)	0.0%	54.5%	+54.5%
Has MI codes	82%	100%	+18%
Voice appropriate	70%	96%	+26%
Avg response words	27.9	19.5	-8.4 (more concise)
Inference time (50 samples)	338s	414s	+22% slower

Training Details

Parameter	Value
Base model	`Qwen/Qwen2.5-3B-Instruct`
Method	QLoRA (PEFT)
LoRA rank (r)	32
LoRA alpha	64
Target modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`
LoRA dropout	0.05
Epochs	3
Learning rate	0.0002
Max sequence length	2048
Training samples	5,171
Validation samples	272
Precision	bf16
Domains	general, finance, sales
Hardware	NVIDIA A100 40GB (GKE)

How to Use

Quick Start — LoRA Adapter

import json
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

BASE_MODEL = "Qwen/Qwen2.5-3B-Instruct"
ADAPTER = "WhissleAI/whissle-agent-lora-3b-test"

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True
)
model = PeftModel.from_pretrained(model, ADAPTER)
model.eval()

SYSTEM_PROMPT = (
    "You are Lulu, a helpful AI voice assistant by Whissle. "
    "You receive structured perception from the ASR system showing the user's "
    "transcript, emotion, intent, and context. Generate a JSON response with "
    "turn_control, reasoning, response (with SSML prosody tags), tool_calls, "
    "and mi_codes_used. Keep responses under 2 sentences for voice. "
    "Available tools: search_web, set_reminder, check_calendar, send_message, "
    "play_music, get_weather, set_alarm, create_note, make_call, get_directions, "
    "add_to_list, set_timer, translate."
)

perception = {
    "transcript": "What's the weather like in San Francisco?",
    "emotion": "NEUTRAL",
    "speech_act": "QUESTION",
    "generic_intent": "REQUEST",
    "agent_intent": "WEATHER_CHECK",
    "urgency": "LOW",
    "language": "en",
    "domain": "general",
}

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": f"<|perception|>\n{json.dumps(perception, indent=2)}\n<|/perception|>"},
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True, top_p=0.9)

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
agent_response = json.loads(response)
print(json.dumps(agent_response, indent=2))

Merge Adapter Weights (for faster inference)

import torch
from transformers import AutoModelForCausalLM
from peft import PeftModel

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-3B-Instruct", torch_dtype=torch.bfloat16, trust_remote_code=True
)
model = PeftModel.from_pretrained(model, "WhissleAI/whissle-agent-lora-3b-test")
model = model.merge_and_unload()
model.save_pretrained("./whissle-agent-merged")

Domain-Specific System Prompts

The model supports three domains, each with a specialized system prompt:

General — Personal assistant (alarms, weather, reminders, search)
Finance — Collections & payments with MI-adherent empathetic handling
Sales — Consultative selling with objection handling

See the training code for the full domain-specific system prompts.

Response Format

The model outputs structured JSON:

{
  "turn_control": "RESPOND",
  "reasoning": "Simple on/off request. Confirm action.",
  "response": "<prosody emotion='professional' rate='medium'>Turning off the lamp in the living room.</prosody>",
  "mi_codes_used": ["GI"],
  "tool_calls": [
    {
      "tool": "turn_off",
      "parameters": {"device": "living_room_lamp"}
    }
  ]
}

Fields

Field	Description
`turn_control`	`RESPOND`, `SELF`, `YIELD`, `INTERRUPT` — controls conversation flow
`reasoning`	Chain-of-thought (hidden from user, used for quality monitoring)
`response`	SSML-tagged text for TTS with prosody emotion, rate, pitch, and break markers
`tool_calls`	Array of tool invocations with tool name and arguments
`mi_codes_used`	Motivational Interviewing behavior codes (AFFIRM, REFLECT, SUPPORT, GI, etc.)

Perception Input Format

The model expects ASR perception as a structured JSON block wrapped in <|perception|> tags:

{
  "transcript": "user's speech transcript",
  "emotion": "NEUTRAL|HAPPY|SAD|ANGRY|FEARFUL|SURPRISED|DISGUSTED",
  "speech_act": "QUESTION|COMMAND|STATEMENT|GREETING|FAREWELL",
  "generic_intent": "REQUEST|INFORM|CONFIRM|DENY|GREET",
  "agent_intent": "WEATHER_CHECK|ALARM_SET|REMINDER_SET|...",
  "urgency": "LOW|MEDIUM|HIGH|CRITICAL",
  "language": "en|hi|...",
  "domain": "general|finance|sales",
  "entities": [{"entity": "type", "value": "extracted_value"}],
  "mi_behavior": "DIRECT|REFLECT|AFFIRM|..."
}

Limitations

Inference is ~22% slower than base model due to LoRA adapter overhead (use merged weights for production)
Tool call accuracy is 54.5% vs ground truth — complex multi-tool scenarios need improvement
Trained primarily on English and Hinglish; other languages may produce degraded output
Break tag placement is 90% — some edge cases may have suboptimal pause timing

Citation

@misc{whissle-agent-lora-2026,
  title={Whissle Agent LLM: LoRA Fine-tuned Qwen2.5-3B for Voice AI Agents},
  author={Whissle AI},
  year={2026},
  url={https://huggingface.co/WhissleAI/whissle-agent-lora-3b-test},
  note={Part of the PromptingNemo framework}
}

License

Apache 2.0

Downloads last month: -

Model tree for WhissleAI/whissle-agent-lora-3b-test

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct

Adapter

(1337)

this model

Dataset used to train WhissleAI/whissle-agent-lora-3b-test

Evaluation results

JSON Valid
self-reported

1.000
Has SSML
self-reported

1.000
Has Prosody/Emotion
self-reported

0.980
Has Tool Calls
self-reported

0.660
Tool Match vs GT
self-reported

0.545
Has MI Codes
self-reported

1.000
Voice Appropriate Length
self-reported

0.960