Gemma VAD Adapter
This is a text-based Voice Activity Detection model that determines if a given speech fragment is complete enough for processing by a smart speaker assistant. This allows smart speakers to move from using time based pauses (300ms - 1000ms) to detect the end of voice input to using this model to determine if the voice input is complete.
Example:
- "Hey" -> no
- "Hey Juno" -> no
- "Hey Juno can you" -> no
- "Hey Juno can you set" -> no
- "Hey Juno can you set the" -> no
- "Hey Juno can you set the temperature" -> no
- "Hey Juno can you set the temperature to" -> no
- "Hey Juno can you set the temperature to 65" -> yes
Model prompting requirements:
- Required system prompt: "You are a Voice Activity Detection system. Determine if the given speech fragment is complete enough for processing. Answer with only 'yes' if complete or 'no' if incomplete."
- Required user prompt: "Is this sentence fragment complete for processing: '{fragment}'"
Usage
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from peft import PeftModel
BASE_ID = "unsloth/gemma-3-270m-it"
ADAPTER_ID = "adamjuhasz/gemma-vad-adapter"
# 1) Load base + attach LoRA
model = AutoModelForCausalLM.from_pretrained(
BASE_ID,
torch_dtype=torch.float32, # use float32 on CPU/MPS; bfloat16 on CUDA if you like
device_map="auto", # picks GPU/MPS if available
)
model = PeftModel.from_pretrained(model, ADAPTER_ID)
model.eval()
# 2) Tokenizer (from the base)
tokenizer = AutoTokenizer.from_pretrained(BASE_ID, use_fast=True)
tokenizer.padding_side = "left"
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
# 3) Build a chat prompt using the model’s chat template
messages = [
{
"role": "system",
"content": "You are a Voice Activity Detection system. Determine if the given speech fragment is complete enough for processing. Answer with only 'yes' if complete or 'no' if incomplete.",
},
{
"role": "user",
"content": "Is this sentence fragment complete for processing: 'Set the temperature'",
},
]
prompt = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
# 4A) Raw generate()
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
gen = model.generate(
**inputs,
max_new_tokens=1,
do_sample=False, # greedy
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
)
print(
tokenizer.decode(gen[0][inputs["input_ids"].shape[1] :], skip_special_tokens=True)
)
# 4B) Or use pipeline("text-generation") — pass the rendered string, not the messages list
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, device_map="auto")
print(pipe(prompt)[0]["generated_text"])
Framework versions
- PEFT 0.17.1
- Downloads last month
- -