You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Gemma VAD Adapter

This is a text-based Voice Activity Detection model that determines if a given speech fragment is complete enough for processing by a smart speaker assistant. This allows smart speakers to move from using time based pauses (300ms - 1000ms) to detect the end of voice input to using this model to determine if the voice input is complete.

Example:

"Hey" -> no
"Hey Juno" -> no
"Hey Juno can you" -> no
"Hey Juno can you set" -> no
"Hey Juno can you set the" -> no
"Hey Juno can you set the temperature" -> no
"Hey Juno can you set the temperature to" -> no
"Hey Juno can you set the temperature to 65" -> yes

Model prompting requirements:

Required system prompt: "You are a Voice Activity Detection system. Determine if the given speech fragment is complete enough for processing. Answer with only 'yes' if complete or 'no' if incomplete."
Required user prompt: "Is this sentence fragment complete for processing: '{fragment}'"

Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from peft import PeftModel

BASE_ID = "unsloth/gemma-3-270m-it"
ADAPTER_ID = "adamjuhasz/gemma-vad-adapter"

# 1) Load base + attach LoRA
model = AutoModelForCausalLM.from_pretrained(
    BASE_ID,
    torch_dtype=torch.float32,  # use float32 on CPU/MPS; bfloat16 on CUDA if you like
    device_map="auto",  # picks GPU/MPS if available
)
model = PeftModel.from_pretrained(model, ADAPTER_ID)
model.eval()

# 2) Tokenizer (from the base)
tokenizer = AutoTokenizer.from_pretrained(BASE_ID, use_fast=True)
tokenizer.padding_side = "left"
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# 3) Build a chat prompt using the model’s chat template
messages = [
    {
        "role": "system",
        "content": "You are a Voice Activity Detection system. Determine if the given speech fragment is complete enough for processing. Answer with only 'yes' if complete or 'no' if incomplete.",
    },
    {
        "role": "user",
        "content": "Is this sentence fragment complete for processing: 'Set the temperature'",
    },
]
prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

# 4A) Raw generate()
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    gen = model.generate(
        **inputs,
        max_new_tokens=1,
        do_sample=False, # greedy
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )
print(
    tokenizer.decode(gen[0][inputs["input_ids"].shape[1] :], skip_special_tokens=True)
)

# 4B) Or use pipeline("text-generation") — pass the rendered string, not the messages list
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, device_map="auto")
print(pipe(prompt)[0]["generated_text"])

Framework versions

PEFT 0.17.1

Downloads last month: -

Model tree for adamjuhasz/gemma-vad-adapter

Base model

google/gemma-3-270m

Finetuned

google/gemma-3-270m-it

Finetuned

unsloth/gemma-3-270m-it

Adapter

(25)

this model