juno-labs/text-voice-activity-detection
Viewer • Updated • 77.4k • 16
How to use juno-labs/gemma-text-vad with Transformers:
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("juno-labs/gemma-text-vad", dtype="auto")This is a text-based Voice Activity Detection model that determines if a given speech fragment is complete enough for processing by a smart speaker assistant. This allows smart speakers to move from using time based pauses (300ms - 1000ms) to detect the end of voice input to using this model to determine if the voice input is complete.
Example:
Model prompting requirements:
"You are a Voice Activity Detection system. Determine if the given speech fragment is complete enough for processing. Answer with only 'yes' if complete or 'no' if incomplete.""Is this sentence fragment complete for processing: '{fragment}'"To use with pipeline from transformers:
from transformers import pipeline
pipe = pipeline("text-generation", model="juno-labs/gemma-text-vad")
SYSTEM_PROMPT = (
"You are a Voice Activity Detection system. "
"Determine if the given speech fragment is complete enough for processing. Answer with only 'yes' if complete or 'no' if incomplete."
)
SENTENCE = "Hey Juno can you set the temperature to"
messages = [
{'content': SYSTEM_PROMPT, 'role': 'system'},
{'content': f"Is this sentence fragment complete for processing: '{SENTENCE}'", 'role': 'user'}
]
generated = pipe(messages)
classification = generated[0]["generated_text"][2]["content"]
print(f"Classification: {classification}") # "yes" or "no"
To use with transformers:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
MODEL_ID = "juno-labs/gemma-text-vad"
# Load model + tokenizer
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
SYSTEM_PROMPT = (
"You are a Voice Activity Detection system. "
"Determine if the given speech fragment is complete enough for processing. Answer with only 'yes' if complete or 'no' if incomplete."
)
SENTENCE = "Set the temperature to 68"
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": f"Is this sentence fragment complete for processing: '{SENTENCE}'"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=1, # only 1 token
do_sample=False, # greedy decoding
pad_token_id=tokenizer.eos_token_id,
)
decoded = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
print(f"Classification: {decoded}") # "yes" or "no"