Model Card for gemma3-1b-IT-ConvFill

This model card implements a finetune of google/gemma-3-1b-it for the conversational infill task described in the ConvFill paper.

Model Details

This model should be used respecting the original license of the base model, google/gemma-3-1b-it. The dataset that was used to finetune this model can be found here.

Model Description

Deploying responsive, multi-turn conversational voice agents with large language models poses a critical challenge: cloud-based foundation models utilize reasoning, information retrieval, and tool use for high-value tasks, but introduce latency that disrupts natural conversation. In contrast, small models can respond quickly but lack capabilities needed in real-world tasks. We propose conversational infill, a task where a small, local model generates prompt, contextually appropriate dialogue and seamlessly incorporates delayed, external knowledge produced in parallel by a foundation model backend. This finetune trains google/gemma-3-1b-it to perform the conversational infill task.

Finetuned from model: google/gemma-3-1b-it
License: Gemma

Model Sources [optional]

Repository: https://github.com/vysri/conversational-infill
Paper: Thinking While Speaking: Inference-Time Knowledge Transfer for Responsive and Intelligent Conversational Voice Agents
Demo: TBD

Direct Use

This model is intended to be used with the infrastructure in the ConvFill repository.

Use Restrictions

You must not use any of the Gemma Services:

for the restricted uses set forth in the Gemma Prohibited Use Policy at ai.google.dev/gemma/prohibited_use_policy ("Prohibited Use Policy"), which is hereby incorporated by reference into this Agreement; or in violation of applicable laws and regulations. To the maximum extent permitted by law, Google reserves the right to restrict (remotely or otherwise) usage of any of the Gemma Services that Google reasonably believes are in violation of this Agreement.

See the license for more details.

Bias, Risks, and Limitations

This model is not explicitly tuned for guardrailed behavior. Please use with caution.

How to Get Started with the Model

Use the code in the ConvFill repository to get started with this model.

Training Data

A link to the training data for this model can be found here. The dataset generation procedure can be found here. Information on training procedures can be found in the ConvFill paper. Training code and scripts can be found in the ConvFill repository.

Citation

@misc{srinivas2026thinkingspeakinginferencetimeknowledge,
      title={Thinking While Speaking: Inference-Time Knowledge Transfer for Responsive and Intelligent Conversational Voice Agents}, 
      author={Vidya Srinivas and Zachary Englhardt and Shwetak Patel and Vikram Iyer},
      year={2026},
      eprint={2511.07397},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2511.07397}, 
}