|
|
--- |
|
|
license: mit |
|
|
language: en |
|
|
base_model: microsoft/phi-2 |
|
|
tags: |
|
|
- text-generation |
|
|
- voice-assistant |
|
|
- automotive |
|
|
- fine-tuned |
|
|
- peft |
|
|
- lora |
|
|
datasets: |
|
|
- synthetic |
|
|
widget: |
|
|
- text: "Navigate to the nearest EV charging station." |
|
|
- text: "Set the temperature to 22 degrees." |
|
|
--- |
|
|
|
|
|
# ๐ Fine-tuned MBUX Voice Assistant (phi-2) |
|
|
|
|
|
This repository contains a fine-tuned version of Microsoft's **`microsoft/phi-2`** model, specifically adapted to function as an in-car voice assistant similar to MBUX. The model is trained to understand and respond to common automotive commands. |
|
|
|
|
|
This model was created as part of an end-to-end MLOps project, from data creation and fine-tuning to deployment in an interactive application. |
|
|
|
|
|
## โจ Live Demo |
|
|
|
|
|
You can interact with this model in a live, voice-to-voice application on Hugging Face Spaces: |
|
|
|
|
|
**โก๏ธ [Live MBUX Gradio Demo](https://huggingface.co/spaces/MrunangG/mbux-gradio-demo)** |
|
|
|
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ Model Details |
|
|
|
|
|
* **Base Model:** `microsoft/phi-2` |
|
|
* **Fine-tuning Method:** Parameter-Efficient Fine-Tuning (PEFT) using LoRA. |
|
|
* **Training Data:** A synthetic, instruction-based dataset of in-car commands covering navigation, climate control, media, and vehicle settings. |
|
|
* **Frameworks:** PyTorch, Transformers, PEFT, TRL. |
|
|
|
|
|
### Intended Use |
|
|
|
|
|
This model is a proof-of-concept designed for demonstration purposes. It's intended to be used as the "brain" for a voice assistant application in an automotive context. It excels at understanding commands like: |
|
|
* "Navigate to the office." |
|
|
* "Set the fan speed to maximum." |
|
|
* "Play my 'Morning Commute' playlist." |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ How to Use |
|
|
|
|
|
While the model's core function is text generation, its primary intended use is within a full voice-to-voice pipeline. |
|
|
|
|
|
### Interactive Voice Demo |
|
|
For the complete, interactive experience including Speech-to-Text and Text-to-Speech, please visit the live application hosted on Hugging Face Spaces: |
|
|
|
|
|
**โก๏ธ [Live MBUX Gradio Demo](https://huggingface.co/spaces/MrunangG/mbux-gradio-demo)** |
|
|
|
|
|
### Programmatic Use (Text-Only) |
|
|
|
|
|
The following Python code shows how to use the fine-tuned model for its core text-generation task programmatically. |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from peft import PeftModel |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
# Define the model repository IDs |
|
|
base_model_id = "microsoft/phi-2" |
|
|
peft_model_id = "MrunangG/phi-2-mbux-assistant" |
|
|
|
|
|
# Set device |
|
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
|
|
|
|
# Load the base model |
|
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
|
base_model_id, |
|
|
trust_remote_code=True, |
|
|
torch_dtype=torch.float16, |
|
|
device_map={"": device} |
|
|
) |
|
|
|
|
|
# Load the tokenizer |
|
|
tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True) |
|
|
if tokenizer.pad_token is None: |
|
|
tokenizer.pad_token = tokenizer.eos_token |
|
|
|
|
|
# Load the PEFT model by merging the adapter |
|
|
model = PeftModel.from_pretrained(base_model, peft_model_id) |
|
|
|
|
|
# --- Inference --- |
|
|
prompt = "Set the temperature to 21 degrees." |
|
|
formatted_prompt = f"[INST] {prompt} [/INST]" |
|
|
|
|
|
inputs = tokenizer(formatted_prompt, return_tensors="pt").to(device) |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = model.generate(**inputs, max_new_tokens=50) |
|
|
|
|
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
cleaned_response = response.split('[/INST]')[-1].strip() |
|
|
|
|
|
print(cleaned_response) |
|
|
# Expected output: Okay, setting the cabin temperature to 21 degrees. |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ ๏ธ Training Procedure |
|
|
|
|
|
The model was fine-tuned using the `SFTTrainer` from the TRL library. Key training parameters included a learning rate of `2e-4`, the `paged_adamw_8bit` optimizer, and 4-bit quantization to ensure efficient training on consumer hardware. |
|
|
|
|
|
### Framework versions |
|
|
- PEFT 0.17.1 |
|
|
- TRL: 0.22.1 |
|
|
- Transformers: 4.56.0 |
|
|
- Pytorch: 2.8.0 |
|
|
- Datasets: 4.0.0 |
|
|
- Tokenizers: 0.22.0 |