---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
library_name: transformers
---
# TD-Llama-OP

## TL;DR
**TD (ToolDial)-Llama-OP (OverallPerformance)** is the same model used in [ToolDial](https://arxiv.org/abs/2503.00564) paper **Overall Performance Task**. We encourage you to use this model to reproduce the results.
 Please refer the **Experiments** of our [github page](https://github.com/holi-lab/ToolDial) to see how our evaluation has proceed.

**[Model Summary]**
- Trained with Q-lora quantization, and LoRA Adapters are merged to original weights.
- Trained for 1 epoch with Adam-8bit optimizer with learning rate 0.00001 and beta 0.9 to 0.995

**[How to load the model]**
```python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

device = "cuda:0"

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

## 1. Load the base model (we use llama3-8b-inst) with the given quantization config.
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Meta-Llama-3-8B-Instruct",
    quantization_config=quant_config,
    device_map={"": device},
)
tokenizer = AutoTokenizer.from_pretrained("HOLILAB/td-llama-op")
tokenizer.pad_token_id = tokenizer.eos_token_id

## 2. Load the lora adapter with PeftModel
model = PeftModel.from_pretrained(base_model, "HOLILAB/td-llama-op")

```