|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: text-generation |
|
|
library_name: transformers |
|
|
--- |
|
|
# TD-Llama-OP |
|
|
|
|
|
## TL;DR |
|
|
**TD (ToolDial)-Llama-OP (OverallPerformance)** is the same model used in [ToolDial](https://arxiv.org/abs/2503.00564) paper **Overall Performance Task**. We encourage you to use this model to reproduce the results. |
|
|
Please refer the **Experiments** of our [github page](https://github.com/holi-lab/ToolDial) to see how our evaluation has proceed. |
|
|
|
|
|
**[Model Summary]** |
|
|
- Trained with Q-lora quantization, and LoRA Adapters are merged to original weights. |
|
|
- Trained for 1 epoch with Adam-8bit optimizer with learning rate 0.00001 and beta 0.9 to 0.995 |
|
|
|
|
|
**[How to load the model]** |
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig |
|
|
from peft import PeftModel |
|
|
|
|
|
device = "cuda:0" |
|
|
|
|
|
quant_config = BitsAndBytesConfig( |
|
|
load_in_4bit=True, |
|
|
bnb_4bit_quant_type='nf4', |
|
|
bnb_4bit_compute_dtype=torch.bfloat16, |
|
|
bnb_4bit_use_double_quant=True, |
|
|
) |
|
|
|
|
|
## 1. Load the base model (we use llama3-8b-inst) with the given quantization config. |
|
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
|
"meta-llama/Meta-Llama-3-8B-Instruct", |
|
|
quantization_config=quant_config, |
|
|
device_map={"": device}, |
|
|
) |
|
|
tokenizer = AutoTokenizer.from_pretrained("HOLILAB/td-llama-op") |
|
|
tokenizer.pad_token_id = tokenizer.eos_token_id |
|
|
|
|
|
## 2. Load the lora adapter with PeftModel |
|
|
model = PeftModel.from_pretrained(base_model, "HOLILAB/td-llama-op") |
|
|
|
|
|
``` |
|
|
|