HOLILAB
/

td-llama-op

Text Generation

Model card Files Files and versions

td-llama-op / README.md

Holi-jhshim's picture

Update README.md

6e7f6d2 verified 9 months ago

|

history blame contribute delete

1.49 kB

	---
	license: apache-2.0
	language:
	- en
	pipeline_tag: text-generation
	library_name: transformers
	---
	# TD-Llama-OP

	## TL;DR
	TD (ToolDial)-Llama-OP (OverallPerformance) is the same model used in [ToolDial](https://arxiv.org/abs/2503.00564) paper Overall Performance Task. We encourage you to use this model to reproduce the results.
	Please refer the Experiments of our [github page](https://github.com/holi-lab/ToolDial) to see how our evaluation has proceed.

	[Model Summary]
	- Trained with Q-lora quantization, and LoRA Adapters are merged to original weights.
	- Trained for 1 epoch with Adam-8bit optimizer with learning rate 0.00001 and beta 0.9 to 0.995

	[How to load the model]
	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
	from peft import PeftModel

	device = "cuda:0"

	quant_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_quant_type='nf4',
	bnb_4bit_compute_dtype=torch.bfloat16,
	bnb_4bit_use_double_quant=True,
	)

	## 1. Load the base model (we use llama3-8b-inst) with the given quantization config.
	base_model = AutoModelForCausalLM.from_pretrained(
	"meta-llama/Meta-Llama-3-8B-Instruct",
	quantization_config=quant_config,
	device_map={"": device},
	)
	tokenizer = AutoTokenizer.from_pretrained("HOLILAB/td-llama-op")
	tokenizer.pad_token_id = tokenizer.eos_token_id

	## 2. Load the lora adapter with PeftModel
	model = PeftModel.from_pretrained(base_model, "HOLILAB/td-llama-op")

	```