Update model card with comprehensive documentation

e0396b0 verified 8 months ago

2.83 kB

	---
	language:
	- en
	license: mit
	tags:
	- lora
	- tool-calling
	- llama3
	- instruction-tuning
	- json-generation
	base_model: meta-llama/Meta-Llama-3-8B-Instruct
	---

	# Tool-Calling LoRA for LLaMA-3-8B-Instruct

	This is a LoRA (Low-Rank Adaptation) model fine-tuned on tool-calling datasets to enhance the model's ability to generate structured JSON responses for tool execution.

	## Model Details

	- Base Model: meta-llama/Meta-Llama-3-8B-Instruct
	- Fine-tuning Method: LoRA (Low-Rank Adaptation)
	- LoRA Rank: 16
	- LoRA Alpha: 32
	- Training Dataset: Custom tool-calling dataset with 357 samples
	- Training Epochs: 5
	- Learning Rate: 5.0e-5

	## Usage

	### Load the Model

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from peft import PeftModel

	# Load base model and tokenizer
	base_model = AutoModelForCausalLM.from_pretrained(
	"meta-llama/Meta-Llama-3-8B-Instruct",
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")

	# Load and merge LoRA
	model = PeftModel.from_pretrained(base_model, "YOUR_USERNAME/llama-traces")
	model = model.merge_and_unload()

	# Generate tool-calling responses
	def generate_tool_call(prompt):
	inputs = tokenizer(prompt, return_tensors="pt")
	outputs = model.generate(
	**inputs,
	max_new_tokens=512,
	temperature=0.7,
	do_sample=True,
	pad_token_id=tokenizer.eos_token_id
	)
	return tokenizer.decode(outputs[0], skip_special_tokens=True)

	# Example usage
	prompt = "Check the weather in New York"
	response = generate_tool_call(prompt)
	print(response)
	```

	### Expected Output Format

	The model generates structured JSON responses like:
	```json
	{
	"trace_id": "002",
	"steps": [
	{
	"action": "call_api",
	"api": "weather_api",
	"arguments": {"location": "New York"}
	},
	{
	"action": "respond",
	"message": "The weather in New York is currently sunny with a temperature of 72°F."
	}
	]
	}
	```

	## Training Details

	- Dataset: Custom tool-calling dataset with instruction/input/output format
	- Template: llama3 chat template
	- Cutoff Length: 4096 tokens
	- Batch Size: 2 (effective batch size: 8 with gradient accumulation)
	- Optimizer: AdamW with cosine learning rate scheduling
	- Warmup Ratio: 0.1

	## Performance

	The model shows improved capability in:
	- Generating structured JSON responses
	- Following tool-calling patterns
	- Maintaining context for multi-step tool execution
	- Producing consistent output formats

	## Limitations

	- Requires the base LLaMA-3-8B-Instruct model to function
	- May generate invalid JSON in some edge cases
	- Performance depends on the quality of the training data

	## License

	This model is released under the MIT License.