--- language: - en license: mit tags: - lora - tool-calling - llama3 - instruction-tuning - json-generation base_model: meta-llama/Meta-Llama-3-8B-Instruct --- # Tool-Calling LoRA for LLaMA-3-8B-Instruct This is a LoRA (Low-Rank Adaptation) model fine-tuned on tool-calling datasets to enhance the model's ability to generate structured JSON responses for tool execution. ## Model Details - **Base Model**: meta-llama/Meta-Llama-3-8B-Instruct - **Fine-tuning Method**: LoRA (Low-Rank Adaptation) - **LoRA Rank**: 16 - **LoRA Alpha**: 32 - **Training Dataset**: Custom tool-calling dataset with 357 samples - **Training Epochs**: 5 - **Learning Rate**: 5.0e-5 ## Usage ### Load the Model ```python from transformers import AutoTokenizer, AutoModelForCausalLM from peft import PeftModel # Load base model and tokenizer base_model = AutoModelForCausalLM.from_pretrained( "meta-llama/Meta-Llama-3-8B-Instruct", torch_dtype=torch.bfloat16, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct") # Load and merge LoRA model = PeftModel.from_pretrained(base_model, "YOUR_USERNAME/llama-traces") model = model.merge_and_unload() # Generate tool-calling responses def generate_tool_call(prompt): inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate( **inputs, max_new_tokens=512, temperature=0.7, do_sample=True, pad_token_id=tokenizer.eos_token_id ) return tokenizer.decode(outputs[0], skip_special_tokens=True) # Example usage prompt = "Check the weather in New York" response = generate_tool_call(prompt) print(response) ``` ### Expected Output Format The model generates structured JSON responses like: ```json { "trace_id": "002", "steps": [ { "action": "call_api", "api": "weather_api", "arguments": {"location": "New York"} }, { "action": "respond", "message": "The weather in New York is currently sunny with a temperature of 72°F." } ] } ``` ## Training Details - **Dataset**: Custom tool-calling dataset with instruction/input/output format - **Template**: llama3 chat template - **Cutoff Length**: 4096 tokens - **Batch Size**: 2 (effective batch size: 8 with gradient accumulation) - **Optimizer**: AdamW with cosine learning rate scheduling - **Warmup Ratio**: 0.1 ## Performance The model shows improved capability in: - Generating structured JSON responses - Following tool-calling patterns - Maintaining context for multi-step tool execution - Producing consistent output formats ## Limitations - Requires the base LLaMA-3-8B-Instruct model to function - May generate invalid JSON in some edge cases - Performance depends on the quality of the training data ## License This model is released under the MIT License.