--- base_model: Qwen/Qwen3-4B-Instruct-2507 datasets: - u-10bei/sft_alfworld_trajectory_dataset_v5 - u-10bei/dbbench_sft_dataset_react_v4 language: - en license: apache-2.0 library_name: peft pipeline_tag: text-generation tags: - lora - agent - tool-use - alfworld - dbbench --- # <Qwen/Qwen3-4B-Instruct-2507/LoRA-combined_datasets--highLR--CleansedSQLdatasets-lessLORA> This repository provides a **LoRA adapter** fine-tuned from **Qwen/Qwen3-4B-Instruct-2507** using **LoRA + Unsloth**. This repository contains **LoRA adapter weights only**. The base model must be loaded separately. ## Training Objective This adapter is trained to improve **multi-turn agent task performance** on ALFWorld (household tasks) and DBBench (database operations). Loss is applied to **all assistant turns** in the multi-turn trajectory, enabling the model to learn environment observation, action selection, tool use, and recovery from errors. The training process ic consist of two steps. First, training for LoRA in order to be adapted to Database SQL, Then Secondary, that for ALF is performed separately. Finally, each LoRA adapter is merged into base model sequentially, LoRA for DB and then that for ALF. ## Training Configuration - Base model: Qwen/Qwen3-4B-Instruct-2507 - Method: LoRA (full precision base) - Max sequence length: 2048 - Epochs: 2 - Learning rate: 2e-06 - LoRA: r=64, alpha=128 ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel import torch base = "Qwen/Qwen3-4B-Instruct-2507" adapter = "your_id/your-repo" tokenizer = AutoTokenizer.from_pretrained(base) model = AutoModelForCausalLM.from_pretrained( base, torch_dtype=torch.float16, device_map="auto", ) model = PeftModel.from_pretrained(model, adapter) ``` ## Sources & Terms (IMPORTANT) Training data: u-10bei/dbbench_sft_dataset_react_v4 Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License. Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.