Agent_try09 / README.md
Shin-YAM's picture
Upload merged Qwen3-4B-Instruct-2507 model (auto-generated README)
9055709 verified
---
base_model: Qwen/Qwen3-4B-Instruct-2507
datasets:
- u-10bei/sft_alfworld_trajectory_dataset_v5
- u-10bei/dbbench_sft_dataset_react_v4
language:
- en
license: apache-2.0
library_name: peft
pipeline_tag: text-generation
tags:
- lora
- agent
- tool-use
- alfworld
- dbbench
---
# <Qwen/Qwen3-4B-Instruct-2507/LoRA-combined_datasets--highLR--CleansedSQLdatasets-lessLORA>
This repository provides a **LoRA adapter** fine-tuned from
**Qwen/Qwen3-4B-Instruct-2507** using **LoRA + Unsloth**.
This repository contains **LoRA adapter weights only**.
The base model must be loaded separately.
## Training Objective
This adapter is trained to improve **multi-turn agent task performance**
on ALFWorld (household tasks) and DBBench (database operations).
Loss is applied to **all assistant turns** in the multi-turn trajectory,
enabling the model to learn environment observation, action selection,
tool use, and recovery from errors.
The training process ic consist of two steps.
First, training for LoRA in order to be adapted to Database SQL,
Then Secondary, that for ALF is performed separately.
Finally, each LoRA adapter is merged into base model sequentially,
LoRA for DB and then that for ALF.
## Training Configuration
- Base model: Qwen/Qwen3-4B-Instruct-2507
- Method: LoRA (full precision base)
- Max sequence length: 2048
- Epochs: 2
- Learning rate: 2e-06
- LoRA: r=64, alpha=128
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "your_id/your-repo"
tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
base,
torch_dtype=torch.float16,
device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)
```
## Sources & Terms (IMPORTANT)
Training data: u-10bei/dbbench_sft_dataset_react_v4
Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License.
Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.