qwen2.5-7b-agent-trajectory-lora
This repository provides a LoRA adapter fine-tuned from Qwen/Qwen2.5-7B-Instruct using LoRA + Unsloth.
This repository contains LoRA adapter weights only. The base model must be loaded separately.
Training Objective
This adapter is trained to improve multi-turn agent task performance on ALFWorld (household tasks) and DBBench (database operations).
Loss is applied to all assistant turns in the multi-turn trajectory, enabling the model to learn environment observation, action selection, tool use, and recovery from errors.
Training Configuration
- Base model: Qwen/Qwen2.5-7B-Instruct
- Method: LoRA (full precision base)
- Max sequence length: 4096
- Epochs: 1
- Learning rate: 2e-06
- LoRA: r=64, alpha=128
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base = "Qwen/Qwen2.5-7B-Instruct"
adapter = "your_id/your-repo"
tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
base,
torch_dtype=torch.float16,
device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)
Sources & Terms (IMPORTANT)
Training data: u-10bei/sft_alfworld_trajectory_dataset, u-10bei/sft_alfworld_trajectory_dataset_v2, u-10bei/sft_alfworld_trajectory_dataset_v3, u-10bei/sft_alfworld_trajectory_dataset_v4, u-10bei/sft_alfworld_trajectory_dataset_v5
Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License. Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.
##以下メモ
-----------------------------
環境変数の設定
-----------------------------
下記の値を書き換えることで、コード本体を編集せずに設定を変更できます。
1. モデル・データセット関連
os.environ["SFT_BASE_MODEL"] = "Qwen/Qwen2.5-7B-Instruct" os.environ["SFT_DATASET_ID"] = "u-10bei/sft_alfworld_trajectory_dataset,u-10bei/sft_alfworld_trajectory_dataset_v2,u-10bei/sft_alfworld_trajectory_dataset_v3,u-10bei/sft_alfworld_trajectory_dataset_v4,u-10bei/sft_alfworld_trajectory_dataset_v5,u-10bei/dbbench_sft_dataset_react,u-10bei/dbbench_sft_dataset_react_v2,u-10bei/dbbench_sft_dataset_react_v3,u-10bei/dbbench_sft_dataset_react_v4" # 複数のデータセットを使う場合はカンマ区切りで指定: "dataset1,dataset2,dataset3" os.environ["SFT_OUT_LORA_DIR"] = "/content/lora_agentbench_qwen3_4b"
2. 学習の基本パラメータ
os.environ["SFT_SEED"] = "3407" os.environ["SFT_VAL_RATIO"] = "0.05" os.environ["SFT_MAX_SEQ_LEN"] = "4096"
3. LoRA (アダプタ) 設定
os.environ["SFT_LORA_R"] = "64" os.environ["SFT_LORA_ALPHA"] = "128" os.environ["SFT_LORA_DROPOUT"] = "0" os.environ["SFT_LORA_TARGET_MODULES"] = "q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj"
4. ハイパーパラメータ
os.environ["SFT_EPOCHS"] = "1" os.environ["SFT_PER_DEVICE_TRAIN_BS"] = "16" os.environ["SFT_PER_DEVICE_EVAL_BS"] = "16" os.environ["SFT_GRAD_ACCUM"] = "1" os.environ["SFT_LR"] = "2e-6" os.environ["SFT_WARMUP_RATIO"] = "0.1" os.environ["SFT_WEIGHT_DECAY"] = "0.05"
5. ステップ・保存設定
os.environ["SFT_MAX_STEPS"] = "-1" # -1でエポックベース。動作確認時は 10 などに。 os.environ["SFT_LOGGING_STEPS"] = "10" os.environ["SFT_EVAL_STEPS"] = "30" os.environ["SFT_SAVE_STEPS"] = "100" os.environ["SFT_SAVE_TOTAL_LIMIT"] = "2"
6. 特殊学習設定 (CoTマスク・アップサンプリング)
os.environ["SFT_MASK_COT"] = "0" # "1" で有効, "0" で無効 os.environ["SFT_TRUNCATE_COT"] = "0" # "1" で有効(マーカー前トークンをpad_tokenで置換), "0" で無効 os.environ["SFT_OUTPUT_MARKERS"] = "Action:,Output:,OUTPUT:,Final:,Answer:,Result:,Response:" os.environ["SFT_OUTPUT_LEARN_MODE"] = "from_marker" # "after_marker" または "from_marker" os.environ["SFT_USE_UPSAMPLING"] = "0" # "1" で有効, "0" で無効 # データ2 専用 os.environ["SFT_UPSAMPLE_RULES"] = '{"dbbench":{"aggregation-MAX":3.0,"counting":2.5,"comparison":2.5,"aggregation-SUM":2.0,"INSERT":2.0,"ranking":1.5}}'
print("環境変数の設定が完了しました。")
どちらを学習するか選択
TASK = "alfworld" # "dbbench" or "alfworld" or "mixed"
タスクごとの設定
TASK_CONFIGS = { "dbbench": { "datasets": [ "u-10bei/dbbench_sft_dataset_react", "u-10bei/dbbench_sft_dataset_react_v2", "u-10bei/dbbench_sft_dataset_react_v3", "u-10bei/dbbench_sft_dataset_react_v4" ], "output_dir": "./lora_adapters/dbbench", "include_types": ["INSERT", "UPDATE", "other","comparison","counting","ranking","aggregation-SUM","aggregation-MIN","aggregation-MAX","aggregation-AVG"] }, "alfworld": { "datasets": [ "u-10bei/sft_alfworld_trajectory_dataset", "u-10bei/sft_alfworld_trajectory_dataset_v2", "u-10bei/sft_alfworld_trajectory_dataset_v3", "u-10bei/sft_alfworld_trajectory_dataset_v4", "u-10bei/sft_alfworld_trajectory_dataset_v5" ], "output_dir": "./lora_adapters/alfworld" }, "mixed": { "datasets": [ # DBBench "u-10bei/dbbench_sft_dataset_react", "u-10bei/dbbench_sft_dataset_react_v2", "u-10bei/dbbench_sft_dataset_react_v3", "u-10bei/dbbench_sft_dataset_react_v4", # ALFWorld "u-10bei/sft_alfworld_trajectory_dataset", "u-10bei/sft_alfworld_trajectory_dataset_v2", "u-10bei/sft_alfworld_trajectory_dataset_v3", "u-10bei/sft_alfworld_trajectory_dataset_v4", "u-10bei/sft_alfworld_trajectory_dataset_v5" ], "output_dir": "./lora_adapters/mixed" } }
環境変数を設定
config = TASK_CONFIGS[TASK] os.environ["SFT_DATASET_ID"] = ",".join(config["datasets"]) os.environ["SFT_OUT_LORA_DIR"] = config["output_dir"]
メタデータフィルタリング設定(オプション)
if "exclude_types" in config: os.environ["SFT_EXCLUDE_TYPES"] = json.dumps(config["exclude_types"]) print(f" Excluding types: {config['exclude_types']}") elif "include_types" in config: os.environ["SFT_INCLUDE_TYPES"] = json.dumps(config["include_types"]) print(f" Including only types: {config['include_types']}") else: # フィルタリングなし os.environ["SFT_EXCLUDE_TYPES"] = "" os.environ["SFT_INCLUDE_TYPES"] = ""
print(f"✅ Task selected: {TASK}") print(f" Datasets: {len(config['datasets'])} datasets") print(f" Output: {config['output_dir']}")
- Downloads last month
- 2