UtsuSl0th
/

mixed-lora-1

@@ -2,6 +2,7 @@
 base_model: unsloth/Qwen2.5-7B-Instruct
 datasets:
 - u-10bei/sft_alfworld_trajectory_dataset_v5
 language:
 - en
 license: apache-2.0
@@ -15,7 +16,7 @@ tags:
 - dbbench
 ---
-# ＜【課題】ここは自分で記入して下さい＞
 This repository provides a **LoRA adapter** fine-tuned from
 **unsloth/Qwen2.5-7B-Instruct** using **LoRA + Unsloth**.
@@ -32,14 +33,50 @@ Loss is applied to **all assistant turns** in the multi-turn trajectory,
 enabling the model to learn environment observation, action selection,
 tool use, and recovery from errors.
 ## Training Configuration
-- Base model: unsloth/Qwen2.5-7B-Instruct
-- Method: LoRA (full precision base)
-- Max sequence length: 4096
-- Epochs: 3
-- Learning rate: 8e-06
-- LoRA: r=64, alpha=128
 ## Usage
@@ -49,12 +86,12 @@ from peft import PeftModel
 import torch
 base = "unsloth/Qwen2.5-7B-Instruct"
-adapter = "your_id/your-repo"
 tokenizer = AutoTokenizer.from_pretrained(base)
 model = AutoModelForCausalLM.from_pretrained(
     base,
-    torch_dtype=torch.float16,
     device_map="auto",
 )
 model = PeftModel.from_pretrained(model, adapter)
@@ -62,7 +99,9 @@ model = PeftModel.from_pretrained(model, adapter)
 ## Sources & Terms (IMPORTANT)
-Training data: u-10bei/sft_alfworld_trajectory_dataset_v5
-Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License.
 Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.

 base_model: unsloth/Qwen2.5-7B-Instruct
 datasets:
 - u-10bei/sft_alfworld_trajectory_dataset_v5
+- u-10bei/dbbench_sft_dataset_react_v4
 language:
 - en
 license: apache-2.0
 - dbbench
 ---
+# ＜Qwen2.5-7B-Agent-Mixed-Trajectory-LoRA＞
 This repository provides a **LoRA adapter** fine-tuned from
 **unsloth/Qwen2.5-7B-Instruct** using **LoRA + Unsloth**.
 enabling the model to learn environment observation, action selection,
 tool use, and recovery from errors.
+## Dataset Construction
+Training data was built by mixing and preprocessing two trajectory datasets:
+- **ALFWorld** (`u-10bei/sft_alfworld_trajectory_dataset_v5`): 2,327 samples after cleaning
+- **DBBench** (`u-10bei/dbbench_sft_dataset_react_v4`): 1,200 samples after cleaning
+Preprocessing steps applied:
+1. Structural validation (removes empty / single-turn samples)
+2. Chat template tag contamination removal (`htags` pattern)
+3. Hallucinated object ID removal — ALFWorld only (e.g. `bowl 99`)
+Category-level upsampling was applied to reinforce weak task types
+identified from evaluation results of a prior model:
+| Category | Multiplier | Reason |
+|---|---|---|
+| ALFWorld multi-object | ×3 | 0% success rate in prior eval |
+| ALFWorld cool | ×2 | 12% success rate |
+| ALFWorld examine | ×1.5 | 12% success rate |
+| DBBench aggregation-MAX | ×3 | 17% accuracy in prior eval |
+| DBBench INSERT | ×2 | 32% accuracy |
+| DBBench counting | ×2 | 36% accuracy |
+Final dataset size after mixing and upsampling: **5,169 samples**
 ## Training Configuration
+| Parameter | Value |
+|---|---|
+| Base model | unsloth/Qwen2.5-7B-Instruct |
+| Method | LoRA + Unsloth (Colab Pro A100) |
+| Max sequence length | 4096 |
+| Epochs | 3 |
+| Learning rate | 8e-6 |
+| LoRA r | 64 |
+| LoRA alpha | 128 |
+| LoRA dropout | 0 |
+| LoRA target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
+| Per-device batch size | 4 |
+| Gradient accumulation | 4 (effective batch size: 16) |
+| Warmup ratio | 0.1 |
+| Weight decay | 0.05 |
+| Seed | 3407 |
 ## Usage
 import torch
 base = "unsloth/Qwen2.5-7B-Instruct"
+adapter = "UtsuSl0th/your-repo-name"
 tokenizer = AutoTokenizer.from_pretrained(base)
 model = AutoModelForCausalLM.from_pretrained(
     base,
+    torch_dtype=torch.bfloat16,
     device_map="auto",
 )
 model = PeftModel.from_pretrained(model, adapter)
 ## Sources & Terms (IMPORTANT)
+Training data:
+- `u-10bei/sft_alfworld_trajectory_dataset_v5`
+- `u-10bei/dbbench_sft_dataset_react_v4`
+Dataset License: MIT License. These datasets are used and distributed under the terms of the MIT License.
 Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.