KOUJI039 commited on
Commit
c27a65a
·
verified ·
1 Parent(s): 8352583

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -16
README.md CHANGED
@@ -4,7 +4,7 @@ datasets:
4
  - u-10bei/sft_alfworld_trajectory_dataset_v5
5
  language:
6
  - en
7
- license: apache-2.0
8
  library_name: peft
9
  pipeline_tag: text-generation
10
  tags:
@@ -12,35 +12,35 @@ tags:
12
  - agent
13
  - tool-use
14
  - alfworld
15
- - dbbench
16
  ---
17
 
18
- (2026-02-26) README practice edit by KOUJI039
19
-
20
- # <【課題】ここは自分で記入して下さい>
21
 
22
  This repository provides a **LoRA adapter** fine-tuned from
23
- **Qwen/Qwen3-4B-Instruct-2507** using **LoRA + Unsloth**.
24
 
25
  This repository contains **LoRA adapter weights only**.
26
  The base model must be loaded separately.
27
 
28
  ## Training Objective
29
 
30
- This adapter is trained to improve **multi-turn agent task performance**
31
- on ALFWorld (household tasks) and DBBench (database operations).
32
 
33
- Loss is applied to **all assistant turns** in the multi-turn trajectory,
34
- enabling the model to learn environment observation, action selection,
35
  tool use, and recovery from errors.
36
 
 
 
 
37
  ## Training Configuration
38
 
39
  - Base model: Qwen/Qwen3-4B-Instruct-2507
40
- - Method: LoRA (full precision base)
41
  - Max sequence length: 2048
42
  - Epochs: 2
43
- - Learning rate: 2e-06
44
  - LoRA: r=64, alpha=128
45
 
46
  ## Usage
@@ -51,7 +51,7 @@ from peft import PeftModel
51
  import torch
52
 
53
  base = "Qwen/Qwen3-4B-Instruct-2507"
54
- adapter = "your_id/your-repo"
55
 
56
  tokenizer = AutoTokenizer.from_pretrained(base)
57
  model = AutoModelForCausalLM.from_pretrained(
@@ -64,7 +64,8 @@ model = PeftModel.from_pretrained(model, adapter)
64
 
65
  ## Sources & Terms (IMPORTANT)
66
 
67
- Training data: u-10bei/sft_alfworld_trajectory_dataset_v5
 
68
 
69
- Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License.
70
- Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.
 
4
  - u-10bei/sft_alfworld_trajectory_dataset_v5
5
  language:
6
  - en
7
+ license: mit
8
  library_name: peft
9
  pipeline_tag: text-generation
10
  tags:
 
12
  - agent
13
  - tool-use
14
  - alfworld
 
15
  ---
16
 
17
+ # LLM Lecture 2025 Advanced Competition (AgentBench: DBBench + ALFWorld)
 
 
18
 
19
  This repository provides a **LoRA adapter** fine-tuned from
20
+ **Qwen/Qwen3-4B-Instruct-2507** using LoRA + Unsloth.
21
 
22
  This repository contains **LoRA adapter weights only**.
23
  The base model must be loaded separately.
24
 
25
  ## Training Objective
26
 
27
+ This adapter is trained to improve multi-turn agent task performance
28
+ on ALFWorld (household tasks).
29
 
30
+ Loss is applied to all assistant turns in the multi-turn trajectory,
31
+ enabling the model to learn observation grounding, action selection,
32
  tool use, and recovery from errors.
33
 
34
+ Training data used in this run is ALFWorld only (see `datasets` in YAML).
35
+ Evaluation in the competition includes AgentBench tasks (**DBBench + ALFWorld**) by the organizers.
36
+
37
  ## Training Configuration
38
 
39
  - Base model: Qwen/Qwen3-4B-Instruct-2507
40
+ - Method: LoRA (PEFT)
41
  - Max sequence length: 2048
42
  - Epochs: 2
43
+ - Learning rate: 1.5e-6
44
  - LoRA: r=64, alpha=128
45
 
46
  ## Usage
 
51
  import torch
52
 
53
  base = "Qwen/Qwen3-4B-Instruct-2507"
54
+ adapter = "KOUJI039/structeval-qwen3-4b-sft-try20"
55
 
56
  tokenizer = AutoTokenizer.from_pretrained(base)
57
  model = AutoModelForCausalLM.from_pretrained(
 
64
 
65
  ## Sources & Terms (IMPORTANT)
66
 
67
+ Training data:
68
+ - u-10bei/sft_alfworld_trajectory_dataset_v5
69
 
70
+ This repository does NOT redistribute the dataset.
71
+ Users must comply with the dataset license and base model terms.