choco800 commited on
Commit
6882dca
·
verified ·
1 Parent(s): 8adadf6

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +72 -12
README.md CHANGED
@@ -1,21 +1,81 @@
1
  ---
2
  base_model: Qwen/Qwen3-4B-Instruct-2507
3
- tags:
4
- - text-generation-inference
5
- - transformers
6
- - unsloth
7
- - qwen3
8
- license: apache-2.0
9
  language:
10
  - en
 
 
 
 
 
 
 
 
11
  ---
12
 
13
- # Uploaded finetuned model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
- - **Developed by:** choco800
16
- - **License:** apache-2.0
17
- - **Finetuned from model :** Qwen/Qwen3-4B-Instruct-2507
 
 
 
 
 
18
 
19
- This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 
 
 
 
 
20
 
21
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
1
  ---
2
  base_model: Qwen/Qwen3-4B-Instruct-2507
3
+ datasets:
4
+ - u-10bei/sft_alfworld_trajectory_dataset_v5
5
+ - u-10bei/dbbench_sft_dataset_react
6
+ - u-10bei/dbbench_sft_dataset_react_v2
7
+ - u-10bei/dbbench_sft_dataset_react_v3
8
+ - u-10bei/dbbench_sft_dataset_react_v4
9
  language:
10
  - en
11
+ license: apache-2.0
12
+ pipeline_tag: text-generation
13
+ tags:
14
+ - unsloth
15
+ - agent
16
+ - tool-use
17
+ - alfworld
18
+ - dbbench
19
  ---
20
 
21
+ # Qwen3-4B Agent Trajectory (v3)
22
+
23
+ This repository provides a **fully merged model** fine-tuned from **Qwen/Qwen3-4B-Instruct-2507** using Unsloth.
24
+
25
+ Unlike standard adapter repositories, this repository contains the **merged weights**, meaning you do not need to load the base model separately.
26
+
27
+ ## Training Objective
28
+
29
+ This model is trained to improve **multi-turn agent task performance**
30
+ on ALFWorld (household tasks) and DBBench (database operations).
31
+
32
+ Loss is applied to **all assistant turns** in the multi-turn trajectory,
33
+ enabling the model to learn environment observation, action selection,
34
+ tool use, and recovery from errors.
35
+
36
+ ## Data Processing
37
+
38
+ - Train/Validation Split: 95% / 5%
39
+ - Random Seed: 3407 (used for shuffling and initialization)
40
+ - Loss Masking: Loss was computed only on the assistant's responses. User prompts and observations were masked during training (`train_on_responses_only` was applied to `<|im_start|>assistant\n`).
41
+
42
+ ## Training Configuration
43
+ - **Base model**: Qwen/Qwen3-4B-Instruct-2507
44
+ - **Method**: LoRA + Unsloth (Merged in 16-bit)
45
+ - **Max sequence length**: 8192
46
+ - **Epochs**: 1
47
+ - **Learning rate**: 1e-05
48
+ - **LoRA**: r=32, alpha=32
49
+ - **PER_DEVICE_TRAIN_BATCH_SIZE** = 8
50
+ - **GRAD_ACCUM** = 4
51
+ - **WARMUP_RATIO** = 0.1
52
+ - **WEIGHT_DECAY** = 0.05
53
+ - **NEFTUNE_NOISE_ALPHA** = 5.0
54
+ - **VAL_RATIO** = 0.05
55
+
56
+ ## Usage
57
+
58
+ ```python
59
+ from transformers import AutoModelForCausalLM, AutoTokenizer
60
+ import torch
61
+
62
+ model_id = "choco800/qwen3-4b-agent-v3"
63
 
64
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
65
+ model = AutoModelForCausalLM.from_pretrained(
66
+ model_id,
67
+ torch_dtype=torch.bfloat16,
68
+ device_map="auto",
69
+ )
70
+ ```
71
+ ## Sources & Terms (IMPORTANT)
72
 
73
+ Training data:
74
+ - u-10bei/sft_alfworld_trajectory_dataset_v5
75
+ - u-10bei/dbbench_sft_dataset_react
76
+ - u-10bei/dbbench_sft_dataset_react_v2
77
+ - u-10bei/dbbench_sft_dataset_react_v3
78
+ - u-10bei/dbbench_sft_dataset_react_v4
79
 
80
+ Dataset License: MIT License. These datasets are used and distributed under the terms of the MIT License.
81
+ Compliance: Users must comply with the dataset licenses and the base model's original terms of use (Apache 2.0).