hara-CU commited on
Commit
f4ffc73
·
verified ·
1 Parent(s): e5c174f

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +56 -13
README.md CHANGED
@@ -1,21 +1,64 @@
1
  ---
2
- base_model: unsloth/Qwen3-4B-Instruct-2507
3
- tags:
4
- - text-generation-inference
5
- - transformers
6
- - unsloth
7
- - qwen3
8
- license: apache-2.0
9
  language:
10
  - en
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
- # Uploaded finetuned model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
- - **Developed by:** hara-CU
16
- - **License:** apache-2.0
17
- - **Finetuned from model :** unsloth/Qwen3-4B-Instruct-2507
18
 
19
- This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
20
 
21
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
1
  ---
2
+ base_model: Qwen/Qwen3-4B-Instruct-2507
3
+ datasets:
4
+ - hara-CU/LLM2025_DB_base_AW_345NoEAd_ALFformat_QH5L4R5_1392
 
 
 
 
5
  language:
6
  - en
7
+ license: mit
8
+ library_name: peft
9
+ pipeline_tag: text-generation
10
+ tags:
11
+ - lora
12
+ - agent
13
+ - tool-use
14
+ - alfworld
15
+ - dbbench
16
  ---
17
 
18
+ # Qwen3-4B-DBbase_AW_345NoEAd_ALFformat_QH5L4R5_1392-r16a32-B16-2ep-5e6
19
+
20
+ This repository provides a **LoRA adapter** fine-tuned from
21
+ **Qwen/Qwen3-4B-Instruct-2507** using **LoRA + Unsloth**.
22
+
23
+
24
+ ## Training Objective
25
+
26
+ This model is trained to improve **multi-turn agent task performance**
27
+ on ALFWorld (household tasks) and DBBench (database operations).
28
+
29
+ Loss is applied to **all assistant turns** in the multi-turn trajectory,
30
+ enabling the model to learn environment observation, action selection,
31
+ tool use, and recovery from errors.
32
+
33
+ ## Training Configuration
34
+
35
+ - Base model: Qwen/Qwen3-4B-Instruct-2507
36
+ - Method: LoRA (full precision base)
37
+ - Max sequence length: 8192
38
+ - Epochs: 2
39
+ - Learning rate: 5e-06
40
+ - LoRA: r=16, alpha=32, use_rslora=False
41
+ - TOTAL_BATCH_SIZE: 16
42
+
43
+ ## Usage
44
+
45
+ ```python
46
+ from transformers import AutoModelForCausalLM, AutoTokenizer
47
+ import torch
48
+
49
+ model_id = "hara-CU/Qwen3-4B-DBbase_AW_345NoEAd_ALFformat_QH5L4R5_1392-r16a32-B16-2ep-5e6"
50
+
51
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
52
+ model = AutoModelForCausalLM.from_pretrained(
53
+ model_id,
54
+ torch_dtype=torch.float16,
55
+ device_map="auto",
56
+ )
57
+ ```
58
 
59
+ ## Sources & Terms (IMPORTANT)
 
 
60
 
61
+ Training data: hara-CU/LLM2025_DB_base_AW_345NoEAd_ALFformat_QH5L4R5_1392
62
 
63
+ Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License.
64
+ Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.