moo100 commited on
Commit
09a9e5b
·
verified ·
1 Parent(s): fa57495

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +176 -16
README.md CHANGED
@@ -1,17 +1,177 @@
1
- ---
2
- base_model: unsloth/deepseek-r1-distill-llama-8b-unsloth-bnb-4bit
3
- tags:
4
- - text-generation-inference
5
- - transformers
6
- - unsloth
7
- - llama
8
- - trl
9
- license: apache-2.0
10
- language:
11
- - en
12
- ---
13
-
14
- # Uploaded model
15
-
16
- - **Developed by:** mohamed Abdulaziz
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
 
1
+ ## Model Description
2
+
3
+ This is a **preference-tuned** version of **DeepSeek-R1-Distill-Llama-8B**, optimized for **telecom-related queries**. The model has been fine-tuned to provide **concise and factual answers**, ensuring that it **does role-play as a customer service agent** while incorporating user-specific preferences.
4
+
5
+ - **Developed by:** Mohamed Abdulaziz
6
+
7
+ - **Model type:** Fine-tune-DeepSeek-R1-Distill-Llama-8B
8
+
9
+ - **Framework Used:** Unsloth for fine tuning and wandb for performance monitoring
10
+
11
+ - **Preference Tuning:** The model has been additionally fine-tuned using **Direct Preference Optimization (DPO)** to align with user preferences, enhancing response quality and relevance.
12
+
13
+ ## Benefits of Preference Tuning with DPO
14
+
15
+ Direct Preference Optimization (DPO) improves model performance by directly optimizing for user-preferred responses without requiring reinforcement learning with human feedback (RLHF).
16
+
17
+ ### **Advantages of DPO in this Model:**
18
+
19
+ - **Higher Quality Responses:** The model generates more accurate and contextually relevant answers by aligning with preferred outputs.
20
+ - **Better User Experience:** By fine-tuning responses based on preferred interactions, the model enhances satisfaction and usability.
21
+ - **Efficient Fine-Tuning:** DPO allows preference alignment with reduced computational overhead compared to RLHF, making it more scalable and adaptable.
22
+ - **Improved Robustness:** The model avoids inconsistencies by learning from direct user feedback rather than implicit reward functions.
23
+
24
+ ## Uses
25
+
26
+ This model is designed for **customer support automation in the telecom industry**. It assists in:
27
+
28
+ - Answering common user queries about **5G, network issues, billing, and services**.
29
+ - Providing **concise and factually correct responses**.
30
+ - Reducing **workload on human support agents** by handling routine inquiries.
31
+ - **Adapting to user-specific preferences** to improve interaction quality.
32
+
33
+ ### **Who can use this model?**
34
+
35
+ - **Telecom companies**: Automate customer service via chatbots.
36
+ - **Developers & researchers**: Fine-tune and adapt for different use cases.
37
+ - **Call centers**: Support agents in handling user requests efficiently.
38
+
39
+ ### **Who might be affected?**
40
+
41
+ - **End-users** interacting with telecom chatbots.
42
+ - **Support agents** using AI-assisted tools.
43
+ - **Developers & data scientists** fine-tuning and deploying the model.
44
+
45
+ ## How to Get Started with the Model
46
+
47
+ ### **1️⃣ Import necessary libraries**
48
+
49
+ ```python
50
+ import torch
51
+ from unsloth import FastLanguageModel
52
+ from transformers import AutoTokenizer
53
+ ```
54
+
55
+ ### **2️⃣ Define model path**
56
+
57
+ ```python
58
+ model_path = "moo100/DeepSeek-R1-telecom-chatbot-v2"
59
+ ```
60
+
61
+ ### **3️⃣ Load the model and tokenizer**
62
+
63
+ ```python
64
+ model, tokenizer = FastLanguageModel.from_pretrained(
65
+ model_path,
66
+ max_seq_length=1024,
67
+ dtype=None,
68
+ device_map="auto",
69
+ load_in_4bit=True,
70
+ trust_remote_code=True
71
+ )
72
+ ```
73
+
74
+ ### **4️⃣ Optimize model for fast inference with Unsloth**
75
+
76
+ ```python
77
+ model = FastLanguageModel.for_inference(model)
78
+ ```
79
+
80
+ ### **5️⃣ Move model to GPU if available, otherwise use CPU**
81
+
82
+ ```python
83
+ device = "cuda" if torch.cuda.is_available() else "cpu"
84
+ model.to(device)
85
+ ```
86
+
87
+ ### **6️⃣ Define system instruction to guide model responses**
88
+
89
+ ```python
90
+ system_instruction = """You are an AI assistant. Answer user questions concisely and factually.
91
+ Do NOT add extra details unless necessary."""
92
+ ```
93
+
94
+ ### **7️⃣ Define user input (Replace with any query)**
95
+
96
+ ```python
97
+ user_input = "Hello"
98
+ ```
99
+
100
+ ### **8️⃣ Construct full prompt with instructions and user query**
101
+
102
+ ```python
103
+ full_prompt = f"{system_instruction}\n\nUser: {user_input}\nAssistant:"
104
+ ```
105
+
106
+ ### **9️⃣ Tokenize input prompt**
107
+
108
+ ```python
109
+ inputs = tokenizer(full_prompt, return_tensors="pt").to(device)
110
+ ```
111
+
112
+ ### **🔦 Generate model response with controlled stopping criteria**
113
+
114
+ ```python
115
+ outputs = model.generate(
116
+ input_ids=inputs.input_ids,
117
+ attention_mask=inputs.attention_mask,
118
+ max_new_tokens=50,
119
+ do_sample=True,
120
+ temperature=0.5,
121
+ top_k=50,
122
+ eos_token_id=tokenizer.eos_token_id,
123
+ )
124
+ ```
125
+
126
+ ### **1️⃣1️ Decode and extract only the newly generated response**
127
+
128
+ ```python
129
+ response = tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)
130
+ ```
131
+
132
+ ### **1️⃣2️ Print the AI-generated response**
133
+
134
+ ```python
135
+ print(response.split("User: ")[0])
136
+ ```
137
+
138
+
139
+
140
+ ## Training Details
141
+
142
+ ### Training Data
143
+
144
+ A **preference dataset** was created using **Phi4**, where the **chosen column** was generated via evaluation prompts. This dataset ensures the model aligns closely with user preferences by fine-tuning responses accordingly.
145
+
146
+ ### Training Procedure
147
+
148
+ - **Training Loss:** Decreased steadily from **0.6997** at step 10 to **0.0966** at step 360, indicating stable model convergence.
149
+ - **Rewards / Chosen:** Improved consistently, reaching **1.4318**, showcasing enhanced preference alignment.
150
+ - **Rewards / Rejected:** Decreased over time, confirming the model's ability to distinguish between optimal and suboptimal responses.
151
+ - **Logits / Chosen & Rejected:** Demonstrated increasing separation, ensuring stronger differentiation between desirable and undesirable outputs.
152
+ - **Gradient Norm & Learning Rate Schedule:** Maintained stability with linear decay, preventing sudden fluctuations and ensuring smooth convergence.
153
+ - **Preference Alignment:** Fine-tuned using **DPO**, incorporating user feedback to optimize responses effectively.
154
+
155
+
156
+
157
+ # Evaluation
158
+
159
+ ## Methodology
160
+
161
+ The chatbot was evaluated using Meta-Llama-3.3-70B-Instruct, assessing relevance, correctness, and fluency of its responses.
162
+
163
+ ## Results
164
+
165
+ Meta-Llama-3.3-70B-Instruct Evaluation:
166
+
167
+ - **Relevance: 10**
168
+ The chatbot's response is highly relevant to the user's greeting, as it acknowledges the greeting and opens the conversation for further discussion.
169
+
170
+ - **Correctness: 10**
171
+ The chatbot's response is grammatically correct and uses proper language, making it a correct and appropriate response to the user's greeting.
172
+
173
+ - **Fluency: 10**
174
+ The chatbot's response is fluent and natural-sounding, with a friendly and approachable tone that invites the user to engage in conversation.
175
+
176
+ Overall, the chatbot's response is a great example of a well-crafted greeting response that effectively sets the tone for a helpful and engaging conversation.
177