--- library_name: transformers tags: [] --- # Model Card ## Model Description This is a **preference-tuned** version of **DeepSeek-R1-Distill-Llama-8B**, optimized for **telecom-related queries**. The model has been fine-tuned to provide **concise and factual answers**, ensuring that it **does role-play as a customer service agent** while incorporating user-specific preferences. - **Developed by:** Mohamed Abdulaziz - **Model type:** Fine-tune-DeepSeek-R1-Distill-Llama-8B - **Framework Used:** Unsloth for fine tuning and wandb for performance monitoring - **Preference Tuning:** The model has been additionally fine-tuned using **Direct Preference Optimization (DPO)** to align with user preferences, enhancing response quality and relevance. ## Benefits of Preference Tuning with DPO Direct Preference Optimization (DPO) improves model performance by directly optimizing for user-preferred responses without requiring reinforcement learning with human feedback (RLHF). ### **Advantages of DPO in this Model:** - **Higher Quality Responses:** The model generates more accurate and contextually relevant answers by aligning with preferred outputs. - **Better User Experience:** By fine-tuning responses based on preferred interactions, the model enhances satisfaction and usability. - **Efficient Fine-Tuning:** DPO allows preference alignment with reduced computational overhead compared to RLHF, making it more scalable and adaptable. - **Improved Robustness:** The model avoids inconsistencies by learning from direct user feedback rather than implicit reward functions. ## Uses This model is designed for **customer support automation in the telecom industry**. It assists in: - Answering common user queries about **5G, network issues, billing, and services**. - Providing **concise and factually correct responses**. - Reducing **workload on human support agents** by handling routine inquiries. - **Adapting to user-specific preferences** to improve interaction quality. ### **Who can use this model?** - **Telecom companies**: Automate customer service via chatbots. - **Developers & researchers**: Fine-tune and adapt for different use cases. - **Call centers**: Support agents in handling user requests efficiently. ### **Who might be affected?** - **End-users** interacting with telecom chatbots. - **Support agents** using AI-assisted tools. - **Developers & data scientists** fine-tuning and deploying the model. ## How to Get Started with the Model ### **1️⃣ Import necessary libraries** ```python import torch from unsloth import FastLanguageModel from transformers import AutoTokenizer ``` ### **2️⃣ Define model path** ```python model_path = "moo100/DeepSeek-R1-telecom-chatbot-v2" ``` ### **3️⃣ Load the model and tokenizer** ```python model, tokenizer = FastLanguageModel.from_pretrained( model_path, max_seq_length=1024, dtype=None, device_map="auto", load_in_4bit=True, trust_remote_code=True ) ``` ### **4️⃣ Optimize model for fast inference with Unsloth** ```python model = FastLanguageModel.for_inference(model) ``` ### **5️⃣ Move model to GPU if available, otherwise use CPU** ```python device = "cuda" if torch.cuda.is_available() else "cpu" model.to(device) ``` ### **6️⃣ Define system instruction to guide model responses** ```python system_instruction = """You are an AI assistant. Answer user questions concisely and factually. Do NOT add extra details unless necessary.""" ``` ### **7️⃣ Define user input (Replace with any query)** ```python user_input = "Hello" ``` ### **8️⃣ Construct full prompt with instructions and user query** ```python full_prompt = f"{system_instruction}\n\nUser: {user_input}\nAssistant:" ``` ### **9️⃣ Tokenize input prompt** ```python inputs = tokenizer(full_prompt, return_tensors="pt").to(device) ``` ### **🔦 Generate model response with controlled stopping criteria** ```python outputs = model.generate( input_ids=inputs.input_ids, attention_mask=inputs.attention_mask, max_new_tokens=50, do_sample=True, temperature=0.5, top_k=50, eos_token_id=tokenizer.eos_token_id, ) ``` ### **1️⃣1️ Decode and extract only the newly generated response** ```python response = tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True) ``` ### **1️⃣2️ Print the AI-generated response** ```python print(response.split("User: ")[0]) ``` ## Training Details ### Training Data A **preference dataset** was created using **Phi4**, where the **chosen column** was generated via evaluation prompts. This dataset ensures the model aligns closely with user preferences by fine-tuning responses accordingly. ### Training Procedure - **Training Loss:** Decreased steadily from **0.6997** at step 10 to **0.0966** at step 360, indicating stable model convergence. - **Rewards / Chosen:** Improved consistently, reaching **1.4318**, showcasing enhanced preference alignment. - **Rewards / Rejected:** Decreased over time, confirming the model's ability to distinguish between optimal and suboptimal responses. - **Logits / Chosen & Rejected:** Demonstrated increasing separation, ensuring stronger differentiation between desirable and undesirable outputs. - **Gradient Norm & Learning Rate Schedule:** Maintained stability with linear decay, preventing sudden fluctuations and ensuring smooth convergence. - **Preference Alignment:** Fine-tuned using **DPO**, incorporating user feedback to optimize responses effectively. # Evaluation ## Methodology The chatbot was evaluated using Meta-Llama-3.3-70B-Instruct, assessing relevance, correctness, and fluency of its responses. ## Results Meta-Llama-3.3-70B-Instruct Evaluation: - **Relevance: 10** The chatbot's response is highly relevant to the user's greeting, as it acknowledges the greeting and opens the conversation for further discussion. - **Correctness: 10** The chatbot's response is grammatically correct and uses proper language, making it a correct and appropriate response to the user's greeting. - **Fluency: 10** The chatbot's response is fluent and natural-sounding, with a friendly and approachable tone that invites the user to engage in conversation. Overall, the chatbot's response is a great example of a well-crafted greeting response that effectively sets the tone for a helpful and engaging conversation.