DeepSeek-R1-telecom-chatbot-v2 / README.md

Update README.md

ef6f0b6 verified 10 months ago

6.43 kB

	---
	library_name: transformers
	tags: []
	---

	# Model Card


	## Model Description

	This is a preference-tuned version of DeepSeek-R1-Distill-Llama-8B, optimized for telecom-related queries. The model has been fine-tuned to provide concise and factual answers, ensuring that it does role-play as a customer service agent while incorporating user-specific preferences.

	- Developed by: Mohamed Abdulaziz

	- Model type: Fine-tune-DeepSeek-R1-Distill-Llama-8B

	- Framework Used: Unsloth for fine tuning and wandb for performance monitoring

	- Preference Tuning: The model has been additionally fine-tuned using Direct Preference Optimization (DPO) to align with user preferences, enhancing response quality and relevance.

	## Benefits of Preference Tuning with DPO

	Direct Preference Optimization (DPO) improves model performance by directly optimizing for user-preferred responses without requiring reinforcement learning with human feedback (RLHF).

	### Advantages of DPO in this Model:

	- Higher Quality Responses: The model generates more accurate and contextually relevant answers by aligning with preferred outputs.
	- Better User Experience: By fine-tuning responses based on preferred interactions, the model enhances satisfaction and usability.
	- Efficient Fine-Tuning: DPO allows preference alignment with reduced computational overhead compared to RLHF, making it more scalable and adaptable.
	- Improved Robustness: The model avoids inconsistencies by learning from direct user feedback rather than implicit reward functions.

	## Uses

	This model is designed for customer support automation in the telecom industry. It assists in:

	- Answering common user queries about 5G, network issues, billing, and services.
	- Providing concise and factually correct responses.
	- Reducing workload on human support agents by handling routine inquiries.
	- Adapting to user-specific preferences to improve interaction quality.

	### Who can use this model?

	- Telecom companies: Automate customer service via chatbots.
	- Developers & researchers: Fine-tune and adapt for different use cases.
	- Call centers: Support agents in handling user requests efficiently.

	### Who might be affected?

	- End-users interacting with telecom chatbots.
	- Support agents using AI-assisted tools.
	- Developers & data scientists fine-tuning and deploying the model.

	## How to Get Started with the Model

	### 1️⃣ Import necessary libraries

	```python
	import torch
	from unsloth import FastLanguageModel
	from transformers import AutoTokenizer
	```

	### 2️⃣ Define model path

	```python
	model_path = "moo100/DeepSeek-R1-telecom-chatbot-v2"
	```

	### 3️⃣ Load the model and tokenizer

	```python
	model, tokenizer = FastLanguageModel.from_pretrained(
	model_path,
	max_seq_length=1024,
	dtype=None,
	device_map="auto",
	load_in_4bit=True,
	trust_remote_code=True
	)
	```

	### 4️⃣ Optimize model for fast inference with Unsloth

	```python
	model = FastLanguageModel.for_inference(model)
	```

	### 5️⃣ Move model to GPU if available, otherwise use CPU

	```python
	device = "cuda" if torch.cuda.is_available() else "cpu"
	model.to(device)
	```

	### 6️⃣ Define system instruction to guide model responses

	```python
	system_instruction = """You are an AI assistant. Answer user questions concisely and factually.
	Do NOT add extra details unless necessary."""
	```

	### 7️⃣ Define user input (Replace with any query)

	```python
	user_input = "Hello"
	```

	### 8️⃣ Construct full prompt with instructions and user query

	```python
	full_prompt = f"{system_instruction}\n\nUser: {user_input}\nAssistant:"
	```

	### 9️⃣ Tokenize input prompt

	```python
	inputs = tokenizer(full_prompt, return_tensors="pt").to(device)
	```

	### 🔦 Generate model response with controlled stopping criteria

	```python
	outputs = model.generate(
	input_ids=inputs.input_ids,
	attention_mask=inputs.attention_mask,
	max_new_tokens=50,
	do_sample=True,
	temperature=0.5,
	top_k=50,
	eos_token_id=tokenizer.eos_token_id,
	)
	```

	### 1️⃣1️ Decode and extract only the newly generated response

	```python
	response = tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)
	```

	### 1️⃣2️ Print the AI-generated response

	```python
	print(response.split("User: ")[0])
	```



	## Training Details

	### Training Data

	A preference dataset was created using Phi4, where the chosen column was generated via evaluation prompts. This dataset ensures the model aligns closely with user preferences by fine-tuning responses accordingly.

	### Training Procedure

	- Training Loss: Decreased steadily from 0.6997 at step 10 to 0.0966 at step 360, indicating stable model convergence.
	- Rewards / Chosen: Improved consistently, reaching 1.4318, showcasing enhanced preference alignment.
	- Rewards / Rejected: Decreased over time, confirming the model's ability to distinguish between optimal and suboptimal responses.
	- Logits / Chosen & Rejected: Demonstrated increasing separation, ensuring stronger differentiation between desirable and undesirable outputs.
	- Gradient Norm & Learning Rate Schedule: Maintained stability with linear decay, preventing sudden fluctuations and ensuring smooth convergence.
	- Preference Alignment: Fine-tuned using DPO, incorporating user feedback to optimize responses effectively.



	# Evaluation

	## Methodology

	The chatbot was evaluated using Meta-Llama-3.3-70B-Instruct, assessing relevance, correctness, and fluency of its responses.

	## Results

	Meta-Llama-3.3-70B-Instruct Evaluation:

	- Relevance: 10
	The chatbot's response is highly relevant to the user's greeting, as it acknowledges the greeting and opens the conversation for further discussion.

	- Correctness: 10
	The chatbot's response is grammatically correct and uses proper language, making it a correct and appropriate response to the user's greeting.

	- Fluency: 10
	The chatbot's response is fluent and natural-sounding, with a friendly and approachable tone that invites the user to engage in conversation.

	Overall, the chatbot's response is a great example of a well-crafted greeting response that effectively sets the tone for a helpful and engaging conversation.