Confirmation on domain
Hi Ronan,
I would like to confirm on the domain for the specific model. It looks that works only with code based questions. How it works on specific (call center)other domains. ?
Note:
I have seen from "DeepSeek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese."
Please clarify on this.
Thanks,
santhosh
I just added a demo video to the model card.
Yeah, it's a coder model and is missing general knowledge, but responds well to language - I guess that 13% is enough, it's about 260B tokens so that's quite a bit.
I tried with below prompt and got the below response.
prompt = "hello, how to book a ticket?"
result = "I'm sorry, but as an AI Programming Assistant, I'm only able to provide assistance with computer science-related questions. I'm unable to help with booking tickets or other non-computer science related tasks.
"
Please help on this.
Can you post a full code example, including showing the prompt, how you tokenize the prompt and how send it to the model.
I can then replicate to troubleshoot.
I tried with open source model to check the model response.
Code: =======================
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-1.3b-instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-1.3b-instruct", trust_remote_code=True).cuda()
messages=[
{ 'role': 'user', 'content': "hello, how to book a ticket?"}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=32021)
print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))
Result: ==========================
I'm sorry, but as an AI Programming Assistant, I'm only able to provide assistance with computer science-related questions. I'm unable to help with booking tickets or other non-computer science related tasks.
Yup, that's because using apply_chat_template appends a restrictive system message (it's a bit sneaky):
# Run a sample for Deepseek coder
messages = [
# {'role': 'system', 'content': "<FUNCTIONS>sample functions</FUNCTIONS>"},
{'role': 'user', 'content': "How are you today?"}
]
# Apply the chat template to the messages
inputs = tokenizer_A.apply_chat_template(messages, return_tensors="pt")
# Assuming `inputs` contains the tensor you provided
decoded_input = tokenizer_A.decode(inputs[0], skip_special_tokens=False)
print(decoded_input)
if you print those inputs, you'll get:
You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.
### Instruction:
How are you today?
### Response:
That's why it's better to use the prompt template I gave in the card and manually put the prompt together with:
# B_INST, E_INST = "\n### Instruction:\n", "\n### Response:\n" #DeepSeek Coder Style
prompt = f"{B_FUNC}{function_list.strip()}{E_FUNC}{B_INST}{user_prompt.strip()}{E_INST}\n\n"
or without functions:
prompt = f"{B_INST}{user_prompt.strip()}{E_INST}\n\n"
If that still doesn't work, but I think it will, then I can fine-tune the non-instruct version for you.