Instructions to use SeaLLMs/SeaLLM-7B-v2.5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use SeaLLMs/SeaLLM-7B-v2.5 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="SeaLLMs/SeaLLM-7B-v2.5") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("SeaLLMs/SeaLLM-7B-v2.5") model = AutoModelForCausalLM.from_pretrained("SeaLLMs/SeaLLM-7B-v2.5") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use SeaLLMs/SeaLLM-7B-v2.5 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "SeaLLMs/SeaLLM-7B-v2.5" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SeaLLMs/SeaLLM-7B-v2.5", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/SeaLLMs/SeaLLM-7B-v2.5
- SGLang
How to use SeaLLMs/SeaLLM-7B-v2.5 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "SeaLLMs/SeaLLM-7B-v2.5" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SeaLLMs/SeaLLM-7B-v2.5", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "SeaLLMs/SeaLLM-7B-v2.5" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SeaLLMs/SeaLLM-7B-v2.5", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use SeaLLMs/SeaLLM-7B-v2.5 with Docker Model Runner:
docker model run hf.co/SeaLLMs/SeaLLM-7B-v2.5
Prompt Template with Langchain
I'm trying to make an LLM-RAG system using Langchain and ChromaDB by imitating the given prompt template with this model, but the output is gibberish. Here's how I define the model, tokenizer, ChromaDB, and the prompt template:
# Load Model
model_id = "/home/model/SeaLLM-7B-v2.5/"
tokenizer = AutoTokenizer.from_pretrained(model_id, device_map='auto')
model = AutoModelForCausalLM.from_pretrained(model_id, device_map='auto')
# ChromaDB
db = Chroma.from_documents(pages, HuggingFaceEmbeddings(model_name="/home/model/all-MiniLM-L6-v2/"), persist_directory = '/home/playground/Triton/chromadb/')
prompt_template = """
<|im_start|>system
Anda adalah sistem asisten. Anda akan diberikan sebuah pertanyaan. Anda diberikan
konteks berikut untuk membantu menjawab pertanyaan tersebut:
CONTEXT: {context}<eos>
<|im_start|>user
QUESTION: {question}<eos>
<|im_start|>assistant
ANSWER:"""
prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
print(tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt_template)))
# ['<bos>', '\n', '<', '|', 'im', '_', 'start', '|>', 'system', '\n', 'Anda', '▁adalah', '▁sistem', '▁asisten', '.', '▁Anda', '▁akan', '▁diberikan', '▁sebuah', '▁pertanyaan', '.', '▁Anda', '▁diberikan', '\n', 'kon', 'teks', '▁berikut', '▁untuk', '▁membantu', '▁menjawab', '▁pertanyaan', '▁tersebut', ':', '\n', 'CONTEXT', ':', '▁{', 'context', '}', '<eos>', '\n', '<', '|', 'im', '_', 'start', '|>', 'user', '\n', 'QUESTION', ':', '▁{', 'question', '}', '\n']
I suspect there's something wrong with my prompt template because I use Langchain but I can't find what. Any help is really appreciated. Thanks for your hard work.
@Hanifahreza There should be no \n at the beginning, but I dont think that is an issue.
Can you craft your full langchain prompt into a complete prompt and run the model with model.generate(**inputs, do_sample=True, temperature=0.7) to see if it works normally?
Note that if you've set repetition penalty, you must set it to 1
Ok, so I have tried to craft the langchain prompt by eliminating the '\n' after the token like this:
prompt_template = """<|im_start|>system
Anda adalah sistem asisten. Anda akan diberikan sebuah pertanyaan yang harus dijawab dalam Bahasa Indonesia.
Anda diberikan konteks berikut untuk membantu menjawab pertanyaan tersebut:
CONTEXT: {context}<eos>
<|im_start|>user
QUESTION: {question}
"""
prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
print(tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt_template)))
#['<bos>', '<', '|', 'im', '_', 'start', '|>', 'system', '\n', 'Anda', '▁adalah', '▁sistem', '▁asisten', '.', '▁Anda', '▁akan', '▁diberikan', '▁sebuah', '▁pertanyaan', '▁yang', '▁harus', '▁di', 'jawab', '▁dalam', '▁Bahasa', '▁Indonesia', '.', '▁', '\n', 'Anda', '▁diberikan', '▁kon', 'teks', '▁berikut', '▁untuk', '▁membantu', '▁menjawab', '▁pertanyaan', '▁tersebut', ':', '\n', 'CONTEXT', ':', '▁{', 'context', '}', '<eos>', '\n', '<', '|', 'im', '_', 'start', '|>', 'user', '\n', 'QUESTION', ':', '▁{', 'question', '}', '\n']
then, I input a dummy context and question that is obvious to the prompt and fed it to the model directly like this:
inputs = {
"context": 'net sales apple adalah 3 juta rupiah',
"question": 'berapa net sales apple?'
}
full_prompt = prompt_template.format(**inputs)
generated_output = model.generate(input_ids=tokenizer.encode(full_prompt, return_tensors="pt"), max_length=100, do_sample=True, temperature=0.7)
print(tokenizer.decode(generated_output[0], skip_special_tokens=True))
The result of that print is:
'<|im_start|>system\nAnda adalah sistem asisten. Anda akan diberikan sebuah pertanyaan yang harus dijawab dalam Bahasa Indonesia. \nAnda diberikan konteks berikut untuk membantu menjawab pertanyaan tersebut:\nCONTEXT: net sales apple adalah 3 juta rupiah\n<|im_start|>user\nQUESTION: berapa net sales apple?\nANSWER: Net sales Apple adalah 3 juta rupiah.'
It seems like the model does indeed work. It provides the correct result in the ANSWER. After some investigations, I think I found the culprit behind the gibberish here:
db = Chroma.from_documents(pages, HuggingFaceEmbeddings(model_name="/home/model/all-MiniLM-L6-v2/"), persist_directory = '/home/playground/Triton/chromadb/')
retriever = db.as_retriever()
memory = ConversationBufferWindowMemory(
memory_key="chat_history", k=4,
return_messages=True, input_key='question', output_key='answer')
qa = ConversationalRetrievalChain.from_llm(
llm=llm,
retriever=retriever,
memory=memory,
combine_docs_chain_kwargs={"prompt": prompt},
return_generated_question=True
)
question = "berapa net sales Apple?"
bot_result = qa({"question": question})
print(bot_result['generated_question'])
# 128011280112801128011280112801128011280112801128011280…
print(bot_result['answer'])
# 128011280112801128011280112801128011280112801128011280…
So I guess there's something wrong when the question is generated from the prompt template after the context and question is passed to it, but I don't understand what.
@Hanifahreza I remembered this case. When you pass in llm=llm, it doesn't follow the chat format, but directly inject the prompt/instruction as pure text, which cause the model fails to follow the instruction. You need to figure it out
do you find solution for this problem? because i have same issue
do you find solution for this problem? because i have same issue
Unfortunately not.