Instructions to use SeaLLMs/SeaLLM-7B-v2.5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use SeaLLMs/SeaLLM-7B-v2.5 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="SeaLLMs/SeaLLM-7B-v2.5")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("SeaLLMs/SeaLLM-7B-v2.5")
model = AutoModelForCausalLM.from_pretrained("SeaLLMs/SeaLLM-7B-v2.5")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use SeaLLMs/SeaLLM-7B-v2.5 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "SeaLLMs/SeaLLM-7B-v2.5"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SeaLLMs/SeaLLM-7B-v2.5",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/SeaLLMs/SeaLLM-7B-v2.5

SGLang

How to use SeaLLMs/SeaLLM-7B-v2.5 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "SeaLLMs/SeaLLM-7B-v2.5" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SeaLLMs/SeaLLM-7B-v2.5",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "SeaLLMs/SeaLLM-7B-v2.5" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SeaLLMs/SeaLLM-7B-v2.5",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use SeaLLMs/SeaLLM-7B-v2.5 with Docker Model Runner:
```
docker model run hf.co/SeaLLMs/SeaLLM-7B-v2.5
```

Prompt Template with Langchain

by Hanifahreza - opened Apr 25, 2024

Discussion

Hanifahreza

Apr 25, 2024

I'm trying to make an LLM-RAG system using Langchain and ChromaDB by imitating the given prompt template with this model, but the output is gibberish. Here's how I define the model, tokenizer, ChromaDB, and the prompt template:

# Load Model
model_id = "/home/model/SeaLLM-7B-v2.5/"
tokenizer = AutoTokenizer.from_pretrained(model_id, device_map='auto')
model = AutoModelForCausalLM.from_pretrained(model_id, device_map='auto')

# ChromaDB
db = Chroma.from_documents(pages, HuggingFaceEmbeddings(model_name="/home/model/all-MiniLM-L6-v2/"), persist_directory = '/home/playground/Triton/chromadb/')

prompt_template = """
<|im_start|>system
Anda adalah sistem asisten. Anda akan diberikan sebuah pertanyaan. Anda diberikan
konteks berikut untuk membantu menjawab pertanyaan tersebut:
CONTEXT: {context}<eos>
<|im_start|>user
QUESTION: {question}<eos>
<|im_start|>assistant
ANSWER:"""

prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
print(tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt_template)))
# ['<bos>', '\n', '<', '|', 'im', '_', 'start', '|>', 'system', '\n', 'Anda', '▁adalah', '▁sistem', '▁asisten', '.', '▁Anda', '▁akan', '▁diberikan', '▁sebuah', '▁pertanyaan', '.', '▁Anda', '▁diberikan', '\n', 'kon', 'teks', '▁berikut', '▁untuk', '▁membantu', '▁menjawab', '▁pertanyaan', '▁tersebut', ':', '\n', 'CONTEXT', ':', '▁{', 'context', '}', '<eos>', '\n', '<', '|', 'im', '_', 'start', '|>', 'user', '\n', 'QUESTION', ':', '▁{', 'question', '}', '\n']

I suspect there's something wrong with my prompt template because I use Langchain but I can't find what. Any help is really appreciated. Thanks for your hard work.

nxphi47

SeaLLMs - Language Models for Southeast Asian Languages org Apr 26, 2024

@Hanifahreza There should be no \n at the beginning, but I dont think that is an issue.

Can you craft your full langchain prompt into a complete prompt and run the model with model.generate(**inputs, do_sample=True, temperature=0.7) to see if it works normally?

Note that if you've set repetition penalty, you must set it to 1

Hanifahreza

Apr 26, 2024

Ok, so I have tried to craft the langchain prompt by eliminating the '\n' after the token like this:

prompt_template = """<|im_start|>system
Anda adalah sistem asisten. Anda akan diberikan sebuah pertanyaan yang harus dijawab dalam Bahasa Indonesia. 
Anda diberikan konteks berikut untuk membantu menjawab pertanyaan tersebut:
CONTEXT: {context}<eos>
<|im_start|>user
QUESTION: {question}
"""

prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
print(tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt_template)))
#['<bos>', '<', '|', 'im', '_', 'start', '|>', 'system', '\n', 'Anda', '▁adalah', '▁sistem', '▁asisten', '.', '▁Anda', '▁akan', '▁diberikan', '▁sebuah', '▁pertanyaan', '▁yang', '▁harus', '▁di', 'jawab', '▁dalam', '▁Bahasa', '▁Indonesia', '.', '▁', '\n', 'Anda', '▁diberikan', '▁kon', 'teks', '▁berikut', '▁untuk', '▁membantu', '▁menjawab', '▁pertanyaan', '▁tersebut', ':', '\n', 'CONTEXT', ':', '▁{', 'context', '}', '<eos>', '\n', '<', '|', 'im', '_', 'start', '|>', 'user', '\n', 'QUESTION', ':', '▁{', 'question', '}', '\n']

then, I input a dummy context and question that is obvious to the prompt and fed it to the model directly like this:

inputs = {
    "context": 'net sales apple adalah 3 juta rupiah',
    "question": 'berapa net sales apple?'
}

full_prompt = prompt_template.format(**inputs)
generated_output = model.generate(input_ids=tokenizer.encode(full_prompt, return_tensors="pt"), max_length=100, do_sample=True, temperature=0.7)
print(tokenizer.decode(generated_output[0], skip_special_tokens=True))

The result of that print is:

'<|im_start|>system\nAnda adalah sistem asisten. Anda akan diberikan sebuah pertanyaan yang harus dijawab dalam Bahasa Indonesia. \nAnda diberikan konteks berikut untuk membantu menjawab pertanyaan tersebut:\nCONTEXT: net sales apple adalah 3 juta rupiah\n<|im_start|>user\nQUESTION: berapa net sales apple?\nANSWER: Net sales Apple adalah 3 juta rupiah.'

It seems like the model does indeed work. It provides the correct result in the ANSWER. After some investigations, I think I found the culprit behind the gibberish here:

db = Chroma.from_documents(pages, HuggingFaceEmbeddings(model_name="/home/model/all-MiniLM-L6-v2/"), persist_directory = '/home/playground/Triton/chromadb/')
retriever = db.as_retriever()
memory = ConversationBufferWindowMemory(
    memory_key="chat_history", k=4,
    return_messages=True,  input_key='question', output_key='answer')

qa = ConversationalRetrievalChain.from_llm(
      llm=llm,
      retriever=retriever,
      memory=memory,
      combine_docs_chain_kwargs={"prompt": prompt},
      return_generated_question=True
  )

question = "berapa net sales Apple?"
bot_result = qa({"question": question})

print(bot_result['generated_question'])
# 128011280112801128011280112801128011280112801128011280…
print(bot_result['answer'])
# 128011280112801128011280112801128011280112801128011280…

So I guess there's something wrong when the question is generated from the prompt template after the context and question is passed to it, but I don't understand what.

nxphi47

SeaLLMs - Language Models for Southeast Asian Languages org Apr 26, 2024

@Hanifahreza I remembered this case. When you pass in llm=llm, it doesn't follow the chat format, but directly inject the prompt/instruction as pure text, which cause the model fails to follow the instruction. You need to figure it out

dinsky

Jul 4, 2024

do you find solution for this problem? because i have same issue

Hanifahreza

Jul 5, 2024

do you find solution for this problem? because i have same issue

Unfortunately not.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment