VinaLLaMA
Collection
Second Generation, Most Powerful Open-Source Vietnamese LLMs. • 8 items • Updated • 13
How to use vilm/vinallama-12.5b-chat-DUS with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="vilm/vinallama-12.5b-chat-DUS") # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("vilm/vinallama-12.5b-chat-DUS")
model = AutoModelForCausalLM.from_pretrained("vilm/vinallama-12.5b-chat-DUS")How to use vilm/vinallama-12.5b-chat-DUS with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "vilm/vinallama-12.5b-chat-DUS"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "vilm/vinallama-12.5b-chat-DUS",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/vilm/vinallama-12.5b-chat-DUS
How to use vilm/vinallama-12.5b-chat-DUS with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "vilm/vinallama-12.5b-chat-DUS" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "vilm/vinallama-12.5b-chat-DUS",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "vilm/vinallama-12.5b-chat-DUS" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "vilm/vinallama-12.5b-chat-DUS",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use vilm/vinallama-12.5b-chat-DUS with Docker Model Runner:
docker model run hf.co/vilm/vinallama-12.5b-chat-DUS
Read our Paper
Prompt Format (ChatML):
<|im_start|>system
Bạn là một trợ lí AI hữu ích. Hãy trả lời người dùng một cách chính xác.
<|im_end|>
<|im_start|>user
Hello world!<|im_end|>
<|im_start|>assistant
This table is copied from VBD-Llama2 with updated results from VinaLLaMA-12.5B-chat-DUS
| Model | Model size | arc_vi (acc) | hellaswag_vi (acc) | mmlu_vi (acc) | truthfulqa_vi (acc) | Average |
|---|---|---|---|---|---|---|
| URA-LLaMA-13B | 13B | 0,3752 | 0,4830 | 0,3973 | 0,4574 | 0,4282 |
| BLOOMZ-7B | 7B | 0,3205 | 0,4930 | 0,3975 | 0,4523 | 0,4158 |
| PhoGPT-7B5-Instruct | 7B | 0,2470 | 0,2578 | 0,2413 | 0,4759 | 0,3055 |
| SeaLLM-7B-chat | 7B | 0,3607 | 0,5112 | 0,3339 | 0,4948 | 0,4252 |
| Vietcuna-7b-v3 | 7B | 0,3419 | 0,4939 | 0,3354 | 0,4807 | 0,4130 |
| VinaLLaMA-2.7B-chat | 7B | 0,3273 | 0,4814 | 0,3051 | 0,4972 | 0,4028 |
| VinaLLaMA-7B-chat | 7B | 0,4239 | 0,5407 | 0,3932 | 0,5251 | 0,4707 |
| VBD-LLaMA2-7B-50b | 7B | 0,3222 | 0,5195 | 0,2964 | 0,4614 | 0,3999 |
| VBD-LLaMA2-7B-50b-Chat | 7B | 0,3585 | 0,5207 | 0,3444 | 0,5179 | 0,4354 |
| VinaLLaMA-12.5B-chat-DUS | 12.5B | 0,4325 | 0,5816 | 0,3875 | 0,5850 | 0,4967 |
This model is a merge of the following models made with LazyMergekit:
slices:
- sources:
- model: vilm/vinallama-7b-chat
layer_range: [0, 16]
- sources:
- model: vilm/vinallama-7b-chat
layer_range: [8, 16]
- sources:
- model: vilm/vinallama-7b-chat
layer_range: [8, 16]
- sources:
- model: vilm/vinallama-7b-chat
layer_range: [16, 24]
- sources:
- model: vilm/vinallama-7b-chat
layer_range: [16, 24]
- sources:
- model: vilm/vinallama-7b-chat
layer_range: [24, 28]
- sources:
- model: vilm/vinallama-7b-chat
layer_range: [24, 28]
- sources:
- model: vilm/vinallama-7b-chat
layer_range: [28, 32]
merge_method: passthrough
dtype: bfloat16