Instructions to use vilm/vinallama-12.5b-chat-DUS with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use vilm/vinallama-12.5b-chat-DUS with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="vilm/vinallama-12.5b-chat-DUS")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("vilm/vinallama-12.5b-chat-DUS")
model = AutoModelForCausalLM.from_pretrained("vilm/vinallama-12.5b-chat-DUS")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use vilm/vinallama-12.5b-chat-DUS with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "vilm/vinallama-12.5b-chat-DUS"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "vilm/vinallama-12.5b-chat-DUS",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/vilm/vinallama-12.5b-chat-DUS

SGLang

How to use vilm/vinallama-12.5b-chat-DUS with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "vilm/vinallama-12.5b-chat-DUS" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "vilm/vinallama-12.5b-chat-DUS",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "vilm/vinallama-12.5b-chat-DUS" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "vilm/vinallama-12.5b-chat-DUS",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use vilm/vinallama-12.5b-chat-DUS with Docker Model Runner:
```
docker model run hf.co/vilm/vinallama-12.5b-chat-DUS
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

VinaLLaMA - State-of-the-art Vietnamese LLMs

Read our Paper

Prompt Format (ChatML):

<|im_start|>system
Bạn là một trợ lí AI hữu ích. Hãy trả lời người dùng một cách chính xác.
<|im_end|>
<|im_start|>user
Hello world!<|im_end|>
<|im_start|>assistant

Evaluation

This table is copied from VBD-Llama2 with updated results from VinaLLaMA-12.5B-chat-DUS

Model	Model size	arc_vi (acc)	hellaswag_vi (acc)	mmlu_vi (acc)	truthfulqa_vi (acc)	Average
URA-LLaMA-13B	13B	0,3752	0,4830	0,3973	0,4574	0,4282
BLOOMZ-7B	7B	0,3205	0,4930	0,3975	0,4523	0,4158
PhoGPT-7B5-Instruct	7B	0,2470	0,2578	0,2413	0,4759	0,3055
SeaLLM-7B-chat	7B	0,3607	0,5112	0,3339	0,4948	0,4252
Vietcuna-7b-v3	7B	0,3419	0,4939	0,3354	0,4807	0,4130
VinaLLaMA-2.7B-chat	7B	0,3273	0,4814	0,3051	0,4972	0,4028
VinaLLaMA-7B-chat	7B	0,4239	0,5407	0,3932	0,5251	0,4707
VBD-LLaMA2-7B-50b	7B	0,3222	0,5195	0,2964	0,4614	0,3999
VBD-LLaMA2-7B-50b-Chat	7B	0,3585	0,5207	0,3444	0,5179	0,4354
VinaLLaMA-12.5B-chat-DUS	12.5B	0,4325	0,5816	0,3875	0,5850	0,4967

Merging Methods

This model is a merge of the following models made with LazyMergekit:

vilm/vinallama-7b-chat

🧩 Configuration

slices:
  - sources:
    - model: vilm/vinallama-7b-chat
      layer_range: [0, 16]
  - sources:
    - model: vilm/vinallama-7b-chat
      layer_range: [8, 16]
  - sources:
    - model: vilm/vinallama-7b-chat
      layer_range: [8, 16]      
  - sources:
    - model: vilm/vinallama-7b-chat
      layer_range: [16, 24]
  - sources:
    - model: vilm/vinallama-7b-chat
      layer_range: [16, 24]
  - sources:
    - model: vilm/vinallama-7b-chat
      layer_range: [24, 28]
  - sources:
    - model: vilm/vinallama-7b-chat
      layer_range: [24, 28]
  - sources:
    - model: vilm/vinallama-7b-chat
      layer_range: [28, 32]
merge_method: passthrough
dtype: bfloat16

Downloads last month: 8

Safetensors

Model size

13B params

Tensor type

BF16

Model tree for vilm/vinallama-12.5b-chat-DUS

Quantizations

1 model

Collection including vilm/vinallama-12.5b-chat-DUS

VinaLLaMA

Collection

Second Generation, Most Powerful Open-Source Vietnamese LLMs. • 8 items • Updated Feb 9, 2024 • 13

Paper for vilm/vinallama-12.5b-chat-DUS

VinaLLaMA: LLaMA-based Vietnamese Foundation Model

Paper • 2312.11011 • Published Dec 18, 2023 • 23