QuantFactory/openthaigpt1.5-7b-instruct-GGUF
This is quantized version of openthaigpt/openthaigpt1.5-7b-instruct created using llama.cpp
Original Model Card
๐น๐ญ OpenThaiGPT 7b 1.5 Instruct
๐น๐ญ OpenThaiGPT 7b Version 1.5 is an advanced 7-billion-parameter Thai language chat model based on Qwen v2.5 released on September 30, 2024. It has been specifically fine-tuned on over 2,000,000 Thai instruction pairs and is capable of answering Thai-specific domain questions.
Online Demo:
Example code for API Calling
https://github.com/OpenThaiGPT/openthaigpt1.5_api_examples
Highlights
- State-of-the-art Thai language LLM, achieving the highest average scores across various Thai language exams compared to other open-source Thai LLMs.
- Multi-turn conversation support for extended dialogues.
- Retrieval Augmented Generation (RAG) compatibility for enhanced response generation.
- Impressive context handling: Processes up to 131,072 tokens of input and generates up to 8,192 tokens, enabling detailed and complex interactions.
- Tool calling support: Enables users to efficiently call various functions through intelligent responses.
Benchmark on OpenThaiGPT Eval
** Please take a look at openthaigpt/openthaigpt1.5-7b-instruct for this model's evaluation result.
| Exam names | scb10x/llama-3-typhoon-v1.5x-8b-instruct | meta-llama/Llama-3.1-7B-Instruct | Qwen/Qwen2.5-7B-Instruct_stat | openthaigpt/openthaigpt1.5-7b |
|---|---|---|---|---|
| 01_a_level | 46.67% | 47.50% | 58.33% | 60.00% |
| 02_tgat | 32.00% | 36.00% | 32.00% | 36.00% |
| 03_tpat1 | 52.50% | 55.00% | 57.50% | 57.50% |
| 04_investment_consult | 56.00% | 48.00% | 68.00% | 76.00% |
| 05_facebook_beleble_th_200 | 78.00% | 73.00% | 79.00% | 81.00% |
| 06_xcopa_th_200 | 79.50% | 69.00% | 80.50% | 81.00% |
| 07_xnli2.0_th_200 | 56.50% | 55.00% | 53.00% | 54.50% |
| 08_onet_m3_thai | 48.00% | 32.00% | 72.00% | 64.00% |
| 09_onet_m3_social | 75.00% | 50.00% | 90.00% | 80.00% |
| 10_onet_m3_math | 25.00% | 18.75% | 31.25% | 31.25% |
| 11_onet_m3_science | 46.15% | 42.31% | 46.15% | 46.15% |
| 12_onet_m3_english | 70.00% | 76.67% | 86.67% | 83.33% |
| 13_onet_m6_thai | 47.69% | 29.23% | 46.15% | 53.85% |
| 14_onet_m6_math | 29.41% | 17.65% | 29.41% | 29.41% |
| 15_onet_m6_social | 50.91% | 43.64% | 56.36% | 58.18% |
| 16_onet_m6_science | 42.86% | 32.14% | 57.14% | 57.14% |
| 17_onet_m6_english | 65.38% | 71.15% | 78.85% | 80.77% |
| Micro Average | 60.65% | 55.60% | 64.41% | 65.78% |
Thai language multiple choice exams, Test on unseen test set, Zero-shot learning. Benchmark source code and exams information: https://github.com/OpenThaiGPT/openthaigpt_eval
(Updated on: 30 September 2024)
Benchmark on scb10x/thai_exam
| Models | Thai Exam (Acc) |
|---|---|
| api/claude-3-5-sonnet-20240620 | 69.2 |
| openthaigpt/openthaigpt1.5-72b-instruct* | 64.07 |
| api/gpt-4o-2024-05-13 | 63.89 |
| hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4 | 63.54 |
| Qwen/Qwen2-72B-Instruct | 58.23 |
| meta-llama/Meta-Llama-3.1-70B-Instruct | 58.23 |
| scb10x/llama-3-typhoon-v1.5x-70b-instruct | 58.76 |
| Qwen/Qwen2.5-14B-Instruct | 57.35 |
| api/gpt-4o-mini-2024-07-18 | 54.51 |
| openthaigpt/openthaigpt1.5-7b-instruct* | 52.04 |
| SeaLLMs/SeaLLMs-v3-7B-Chat | 51.33 |
| openthaigpt/openthaigpt-1.0.0-70b-chat | 50.09 |
* Evaluated by OpenThaiGPT team using scb10x/thai_exam.
Licenses
- Built with Qwen
- Qwen License: Allow Research and
Commercial uses but if your user base exceeds 100 million monthly active users, you need to negotiate a separate commercial license. Please see LICENSE file for more information.
Sponsors
Supports
- Official website: https://openthaigpt.aieat.or.th
- Facebook page: https://web.facebook.com/groups/openthaigpt
- A Discord server for discussion and support here
- E-mail: kobkrit@aieat.or.th
Prompt Format
Prompt format is based on ChatML.
<|im_start|>system\n{sytem_prompt}<|im_end|>\n<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\n
System prompt:
เธเธธเธเธเธทเธญเธเธนเนเธเนเธงเธขเธเธญเธเธเธณเธเธฒเธกเธเธตเนเธเธฅเธฒเธเนเธฅเธฐเธเธทเนเธญเธชเธฑเธเธขเน
Examples
Single Turn Conversation Example
<|im_start|>system\nเธเธธเธเธเธทเธญเธเธนเนเธเนเธงเธขเธเธญเธเธเธณเธเธฒเธกเธเธตเนเธเธฅเธฒเธเนเธฅเธฐเธเธทเนเธญเธชเธฑเธเธขเน<|im_end|>\n<|im_start|>user\nเธชเธงเธฑเธชเธเธตเธเธฃเธฑเธ<|im_end|>\n<|im_start|>assistant\n
Single Turn Conversation with Context (RAG) Example
<|im_start|>system\nเธเธธเธเธเธทเธญเธเธนเนเธเนเธงเธขเธเธญเธเธเธณเธเธฒเธกเธเธตเนเธเธฅเธฒเธเนเธฅเธฐเธเธทเนเธญเธชเธฑเธเธขเน<|im_end|>\n<|im_start|>user\nเธเธฃเธธเธเนเธเธเธกเธซเธฒเธเธเธฃ เนเธเนเธเนเธกเธทเธญเธเธซเธฅเธงเธ เธเธเธฃเนเธฅเธฐเธกเธซเธฒเธเธเธฃเธเธตเนเธกเธตเธเธฃเธฐเธเธฒเธเธฃเธกเธฒเธเธเธตเนเธชเธธเธเธเธญเธเธเธฃเธฐเนเธเธจเนเธเธข เธเธฃเธธเธเนเธเธเธกเธซเธฒเธเธเธฃเธกเธตเธเธทเนเธเธเธตเนเธเธฑเนเธเธซเธกเธ 1,568.737 เธเธฃ.เธเธก. เธกเธตเธเธฃเธฐเธเธฒเธเธฃเธเธฒเธกเธเธฐเนเธเธตเธขเธเธฃเธฒเธฉเธเธฃเธเธงเนเธฒ 8 เธฅเนเธฒเธเธเธ\nเธเธฃเธธเธเนเธเธเธกเธซเธฒเธเธเธฃเธกเธตเธเธทเนเธเธเธตเนเนเธเนเธฒเนเธฃเน<|im_end|>\n<|im_start|>assistant\n
Multi Turn Conversation Example
First turn
<|im_start|>system\nเธเธธเธเธเธทเธญเธเธนเนเธเนเธงเธขเธเธญเธเธเธณเธเธฒเธกเธเธตเนเธเธฅเธฒเธเนเธฅเธฐเธเธทเนเธญเธชเธฑเธเธขเน<|im_end|>\n<|im_start|>user\nเธชเธงเธฑเธชเธเธตเธเธฃเธฑเธ<|im_end|>\n<|im_start|>assistant\n
Second turn
<|im_start|>system\nเธเธธเธเธเธทเธญเธเธนเนเธเนเธงเธขเธเธญเธเธเธณเธเธฒเธกเธเธตเนเธเธฅเธฒเธเนเธฅเธฐเธเธทเนเธญเธชเธฑเธเธขเน<|im_end|>\n<|im_start|>user\nเธชเธงเธฑเธชเธเธตเธเธฃเธฑเธ<|im_end|>\n<|im_start|>assistant\nเธชเธงเธฑเธชเธเธตเธเธฃเธฑเธ เธขเธดเธเธเธตเธเนเธญเธเธฃเธฑเธเธเธฃเธฑเธ เธเธธเธเธเนเธญเธเธเธฒเธฃเนเธซเนเธเธฑเธเธเนเธงเธขเธญเธฐเนเธฃเธเธฃเธฑเธ?<|im_end|>\n<|im_start|>user\nเธเธฃเธธเธเนเธเธเธกเธซเธฒเธเธเธฃ เธเธทเนเธญเนเธเนเธกเธขเธฒเธงเนเธเธทเธญเธญเธฐเนเธฃ<|im_end|>\n<|im_start|>assistant\n
Result
<|im_start|>system\nเธเธธเธเธเธทเธญเธเธนเนเธเนเธงเธขเธเธญเธเธเธณเธเธฒเธกเธเธตเนเธเธฅเธฒเธเนเธฅเธฐเธเธทเนเธญเธชเธฑเธเธขเน<|im_end|>\n<|im_start|>user\nเธชเธงเธฑเธชเธเธตเธเธฃเธฑเธ<|im_end|>\n<|im_start|>assistant\nเธชเธงเธฑเธชเธเธตเธเธฃเธฑเธ เธขเธดเธเธเธตเธเนเธญเธเธฃเธฑเธเธเธฃเธฑเธ เธเธธเธเธเนเธญเธเธเธฒเธฃเนเธซเนเธเธฑเธเธเนเธงเธขเธญเธฐเนเธฃเธเธฃเธฑเธ?<|im_end|>\n<|im_start|>user\nเธเธฃเธธเธเนเธเธเธกเธซเธฒเธเธเธฃ เธเธทเนเธญเนเธเนเธกเธขเธฒเธงเนเธเธทเธญเธญเธฐเนเธฃ<|im_end|>\n<|im_start|>assistant\nเธเธทเนเธญเนเธเนเธกเธเธญเธเธเธฃเธธเธเนเธเธเธกเธซเธฒเธเธเธฃเธเธทเธญ \"เธเธฃเธธเธเนเธเธเธกเธซเธฒเธเธเธฃ เธญเธกเธฃเธฃเธฑเธเธเนเธเธชเธดเธเธเธฃเน เธกเธซเธดเธเธเธฃเธฒเธขเธธเธเธขเธฒ เธกเธซเธฒเธเธดเธฅเธเธ เธ เธเธเธฃเธฑเธเธเธฃเธฒเธเธเธฒเธเธตเธเธนเธฃเธตเธฃเธกเธขเน เธญเธธเธเธกเธฃเธฒเธเธเธดเนเธงเธจเธเนเธกเธซเธฒเธชเธเธฒเธ เธญเธกเธฃเธเธดเธกเธฒเธเธญเธงเธเธฒเธฃเธชเธเธดเธ เธชเธฑเธเธเธฐเธเธฑเธเธเธดเธขเธงเธดเธฉเธเธธเธเธฃเธฃเธกเธเธฃเธฐเธชเธดเธเธเธดเน\"
How to use
Free API Service (hosted by Siam.Ai and Float16.cloud)
Siam.AI
curl https://api.aieat.or.th/v1/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer dummy" \
-d '{
"model": ".",
"prompt": "<|im_start|>system\nเธเธธเธเธเธทเธญเธเธนเนเธเนเธงเธขเธเธญเธเธเธณเธเธฒเธกเธเธตเนเธเธฅเธฒเธเนเธฅเธฐเธเธทเนเธญเธชเธฑเธเธขเน<|im_end|>\n<|im_start|>user\nเธเธฃเธธเธเนเธเธเธกเธซเธฒเธเธเธฃเธเธทเธญเธญเธฐเนเธฃ<|im_end|>\n<|im_start|>assistant\n",
"max_tokens": 512,
"temperature": 0.7,
"top_p": 0.8,
"top_k": 40,
"stop": ["<|im_end|>"]
}'
Float16
curl -X POST https://api.float16.cloud/dedicate/78y8fJLuzE/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer float16-AG0F8yNce5s1DiXm1ujcNrTaZquEdaikLwhZBRhyZQNeS7Dv0X" \
-d '{
"model": "openthaigpt/openthaigpt1.5-7b-instruct",
"messages": [
{
"role": "system",
"content": "เธเธธเธเธเธทเธญเธเธนเนเธเนเธงเธขเธเธญเธเธเธณเธเธฒเธกเธเธตเนเธเธฅเธฒเธเนเธฅเธฐเธเธทเนเธญเธชเธฑเธเธขเน"
},
{
"role": "user",
"content": "เธชเธงเธฑเธชเธเธต"
}
]
}'
OpenAI Client Library (Hosted by VLLM, please see below.)
import openai
# Configure OpenAI client to use vLLM server
openai.api_base = "http://127.0.0.1:8000/v1"
openai.api_key = "dummy" # vLLM doesn't require a real API key
prompt = "<|im_start|>system\nเธเธธเธเธเธทเธญเธเธนเนเธเนเธงเธขเธเธญเธเธเธณเธเธฒเธกเธเธตเนเธเธฅเธฒเธเนเธฅเธฐเธเธทเนเธญเธชเธฑเธเธขเน<|im_end|>\n<|im_start|>user\nเธเธฃเธธเธเนเธเธเธกเธซเธฒเธเธเธฃเธเธทเธญเธญเธฐเนเธฃ<|im_end|>\n<|im_start|>assistant\n"
try:
response = openai.Completion.create(
model=".", # Specify the model you're using with vLLM
prompt=prompt,
max_tokens=512,
temperature=0.7,
top_p=0.8,
top_k=40,
stop=["<|im_end|>"]
)
print("Generated Text:", response.choices[0].text)
except Exception as e:
print("Error:", str(e))
Huggingface
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "openthaigpt/openthaigpt1.5-72b-instruct"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "เธเธฃเธฐเนเธเธจเนเธเธขเธเธทเธญเธญเธฐเนเธฃ"
messages = [
{"role": "system", "content": "เธเธธเธเธเธทเธญเธเธนเนเธเนเธงเธขเธเธญเธเธเธณเธเธฒเธกเธเธตเนเธเธฅเธฒเธเนเธฅเธฐเธเธทเนเธญเธชเธฑเธเธขเน"},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
vLLM
Install VLLM (https://github.com/vllm-project/vllm)
Run server
vllm serve openthaigpt/openthaigpt1.5-72b-instruct --tensor-parallel-size 4
- Note, change
--tensor-parallel-size 4to the amount of available GPU cards.
- Run inference (CURL example)
curl -X POST 'http://127.0.0.1:8000/v1/completions' \
-H 'Content-Type: application/json' \
-d '{
"model": ".",
"prompt": "<|im_start|>system\nเธเธธเธเธเธทเธญเธเธนเนเธเนเธงเธขเธเธญเธเธเธณเธเธฒเธกเธเธตเนเธเธฅเธฒเธเนเธฅเธฐเธเธทเนเธญเธชเธฑเธเธขเน<|im_end|>\n<|im_start|>user\nเธชเธงเธฑเธชเธเธตเธเธฃเธฑเธ<|im_end|>\n<|im_start|>assistant\n",
"max_tokens": 512,
"temperature": 0.7,
"top_p": 0.8,
"top_k": 40,
"stop": ["<|im_end|>"]
}'
Processing Long Texts
The current config.json is set for context length up to 32,768 tokens.
To handle extensive inputs exceeding 32,768 tokens, we utilize YaRN, a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts.
For supported frameworks, you could add the following to config.json to enable YaRN:
{
...
"rope_scaling": {
"factor": 4.0,
"original_max_position_embeddings": 32768,
"type": "yarn"
}
}
Tool Calling
The Tool Calling feature in OpenThaiGPT 1.5 enables users to efficiently call various functions through intelligent responses. This includes making external API calls to retrieve real-time data, such as current temperature information, or predicting future data simply by submitting a query. For example, a user can ask OpenThaiGPT, โWhat is the current temperature in San Francisco?โ and the AI will execute a pre-defined function to provide an immediate response without the need for additional coding. This feature also allows for broader applications with external data sources, including the ability to call APIs for services such as weather updates, stock market information, or data from within the userโs own system.
Example:
import openai
def get_temperature(location, date=None, unit="celsius"):
"""Get temperature for a location (current or specific date)."""
if date:
return {"temperature": 25.9, "location": location, "date": date, "unit": unit}
return {"temperature": 26.1, "location": location, "unit": unit}
tools = [
{
"name": "get_temperature",
"description": "Get temperature for a location (current or by date).",
"parameters": {
"location": "string", "date": "string (optional)", "unit": "enum [celsius, fahrenheit]"
},
}
]
messages = [{"role": "user", "content": "เธญเธธเธเธซเธ เธนเธกเธดเธเธตเน San Francisco เธงเธฑเธเธเธตเนเธตเนเธฅเธฐเธเธฃเธธเนเนเธเธเธตเนเธเธทเธญเนเธเนเธฒเนเธฃเน?"}]
# Simulated response flow using OpenThaiGPT Tool Calling
response = openai.ChatCompletion.create(
model=".", messages=messages, tools=tools, temperature=0.7, max_tokens=512
)
print(response)
Full example: https://github.com/OpenThaiGPT/openthaigpt1.5_api_examples/blob/main/api_tool_calling_powered_by_siamai.py
GPU Memory Requirements
| Number of Parameters | FP 16 bits | 8 bits (Quantized) | 4 bits (Quantized) | Example Graphic Card for 4 bits |
|---|---|---|---|---|
| 7b | 24 GB | 12 GB | 6 GB | Nvidia RTX 4060 8GB |
| 13b | 48 GB | 24 GB | 12 GB | Nvidia RTX 4070 16GB |
| 72b | 192 GB | 96 GB | 48 GB | Nvidia RTX 4090 24GB x 2 cards |
Authors
- Sumeth Yuenyong (sumeth.yue@mahidol.edu)
- Kobkrit Viriyayudhakorn (kobkrit@aieat.or.th)
- Apivadee Piyatumrong (apivadee.piy@nectec.or.th)
- Jillaphat Jaroenkantasima (autsadang41@gmail.com)
- Thaweewat Rugsujarit (thaweewr@scg.com)
- Norapat Buppodom (new@norapat.com)
- Koravich Sangkaew (kwankoravich@gmail.com)
- Peerawat Rojratchadakorn (peerawat.roj@gmail.com)
- Surapon Nonesung (nonesungsurapon@gmail.com)
- Chanon Utupon (chanon.utupon@gmail.com)
- Sadhis Wongprayoon (sadhis.tae@gmail.com)
- Nucharee Thongthungwong (nuchhub@hotmail.com)
- Chawakorn Phiantham (mondcha1507@gmail.com)
- Patteera Triamamornwooth (patt.patteera@gmail.com)
- Nattarika Juntarapaoraya (natt.juntara@gmail.com)
- Kriangkrai Saetan (kraitan.ss21@gmail.com)
- Pitikorn Khlaisamniang (pitikorn32@gmail.com)
Disclaimer: Provided responses are not guaranteed.
- Downloads last month
- 21
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
