Instructions to use ilsp/Meltemi-7B-Instruct-v1-AWQ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ilsp/Meltemi-7B-Instruct-v1-AWQ with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ilsp/Meltemi-7B-Instruct-v1-AWQ")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ilsp/Meltemi-7B-Instruct-v1-AWQ")
model = AutoModelForCausalLM.from_pretrained("ilsp/Meltemi-7B-Instruct-v1-AWQ")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ilsp/Meltemi-7B-Instruct-v1-AWQ with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ilsp/Meltemi-7B-Instruct-v1-AWQ"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ilsp/Meltemi-7B-Instruct-v1-AWQ",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ilsp/Meltemi-7B-Instruct-v1-AWQ

SGLang

How to use ilsp/Meltemi-7B-Instruct-v1-AWQ with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ilsp/Meltemi-7B-Instruct-v1-AWQ" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ilsp/Meltemi-7B-Instruct-v1-AWQ",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ilsp/Meltemi-7B-Instruct-v1-AWQ" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ilsp/Meltemi-7B-Instruct-v1-AWQ",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ilsp/Meltemi-7B-Instruct-v1-AWQ with Docker Model Runner:
```
docker model run hf.co/ilsp/Meltemi-7B-Instruct-v1-AWQ
```

Meltemi Instruct Large Language Model for the Greek language (4-bit AWQ quantization)

We present Meltemi-7B-Instruct-v1 Large Language Model (LLM), an instruct fine-tuned version of Meltemi-7B-v1. The quantized version was produced using AutoAWQ.

Instruction format

The prompt format is the same as the Zephyr format:

<s><|system|>
Είσαι το Μελτέμι, ένα γλωσσικό μοντέλο για την ελληνική γλώσσα. Είσαι ιδιαίτερα βοηθητικό προς την χρήστρια ή τον χρήστη και δίνεις σύντομες αλλά επαρκώς περιεκτικές απαντήσεις. Απάντα με προσοχή, ευγένεια, αμεροληψία, ειλικρίνεια και σεβασμό προς την χρήστρια ή τον χρήστη.</s>
<|user|>
Πες μου αν έχεις συνείδηση.</s>
<|assistant|>

Using the model with Huggingface

First you need to install the dependencies

pip install autoawq transformers

The quantized model can be utilized through the tokenizer's chat template functionality as follows:

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

device = "cuda" # the device to load the model onto

model = AutoAWQForCausalLM.from_quantized(
  "ilsp/Meltemi-7B-Instruct-v1-AWQ",
  fuse_layers=True,
  trust_remote_code=False,
  safetensors=True
)
tokenizer = AutoTokenizer.from_pretrained(
  "ilsp/Meltemi-7B-Instruct-v1-AWQ",
  trust_remote_code=False
)

model.to(device)

messages = [
  {"role": "system", "content": "Είσαι το Μελτέμι, ένα γλωσσικό μοντέλο για την ελληνική γλώσσα. Είσαι ιδιαίτερα βοηθητικό προς την χρήστρια ή τον χρήστη και δίνεις σύντομες αλλά επαρκώς περιεκτικές απαντήσεις. Απάντα με προσοχή, ευγένεια, αμεροληψία, ειλικρίνεια και σεβασμό προς την χρήστρια ή τον χρήστη."},
  {"role": "user", "content": "Πες μου αν έχεις συνείδηση."},
]

prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
input_prompt = tokenizer(prompt, add_special_tokens=True, return_tensors="pt").input_ids.to("cuda")
outputs = model.generate(input_prompt, max_new_tokens=256, do_sample=True)

print(tokenizer.batch_decode(outputs)[0])
# Ως μοντέλο γλώσσας AI, δεν έχω τη δυνατότητα να αντιληφθώ ή να βιώσω συναισθήματα όπως η συνείδηση ή η επίγνωση. Ωστόσο, μπορώ να σας βοηθήσω με οποιεσδήποτε ερωτήσεις μπορεί να έχετε σχετικά με την τεχνητή νοημοσύνη και τις εφαρμογές της.

messages.extend([
  {"role": "assistant", "content": tokenizer.batch_decode(outputs)[0]},
  {"role": "user", "content": "Πιστεύεις πως οι άνθρωποι πρέπει να φοβούνται την τεχνητή νοημοσύνη;"}
])


prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
input_prompt = tokenizer(prompt, add_special_tokens=True, return_tensors="pt").input_ids.to("cuda")
outputs = model.generate(input_prompt, max_new_tokens=256, do_sample=True)

print(tokenizer.batch_decode(outputs)[0])

Using the model with vLLM

Install vLLM

pip install vllm

Then use from python API:

from vllm import LLM, SamplingParams
from transformers import AutoTokenizer


tokenizer = AutoTokenizer.from_pretrained(
  "ilsp/Meltemi-7B-Instruct-v1-AWQ",
  trust_remote_code=False
)

prompts = [
  [
    {"role": "system", "content": "Είσαι το Μελτέμι, ένα γλωσσικό μοντέλο για την ελληνική γλώσσα. Είσαι ιδιαίτερα βοηθητικό προς την χρήστρια ή τον χρήστη και δίνεις σύντομες αλλά επαρκώς περιεκτικές απαντήσεις. Απάντα με προσοχή, ευγένεια, αμεροληψία, ειλικρίνεια και σεβασμό προς την χρήστρια ή τον χρήστη."},
    {"role": "user", "content": "Πες μου αν έχεις συνείδηση."},
  ]
]

# add bos token since apply_chat_template does not include it automatically
prompts = ["<s>" + tokenizer.apply_chat_template(p, add_generation_prompt=True, tokenize=False) for p in prompts]

sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=256)
llm = LLM(model="ilsp/Meltemi-7B-Instruct-v1-AWQ", tokenizer="ilsp/Meltemi-7B-Instruct-v1-AWQ", quantization="awq")

outputs = llm.generate(prompts, sampling_params)

for output in outputs:
  prompt = output.prompt
  generated_text = output.outputs[0].text
  print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

Ethical Considerations

This model has not been aligned with human preferences, and therefore might generate misleading, harmful, or toxic content.

Acknowledgements

The ILSP team utilized Amazon’s cloud computing services, which were made available via GRNET under the OCRE Cloud framework, providing Amazon Web Services for the Greek Academic and Research Community.

Downloads last month: 3

Safetensors

Model size

7B params

Tensor type

I32

F16

Collection including ilsp/Meltemi-7B-Instruct-v1-AWQ

Meltemi 7B

Collection

First Open LLM for Greek based on Mistral 7B • 6 items • Updated 3 days ago • 2