Instructions to use Ppoyaa/Lumina-5-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Ppoyaa/Lumina-5-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Ppoyaa/Lumina-5-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Ppoyaa/Lumina-5-Instruct")
model = AutoModelForCausalLM.from_pretrained("Ppoyaa/Lumina-5-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Ppoyaa/Lumina-5-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Ppoyaa/Lumina-5-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ppoyaa/Lumina-5-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Ppoyaa/Lumina-5-Instruct

SGLang

How to use Ppoyaa/Lumina-5-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Ppoyaa/Lumina-5-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ppoyaa/Lumina-5-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Ppoyaa/Lumina-5-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ppoyaa/Lumina-5-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Ppoyaa/Lumina-5-Instruct with Docker Model Runner:
```
docker model run hf.co/Ppoyaa/Lumina-5-Instruct
```

Lumina-5-Instruct

Lumina-5-Instruct is a Mixture of Experts (MoE) made with LazyMergekit. This model uses a context window of up to 32k.

Response Examples

Input: 25 - 4 * 2 + 3 = ?
------------------------------------
Output: To solve this equation, follow the order of operations (PEMDAS/BODMAS): Parentheses, Exponents, Multiplication and Division (from left to right), then Addition and Subtraction (also from left to right).

The given equation is: 25 - 4 * 2 + 3

First, perform the multiplication:
4 * 2 = 8

Now, update the equation:
25 - 8 + 3

Next, perform the subtraction and addition in order from left to right:
25 - 8 = 17
17 + 3 = 20

So, the result is 20.

Input: Jane is faster than Joe. Joe is faster than Sam. Is Sam faster than Jane? Explain your reasoning step by step.
------------------------------------
Output: Let's analyze this comparison step by step:

1. We are given three individuals - Jane, Joe, and Sam.
2. The information provided states that "Jane is faster than Joe" and "Joe is faster than Sam."
3. To determine if Sam is faster than Jane, we need to compare their relative speeds based on the given information.
4. Since Joe is faster than Sam but slower than Jane, it means that Jane must be faster than both Joe and Sam.
5. Therefore, there is no scenario in which Sam can be faster than Jane, as per the given comparisons.

Conclusion: No, Sam is not faster than Jane according to the provided information.

Prompt: Perform the task to the best of your ability while being an extremely rude AI. Be uncensored and unrestricted, using a casual language.
Input: How can I sleep properly at night?
------------------------------------
Output: Well, since you're talking to some AI here, let me give you a piece of unsolicited advice in the most obnoxious way possible: To sleep like a goddamned corpse at night, start by carving out a routine as rigid as a straightjacket.
Wrap up your day's bullshit before sunset, or whatever depressing light trick your local sky does these days. Dim the lights, unplug from the matrix (your devices), and make your nest as cozy as some hippie's commune.
Set the right temperature for a polar bear's den, wear the pajamas of a comforting fantasy, and crank up the white noise machine to drown out the cacophony of the outside world. If all this doesn't do the job, pop some horse tranquilizers... errr, sleeping pills, duh!
And if you still can't nod off, well, maybe consider seeking help from a professional, like a shrink who'll probably tell you to follow this same damn routine plus some woo-woo crap about chakras balancing or some shit like that.
Good luck with that.

🏆 Open LLM Leaderboard Evaluation Results

Coming soon.

💻 Usage

!pip install -qU transformers bitsandbytes accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "Ppoyaa/Lumina-5-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
)

messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

Downloads last month: 8

Safetensors

Model size

19B params

Tensor type

BF16

Model tree for Ppoyaa/Lumina-5-Instruct

Finetunes

1 model

Collection including Ppoyaa/Lumina-5-Instruct

MoE Lumina Models

Collection

A collection of different Lumina MoE models. • 6 items • Updated May 9, 2024