Instructions to use nightmedia/Qwen3-4B-Element4-Eva with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nightmedia/Qwen3-4B-Element4-Eva with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="nightmedia/Qwen3-4B-Element4-Eva")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("nightmedia/Qwen3-4B-Element4-Eva")
model = AutoModelForCausalLM.from_pretrained("nightmedia/Qwen3-4B-Element4-Eva")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use nightmedia/Qwen3-4B-Element4-Eva with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nightmedia/Qwen3-4B-Element4-Eva"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nightmedia/Qwen3-4B-Element4-Eva",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/nightmedia/Qwen3-4B-Element4-Eva

SGLang

How to use nightmedia/Qwen3-4B-Element4-Eva with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "nightmedia/Qwen3-4B-Element4-Eva" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nightmedia/Qwen3-4B-Element4-Eva",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "nightmedia/Qwen3-4B-Element4-Eva" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nightmedia/Qwen3-4B-Element4-Eva",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use nightmedia/Qwen3-4B-Element4-Eva with Docker Model Runner:
```
docker model run hf.co/nightmedia/Qwen3-4B-Element4-Eva
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Qwen3-4B-Element4-Eva

This is a model merge between Qwen3-4B-Element4 and FutureMa/Eva-4B.

Brainwaves of qx86-hi quants of the parent models

Element4     0.582,0.779,0.849,0.708,0.442,0.771,0.655
Eva-4B       0.539,0.747,0.864,0.606,0.412,0.751,0.605

Eva merged models

Agent-Eva    0.568,0.775,0.872,0.699,0.418,0.777,0.654
Element8-Eva 0.559,0.768,0.872,0.694,0.422,0.765,0.647

Element4-Eva
bf16         0.570,0.781,0.869,0.689,0.422,0.769,0.645
qx86-hi      0.567,0.781,0.868,0.689,0.426,0.773,0.642
qx64-hi      0.567,0.772,0.865,0.679,0.424,0.772,0.641
mxfp4        0.549,0.757,0.864,0.666,0.414,0.764,0.635

Element4 is a merge of Qwen3-4B-Engineer3x and Qwen3-4B-Agent, and serves as a base for the higher number elements. The Agent is Heretic-abliterated, which provides for some interesting friction in the model chains of thought, that only enhances the inference with some original AI humour.

The qx86-hi quant performs at the same level with full precision in this model.

The Element models are profiled to act as agents on the Star Trek DS9 station, in a roleplay scenario.

The models can be used for regular tasks as well.

Each comes with different skills. I found FutureMa/Eva-4B recently with an interesting model card: