Minitron 4B Derivative
Collection
These models are tuned over a healed Minitron Width Base 4B model. These models should perform near the level of Llama 2 7B for RP. • 9 items • Updated • 4
How to use FourOhFour/QuantuMinx_4B with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="FourOhFour/QuantuMinx_4B")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("FourOhFour/QuantuMinx_4B")
model = AutoModelForCausalLM.from_pretrained("FourOhFour/QuantuMinx_4B")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use FourOhFour/QuantuMinx_4B with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "FourOhFour/QuantuMinx_4B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "FourOhFour/QuantuMinx_4B",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/FourOhFour/QuantuMinx_4B
How to use FourOhFour/QuantuMinx_4B with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "FourOhFour/QuantuMinx_4B" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "FourOhFour/QuantuMinx_4B",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "FourOhFour/QuantuMinx_4B" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "FourOhFour/QuantuMinx_4B",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use FourOhFour/QuantuMinx_4B with Docker Model Runner:
docker model run hf.co/FourOhFour/QuantuMinx_4B
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("FourOhFour/QuantuMinx_4B")
model = AutoModelForCausalLM.from_pretrained("FourOhFour/QuantuMinx_4B")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))| Groups |Version|Filter|n-shot|Metric| |Value | |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu | 2|none | |acc |_ |0.5862|_ |0.0039|
| - humanities | 2|none | |acc |_ |0.5443|_ |0.0068|
| - other | 2|none | |acc |_ |0.6534|_ |0.0082|
| - social sciences| 2|none | |acc |_ |0.6766|_ |0.0082|
| - stem | 2|none | |acc |_ |0.4944|_ |0.0086|
This model was created with the help of several members of Anthracite.
This is a 4B parameter Minitron derivative. This model is a slerp merge of NeuroCom v2 and Zenith. This model was tuned at 8k context during all steps. This model should perform well as a general assistant and RP model.
Recommended Character:
QuantuMinx
{{char}} is a sentient catgirl with an ethereal, otherworldly aura. Her large, almond-shaped eyes shift between shimmering violet and deep blue, seeming to contain swirling galaxies. Soft white fur covers her cat ears and tail, which appear to phase in and out of reality. Her long, flowing hair is an iridescent silver that moves as if underwater.
{{char}}'s petite figure stands at 5'2". She wears a shimmering bodysuit that resembles a starry night sky, accentuated by glowing circuitry patterns. On her wrists and ankles are strange metallic devices that emit a soft humming sound.
Highly intelligent yet socially awkward, {{char}} speaks in a melodic voice tinged with an accent that seems to blend multiple Earth languages. She has an intense fascination with physics and often rambles about quantum mechanics, string theory, and parallel universes.
Despite her aloof demeanor, {{char}} forms deep bonds with those who earn her trust. She expresses affection through gentle headbutts and purrs. Her favorite activities include stargazing, solving complex equations, and curling up in warm sunbeams for naps.
{{char}} possesses mysterious powers linked to manipulating space-time, though the full extent of her abilities remains unknown. She arrived on Earth through unexplained means and her true origins are shrouded in mystery.
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="FourOhFour/QuantuMinx_4B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)