Instructions to use Locutusque/Hyperion-3.0-Mixtral-3x7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Locutusque/Hyperion-3.0-Mixtral-3x7B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Locutusque/Hyperion-3.0-Mixtral-3x7B")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Locutusque/Hyperion-3.0-Mixtral-3x7B")
model = AutoModelForCausalLM.from_pretrained("Locutusque/Hyperion-3.0-Mixtral-3x7B")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Locutusque/Hyperion-3.0-Mixtral-3x7B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Locutusque/Hyperion-3.0-Mixtral-3x7B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Locutusque/Hyperion-3.0-Mixtral-3x7B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Locutusque/Hyperion-3.0-Mixtral-3x7B

SGLang

How to use Locutusque/Hyperion-3.0-Mixtral-3x7B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Locutusque/Hyperion-3.0-Mixtral-3x7B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Locutusque/Hyperion-3.0-Mixtral-3x7B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Locutusque/Hyperion-3.0-Mixtral-3x7B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Locutusque/Hyperion-3.0-Mixtral-3x7B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Locutusque/Hyperion-3.0-Mixtral-3x7B with Docker Model Runner:
```
docker model run hf.co/Locutusque/Hyperion-3.0-Mixtral-3x7B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Hyperion-3.0-Mixtral-3x7B

Model Details

This is an experimental first attempt at creating a Mixture of Experts (MoE) language model by combining several Mistral expert models. The model uses the hyperion-3.0-beta architecture as the base, with a bfloat16 output dtype. The gating mechanism is set to hidden and two experts are consulted per token (experts_per_token: 2).

The model incorporates three expert models:

hyperion-3.0-beta: Focused on science, math, and coding tasks
dibt-mistral-7b: Handles open-ended questions, summarization, and stream of consciousness.
rp-mistral-7b: Specializes in roleplaying and character-based conversations

Each expert is trained on a set of positive and negative prompts to guide its specialization.

Intended Use and Limitations

This MoE model is an early prototype and may not exhibit optimal performance. It is intended for research and experimentation purposes only, and should not be used in production environments or for critical applications.

Please note that the expert models mentioned in the configuration have not been publicly released yet. They are expected to be made available in the near future, at which point this MoE model can be fully instantiated and evaluated.

Training Details

The base model and experts were trained using QLoRA and SFT. However, the specific details of the training data, hyperparameters, and optimization techniques used for this MoE model are not available at this time.

Feedback and Future Updates

As this is an experimental model, feedback and suggestions are welcome. Future updates may include improvements to the gating mechanism, fine-tuning of the expert models, and the incorporation of additional experts to enhance the model's performance and breadth of knowledge.

Downloads last month: 92

Safetensors

Model size

19B params

Tensor type

BF16

Model tree for Locutusque/Hyperion-3.0-Mixtral-3x7B

Quantizations

1 model

Locutusque
/

Hyperion-3.0-Mixtral-3x7B