Instructions to use Locutusque/Hyperion-3.0-Mixtral-3x7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Locutusque/Hyperion-3.0-Mixtral-3x7B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Locutusque/Hyperion-3.0-Mixtral-3x7B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Locutusque/Hyperion-3.0-Mixtral-3x7B") model = AutoModelForCausalLM.from_pretrained("Locutusque/Hyperion-3.0-Mixtral-3x7B") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Locutusque/Hyperion-3.0-Mixtral-3x7B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Locutusque/Hyperion-3.0-Mixtral-3x7B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Locutusque/Hyperion-3.0-Mixtral-3x7B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Locutusque/Hyperion-3.0-Mixtral-3x7B
- SGLang
How to use Locutusque/Hyperion-3.0-Mixtral-3x7B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Locutusque/Hyperion-3.0-Mixtral-3x7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Locutusque/Hyperion-3.0-Mixtral-3x7B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Locutusque/Hyperion-3.0-Mixtral-3x7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Locutusque/Hyperion-3.0-Mixtral-3x7B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Locutusque/Hyperion-3.0-Mixtral-3x7B with Docker Model Runner:
docker model run hf.co/Locutusque/Hyperion-3.0-Mixtral-3x7B
Hyperion-3.0-Mixtral-3x7B
Model Details
This is an experimental first attempt at creating a Mixture of Experts (MoE) language model by combining several Mistral expert models. The model uses the hyperion-3.0-beta architecture as the base, with a bfloat16 output dtype. The gating mechanism is set to hidden and two experts are consulted per token (experts_per_token: 2).
The model incorporates three expert models:
hyperion-3.0-beta: Focused on science, math, and coding tasksdibt-mistral-7b: Handles open-ended questions, summarization, and stream of consciousness.rp-mistral-7b: Specializes in roleplaying and character-based conversations
Each expert is trained on a set of positive and negative prompts to guide its specialization.
Intended Use and Limitations
This MoE model is an early prototype and may not exhibit optimal performance. It is intended for research and experimentation purposes only, and should not be used in production environments or for critical applications.
Please note that the expert models mentioned in the configuration have not been publicly released yet. They are expected to be made available in the near future, at which point this MoE model can be fully instantiated and evaluated.
Training Details
The base model and experts were trained using QLoRA and SFT. However, the specific details of the training data, hyperparameters, and optimization techniques used for this MoE model are not available at this time.
Feedback and Future Updates
As this is an experimental model, feedback and suggestions are welcome. Future updates may include improvements to the gating mechanism, fine-tuning of the expert models, and the incorporation of additional experts to enhance the model's performance and breadth of knowledge.
- Downloads last month
- 92