Instructions to use rizla/rizla-17 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use rizla/rizla-17 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="rizla/rizla-17")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("rizla/rizla-17") model = AutoModelForCausalLM.from_pretrained("rizla/rizla-17") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use rizla/rizla-17 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "rizla/rizla-17" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rizla/rizla-17", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/rizla/rizla-17
- SGLang
How to use rizla/rizla-17 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "rizla/rizla-17" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rizla/rizla-17", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "rizla/rizla-17" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rizla/rizla-17", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use rizla/rizla-17 with Docker Model Runner:
docker model run hf.co/rizla/rizla-17
- rizla been cooking while singing
- This is an experimental model that I made by merging two 2expmixtrals The mergekitty is a tool that lets me mix and match different models into one big model, keeping all the smarts and skills of the original models. The llama70b is a huge language model that can make words for all kinds of things and ways, based on the GPT-4 thingy.
rizla been cooking while singing
This is an experimental model that I made by merging two 2expmixtrals The mergekitty is a tool that lets me mix and match different models into one big model, keeping all the smarts and skills of the original models. The llama70b is a huge language model that can make words for all kinds of things and ways, based on the GPT-4 thingy.
The merged model has 17 billion parraraameters and was made to run on 8gb of ram minimum in q3KL gguf.
Merge me baby one more time
Sending this contraption out straight to mergeland, wwhheeeeeeeeeeeee LFG ๐
- Downloads last month
- 226
Model tree for rizla/rizla-17
Base model
mistralai/Mixtral-8x7B-v0.1 Finetuned
mistralai/Mixtral-8x7B-Instruct-v0.1
docker model run hf.co/rizla/rizla-17