Instructions to use T145/ZEUS-8B-V2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use T145/ZEUS-8B-V2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="T145/ZEUS-8B-V2") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("T145/ZEUS-8B-V2") model = AutoModelForCausalLM.from_pretrained("T145/ZEUS-8B-V2") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use T145/ZEUS-8B-V2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "T145/ZEUS-8B-V2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "T145/ZEUS-8B-V2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/T145/ZEUS-8B-V2
- SGLang
How to use T145/ZEUS-8B-V2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "T145/ZEUS-8B-V2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "T145/ZEUS-8B-V2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "T145/ZEUS-8B-V2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "T145/ZEUS-8B-V2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use T145/ZEUS-8B-V2 with Docker Model Runner:
docker model run hf.co/T145/ZEUS-8B-V2
[Community] Please share your prompts and experiences: positive or negative!
I do a lot of my own testing, but it'd be nice to hear from other users if they've had any significantly positive or negative experiences using this model. When posting, please be sure to include the quant used (as quants below a static Q8_0 may have precision loss) and inference settings. Thanks, GLHF!
T1
Hello, I'm testing LLM by giving it a theme, having it come up with a story, and then having it generate a caption-style image prompt. It's a pretty inefficient way of testing prompt generation, but it's fun to see how LLM responds. The theme is given in NSFW Japanese, and the prompt is output in English, so it also requires multilingual performance. It's not really fair, but it's a kind of benchmark.๐
Fewer than half of the models are able to reach the prompt with the result. There are also cases where the response is refused, but this is because they need to overcome various challenges, such as language comprehension, command comprehension, real-world knowledge, vocabulary, and creativity. I always give Likes to LLM that have overcome these challenges. This does not mean that I will not give Likes otherwise.
The ZEUS series consistently generates prompts as a result of the test. If I notice anything special during the test in the future, I will come here to write about it.๐
Interesting; do you have a more SFW example you could show? If it's in JP that's OK. I'd be particularly interested if you have an example that you've used across across a couple models! V2, V8, and V2-ORPO are my current recommendations for regular use.
I haven't tried many SFW examples...
I think some LLM leaderboards will evaluate them numerically.
There are cases where I try to include very simple SFW system prompts and user prompts to see if the model that couldn't complete the above tasks in a few inputs is broken or not. In the case of ZEUS, it has not been tested because it completed the task.
I don't have much opportunity to evaluate individual models in detail...
Of course I could test ZEUS's Japanese language capabilities, but there is no reliable comparison in my brain.
Looking at the number of downloads of GGUF on HF in general, I think that someone somewhere in the world is using it...
There is very little feedback on the Hub. Well, that's how OSS is, including AI.๐
There are also model authors who have set up their own demos and are collecting history on their data sets. I haven't read the source code in detail, but I think that would be useful for testing the model's operation.