Instructions to use Vortex5/Prototype-X-12b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Vortex5/Prototype-X-12b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Vortex5/Prototype-X-12b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Vortex5/Prototype-X-12b")
model = AutoModelForCausalLM.from_pretrained("Vortex5/Prototype-X-12b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Vortex5/Prototype-X-12b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Vortex5/Prototype-X-12b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Vortex5/Prototype-X-12b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Vortex5/Prototype-X-12b

SGLang

How to use Vortex5/Prototype-X-12b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Vortex5/Prototype-X-12b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Vortex5/Prototype-X-12b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Vortex5/Prototype-X-12b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Vortex5/Prototype-X-12b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Vortex5/Prototype-X-12b with Docker Model Runner:
```
docker model run hf.co/Vortex5/Prototype-X-12b
```

Model Capabilities, Sample Presets, Glazing

by YuuNomi - opened Mar 24

Discussion

YuuNomi

Mar 24

Hello! This was my first ever LLM as a new SillyTavern user and I'm quite happy with how developed it is compared to company chat models from places like C.ai and Spicychat! (I've put in a whole month of working on one single char. card and this Merge Model hhh)

For those who are interested in the capabilities of this model as fellow noobs, here is a list as a Q4_K_M version user!

Capable of separating internal dialogue of character from narration and speech (Something I finally figured out after a while)
Capable of applying world info seamlessly within dialogue and narration direction such as quickly explaining in internal dialogue why char. would avoid speaking of something within their lore as actual example: "He'd never want anything to do with me if he knew"
Understand relative positions of characters such as sitting on a couch which it would somehow know to remind itself through following responses which impresses me quite a lot
Char. often hesitated and gave further option for user to think further on actions likely from the Radiance model merge.

-----------!!SAMPLE SETTINGS!!--(KoboldCpp, SillyTavern, Context: 16K, NO quantization of KV for memory, No ContextShift, Text Completion)-------------

Temp: 3.03 (The high temp really emphasized creative writing which allowed separation of Narration/ Char. Dialogue/ Char. Internal Dialogue while being perfectly coherent for most generations. Going higher [Near 5.0 temp.] introduced the usage of "~" without any prompting or text examples to do so within words that are supposed to sound syrupy or soft which I'm still trying to figure out how to apply, though currently becomes illogical and incoherent consistently due to really high temp.)
Min P: 0.05
Repetition Penalty: 1.14
Repetition Penalty Range: 64
Adaptive-P: target = 0.54 , Decay = 0.99 (Going lower for Target breaks established formatting, Higher introduced repeating strings of text, Added the general formatting of highly separated text I wanted rather than bland paragraphs)
Smooth Factoring: 2.14 (Very Important!! This specific sampling method monumentally affected the coherency, likely due to temp. being set weirdly high)
Exclude Top Choice: Threshold = 0.1, Probability = 0.2

Possible samples for improvements:
Uncommonly, generations still may not make full sense but not enough that you can't be emersed ...
-- Smooth factoring: If coherency improved due to this value, experimenting with going a bit lower or higher may increase per generation accuracy
-- Exclude Top Choice: This was a random choice. Didn't like some words responses used so I turned it up a bit. Now it feels good. Could be better though!
-- Introducing TFS: I've heard this sampling method was a lot mor effective than "Top-P" which I couldn't use as it generated less accurate responses even set to 0.99 for some reason. SO I might experiment using TFS later.

Additional Stuff:
I really tried getting a [time system] established per response as a header before the actual response but it gets quite incoherent like skipping hours or changing dates in most generations, though I did place the time instruction in "post history prompt" which may have affected it?

Glazing the Creator:
Thank you so much for merging these together!! With how each model acts, it felt very nice how it responds especially in the light novel format I really enjoy.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment