Instructions to use TeeZee/Kyllene-34B-v1.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use TeeZee/Kyllene-34B-v1.1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="TeeZee/Kyllene-34B-v1.1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("TeeZee/Kyllene-34B-v1.1")
model = AutoModelForCausalLM.from_pretrained("TeeZee/Kyllene-34B-v1.1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use TeeZee/Kyllene-34B-v1.1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "TeeZee/Kyllene-34B-v1.1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TeeZee/Kyllene-34B-v1.1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/TeeZee/Kyllene-34B-v1.1

SGLang

How to use TeeZee/Kyllene-34B-v1.1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "TeeZee/Kyllene-34B-v1.1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TeeZee/Kyllene-34B-v1.1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "TeeZee/Kyllene-34B-v1.1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TeeZee/Kyllene-34B-v1.1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use TeeZee/Kyllene-34B-v1.1 with Docker Model Runner:
```
docker model run hf.co/TeeZee/Kyllene-34B-v1.1
```

What’s the prompt format?

by MarinaraSpaghetti - opened Feb 4, 2024

Discussion

MarinaraSpaghetti

Feb 4, 2024

Howdy! Would love to test this model for my roleplay. :) What prompt format does it use? Thank you in advance for your answer!

TeeZee

Owner Feb 5, 2024

Alpaca works best or in Silly Tavern roleplay or story templates. Also LimaRP specific format. Have fun :)

MarinaraSpaghetti

Feb 5, 2024

Hey, thank you so much! I'm jumping to testing then! :D

MarinaraSpaghetti changed discussion status to closed Feb 5, 2024

MarinaraSpaghetti changed discussion status to open Feb 5, 2024

MarinaraSpaghetti

Feb 5, 2024

Oop, reopening since I did some tests and noticed some problems with this model. Perhaps the issue might be with my settings, though? Firstly, the characters seem to be misspelling my name and using "Marianne" or "Mariana" instead of "Marianna". Secondly, they sometimes produce weird and unexpected tokens. And lastly, the outputs seem to be weirdly short? Unless the model was trained for shorter roleplay formats, then it works as intended, ha ha.

For example, below is the answer produced by Nous-Capybara-LimaRPv3:

And this is an answer produced by Kyllene with the same context, etc.:

And an example of misspells and weird token outputs:

I'm using the same settings for it as for my favorite Nous-Capybara-LimaRPv3 model right now, here they are.
Settings: https://files.catbox.moe/ate9pa.json
Story String: https://files.catbox.moe/g28n4f.json
Instruct: https://files.catbox.moe/7x3jod.json

Curious to learn how the model fares for others. Suspecting that perhaps my set-up may be at fault.

TeeZee

Owner Feb 6, 2024

Thank for tests, yeah, I'm also curious. I had once name Malice switched to Malcolm; strange tokens, do you mean '[For contect pusrposes]' at the end? If so, I had it also when character card had lots of intructions in [], seems like model follows patterns too well sometimes. Shorter output, no idea, but I'll check, probably next weekend.

MarinaraSpaghetti

Feb 6, 2024

Yea, that’s what I meant. Well, the thing with my instructions is that I don’t use the brackets [] for them, ha ha. The shorter outputs seem to be happening mostly when I test the model on my 45k context group chat conversation, even though the previous messages are all, well, long.

TeeZee

Owner Feb 9, 2024

Confirmed, comparing to capybara it produces output shorter by 10-30% on average, tested with a few different scenarios on both models, with shorter context - 4k,8k. Additional output '[]' also happened once in one conversation, so also confirmed. Next iteration of this model will be better ;). Thanks again for testing and btw. what hardware are you using for 45k context?

MarinaraSpaghetti

Feb 9, 2024

•

edited Feb 9, 2024

Awesome, can't wait to test the next iteration then, thanks for letting me know! And I have a 3090 with 24GB of VRAM, can easily run up to 45k of context with 4.0bpw exl2 quants. :) I also recommend dropping the bagel model from the merge if you want it to be better suited for longer context roleplays, since bagel is known for its quality and context-processing drop-off at higher numbers. Recently, I've been using brucethemoose's newest RPMerge, and it has been absolutely wonderful, I highly recommend checking it out for some inspiration!
https://huggingface.co/brucethemoose/Yi-34B-200K-RPMerge

vurkan

Feb 23, 2024

possible to get setting without this catbox that don't work in most places?

MarinaraSpaghetti

Feb 23, 2024

Oh dear, these settings are old and I don’t use them anymore, but here they are.
{
"temp": 1,
"temperature_last": false,
"top_p": 1,
"top_k": 0,
"top_a": 0,
"tfs": 1,
"epsilon_cutoff": 0,
"eta_cutoff": 0,
"typical_p": 1,
"min_p": 0.1,
"rep_pen": 1.2,
"rep_pen_range": 4096,
"no_repeat_ngram_size": 0,
"penalty_alpha": 0,
"num_beams": 1,
"length_penalty": 0,
"min_length": 0,
"encoder_rep_pen": 1,
"freq_pen": 0,
"presence_pen": 0,
"do_sample": true,
"early_stopping": false,
"dynatemp": false,
"min_temp": 1,
"max_temp": 1.4,
"dynatemp_exponent": 1,
"smoothing_factor": 0,
"add_bos_token": false,
"truncation_length": 2048,
"ban_eos_token": false,
"skip_special_tokens": true,
"streaming": true,
"mirostat_mode": 0,
"mirostat_tau": 5,
"mirostat_eta": 0.1,
"guidance_scale": 1,
"negative_prompt": "",
"grammar_string": "",
"banned_tokens": "",
"ignore_eos_token_aphrodite": false,
"spaces_between_special_tokens_aphrodite": true,
"sampler_order": [
6,
0,
1,
3,
4,
2,
5
],
"logit_bias": [],
"n": 1,
"rep_pen_size": 0,
"genamt": 400,
"max_length": 44032
}

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment