Instructions to use dnhkng/RYS-XLarge with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dnhkng/RYS-XLarge with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="dnhkng/RYS-XLarge")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("dnhkng/RYS-XLarge")
model = AutoModelForCausalLM.from_pretrained("dnhkng/RYS-XLarge")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use dnhkng/RYS-XLarge with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "dnhkng/RYS-XLarge"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dnhkng/RYS-XLarge",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/dnhkng/RYS-XLarge

SGLang

How to use dnhkng/RYS-XLarge with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "dnhkng/RYS-XLarge" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dnhkng/RYS-XLarge",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "dnhkng/RYS-XLarge" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dnhkng/RYS-XLarge",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use dnhkng/RYS-XLarge with Docker Model Runner:
```
docker model run hf.co/dnhkng/RYS-XLarge
```

Request

by bartowski - opened Aug 10, 2024

Discussion

bartowski

Aug 10, 2024

@dnhkng repo privated, feel free to reply here with any other details

dnhkng

Owner Aug 10, 2024

Thanks! Could you do the same for any other RYS models you have prepped?

I have another method that is what will be in the paper, and these models were released a bit early. The next models will be better, and I don't want to have huge amount of suboptimal models out there :)

bartowski

Aug 10, 2024

all other RYS? sture thing

bartowski

Aug 10, 2024

speaking of, because your Gemma introduces new layers, llama.cpp doesn't recognize it properly, so will need to either add a fix upstream or remove the extra layers

dnhkng

Owner Aug 11, 2024

makes sense! yeah, this was prematurely released. I didn't expect it to spread so fast 😥

Some models are not optimal yet, and others are broken.

dnhkng changed discussion status to closed Sep 4, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment